Ray

is an open source distributed computing framework to scale compute-intensive workloads in Python

Created: Mar 30, 2024 by Pradeep Gowda Updated: Mar 30, 2024 Tagged: ray · python · distributed

Moritz, Philipp, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, et al. “Ray: A distributed framework for emerging AI applications,” 2018. https://arxiv.org/abs/1712.05889. PDF

Ray unifies task-parallel and actor programming models in a single dynamic task graph and employs a scalable architecture enabled by the global control store and a bottom-up distributed scheduler. The programming flexibility, high throughput, and low laten- cies simultaneously achieved by this architecture is partic- ularly important for emerging artificial intelligence work- loads, which produce tasks diverse in their resource re- quirements, duration, and functionality.

Learning Ray by Max Pumperla, Edward Oakes, and Richard Liaw. ( PDF ).

In short, Ray sets up and manages clusters of computers so that you can run distributed tasks on them. A Ray Cluster consists of nodes that are connected to each other via a network. You program against the so-called driver, the program root, which lives on the head node. The driver can run jobs, a collection of tasks, that are run on the nodes in the cluster. Specifically, the individual tasks of a job are run on worker processes on worker nodes.

This cluster can also be local, on your laptop. The default number of worker processes is the number of CPUs available on your machine.

Github - https://github.com/ray-project/ray

$ pip install "ray[rllib, serve, tune]==2.10.0"

import ray
ray.init()

Libraries:

data processing – apache-arrow
model training – TensorFlow, pytorch
hyperparameter tuning
model serving