Run A RayCluster
This page shows how to leverage Kueue’s scheduling and resource management capabilities when running RayCluster.
This guide is for batch users that have a basic understanding of Kueue. For more information, see Kueue’s overview.
Before you begin
-
Make sure you are using Kueue v0.6.0 version or newer and KubeRay 1.1.0 or newer.
-
Check Administer cluster quotas for details on the initial Kueue setup.
-
See KubeRay Installation for installation and configuration details of KubeRay.
RayCluster definition
When running RayClusters on Kueue, take into consideration the following aspects:
a. Queue selection
The target local queue should be specified in the metadata.labels
section of the RayCluster configuration.
metadata:
name: raycluster-sample
namespace: default
labels:
kueue.x-k8s.io/queue-name: local-queue
b. Configure the resource needs
The resource needs of the workload can be configured in the spec
.
headGroupSpec:
spec:
affinity: {}
containers:
- env: []
image: rayproject/ray:2.7.0
imagePullPolicy: IfNotPresent
name: ray-head
resources:
limits:
cpu: "1"
memory: 2G
requests:
cpu: "1"
memory: 2G
securityContext: {}
volumeMounts:
- mountPath: /tmp/ray
name: log-volume
workerGroupSpecs:
template:
spec:
affinity: {}
containers:
- env: []
image: rayproject/ray:2.7.0
imagePullPolicy: IfNotPresent
name: ray-worker
resources:
limits:
cpu: "1"
memory: 1G
requests:
cpu: "1"
memory: 1G
Note that a RayCluster will hold resource quotas while it exists. For optimal resource management, you should delete a RayCluster that is no longer in use.
c. Limitations
- Limited Worker Groups: Because a Kueue workload can have a maximum of 8 PodSets, the maximum number of
spec.workerGroupSpecs
is 7 - In-Tree Autoscaling Disabled: Kueue manages resource allocation for the RayCluster; therefore, the cluster’s internal autoscaling mechanisms need to be disabled
Example RayCluster
The RayCluster looks like the following:
apiVersion: ray.io/v1
kind: RayCluster
metadata:
name: raycluster-sample
namespace: default
labels:
kueue.x-k8s.io/queue-name: local-queue
spec:
headGroupSpec:
rayStartParams:
dashboard-host: 0.0.0.0
serviceType: ClusterIP
template:
metadata:
annotations: {}
spec:
affinity: {}
containers:
- env: []
image: rayproject/ray:2.7.0
imagePullPolicy: IfNotPresent
name: ray-head
resources:
limits:
cpu: "1"
memory: 2G
requests:
cpu: "1"
memory: 2G
securityContext: {}
volumeMounts:
- mountPath: /tmp/ray
name: log-volume
imagePullSecrets: []
nodeSelector: {}
tolerations: []
volumes:
- emptyDir: {}
name: log-volume
workerGroupSpecs:
- groupName: workergroup
maxReplicas: 10
minReplicas: 1
rayStartParams: {}
replicas: 4
template:
metadata:
annotations: {}
spec:
affinity: {}
containers:
- env: []
image: rayproject/ray:2.7.0
imagePullPolicy: IfNotPresent
name: ray-worker
resources:
limits:
cpu: "1"
memory: 1G
requests:
cpu: "1"
memory: 1G
securityContext: {}
volumeMounts:
- mountPath: /tmp/ray
name: log-volume
imagePullSecrets: []
nodeSelector: {}
tolerations: []
volumes:
- emptyDir: {}
name: log-volume
You can submit a Ray Job using the CLI or log into the Ray Head and execute a job following this example with kind cluster.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.