Run A RayJob
This page shows how to leverage Kueue’s scheduling and resource management capabilities when running KubeRay’s RayJob.
This guide is for batch users that have a basic understanding of Kueue. For more information, see Kueue’s overview.
Before you begin
-
Check Administer cluster quotas for details on the initial Kueue setup.
-
See KubeRay Installation for installation and configuration details of KubeRay.
RayJob definition
When running RayJobs on Kueue, take into consideration the following aspects:
a. Queue selection
The target local queue should be specified in the metadata.labels
section of the RayJob configuration.
b. Configure the resource needs
The resource needs of the workload can be configured in the spec.rayClusterSpec
.
c. Limitations
- A Kueue managed RayJob cannot use an existing RayCluster.
- The RayCluster should be deleted at the end of the job execution,
spec.ShutdownAfterJobFinishes
should betrue
. - Because Kueue will reserve resources for the RayCluster,
spec.rayClusterSpec.enableInTreeAutoscaling
should befalse
. - Because a Kueue workload can have a maximum of 8 PodSets, the maximum number of
spec.rayClusterSpec.workerGroupSpecs
is 7.
Example RayJob
In this example, the code is provided to the Ray framework via a ConfigMap.
The RayJob looks like the following:
You can run this RayJob with the following commands:
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.