Fallback Client

The FallbackClient class provides an interface for interacting with the fallback service, which allows users to run jobs on fallback cloud like RunPod. It uses SkyPilot for managing the cluster. This class includes methods for creating, managing, and retrieving jobs.

class compute_horde_sdk.v1.fallback.FallbackClient

A fallback client that provides the same API as ComputeHordeClient.

__init__(cloud, idle_minutes=15, **kwargs)
Parameters:
  • cloud (str)

  • idle_minutes (int)

  • kwargs (Any)

Return type:

None

async create_job(job_spec)

Run a fallback job in the SkyPilot cluster. This method does not retry a failed job. Use run_until_complete() if you want failed jobs to be automatically retried.

Parameters:

job_spec (FallbackJobSpec) – Job specification to run.

Returns:

A FallbackJob class instance representing the created job.

Return type:

FallbackJob

async run_until_complete(job_spec, job_attempt_callback=None, timeout=None, max_attempts=3)

Run a fallback job in the SkyPilot cluster until it is successful. It will call create_job() repeatedly until the job is successful.

Parameters:
  • job_spec (FallbackJobSpec) – Job specification to run.

  • job_attempt_callback (Callable[[FallbackJob], None] | Callable[[FallbackJob], Awaitable[None]] | Callable[[FallbackJob], Coroutine[Any, Any, None]] | None) – A callback function that will be called after every attempt of running the job. The callback will be called immediately after an attempt is made run the job, before waiting for the job to complete. The function must take one argument of type FallbackJob. It can be a regular or an async function.

  • timeout (float | None) – Maximum number of seconds to wait for.

  • max_attempts (int) – Maximum number times the job will be attempted to run within timeout seconds. Negative or 0 means unlimited attempts.

Returns:

A FallbackJob class instance representing the created job. If the job was rerun, it will represent the last attempt.

Return type:

FallbackJob

async get_job(job_uuid)

Retrieve information about a job from the SkyPilot cluster.

Parameters:

job_uuid (str) – The UUID of the job to retrieve.

Returns:

A FallbackJob instance representing this job.

Raises:

FallbackNotFoundError – If the job with this UUID does not exist.

Return type:

FallbackJob

async get_jobs()

Retrieve information about your jobs from the SkyPilot cluster.

Returns:

A list of FallbackJob instances representing your jobs.

Return type:

list[FallbackJob]

async iter_jobs()

Retrieve information about your jobs from the ComputeHorde.

Returns:

An async iterator of FallbackJob instances representing your jobs.

Return type:

AsyncIterator[FallbackJob]