Fallback
Usage
Important
In order to use the fallback functionality, please install ComputeHorde SDK with fallback enabled (see Installation).
Important
This package uses ApiVer, make sure to import compute_horde_sdk.v1.fallback.
Running Jobs on a Fallback Cloud
If ComputeHorde is not operational for some reason, you can provide a logic for running the job on a fallback cloud like RunPod.
The fallback functionality uses SkyPilot cluster management utility.
Running on Runpod
If you want to run your job on Runpod in case of any error:
import asyncio
import bittensor_wallet
from compute_horde_sdk.v1 import ComputeHordeClient, ComputeHordeJobSpec, ExecutorClass
from compute_horde_sdk.v1.fallback import FallbackClient, FallbackJobSpec
async def main():
try:
wallet = bittensor_wallet.Wallet(name="...", hotkey="...")
compute_horde_client = ComputeHordeClient(
hotkey=wallet.hotkey,
compute_horde_validator_hotkey="...", # usually the ss58_address of the hotkey above
)
# Define your job
job_spec = ComputeHordeJobSpec(
executor_class=ExecutorClass.always_on__llm__a6000,
job_namespace="SN123.0",
docker_image="my-username/my-image:latest",
)
# Run the job
job = await compute_horde_client.run_until_complete(job_spec)
except Exception:
# Create the fallback client for Runpod
fallback_client = FallbackClient("runpod", api_key=environ.get("RUNPOD_API_KEY"))
# Define your fallback job base on the ComputeHorde spec
fallback_spec = FallbackJobSpec.from_job_spec(job_spec, work_dir="/app", region="US")
# Run the fallback job
job = await fallback_client.run_until_complete(fallback_spec)
print(job.status) # should be "Completed".
asyncio.run(main())
Note
The cloud argument passed to FallbackClient is forwarded to SkyPilot. Valid values include:
"runpod", "aws", "gcp", and others supported by SkyPilot.
We officially support "runpod", but you’re welcome to try other providers.
If you do, let us know how it goes!
See the SkyPilot Cloud Setup Docs for details.