Runtime environment setup

You can reproduce this deployment guide on localhost. Nevertheless this guide is a blueprint for production. All environment variables, package sources, dependencies and other settings are declared with configuration YAML files. This is the preferred approach to production settings with Ray. See Ray Production Guide.

As a consequence, this guide is a lot about managing YAML files. Kodosumi simplifies the management of these configuration files by splitting the configuration into a base and multiple app configurations.

This deployment is based on the company_news service built in development workflow.

Let’s start with creating a root directory to host our runtime environment, i.e.

mkdir ~/kodosumi cd kodosumi

Create a Python Virtual Environment with

python -m venv .venv # depends on OS setup source .venv/bin/activate

[!NOTE] The location of your system’s Python executable python might vary.

Next, install Kodosumi from the Python package index

pip install kodosumi

if instead you prefer to install the latest development trunk run

pip install “kodosumi @ git+https://github.com/masumi-network/kodosumi.git@dev

Start your Ray cluster with

ray start —head

In the previous examples you did run koco start which launches the Kodosumi spooler daemon PLUS the Kodosumi admin panel web app and API. In the current example we start the spooler and the panel seperately. We start with the spooler

koco spool

This starts the spooler in the background and as a daemon. You can review daemon status with

koco spool —status

Stop the spooler later with koco spool --stop.

The spooler automatically creates directory ./data/config to host Ray serve configuration files as specified with configuration parameter YAML_BASE. The yaml base defaults to ./data/config/config.yaml and locates the base relative to the directory where you start koco spool and koco serve.

Create file ./data/config/config.yaml with Ray serve base configuration. The following yaml configuration is a good starting point. For further details read Ray’s documentation about Serve Config Files.

# ./data/config/config.yaml
proxy_location: EveryNode
http_options:
  host: 0.0.0.0
  port: 8001
grpc_options:
  port: 9001
  grpc_servicer_functions: []
logging_config:
  encoding: TEXT
  log_level: DEBUG
  logs_dir: null
  enable_access_log: true

We will deploy the agentic-workflow-example service as a package company_news.

Create the first app configuration named company_news.yaml with the following content:

# ./data/config/company_news.yaml
name: company_news
route_prefix: /company_news
import_path: company_news.query:fast_app
runtime_env: 
  py_modules:
  - https://github.com/plan-net/agentic-workflow-example/archive/45aabddf234cf8beb7118b400e7cb567776e458a.zip
  pip:
  - openai
  env_vars:
    OTEL_SDK_DISABLED: "true"
    OPENAI_API_KEY: <-- your-api-key -->

Test and deploy your configuration set with

koco deploy —dry-run —file ./data/config/config.yaml koco deploy —run —file ./data/config/config.yaml

This will apply your base configuration from ./data/config/config.yaml and adds a key application with records from ./data/config/company_news.yaml.

With running Ray, spooler and app we now start the Kodosumi panel and register Ray deployments

koco serve —register http://localhost:8001/-/routes

See Configure Ray Serve Deployments for additional options on your deployment. Be advised to gather some experience with Ray core components before you rollout your services. Understand remote resource requirements and how to limit concurrency to avoid OOM issues

Deployment API

[!NOTE] The deployment API at /deploy and /serve is experimental.

Use kodosumi panel API to change your Ray serve deployments at runtime. The panel API ships with a simple CRUD interfacce to create, read, update and delete deployment configurations including the base configuration with config.yaml.

The following Python snippets demonstrates API usage with example service kodosumi_examples.prime.

import httpx
from pprint import pprint

# login
resp = httpx.get("http://localhost:3370/login?name=admin&password=admin")
cookies = resp.cookies

# retrieve Ray serve deployments status
resp = httpx.get("http://localhost:3370/deploy", cookies=cookies)
pprint(resp.json())

Let us first stop Ray serve and remove all existing deployments except the base configuration config.yaml before we deploy the prime service.

# retrieve active deployments
scope = httpx.get("http://localhost:3370/deploy", cookies=cookies)
for name in scope.json():
    # remove deployment
    print(name)
    resp = httpx.delete(f"http://localhost:3370/deploy/{name}", cookies=cookies)
    assert resp.status_code == 204
# stop Ray serve
resp = httpx.delete("http://localhost:3370/serve", cookies=cookies)
assert resp.status_code == 204

Verify no deployments with GET /deploy and an existing base configuration with GET /deploy/config.

# verify no deployments
resp = httpx.get("http://localhost:3370/deploy", cookies=cookies)
assert resp.json() == {}
# verify base configuration
resp = httpx.get("http://localhost:3370/deploy/config", cookies=cookies)
print(resp.content.decode())

This yields the content of the base configuration ./data/config/config.yaml, for example

proxy_location: EveryNode

http_options:
  host: 127.0.0.1
  port: 8001

grpc_options:
  port: 9001
  grpc_servicer_functions: []

logging_config:
  encoding: TEXT
  log_level: DEBUG
  logs_dir: null
  enable_access_log: true

If the base configuration does not exist and GET /deploy/config throws a 404 Not found exception, then create it with for example

base = """
proxy_location: EveryNode

http_options:
  host: 127.0.0.1
  port: 8001

grpc_options:
  port: 9001
  grpc_servicer_functions: []

logging_config:
  encoding: TEXT
  log_level: DEBUG
  logs_dir: null
  enable_access_log: true
"""
resp = httpx.post("http://localhost:3370/deploy/config", 
                  cookies=cookies,
                  content=base)
assert resp.status_code == 201

Deploy the prime service with the corresponding Ray serve configuration.

prime = """
name: prime
route_prefix: /prime
import_path: kodosumi_examples.prime.app:fast_app
runtime_env: 
  py_modules:
  - https://github.com/masumi-network/kodosumi-examples/archive/2db907d955de65bed5dde6513f6359aeb18ebff1.zip
deployments:
  - name: PrimeDistribution
    num_replicas: auto
    ray_actor_options:
      num_cpus: 0.1
"""
resp = httpx.post("http://localhost:3370/deploy/prime", 
                  cookies=cookies,
                  content=prime)
assert resp.status_code == 201

Verify the to-be deployment state of the prime service.

resp = httpx.get("http://localhost:3370/deploy", cookies=cookies)
assert resp.status_code == 200
assert resp.json() == {'prime': 'to-deploy'}

To request Ray serve to enter this state POST /serve with

resp = httpx.post("http://localhost:3370/serve", cookies=cookies, timeout=30)
assert resp.status_code == 201

Watch the timeout because the response of Ray serve might take a while.