Rime Partner Service is available from CLI version 1.39.0 and greater
The Rime Partner Service is in beta. It’s available to all users and ready for
production workloads, but expect occasional rough edges while the integration
matures. Reach out to support if you hit any
issues.
Cerebrium’s partnership with Rime enables text-to-speech (TTS) deployment with low latency and region selection for data privacy compliance.
Setup
-
Create a Rime account and get an API key. Add the key as a secret in Cerebrium with the name “RIME_API_KEY”.
-
Create a Cerebrium app with the CLI:
- Rime services use a simplified TOML configuration with the
[cerebrium.runtime.rime] section. Create a cerebrium.toml file with the following:
[cerebrium.deployment]
name = "rime"
disable_auth = true
[cerebrium.runtime.rime]
port = 8001
# model_name = "arcana" # Optional: specify a Rime model (e.g. "arcana", "mist", "mistv2")
# language = "en" # Optional: specify language code (e.g. "en", "es")
[cerebrium.hardware]
cpu = 4
memory = 30
compute = "AMPERE_A10"
gpu_count = 1
[cerebrium.scaling]
min_replicas = 1
max_replicas = 2
cooldown = 120
replica_concurrency = 50
Disable auth because the Rime API key in the header handles authentication.
The Rime Server validates the API key directly.
- Run
cerebrium deploy to deploy the Rime service - the output of which should appear as follows:
App Dashboard: https://dashboard.cerebrium.ai/projects/p-xxxxxxxx/apps/p-xxxxxxxx-rime
- Send requests to the HTTP Rime service using the deployment URL from the output:
curl --location 'https://api.aws.us-east-1.cerebrium.ai/v4/p-xxxxxxxx/rime' \
--header 'Authorization: Bearer <RIME_API_KEY>' \
--header 'Content-Type: application/json' \
--header 'Accept: audio/pcm' \
--data '{
"text": "I would love to have a conversation with you.",
"speaker": "joy",
"modelId": "mist"
}'
For Websockets, send the following
wss://api.aws.us-east-1.cerebrium.ai/v4/p-xxxxxx/rime/ws2?audioFormat=mp3&speaker=cove&modelId=mistv2&phonemizeBetweenBrackets=true
Authorization Bearer <RIME_API_KEY>
#With a message like:
{"text": "This "},
{"text": "is "},
{"text": "a "},
{"text": "test against the "},
{"text": "websockets endpoint of the "},
{"text": "api image. "},
{"operation": "flush"},
{"text": "This "},
{"text": "is "},
{"text": "an "},
{"text": "incomplete "},
{"text": "phrase "},
{"operation": "eos"}
Runtime Configuration
The [cerebrium.runtime.rime] section supports the following parameters:
| Option | Type | Default | Description |
|---|
port | integer | required | Port the Rime server listens on. Typically 8001. |
model_name | string | — | Rime model to load (e.g. "arcana", "mist", "mistv2"). Defaults to Rime’s server default if not set. |
language | string | — | Language code for the model (e.g. "en", "es"). Defaults to Rime’s server default if not set. |
Example with optional parameters:
[cerebrium.runtime.rime]
port = 8001
model_name = "arcana"
language = "en"
Scaling and Concurrency
Rime services support independent scaling configurations:
- min_replicas: Minimum instances to maintain (0 for scale-to-zero). Recommended: 1.
- max_replicas: Maximum instances during high load.
- replica_concurrency: Concurrent requests per instance. Recommended: 3.
- cooldown: Time window (in seconds) that must pass at reduced concurrency before scaling down. Recommended: 50.
- compute: Instance type. Recommended:
AMPERE_A10.
Adjust these parameters based on traffic patterns and latency requirements. Consult the Rime team
for concurrency and scalability guidance.
For further documentation on Rime, see the Rime documentation.