feat: add StepFun, Xiaomi, and Qwen generation drivers by bezko · Pull Request #1 · bezko/OpenMontage

bezko · 2026-06-25T16:56:22Z

New drivers:

stepfun_tts: StepAudio 2.5 TTS (Chinese/English, included in plan)
stepfun_image: Step Image Edit 2 (instruction-based image editing)
xiaomi_tts: MiMo V2.5 TTS with voice cloning and voice design
qwen_image: Wanx image synthesis via DashScope (turbo + plus)

All providers reviewed:

StepFun: TTS + image editing (stepaudio-2.5-tts, step-image-edit-2)
Xiaomi: TTS with voice clone/design (mimo-v2.5-tts variants)
Qwen/DashScope: image generation (wanx2.1-t2i)
Kimi/Moonshot: text-only, no generation APIs
Featherless: text-only LLM gateway
OpenAI Sora: discontinued (no driver added)

New drivers: - stepfun_tts: StepAudio 2.5 TTS (Chinese/English, included in plan) - stepfun_image: Step Image Edit 2 (instruction-based image editing) - xiaomi_tts: MiMo V2.5 TTS with voice cloning and voice design - qwen_image: Wanx image synthesis via DashScope (turbo + plus) All providers reviewed: - StepFun: TTS + image editing (stepaudio-2.5-tts, step-image-edit-2) - Xiaomi: TTS with voice clone/design (mimo-v2.5-tts variants) - Qwen/DashScope: image generation (wanx2.1-t2i) - Kimi/Moonshot: text-only, no generation APIs - Featherless: text-only LLM gateway - OpenAI Sora: discontinued (no driver added)

qodo-code-review · 2026-06-25T16:57:24Z

PR Summary by Qodo

Add StepFun, Xiaomi, and Qwen generation drivers
✨ Enhancement 📝 Documentation 🧪 Tests ⚙️ Configuration changes 🕐 40+ Minutes

Description

• Add StepFun and Xiaomi TTS drivers, including voice clone/design options for Xiaomi.
• Add Qwen (DashScope) text-to-image and StepFun instruction-based image editing drivers.
• Document new providers and add contract tests plus example env vars for required API keys.

Diagram

graph TD
  A["Tool runner / Agent"] --> B(["StepFunTTS tool"]) --> E{{"StepFun API"}} --> H[("Local files")]
  A --> C(["XiaomiTTS tool"]) --> F{{"Xiaomi MiMo API"}} --> H
  A --> D(["QwenImage tool"]) --> G{{"DashScope API"}} --> H
  A --> I(["StepFunImage tool"]) --> E --> H

  subgraph Legend
    direction LR
    _runner["Caller"] ~~~ _tool(["Tool driver"]) ~~~ _ext{{"External API"}} ~~~ _fs[("Filesystem")]
  end

High-Level Assessment

The following are alternative approaches to this PR:

1. Introduce a shared OpenAI-compatible API base driver

➕ Reduces duplicated request/headers/error-handling logic across StepFun/Xiaomi
➕ Centralizes retry/timeouts/status-code mapping and output writing
➕ Makes adding future OpenAI-compatible providers faster and more consistent
➖ Requires a small refactor and careful interface design upfront
➖ May complicate provider-specific payload differences (e.g., Xiaomi reference audio, DashScope async polling)

2. Use official/provider SDKs instead of raw requests

➕ Potentially better long-term API compatibility and typed request models
➕ May simplify auth, polling, and error decoding
➖ Adds dependencies and increases packaging/installation surface area
➖ SDK behavior can be opaque and harder to patch quickly

3. Standardize async job handling for generators

➕ Reusable polling/backoff logic for any async image/video provider
➕ More consistent timeout/cancellation semantics across tools
➖ Extra abstraction may be premature if DashScope is the only async flow today

Recommendation: Current approach (self-contained tool drivers using requests) is acceptable for quickly integrating new providers with minimal dependency impact. If more OpenAI-compatible providers are expected, a shared base driver for common request/response patterns is the highest-leverage follow-up refactor; keep DashScope polling as a specialized path (or extract a generic async-job helper once a second provider needs it).

Files changed (8) +1088 / -3

Enhancement (4) +813 / -0

stepfun_tts.pyIntroduce StepFun StepAudio 2.5 TTS tool driver +176/-0

Introduce StepFun StepAudio 2.5 TTS tool driver

• Adds an API-backed TTS tool targeting StepFun's OpenAI-compatible Step Plan endpoint. Implements input schema (text/voice/model/format/speed/output_path), authentication via STEPFUN_API_KEY, and writes returned audio bytes to disk.

tools/audio/stepfun_tts.py

xiaomi_tts.pyIntroduce Xiaomi MiMo V2.5 TTS tool with voice clone/design support +211/-0

Introduce Xiaomi MiMo V2.5 TTS tool with voice clone/design support

• Adds an API-backed TTS tool for Xiaomi's token plan endpoint with model variants for base TTS, voice cloning, and voice design. Supports optional reference audio (base64-embedded) and voice description inputs, and writes audio bytes to disk.

tools/audio/xiaomi_tts.py

qwen_image.pyAdd Qwen Wanx image generation tool via DashScope (async polling) +249/-0

Add Qwen Wanx image generation tool via DashScope (async polling)

• Implements a DashScope text-to-image driver with model selection (turbo/plus), size/style presets, and multi-image outputs. Submits an async job, polls task status until completion, downloads result URLs, and saves images to output paths.

tools/graphics/qwen_image.py

stepfun_image.pyAdd StepFun instruction-based image editing tool (text + image) +177/-0

Add StepFun instruction-based image editing tool (text + image)

• Implements StepFun's image edit flow by base64-embedding a source image and submitting an edit prompt to the Step Plan endpoint. Handles URL or base64 responses and writes the edited image to the requested output path.

tools/graphics/stepfun_image.py

Tests (1) +189 / -0

test_new_generation_drivers.pyAdd contract tests for new StepFun/Xiaomi/Qwen tool drivers +189/-0

Add contract tests for new StepFun/Xiaomi/Qwen tool drivers
• Introduces a new pytest suite verifying required BaseTool fields, input schema shape, and basic execute/dry_run expectations. Adds TTS- and image-specific schema checks plus missing-API-key failure expectations.
tests/tools/test_new_generation_drivers.py

Documentation (2) +73 / -3

PROVIDERS.mdDocument Qwen, StepFun, and Xiaomi provider capabilities and setup +71/-3

Document Qwen, StepFun, and Xiaomi provider capabilities and setup

• Adds provider sections for Qwen (Wanx via DashScope), StepFun (StepAudio TTS + Step Image Edit 2), and Xiaomi (MiMo TTS variants), including env vars, models, and pricing/free-tier notes. Updates provider tables and clarifies OpenAI tool coverage (adds Sora video tool entry).

docs/PROVIDERS.md

image-provider-usage.mdAdd Qwen image provider guidance and positioning +2/-0

Add Qwen image provider guidance and positioning
• Adds qwen_image to the provider comparison table and recommends it as a budget-friendly option. Updates the selection matrix to include a Qwen-focused use case.
skills/creative/image-provider-usage.md

Other (1) +13 / -0

.env.exampleAdd API key env vars for Qwen/DashScope, StepFun, and Xiaomi +13/-0

Add API key env vars for Qwen/DashScope, StepFun, and Xiaomi

• Documents new environment variables (DASHSCOPE_API_KEY, STEPFUN_API_KEY, XIAOMI_API_KEY) required by the added drivers. Includes links/notes on where to obtain keys.

.env.example

qodo-code-review · 2026-06-25T17:03:38Z

Code Review by Qodo

🐞 Bugs (3) 📘 Rule violations (0) 📜 Skill insights (0)

1. Execute signature mismatch 🐞 Bug ≡ Correctness

Description

New generation drivers define execute(self, inputs, runtime) even though BaseTool’s contract is
execute(self, inputs) and selectors invoke tools with a single positional argument, so these
providers will raise TypeError when run through tts_selector/image_selector. The accompanying
tests also reinforce the wrong arity by calling execute(..., None) and incorrectly try to unset
API keys by passing the *API key value* (or empty string) from _get_api_key() into
monkeypatch.delenv(), which may fail to remove the real env var or raise.

Code

tools/audio/stepfun_tts.py[109]

+    def execute(self, inputs: dict[str, Any], runtime: ToolRuntime) -> ToolResult:

Evidence

The framework defines a single-argument BaseTool.execute(inputs) contract and the selectors call
tool.execute(inputs) with one positional argument; however, the new tools’ `execute(self, inputs,
runtime)` signature requires an additional parameter, which will trigger a missing-argument
TypeError when invoked via the standard selector/registry path. In the tests, _get_api_key() is
used as the key for delenv(), but _get_api_key() returns the secret value (or None), so the
test can end up calling monkeypatch.delenv("") or unsetting the wrong thing, and the tests further
call execute with two arguments (..., None), baking in behavior that contradicts the base tool
contract.

tools/base_tool.py[296-301]
tools/audio/tts_selector.py[120-145]
tools/graphics/image_selector.py[196-197]
tools/audio/stepfun_tts.py[106-110]
tools/audio/xiaomi_tts.py[128-133]
tools/graphics/qwen_image.py[106-111]
tools/graphics/stepfun_image.py[94-99]
tests/tools/test_new_generation_drivers.py[136-146]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Unify the new generation drivers and their tests with the repository’s tool interface: `BaseTool` and selector orchestration expect `execute(self, inputs)` (single positional argument), but the new tools implement `execute(self, inputs, runtime)` and tests call `execute(..., None)`, causing runtime `TypeError` when selectors invoke these tools. Additionally, the tests attempt to unset API keys by passing the *API key value* (or an empty string) returned by `_get_api_key()` into `monkeypatch.delenv()`, which can fail to unset the real environment variable and/or raise.

## Issue Context
- `BaseTool.execute` is abstract and uses a single-argument signature.
- `TTSSelector` and `ImageSelector` invoke tools as `tool.execute(inputs)`.
- The four new tools currently require an extra positional `runtime` argument.
- `test_missing_api_key_returns_failure` uses `_get_api_key()` (returns the env var *value* or `None`) as the `delenv()` key, which can become `""` and does not correspond to the actual env var name.

## Fix Focus Areas
- tools/audio/stepfun_tts.py[106-110]
- tools/audio/xiaomi_tts.py[128-133]
- tools/graphics/qwen_image.py[106-111]
- tools/graphics/stepfun_image.py[94-99]
- tests/tools/test_new_generation_drivers.py[136-175]
- tools/base_tool.py[296-301]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. Status ignores API key 🐞 Bug ☼ Reliability

Description

The new API tools leave dependencies = [] and do not override get_status(), so they will be
reported AVAILABLE even when their required API keys are missing. This can cause ToolRegistry menus
and provider selection to route to these tools and then immediately fail at execution time.

Code

tools/graphics/qwen_image.py[R37-42]

+    dependencies = []
+    install_instructions = (
+        "Set DASHSCOPE_API_KEY to your Alibaba Cloud DashScope API key.\n"
+        "  export DASHSCOPE_API_KEY=your_key_here\n"
+        "Get a key at https://dashscope.aliyun.com/"
+    )

Evidence
BaseTool.get_status() only checks dependencies, and registry menus categorize providers based on
get_status(). The new tools have empty dependencies and no get_status() override, unlike
existing API tools that explicitly return UNAVAILABLE without keys.
tools/base_tool.py[199-224]
tools/tool_registry.py[270-303]
tools/graphics/openai_image.py[97-101]
tools/graphics/qwen_image.py[37-42]
tools/audio/stepfun_tts.py[41-46]
tools/audio/xiaomi_tts.py[42-47]
tools/graphics/stepfun_image.py[43-48]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
New API-backed tools currently report AVAILABLE even when their required API key env vars are not set, because they neither declare env dependencies nor override `get_status()`.

## Issue Context
- `ToolRegistry.provider_menu()` and selectors use `tool.get_status()` to decide what’s configured/available.
- `BaseTool.get_status()` only checks `dependencies` via `check_dependencies()`.

## Fix Focus Areas
- tools/audio/stepfun_tts.py[41-46]
- tools/audio/xiaomi_tts.py[42-47]
- tools/graphics/qwen_image.py[37-42]
- tools/graphics/stepfun_image.py[43-48]

## Implementation notes
Pick one consistent pattern used in the repo:
1) Override `get_status()` in each tool to return UNAVAILABLE when the corresponding env var is missing (see `OpenAIImage.get_status()`), OR
2) Declare env dependencies, e.g. `dependencies = ["env:STEPFUN_API_KEY"]`, `dependencies = ["env:XIAOMI_API_KEY"]`, `dependencies = ["env:DASHSCOPE_API_KEY"]`.

Either approach will make registry menus and provider selection reflect reality.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. Nonstandard output contract 🐞 Bug ⚙ Maintainability

Description

The new drivers write output files but don’t populate ToolResult.artifacts and return nonstandard
keys (output_path/output_paths) instead of the repo’s common data['output'] + optional
data['outputs']. Downstream code/tests that assume the standard contract will not find the
produced artifacts.

Code

tools/graphics/qwen_image.py[R238-249]

+        return ToolResult(
+            success=True,
+            data={
+                "provider": "qwen-dashscope",
+                "model": model,
+                "prompt": prompt,
+                "output_paths": saved_paths,
+                "style": style,
+                "size": size,
+                "n": len(saved_paths),
+            },
+        )

Evidence
Other providers in this repo consistently return data['output'] and populate artifacts, while
the new drivers return output_path/output_paths without artifacts.
tools/audio/openai_tts.py[151-163]
tools/graphics/grok_image.py[280-292]
tools/audio/stepfun_tts.py[166-176]
tools/audio/xiaomi_tts.py[201-211]
tools/graphics/qwen_image.py[238-249]
tools/graphics/stepfun_image.py[168-177]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
New drivers return result payloads that deviate from existing provider conventions: they omit `ToolResult.artifacts` and don’t provide `data['output']` (primary output path). This can break callers that rely on consistent fields.

## Issue Context
Existing image/TTS providers typically:
- set `data['output']` to the primary file path
- optionally set `data['outputs']` to a list
- set `artifacts=[...]` to produced file paths

## Fix Focus Areas
- tools/audio/stepfun_tts.py[166-176]
- tools/audio/xiaomi_tts.py[201-211]
- tools/graphics/qwen_image.py[238-249]
- tools/graphics/stepfun_image.py[168-177]

## Implementation notes
- For TTS tools: add `data['output']=str(out)` and `artifacts=[str(out)]` (you can keep `output_path` too for backwards-compat).
- For QwenImage: set `data['output']=saved_paths[0]`, `data['outputs']=saved_paths`, and `artifacts=saved_paths`.
- For StepFunImage: set `data['output']=str(out)` and `artifacts=[str(out)]` (keep `output_path` if desired).
- Consider also setting `model=...` on ToolResult for consistency with other providers.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

qodo-code-review · 2026-06-25T17:03:39Z

+    def _get_api_key(self) -> str | None:
+        return os.environ.get("STEPFUN_API_KEY")
+
+    def execute(self, inputs: dict[str, Any], runtime: ToolRuntime) -> ToolResult:


1. Execute signature mismatch 🐞 Bug ≡ Correctness

New generation drivers define execute(self, inputs, runtime) even though BaseTool’s contract is execute(self, inputs) and selectors invoke tools with a single positional argument, so these providers will raise TypeError when run through tts_selector/image_selector. The accompanying tests also reinforce the wrong arity by calling execute(..., None) and incorrectly try to unset API keys by passing the *API key value* (or empty string) from _get_api_key() into monkeypatch.delenv(), which may fail to remove the real env var or raise.

Agent Prompt

## Issue description Unify the new generation drivers and their tests with the repository’s tool interface: `BaseTool` and selector orchestration expect `execute(self, inputs)` (single positional argument), but the new tools implement `execute(self, inputs, runtime)` and tests call `execute(..., None)`, causing runtime `TypeError` when selectors invoke these tools. Additionally, the tests attempt to unset API keys by passing the *API key value* (or an empty string) returned by `_get_api_key()` into `monkeypatch.delenv()`, which can fail to unset the real environment variable and/or raise. ## Issue Context - `BaseTool.execute` is abstract and uses a single-argument signature. - `TTSSelector` and `ImageSelector` invoke tools as `tool.execute(inputs)`. - The four new tools currently require an extra positional `runtime` argument. - `test_missing_api_key_returns_failure` uses `_get_api_key()` (returns the env var *value* or `None`) as the `delenv()` key, which can become `""` and does not correspond to the actual env var name. ## Fix Focus Areas - tools/audio/stepfun_tts.py[106-110] - tools/audio/xiaomi_tts.py[128-133] - tools/graphics/qwen_image.py[106-111] - tools/graphics/stepfun_image.py[94-99] - tests/tools/test_new_generation_drivers.py[136-175] - tools/base_tool.py[296-301]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

qodo-code-review · 2026-06-25T17:03:39Z

+    dependencies = []
+    install_instructions = (
+        "Set DASHSCOPE_API_KEY to your Alibaba Cloud DashScope API key.\n"
+        "  export DASHSCOPE_API_KEY=your_key_here\n"
+        "Get a key at https://dashscope.aliyun.com/"
+    )


2. Status ignores api key 🐞 Bug ☼ Reliability

The new API tools leave dependencies = [] and do not override get_status(), so they will be reported AVAILABLE even when their required API keys are missing. This can cause ToolRegistry menus and provider selection to route to these tools and then immediately fail at execution time.

Agent Prompt

## Issue description New API-backed tools currently report AVAILABLE even when their required API key env vars are not set, because they neither declare env dependencies nor override `get_status()`. ## Issue Context - `ToolRegistry.provider_menu()` and selectors use `tool.get_status()` to decide what’s configured/available. - `BaseTool.get_status()` only checks `dependencies` via `check_dependencies()`. ## Fix Focus Areas - tools/audio/stepfun_tts.py[41-46] - tools/audio/xiaomi_tts.py[42-47] - tools/graphics/qwen_image.py[37-42] - tools/graphics/stepfun_image.py[43-48] ## Implementation notes Pick one consistent pattern used in the repo: 1) Override `get_status()` in each tool to return UNAVAILABLE when the corresponding env var is missing (see `OpenAIImage.get_status()`), OR 2) Declare env dependencies, e.g. `dependencies = ["env:STEPFUN_API_KEY"]`, `dependencies = ["env:XIAOMI_API_KEY"]`, `dependencies = ["env:DASHSCOPE_API_KEY"]`. Either approach will make registry menus and provider selection reflect reality.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

bezko merged commit e5e7b3e into main Jun 25, 2026

qodo-code-review Bot reviewed Jun 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add StepFun, Xiaomi, and Qwen generation drivers#1

feat: add StepFun, Xiaomi, and Qwen generation drivers#1
bezko merged 1 commit into
mainfrom
pr-174

bezko commented Jun 25, 2026

Uh oh!

qodo-code-review Bot commented Jun 25, 2026

Uh oh!

qodo-code-review Bot commented Jun 25, 2026

Uh oh!

qodo-code-review Bot Jun 25, 2026

Uh oh!

qodo-code-review Bot Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bezko commented Jun 25, 2026

Uh oh!

qodo-code-review Bot commented Jun 25, 2026

PR Summary by Qodo

Uh oh!

qodo-code-review Bot commented Jun 25, 2026

Code Review by Qodo

Uh oh!

qodo-code-review Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

qodo-code-review Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant