Skip to content

feat: add StepFun, Xiaomi, and Qwen generation drivers#1

Merged
bezko merged 1 commit into
mainfrom
pr-174
Jun 25, 2026
Merged

feat: add StepFun, Xiaomi, and Qwen generation drivers#1
bezko merged 1 commit into
mainfrom
pr-174

Conversation

@bezko

@bezko bezko commented Jun 25, 2026

Copy link
Copy Markdown
Owner

New drivers:

  • stepfun_tts: StepAudio 2.5 TTS (Chinese/English, included in plan)
  • stepfun_image: Step Image Edit 2 (instruction-based image editing)
  • xiaomi_tts: MiMo V2.5 TTS with voice cloning and voice design
  • qwen_image: Wanx image synthesis via DashScope (turbo + plus)

All providers reviewed:

  • StepFun: TTS + image editing (stepaudio-2.5-tts, step-image-edit-2)
  • Xiaomi: TTS with voice clone/design (mimo-v2.5-tts variants)
  • Qwen/DashScope: image generation (wanx2.1-t2i)
  • Kimi/Moonshot: text-only, no generation APIs
  • Featherless: text-only LLM gateway
  • OpenAI Sora: discontinued (no driver added)

New drivers:
- stepfun_tts: StepAudio 2.5 TTS (Chinese/English, included in plan)
- stepfun_image: Step Image Edit 2 (instruction-based image editing)
- xiaomi_tts: MiMo V2.5 TTS with voice cloning and voice design
- qwen_image: Wanx image synthesis via DashScope (turbo + plus)

All providers reviewed:
- StepFun: TTS + image editing (stepaudio-2.5-tts, step-image-edit-2)
- Xiaomi: TTS with voice clone/design (mimo-v2.5-tts variants)
- Qwen/DashScope: image generation (wanx2.1-t2i)
- Kimi/Moonshot: text-only, no generation APIs
- Featherless: text-only LLM gateway
- OpenAI Sora: discontinued (no driver added)
@bezko bezko merged commit e5e7b3e into main Jun 25, 2026
@qodo-code-review

Copy link
Copy Markdown

PR Summary by Qodo

Add StepFun, Xiaomi, and Qwen generation drivers
✨ Enhancement 📝 Documentation 🧪 Tests ⚙️ Configuration changes 🕐 40+ Minutes

Grey Divider

Description

• Add StepFun and Xiaomi TTS drivers, including voice clone/design options for Xiaomi.
• Add Qwen (DashScope) text-to-image and StepFun instruction-based image editing drivers.
• Document new providers and add contract tests plus example env vars for required API keys.
Diagram

graph TD
  A["Tool runner / Agent"] --> B(["StepFunTTS tool"]) --> E{{"StepFun API"}} --> H[("Local files")]
  A --> C(["XiaomiTTS tool"]) --> F{{"Xiaomi MiMo API"}} --> H
  A --> D(["QwenImage tool"]) --> G{{"DashScope API"}} --> H
  A --> I(["StepFunImage tool"]) --> E --> H

  subgraph Legend
    direction LR
    _runner["Caller"] ~~~ _tool(["Tool driver"]) ~~~ _ext{{"External API"}} ~~~ _fs[("Filesystem")]
  end
Loading
High-Level Assessment

The following are alternative approaches to this PR:

1. Introduce a shared OpenAI-compatible API base driver
  • ➕ Reduces duplicated request/headers/error-handling logic across StepFun/Xiaomi
  • ➕ Centralizes retry/timeouts/status-code mapping and output writing
  • ➕ Makes adding future OpenAI-compatible providers faster and more consistent
  • ➖ Requires a small refactor and careful interface design upfront
  • ➖ May complicate provider-specific payload differences (e.g., Xiaomi reference audio, DashScope async polling)
2. Use official/provider SDKs instead of raw requests
  • ➕ Potentially better long-term API compatibility and typed request models
  • ➕ May simplify auth, polling, and error decoding
  • ➖ Adds dependencies and increases packaging/installation surface area
  • ➖ SDK behavior can be opaque and harder to patch quickly
3. Standardize async job handling for generators
  • ➕ Reusable polling/backoff logic for any async image/video provider
  • ➕ More consistent timeout/cancellation semantics across tools
  • ➖ Extra abstraction may be premature if DashScope is the only async flow today

Recommendation: Current approach (self-contained tool drivers using requests) is acceptable for quickly integrating new providers with minimal dependency impact. If more OpenAI-compatible providers are expected, a shared base driver for common request/response patterns is the highest-leverage follow-up refactor; keep DashScope polling as a specialized path (or extract a generic async-job helper once a second provider needs it).

Files changed (8) +1088 / -3

Enhancement (4) +813 / -0
stepfun_tts.pyIntroduce StepFun StepAudio 2.5 TTS tool driver +176/-0

Introduce StepFun StepAudio 2.5 TTS tool driver

• Adds an API-backed TTS tool targeting StepFun's OpenAI-compatible Step Plan endpoint. Implements input schema (text/voice/model/format/speed/output_path), authentication via STEPFUN_API_KEY, and writes returned audio bytes to disk.

tools/audio/stepfun_tts.py

xiaomi_tts.pyIntroduce Xiaomi MiMo V2.5 TTS tool with voice clone/design support +211/-0

Introduce Xiaomi MiMo V2.5 TTS tool with voice clone/design support

• Adds an API-backed TTS tool for Xiaomi's token plan endpoint with model variants for base TTS, voice cloning, and voice design. Supports optional reference audio (base64-embedded) and voice description inputs, and writes audio bytes to disk.

tools/audio/xiaomi_tts.py

qwen_image.pyAdd Qwen Wanx image generation tool via DashScope (async polling) +249/-0

Add Qwen Wanx image generation tool via DashScope (async polling)

• Implements a DashScope text-to-image driver with model selection (turbo/plus), size/style presets, and multi-image outputs. Submits an async job, polls task status until completion, downloads result URLs, and saves images to output paths.

tools/graphics/qwen_image.py

stepfun_image.pyAdd StepFun instruction-based image editing tool (text + image) +177/-0

Add StepFun instruction-based image editing tool (text + image)

• Implements StepFun's image edit flow by base64-embedding a source image and submitting an edit prompt to the Step Plan endpoint. Handles URL or base64 responses and writes the edited image to the requested output path.

tools/graphics/stepfun_image.py

Tests (1) +189 / -0
test_new_generation_drivers.pyAdd contract tests for new StepFun/Xiaomi/Qwen tool drivers +189/-0

Add contract tests for new StepFun/Xiaomi/Qwen tool drivers

• Introduces a new pytest suite verifying required BaseTool fields, input schema shape, and basic execute/dry_run expectations. Adds TTS- and image-specific schema checks plus missing-API-key failure expectations.

tests/tools/test_new_generation_drivers.py

Documentation (2) +73 / -3
PROVIDERS.mdDocument Qwen, StepFun, and Xiaomi provider capabilities and setup +71/-3

Document Qwen, StepFun, and Xiaomi provider capabilities and setup

• Adds provider sections for Qwen (Wanx via DashScope), StepFun (StepAudio TTS + Step Image Edit 2), and Xiaomi (MiMo TTS variants), including env vars, models, and pricing/free-tier notes. Updates provider tables and clarifies OpenAI tool coverage (adds Sora video tool entry).

docs/PROVIDERS.md

image-provider-usage.mdAdd Qwen image provider guidance and positioning +2/-0

Add Qwen image provider guidance and positioning

• Adds qwen_image to the provider comparison table and recommends it as a budget-friendly option. Updates the selection matrix to include a Qwen-focused use case.

skills/creative/image-provider-usage.md

Other (1) +13 / -0
.env.exampleAdd API key env vars for Qwen/DashScope, StepFun, and Xiaomi +13/-0

Add API key env vars for Qwen/DashScope, StepFun, and Xiaomi

• Documents new environment variables (DASHSCOPE_API_KEY, STEPFUN_API_KEY, XIAOMI_API_KEY) required by the added drivers. Includes links/notes on where to obtain keys.

.env.example

@qodo-code-review

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (3) 📘 Rule violations (0) 📜 Skill insights (0)

Grey Divider


Action required

1. Execute signature mismatch 🐞 Bug ≡ Correctness
Description
New generation drivers define execute(self, inputs, runtime) even though BaseTool’s contract is
execute(self, inputs) and selectors invoke tools with a single positional argument, so these
providers will raise TypeError when run through tts_selector/image_selector. The accompanying
tests also reinforce the wrong arity by calling execute(..., None) and incorrectly try to unset
API keys by passing the *API key value* (or empty string) from _get_api_key() into
monkeypatch.delenv(), which may fail to remove the real env var or raise.
Code

tools/audio/stepfun_tts.py[109]

+    def execute(self, inputs: dict[str, Any], runtime: ToolRuntime) -> ToolResult:
Evidence
The framework defines a single-argument BaseTool.execute(inputs) contract and the selectors call
tool.execute(inputs) with one positional argument; however, the new tools’ `execute(self, inputs,
runtime)` signature requires an additional parameter, which will trigger a missing-argument
TypeError when invoked via the standard selector/registry path. In the tests, _get_api_key() is
used as the key for delenv(), but _get_api_key() returns the secret value (or None), so the
test can end up calling monkeypatch.delenv("") or unsetting the wrong thing, and the tests further
call execute with two arguments (..., None), baking in behavior that contradicts the base tool
contract.

tools/base_tool.py[296-301]
tools/audio/tts_selector.py[120-145]
tools/graphics/image_selector.py[196-197]
tools/audio/stepfun_tts.py[106-110]
tools/audio/xiaomi_tts.py[128-133]
tools/graphics/qwen_image.py[106-111]
tools/graphics/stepfun_image.py[94-99]
tests/tools/test_new_generation_drivers.py[136-146]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Unify the new generation drivers and their tests with the repository’s tool interface: `BaseTool` and selector orchestration expect `execute(self, inputs)` (single positional argument), but the new tools implement `execute(self, inputs, runtime)` and tests call `execute(..., None)`, causing runtime `TypeError` when selectors invoke these tools. Additionally, the tests attempt to unset API keys by passing the *API key value* (or an empty string) returned by `_get_api_key()` into `monkeypatch.delenv()`, which can fail to unset the real environment variable and/or raise.

## Issue Context
- `BaseTool.execute` is abstract and uses a single-argument signature.
- `TTSSelector` and `ImageSelector` invoke tools as `tool.execute(inputs)`.
- The four new tools currently require an extra positional `runtime` argument.
- `test_missing_api_key_returns_failure` uses `_get_api_key()` (returns the env var *value* or `None`) as the `delenv()` key, which can become `""` and does not correspond to the actual env var name.

## Fix Focus Areas
- tools/audio/stepfun_tts.py[106-110]
- tools/audio/xiaomi_tts.py[128-133]
- tools/graphics/qwen_image.py[106-111]
- tools/graphics/stepfun_image.py[94-99]
- tests/tools/test_new_generation_drivers.py[136-175]
- tools/base_tool.py[296-301]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Status ignores API key 🐞 Bug ☼ Reliability
Description
The new API tools leave dependencies = [] and do not override get_status(), so they will be
reported AVAILABLE even when their required API keys are missing. This can cause ToolRegistry menus
and provider selection to route to these tools and then immediately fail at execution time.
Code

tools/graphics/qwen_image.py[R37-42]

+    dependencies = []
+    install_instructions = (
+        "Set DASHSCOPE_API_KEY to your Alibaba Cloud DashScope API key.\n"
+        "  export DASHSCOPE_API_KEY=your_key_here\n"
+        "Get a key at https://dashscope.aliyun.com/"
+    )
Evidence
BaseTool.get_status() only checks dependencies, and registry menus categorize providers based on
get_status(). The new tools have empty dependencies and no get_status() override, unlike
existing API tools that explicitly return UNAVAILABLE without keys.

tools/base_tool.py[199-224]
tools/tool_registry.py[270-303]
tools/graphics/openai_image.py[97-101]
tools/graphics/qwen_image.py[37-42]
tools/audio/stepfun_tts.py[41-46]
tools/audio/xiaomi_tts.py[42-47]
tools/graphics/stepfun_image.py[43-48]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
New API-backed tools currently report AVAILABLE even when their required API key env vars are not set, because they neither declare env dependencies nor override `get_status()`.

## Issue Context
- `ToolRegistry.provider_menu()` and selectors use `tool.get_status()` to decide what’s configured/available.
- `BaseTool.get_status()` only checks `dependencies` via `check_dependencies()`.

## Fix Focus Areas
- tools/audio/stepfun_tts.py[41-46]
- tools/audio/xiaomi_tts.py[42-47]
- tools/graphics/qwen_image.py[37-42]
- tools/graphics/stepfun_image.py[43-48]

## Implementation notes
Pick one consistent pattern used in the repo:
1) Override `get_status()` in each tool to return UNAVAILABLE when the corresponding env var is missing (see `OpenAIImage.get_status()`), OR
2) Declare env dependencies, e.g. `dependencies = ["env:STEPFUN_API_KEY"]`, `dependencies = ["env:XIAOMI_API_KEY"]`, `dependencies = ["env:DASHSCOPE_API_KEY"]`.

Either approach will make registry menus and provider selection reflect reality.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

3. Nonstandard output contract 🐞 Bug ⚙ Maintainability
Description
The new drivers write output files but don’t populate ToolResult.artifacts and return nonstandard
keys (output_path/output_paths) instead of the repo’s common data['output'] + optional
data['outputs']. Downstream code/tests that assume the standard contract will not find the
produced artifacts.
Code

tools/graphics/qwen_image.py[R238-249]

+        return ToolResult(
+            success=True,
+            data={
+                "provider": "qwen-dashscope",
+                "model": model,
+                "prompt": prompt,
+                "output_paths": saved_paths,
+                "style": style,
+                "size": size,
+                "n": len(saved_paths),
+            },
+        )
Evidence
Other providers in this repo consistently return data['output'] and populate artifacts, while
the new drivers return output_path/output_paths without artifacts.

tools/audio/openai_tts.py[151-163]
tools/graphics/grok_image.py[280-292]
tools/audio/stepfun_tts.py[166-176]
tools/audio/xiaomi_tts.py[201-211]
tools/graphics/qwen_image.py[238-249]
tools/graphics/stepfun_image.py[168-177]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
New drivers return result payloads that deviate from existing provider conventions: they omit `ToolResult.artifacts` and don’t provide `data['output']` (primary output path). This can break callers that rely on consistent fields.

## Issue Context
Existing image/TTS providers typically:
- set `data['output']` to the primary file path
- optionally set `data['outputs']` to a list
- set `artifacts=[...]` to produced file paths

## Fix Focus Areas
- tools/audio/stepfun_tts.py[166-176]
- tools/audio/xiaomi_tts.py[201-211]
- tools/graphics/qwen_image.py[238-249]
- tools/graphics/stepfun_image.py[168-177]

## Implementation notes
- For TTS tools: add `data['output']=str(out)` and `artifacts=[str(out)]` (you can keep `output_path` too for backwards-compat).
- For QwenImage: set `data['output']=saved_paths[0]`, `data['outputs']=saved_paths`, and `artifacts=saved_paths`.
- For StepFunImage: set `data['output']=str(out)` and `artifacts=[str(out)]` (keep `output_path` if desired).
- Consider also setting `model=...` on ToolResult for consistency with other providers.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

def _get_api_key(self) -> str | None:
return os.environ.get("STEPFUN_API_KEY")

def execute(self, inputs: dict[str, Any], runtime: ToolRuntime) -> ToolResult:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Execute signature mismatch 🐞 Bug ≡ Correctness

New generation drivers define execute(self, inputs, runtime) even though BaseTool’s contract is
execute(self, inputs) and selectors invoke tools with a single positional argument, so these
providers will raise TypeError when run through tts_selector/image_selector. The accompanying
tests also reinforce the wrong arity by calling execute(..., None) and incorrectly try to unset
API keys by passing the *API key value* (or empty string) from _get_api_key() into
monkeypatch.delenv(), which may fail to remove the real env var or raise.
Agent Prompt
## Issue description
Unify the new generation drivers and their tests with the repository’s tool interface: `BaseTool` and selector orchestration expect `execute(self, inputs)` (single positional argument), but the new tools implement `execute(self, inputs, runtime)` and tests call `execute(..., None)`, causing runtime `TypeError` when selectors invoke these tools. Additionally, the tests attempt to unset API keys by passing the *API key value* (or an empty string) returned by `_get_api_key()` into `monkeypatch.delenv()`, which can fail to unset the real environment variable and/or raise.

## Issue Context
- `BaseTool.execute` is abstract and uses a single-argument signature.
- `TTSSelector` and `ImageSelector` invoke tools as `tool.execute(inputs)`.
- The four new tools currently require an extra positional `runtime` argument.
- `test_missing_api_key_returns_failure` uses `_get_api_key()` (returns the env var *value* or `None`) as the `delenv()` key, which can become `""` and does not correspond to the actual env var name.

## Fix Focus Areas
- tools/audio/stepfun_tts.py[106-110]
- tools/audio/xiaomi_tts.py[128-133]
- tools/graphics/qwen_image.py[106-111]
- tools/graphics/stepfun_image.py[94-99]
- tests/tools/test_new_generation_drivers.py[136-175]
- tools/base_tool.py[296-301]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +37 to +42
dependencies = []
install_instructions = (
"Set DASHSCOPE_API_KEY to your Alibaba Cloud DashScope API key.\n"
" export DASHSCOPE_API_KEY=your_key_here\n"
"Get a key at https://dashscope.aliyun.com/"
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

2. Status ignores api key 🐞 Bug ☼ Reliability

The new API tools leave dependencies = [] and do not override get_status(), so they will be
reported AVAILABLE even when their required API keys are missing. This can cause ToolRegistry menus
and provider selection to route to these tools and then immediately fail at execution time.
Agent Prompt
## Issue description
New API-backed tools currently report AVAILABLE even when their required API key env vars are not set, because they neither declare env dependencies nor override `get_status()`.

## Issue Context
- `ToolRegistry.provider_menu()` and selectors use `tool.get_status()` to decide what’s configured/available.
- `BaseTool.get_status()` only checks `dependencies` via `check_dependencies()`.

## Fix Focus Areas
- tools/audio/stepfun_tts.py[41-46]
- tools/audio/xiaomi_tts.py[42-47]
- tools/graphics/qwen_image.py[37-42]
- tools/graphics/stepfun_image.py[43-48]

## Implementation notes
Pick one consistent pattern used in the repo:
1) Override `get_status()` in each tool to return UNAVAILABLE when the corresponding env var is missing (see `OpenAIImage.get_status()`), OR
2) Declare env dependencies, e.g. `dependencies = ["env:STEPFUN_API_KEY"]`, `dependencies = ["env:XIAOMI_API_KEY"]`, `dependencies = ["env:DASHSCOPE_API_KEY"]`.

Either approach will make registry menus and provider selection reflect reality.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant