feat: add StepFun, Xiaomi, and Qwen generation drivers by bezko · Pull Request #174 · calesthio/OpenMontage

bezko · 2026-06-24T17:55:41Z

Summary

Adds 4 new generation drivers to OpenMontage, covering 3 providers with TTS, image editing, and image generation capabilities.

New Drivers

Driver	Provider	Model	Type	Env Var
`stepfun_tts`	StepFun	stepaudio-2.5-tts	TTS	`STEPFUN_API_KEY`
`stepfun_image`	StepFun	step-image-edit-2	Image editing	`STEPFUN_API_KEY`
`xiaomi_tts`	Xiaomi	mimo-v2.5-tts (+ voiceclone, voicedesign)	TTS	`XIAOMI_API_KEY`
`qwen_image`	Qwen/DashScope	wanx2.1-t2i-turbo/plus	Image gen	`DASHSCOPE_API_KEY`

Provider Review

All providers evaluated for TTS, image, and video generation:

Provider	TTS	Image	Video	Status
StepFun	stepaudio-2.5-tts	step-image-edit-2	No	Plan included, $0
Xiaomi	mimo-v2.5-tts + variants	No	No	Plan included, $0
Qwen/DashScope	No	wanx2.1-t2i	No	Free tier + paid
Kimi/Moonshot	No	No	No	Text-only
Featherless	No	No	No	Text LLMs only
OpenAI Sora	No	No	~~discontinued~~	No longer available

Key Features

StepFun TTS: Chinese/English, multilingual, $0 cost within plan
StepFun Image: Instruction-based editing (text+image input)
Xiaomi TTS: Voice cloning from reference audio, voice design from text description
Qwen Image: 10 style presets, multilingual, cost-effective

Files Changed

tools/audio/stepfun_tts.py — new
tools/audio/xiaomi_tts.py — new
tools/graphics/stepfun_image.py — new
tools/graphics/qwen_image.py — new
docs/PROVIDERS.md — updated with all providers
.env.example — added STEPFUN_API_KEY, XIAOMI_API_KEY, DASHSCOPE_API_KEY

New drivers: - stepfun_tts: StepAudio 2.5 TTS (Chinese/English, included in plan) - stepfun_image: Step Image Edit 2 (instruction-based image editing) - xiaomi_tts: MiMo V2.5 TTS with voice cloning and voice design - qwen_image: Wanx image synthesis via DashScope (turbo + plus) All providers reviewed: - StepFun: TTS + image editing (stepaudio-2.5-tts, step-image-edit-2) - Xiaomi: TTS with voice clone/design (mimo-v2.5-tts variants) - Qwen/DashScope: image generation (wanx2.1-t2i) - Kimi/Moonshot: text-only, no generation APIs - Featherless: text-only LLM gateway - Sora: no longer available Sora driver removed (API discontinued).

New drivers: - stepfun_tts: StepAudio 2.5 TTS (Chinese/English, included in plan) - stepfun_image: Step Image Edit 2 (instruction-based image editing) - xiaomi_tts: MiMo V2.5 TTS with voice cloning and voice design - qwen_image: Wanx image synthesis via DashScope (turbo + plus) All providers reviewed: - StepFun: TTS + image editing (stepaudio-2.5-tts, step-image-edit-2) - Xiaomi: TTS with voice clone/design (mimo-v2.5-tts variants) - Qwen/DashScope: image generation (wanx2.1-t2i) - Kimi/Moonshot: text-only, no generation APIs - Featherless: text-only LLM gateway - OpenAI Sora: discontinued (no driver added) Co-authored-by: Mathieu <mathieu@kwot.in>

Co-authored-by: calesthio <celesthioailabs@gmail.com>

* docs: add provider viability review guidance * fix: use --props=<path> equals form for Remotion render On Windows, passing --props and the JSON path as two separate CLI arguments causes Remotion to mis-parse the value due to platform quote escaping, failing with "neither valid JSON nor a file path to a valid JSON file". Switch to the --props=<path> equals form, which Remotion recommends for file paths and which works consistently across platforms. Fixes calesthio#172 * chore: add issue templates and a pull-request template The repo had no .github issue or PR templates, so bug reports arrived with inconsistent detail and questions landed on the tracker instead of Discussions. Add GitHub community templates: - ISSUE_TEMPLATE/bug_report.yml: structured form (OS, pipeline, runtime, repro, expected vs actual, logs) - ISSUE_TEMPLATE/feature_request.yml: problem / solution / alternatives - ISSUE_TEMPLATE/config.yml: disables blank issues and routes questions, ideas, and show-and-tell to the existing Discussions categories - PULL_REQUEST_TEMPLATE.md: summary, linked issue, testing, checklist Additive only; no source changes. Closes calesthio#189 * fix: resolve skill loading warnings and correct video-toolkit naming * ci: add GitHub Actions validation pipeline --------- Co-authored-by: calesthio <celesthioailabs@gmail.com> Co-authored-by: 0xDevNinja <manmit0x@gmail.com> Co-authored-by: Harsh Dadiya Wappnet <harsh.dadiya@wappnet.com>

bprasun366

Rrjfo jodhpur hunch hub hdgj unwritten hgssjk reduce hhsyjc. Uhgnk

bezko requested a review from calesthio as a code owner June 24, 2026 17:55

bezko force-pushed the main branch from 76935c2 to c3e276d Compare June 24, 2026 18:07

bezko changed the title ~~feat: add OpenAI Sora and Qwen image generation drivers~~ feat: add Qwen image generation driver (DashScope) Jun 24, 2026

bezko force-pushed the main branch from c3e276d to b795f92 Compare June 24, 2026 18:30

bezko changed the title ~~feat: add Qwen image generation driver (DashScope)~~ feat: add StepFun, Xiaomi, and Qwen generation drivers Jun 24, 2026

bezko and others added 3 commits June 25, 2026 12:56

docs: add provider viability review guidance (#2)

3bbf8ca

Co-authored-by: calesthio <celesthioailabs@gmail.com>

bprasun366 reviewed Jun 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add StepFun, Xiaomi, and Qwen generation drivers#174

feat: add StepFun, Xiaomi, and Qwen generation drivers#174
bezko wants to merge 4 commits into
calesthio:mainfrom
bezko:main

bezko commented Jun 24, 2026 •

edited

Loading

Uh oh!

bprasun366 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bezko commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New Drivers

Provider Review

Key Features

Files Changed

Uh oh!

bprasun366 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bezko commented Jun 24, 2026 •

edited

Loading