Skip to content

feat: add StepFun, Xiaomi, and Qwen generation drivers#174

Open
bezko wants to merge 4 commits into
calesthio:mainfrom
bezko:main
Open

feat: add StepFun, Xiaomi, and Qwen generation drivers#174
bezko wants to merge 4 commits into
calesthio:mainfrom
bezko:main

Conversation

@bezko

@bezko bezko commented Jun 24, 2026

Copy link
Copy Markdown

Summary

Adds 4 new generation drivers to OpenMontage, covering 3 providers with TTS, image editing, and image generation capabilities.

New Drivers

Driver Provider Model Type Env Var
stepfun_tts StepFun stepaudio-2.5-tts TTS STEPFUN_API_KEY
stepfun_image StepFun step-image-edit-2 Image editing STEPFUN_API_KEY
xiaomi_tts Xiaomi mimo-v2.5-tts (+ voiceclone, voicedesign) TTS XIAOMI_API_KEY
qwen_image Qwen/DashScope wanx2.1-t2i-turbo/plus Image gen DASHSCOPE_API_KEY

Provider Review

All providers evaluated for TTS, image, and video generation:

Provider TTS Image Video Status
StepFun stepaudio-2.5-tts step-image-edit-2 No Plan included, $0
Xiaomi mimo-v2.5-tts + variants No No Plan included, $0
Qwen/DashScope No wanx2.1-t2i No Free tier + paid
Kimi/Moonshot No No No Text-only
Featherless No No No Text LLMs only
OpenAI Sora No No discontinued No longer available

Key Features

StepFun TTS: Chinese/English, multilingual, $0 cost within plan
StepFun Image: Instruction-based editing (text+image input)
Xiaomi TTS: Voice cloning from reference audio, voice design from text description
Qwen Image: 10 style presets, multilingual, cost-effective

Files Changed

  • tools/audio/stepfun_tts.py — new
  • tools/audio/xiaomi_tts.py — new
  • tools/graphics/stepfun_image.py — new
  • tools/graphics/qwen_image.py — new
  • docs/PROVIDERS.md — updated with all providers
  • .env.example — added STEPFUN_API_KEY, XIAOMI_API_KEY, DASHSCOPE_API_KEY

@bezko bezko requested a review from calesthio as a code owner June 24, 2026 17:55
@bezko bezko changed the title feat: add OpenAI Sora and Qwen image generation drivers feat: add Qwen image generation driver (DashScope) Jun 24, 2026
New drivers:
- stepfun_tts: StepAudio 2.5 TTS (Chinese/English, included in plan)
- stepfun_image: Step Image Edit 2 (instruction-based image editing)
- xiaomi_tts: MiMo V2.5 TTS with voice cloning and voice design
- qwen_image: Wanx image synthesis via DashScope (turbo + plus)

All providers reviewed:
- StepFun: TTS + image editing (stepaudio-2.5-tts, step-image-edit-2)
- Xiaomi: TTS with voice clone/design (mimo-v2.5-tts variants)
- Qwen/DashScope: image generation (wanx2.1-t2i)
- Kimi/Moonshot: text-only, no generation APIs
- Featherless: text-only LLM gateway
- Sora: no longer available

Sora driver removed (API discontinued).
@bezko bezko changed the title feat: add Qwen image generation driver (DashScope) feat: add StepFun, Xiaomi, and Qwen generation drivers Jun 24, 2026
bezko and others added 3 commits June 25, 2026 12:56
New drivers:
- stepfun_tts: StepAudio 2.5 TTS (Chinese/English, included in plan)
- stepfun_image: Step Image Edit 2 (instruction-based image editing)
- xiaomi_tts: MiMo V2.5 TTS with voice cloning and voice design
- qwen_image: Wanx image synthesis via DashScope (turbo + plus)

All providers reviewed:
- StepFun: TTS + image editing (stepaudio-2.5-tts, step-image-edit-2)
- Xiaomi: TTS with voice clone/design (mimo-v2.5-tts variants)
- Qwen/DashScope: image generation (wanx2.1-t2i)
- Kimi/Moonshot: text-only, no generation APIs
- Featherless: text-only LLM gateway
- OpenAI Sora: discontinued (no driver added)

Co-authored-by: Mathieu <mathieu@kwot.in>
Co-authored-by: calesthio <celesthioailabs@gmail.com>
* docs: add provider viability review guidance

* fix: use --props=<path> equals form for Remotion render

On Windows, passing --props and the JSON path as two separate CLI
arguments causes Remotion to mis-parse the value due to platform quote
escaping, failing with "neither valid JSON nor a file path to a valid
JSON file". Switch to the --props=<path> equals form, which Remotion
recommends for file paths and which works consistently across
platforms.

Fixes calesthio#172

* chore: add issue templates and a pull-request template

The repo had no .github issue or PR templates, so bug reports arrived
with inconsistent detail and questions landed on the tracker instead of
Discussions.

Add GitHub community templates:
- ISSUE_TEMPLATE/bug_report.yml: structured form (OS, pipeline, runtime,
  repro, expected vs actual, logs)
- ISSUE_TEMPLATE/feature_request.yml: problem / solution / alternatives
- ISSUE_TEMPLATE/config.yml: disables blank issues and routes questions,
  ideas, and show-and-tell to the existing Discussions categories
- PULL_REQUEST_TEMPLATE.md: summary, linked issue, testing, checklist

Additive only; no source changes.

Closes calesthio#189

* fix: resolve skill loading warnings and correct video-toolkit naming

* ci: add GitHub Actions validation pipeline

---------

Co-authored-by: calesthio <celesthioailabs@gmail.com>
Co-authored-by: 0xDevNinja <manmit0x@gmail.com>
Co-authored-by: Harsh Dadiya Wappnet <harsh.dadiya@wappnet.com>

@bprasun366 bprasun366 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rrjfo jodhpur hunch hub hdgj unwritten hgssjk reduce hhsyjc. Uhgnk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants