fix(qwen3_5): add linear_attn entries to base_model_tp_plan by muhamedfazalps · Pull Request #46847 · huggingface/transformers

muhamedfazalps · 2026-06-23T13:56:29Z

Qwen3.5 uses linear_attention in ~75% of layers. base_model_tp_plan was missing linear_attn entries, causing OOM at TP>1.

Fix: add colwise_gather_output for in_proj_qkv, in_proj_z, in_proj_b, in_proj_a, out_proj.

This shards weights and all-gathers before conv1d which needs full channels.

Fixes huggingface#46846 Qwen3.5 is a hybrid model where ~75% of decoder layers use linear_attention (Qwen3_5GatedDeltaNet). The base_model_tp_plan only covered self_attn and mlp layers, leaving linear_attn weights unsharded. This caused OOM at TP>1 and RuntimeError during generate. Added colwise_gather_output entries for all linear_attn projections: - in_proj_qkv, in_proj_z, in_proj_b, in_proj_a, out_proj colwise_gather_output shards weights across ranks (fixing OOM) and all-gathers activations before the depthwise Conv1d, which requires the full channel dimension.

…lar file The check_modular_conversion CI step compares the generated configuration (from modular_qwen3_5.py) with configuration_qwen3_5.py. The modular file includes linear_attn entries in base_model_tp_plan but the configuration file was missing them, causing the consistency check to fail.

github-actions · 2026-06-23T15:58:57Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: qwen3_5

github-actions · 2026-06-23T16:05:20Z

CI Dashboard: View test results in Grafana

Rocketknight1 · 2026-06-24T10:52:34Z

cc @vasqu since you were replying in the original issue!

muhamedfazalps added 2 commits June 23, 2026 19:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(qwen3_5): add linear_attn entries to base_model_tp_plan#46847

fix(qwen3_5): add linear_attn entries to base_model_tp_plan#46847
muhamedfazalps wants to merge 2 commits into
huggingface:mainfrom
muhamedfazalps:fix/qwen3-5-tp-plan

muhamedfazalps commented Jun 23, 2026

Uh oh!

github-actions Bot commented Jun 23, 2026

Uh oh!

github-actions Bot commented Jun 23, 2026

Uh oh!

Rocketknight1 commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

muhamedfazalps commented Jun 23, 2026

Uh oh!

github-actions Bot commented Jun 23, 2026

Uh oh!

github-actions Bot commented Jun 23, 2026

Uh oh!

Rocketknight1 commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants