fix(qwen3_5): add linear_attn entries to base_model_tp_plan#46847
Open
muhamedfazalps wants to merge 2 commits into
Open
fix(qwen3_5): add linear_attn entries to base_model_tp_plan#46847muhamedfazalps wants to merge 2 commits into
muhamedfazalps wants to merge 2 commits into
Conversation
Fixes huggingface#46846 Qwen3.5 is a hybrid model where ~75% of decoder layers use linear_attention (Qwen3_5GatedDeltaNet). The base_model_tp_plan only covered self_attn and mlp layers, leaving linear_attn weights unsharded. This caused OOM at TP>1 and RuntimeError during generate. Added colwise_gather_output entries for all linear_attn projections: - in_proj_qkv, in_proj_z, in_proj_b, in_proj_a, out_proj colwise_gather_output shards weights across ranks (fixing OOM) and all-gathers activations before the depthwise Conv1d, which requires the full channel dimension.
…lar file The check_modular_conversion CI step compares the generated configuration (from modular_qwen3_5.py) with configuration_qwen3_5.py. The modular file includes linear_attn entries in base_model_tp_plan but the configuration file was missing them, causing the consistency check to fail.
Contributor
|
[For maintainers] Suggested jobs to run (before merge) run-slow: qwen3_5 |
Contributor
|
CI Dashboard: View test results in Grafana |
Member
|
cc @vasqu since you were replying in the original issue! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #46846
Qwen3.5 uses linear_attention in ~75% of layers. base_model_tp_plan was missing linear_attn entries, causing OOM at TP>1.
Fix: add colwise_gather_output for in_proj_qkv, in_proj_z, in_proj_b, in_proj_a, out_proj.
This shards weights and all-gathers before conv1d which needs full channels.