Skip to content

[ROCm] gpt-oss: route FA3 to aiter-flash-attn, generate ROCm fixtures#46837

Draft
Abdennacer-Badaoui wants to merge 7 commits into
huggingface:mainfrom
Abdennacer-Badaoui:fix-gptoss
Draft

[ROCm] gpt-oss: route FA3 to aiter-flash-attn, generate ROCm fixtures#46837
Abdennacer-Badaoui wants to merge 7 commits into
huggingface:mainfrom
Abdennacer-Badaoui:fix-gptoss

Conversation

@Abdennacer-Badaoui

Copy link
Copy Markdown
Member

On ROCm, gpt-oss now routes the FA3-style attention to kernels-community/aiter-flash-attn (_compatible_flash_implementations) and generates separate ROCm fixtures (non-distributed + tp_size=2). Configs that depend on kernels-community/megablocks are skipped because it doesn't ship a ROCm build for torch 2.10 (the version the AMD CI runs today) we have the ones for 2.11 and 2.12, and configs that hit kernels-community/sonic-moe are skipped since that kernel has no ROCm build yet.
Fixes around 140 failing tests.

@Abdennacer-Badaoui Abdennacer-Badaoui marked this pull request as draft June 23, 2026 08:49
@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Abdennacer-Badaoui Abdennacer-Badaoui marked this pull request as ready for review June 23, 2026 09:53
@Abdennacer-Badaoui Abdennacer-Badaoui requested a review from vasqu June 23, 2026 09:53

@vasqu vasqu left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a few comments, especially hesitant on the tests side because we change a few things that would break the original cuda fixtures so gotta be a bit careful

Comment thread src/transformers/models/gpt_oss/modeling_gpt_oss.py
Comment thread tests/models/gpt_oss/test_modeling_gpt_oss.py Outdated
Comment thread tests/models/gpt_oss/test_modeling_gpt_oss.py
Comment thread tests/models/gpt_oss/test_modeling_gpt_oss.py Outdated
Comment thread tests/models/gpt_oss/test_modeling_gpt_oss.py Outdated
Comment thread tests/models/gpt_oss/test_modeling_gpt_oss.py Outdated
@github-actions

Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: gpt_oss, openai_privacy_filter

# Generate key to look up expected outputs
key = generate_config_key(quantized, model_size, kernels, attn_impl, mode)

if os.environ.get("WRITE_FIXTURES") == "1":

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debug?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes ahah, i was about to remove it , thanks

@github-actions

Copy link
Copy Markdown
Contributor

CI Dashboard: View test results in Grafana

vasqu
vasqu previously approved these changes Jun 23, 2026

@vasqu vasqu left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, merging 🤗

@vasqu vasqu added this pull request to the merge queue Jun 23, 2026
@vasqu vasqu removed this pull request from the merge queue due to a manual request Jun 23, 2026
Comment on lines +186 to +216
"device=rocm|quantized=false|model=20b|kernels=false|attn_impl=kernels-community/aiter-flash-attn|mode=eval": [
"Roses are red, violets, vi, vi, vi, vi, vi, vi, vi, vi, vi, vi",
"How are you? Tell me the name of the president of the president of the name of the president of the name of the president of the name of the president"
],
"device=rocm|quantized=false|model=20b|kernels=false|attn_impl=kernels-community/aiter-flash-attn|mode=train": [
"Roses are red, violets, vi, vi, vi, vi, vi, vi, vi, vi, vi, vi",
"How are you? Tell me the name of the president of the president of the name of the president of the name of the president of the name of the president"
],
"device=rocm|quantized=true|model=20b|kernels=false|attn_impl=kernels-community/aiter-flash-attn|mode=eval": [
"Roses are red, violets, vi, vi, vi, vi, vi, vi, vi, vi, vi, vi",
"How are you? Tell me the name of the president of the president of the name of the president of the name of the president of the name of the president"
],
"device=rocm|quantized=true|model=20b|kernels=false|attn_impl=kernels-community/aiter-flash-attn|mode=train": [
"Roses are red, violets, vi, vi, vi, vi, vi, vi, vi, vi, vi, vi",
"How are you? Tell me the name of the president of the president of the name of the president of the name of the president of the name of the president"
],
"device=rocm|quantized=false|model=120b|kernels=false|attn_impl=kernels-community/aiter-flash-attn|mode=eval": [
"Roses are red, violets red, red, red, red, red, red,,,,,,,,,",
"How are you? Tell me the name of the president of the\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
],
"device=rocm|quantized=false|model=120b|kernels=false|attn_impl=kernels-community/aiter-flash-attn|mode=train": [
"Roses are red, violets red, red, red, red, red, red,,,,,,,,,",
"How are you? Tell me the name of the president of the\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
],
"device=rocm|quantized=true|model=120b|kernels=false|attn_impl=kernels-community/aiter-flash-attn|mode=eval": [
"Roses are red, violets red, red, red, red, red, red,,,,,,,,,",
"How are you? Tell me the name of the president of the\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
],
"device=rocm|quantized=true|model=120b|kernels=false|attn_impl=kernels-community/aiter-flash-attn|mode=train": [
"Roses are red, violets red, red, red, red, red, red,,,,,,,,,",
"How are you? Tell me the name of the president of the\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok sorry to intervene again, I just noticed that we have significantly different output between FA and eager attention. this smells fishy to me - are we sure that something is not broken?

Especially these repeating outputs are weird. Is there a different naming convention for the s_aux for example?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Thanks.
I didn't check the fixtures as i was generating them directly to the file.
Let me check what's happening here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a real bug. transformers passes the attention sinks to the FA kernel under the name s_aux (what vllm-fa3 expects) or learnable_sink (what FA4 expects). Our aiter-flash-attn kernel calls the same argument sink, which transformers doesn't recognize as a sink name, so it never passes the sink tensor to the kernel. The attention then runs without sinks, which is why we see the repetitive degenerate output. I'll rename the kernel's public arg to s_aux so transformers picks it up, then regenerate the ROCm FA fixtures.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vasqu vasqu dismissed their stale review June 23, 2026 14:23

Noticed significanlty different outputs across eager vs fa

@Abdennacer-Badaoui

Copy link
Copy Markdown
Member Author

This PR allows multiple different names for the same argument (especially for sinks): #45153.
Once it's merged, our aiter-flash-attn can keep using sink as the argument name.

@Abdennacer-Badaoui Abdennacer-Badaoui marked this pull request as draft June 23, 2026 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants