[ROCm] gpt-oss: route FA3 to aiter-flash-attn, generate ROCm fixtures by Abdennacer-Badaoui · Pull Request #46837 · huggingface/transformers

Abdennacer-Badaoui · 2026-06-23T08:42:59Z

On ROCm, gpt-oss now routes the FA3-style attention to kernels-community/aiter-flash-attn (_compatible_flash_implementations) and generates separate ROCm fixtures (non-distributed + tp_size=2). Configs that depend on kernels-community/megablocks are skipped because it doesn't ship a ROCm build for torch 2.10 (the version the AMD CI runs today) we have the ones for 2.11 and 2.12, and configs that hit kernels-community/sonic-moe are skipped since that kernel has no ROCm build yet.
Fixes around 140 failing tests.

HuggingFaceDocBuilderDev · 2026-06-23T08:57:56Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

vasqu

Have a few comments, especially hesitant on the tests side because we change a few things that would break the original cuda fixtures so gotta be a bit careful

github-actions · 2026-06-23T13:28:48Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: gpt_oss, openai_privacy_filter

vasqu · 2026-06-23T13:45:38Z

        # Generate key to look up expected outputs
        key = generate_config_key(quantized, model_size, kernels, attn_impl, mode)

+        if os.environ.get("WRITE_FIXTURES") == "1":


Yes ahah, i was about to remove it , thanks

github-actions · 2026-06-23T14:02:32Z

CI Dashboard: View test results in Grafana

vasqu

Thanks, merging 🤗

vasqu · 2026-06-23T14:22:58Z

+  "device=rocm|quantized=false|model=20b|kernels=false|attn_impl=kernels-community/aiter-flash-attn|mode=eval": [
+    "Roses are red, violets, vi, vi, vi, vi, vi, vi, vi, vi, vi, vi",
+    "How are you? Tell me the name of the president of the president of the name of the president of the name of the president of the name of the president"
+  ],
+  "device=rocm|quantized=false|model=20b|kernels=false|attn_impl=kernels-community/aiter-flash-attn|mode=train": [
+    "Roses are red, violets, vi, vi, vi, vi, vi, vi, vi, vi, vi, vi",
+    "How are you? Tell me the name of the president of the president of the name of the president of the name of the president of the name of the president"
+  ],
+  "device=rocm|quantized=true|model=20b|kernels=false|attn_impl=kernels-community/aiter-flash-attn|mode=eval": [
+    "Roses are red, violets, vi, vi, vi, vi, vi, vi, vi, vi, vi, vi",
+    "How are you? Tell me the name of the president of the president of the name of the president of the name of the president of the name of the president"
+  ],
+  "device=rocm|quantized=true|model=20b|kernels=false|attn_impl=kernels-community/aiter-flash-attn|mode=train": [
+    "Roses are red, violets, vi, vi, vi, vi, vi, vi, vi, vi, vi, vi",
+    "How are you? Tell me the name of the president of the president of the name of the president of the name of the president of the name of the president"
+  ],
+  "device=rocm|quantized=false|model=120b|kernels=false|attn_impl=kernels-community/aiter-flash-attn|mode=eval": [
+    "Roses are red, violets red, red, red, red, red, red,,,,,,,,,",
+    "How are you? Tell me the name of the president of the\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
+  ],
+  "device=rocm|quantized=false|model=120b|kernels=false|attn_impl=kernels-community/aiter-flash-attn|mode=train": [
+    "Roses are red, violets red, red, red, red, red, red,,,,,,,,,",
+    "How are you? Tell me the name of the president of the\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
+  ],
+  "device=rocm|quantized=true|model=120b|kernels=false|attn_impl=kernels-community/aiter-flash-attn|mode=eval": [
+    "Roses are red, violets red, red, red, red, red, red,,,,,,,,,",
+    "How are you? Tell me the name of the president of the\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
+  ],
+  "device=rocm|quantized=true|model=120b|kernels=false|attn_impl=kernels-community/aiter-flash-attn|mode=train": [
+    "Roses are red, violets red, red, red, red, red, red,,,,,,,,,",
+    "How are you? Tell me the name of the president of the\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"


ok sorry to intervene again, I just noticed that we have significantly different output between FA and eager attention. this smells fishy to me - are we sure that something is not broken?

Especially these repeating outputs are weird. Is there a different naming convention for the s_aux for example?

Good catch! Thanks.
I didn't check the fixtures as i was generating them directly to the file.
Let me check what's happening here.

there's a real bug. transformers passes the attention sinks to the FA kernel under the name s_aux (what vllm-fa3 expects) or learnable_sink (what FA4 expects). Our aiter-flash-attn kernel calls the same argument sink, which transformers doesn't recognize as a sink name, so it never passes the sink tensor to the kernel. The attention then runs without sinks, which is why we see the repetitive degenerate output. I'll rename the kernel's public arg to s_aux so transformers picks it up, then regenerate the ROCm FA fixtures.

huggingface/kernels-community#992

Noticed significanlty different outputs across eager vs fa

Abdennacer-Badaoui · 2026-06-23T15:33:12Z

This PR allows multiple different names for the same argument (especially for sinks): #45153.
Once it's merged, our aiter-flash-attn can keep using sink as the argument name.

Abdennacer-Badaoui marked this pull request as draft June 23, 2026 08:49

Abdennacer-Badaoui marked this pull request as ready for review June 23, 2026 09:53

Abdennacer-Badaoui requested a review from vasqu June 23, 2026 09:53

vasqu reviewed Jun 23, 2026

View reviewed changes

Abdennacer-Badaoui added 6 commits June 23, 2026 13:27

fix gpt-oss

333587a

fix some fixtures

6102680

style

2d9ed52

repo cons

02aadba

stylllle

64fd3ae

fix after review

cae1c1f

Abdennacer-Badaoui force-pushed the fix-gptoss branch from 71f135c to cae1c1f Compare June 23, 2026 13:27

vasqu reviewed Jun 23, 2026

View reviewed changes

cleaning

c7eb3f2

vasqu previously approved these changes Jun 23, 2026

View reviewed changes

vasqu added this pull request to the merge queue Jun 23, 2026

vasqu removed this pull request from the merge queue due to a manual request Jun 23, 2026

vasqu reviewed Jun 23, 2026

View reviewed changes

Abdennacer-Badaoui marked this pull request as draft June 23, 2026 15:33

Conversation

Abdennacer-Badaoui commented Jun 23, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Jun 23, 2026

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 23, 2026

Uh oh!

vasqu Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Abdennacer-Badaoui Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 23, 2026

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vasqu Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Abdennacer-Badaoui Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Abdennacer-Badaoui Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Abdennacer-Badaoui Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Abdennacer-Badaoui commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants