Skip to content

[fix](fe) Bound length in MysqlProto.readLenEncodedString#63604

Merged
morrySnow merged 4 commits into
apache:masterfrom
MarkLee131:fix-mysql-lenenc-bound
Jun 3, 2026
Merged

[fix](fe) Bound length in MysqlProto.readLenEncodedString#63604
morrySnow merged 4 commits into
apache:masterfrom
MarkLee131:fix-mysql-lenenc-bound

Conversation

@MarkLee131

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: close #63603

Problem Summary:

MysqlProto.readLenEncodedString reads a length-encoded integer and passes it straight to new byte[(int) length] with no bound. The length is fully attacker-controlled (a 0xFE lead byte carries an 8-byte value), and it is read before authentication from MysqlAuthPacket.readFrom (the auth-response field at MysqlAuthPacket.java:93 and the connection-attributes loop at MysqlAuthPacket.java:110-118). A small handshake response can therefore request
a ~2 GiB allocation, and a length with the high bit set casts to a negative size (NegativeArraySizeException).

This PR rejects a length that is negative or larger than the bytes remaining in the buffer before allocating. A well-formed length-encoded string's payload always fits in the remaining buffer, so valid input is unaffected. One guard covers both reach paths.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

The Unit Test is MysqlProtoLenEncStringTest (added in this PR): oversized and negative-cast lengths are rejected with IllegalArgumentException; a normal short length-encoded string still parses.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

### What problem does this PR solve?

Problem Summary:
MysqlProto.readLenEncodedString reads a length-encoded integer from the MySQL
packet and passes it directly to new byte[(int) length] with no upper bound.
The length is attacker-controlled (a 0xFE lead byte carries a full 8-byte
value), and it is read before authentication from MysqlAuthPacket.readFrom (the
auth-response field and the connection-attributes loop). A small handshake
response can therefore request a ~2 GiB allocation, and a length with the high
bit set casts to a negative size.

Reject a length that is negative or larger than the bytes remaining in the
buffer before allocating. A well-formed length-encoded string's payload always
fits in the remaining buffer, so valid input is unaffected.

### Release note

None

### Check List (For Author)

- Test: Unit Test (MysqlProtoLenEncStringTest)
- Behavior changed: No
- Does this need documentation: No
@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@morrySnow

Copy link
Copy Markdown
Contributor

/review

@morrySnow

Copy link
Copy Markdown
Contributor

run buildall

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Jun 1, 2026
@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking issues found. The change is small and focused: readLenEncodedString now rejects negative or packet-overrunning length-encoded string sizes before allocation, and the added unit test covers oversized, negative-cast, and valid payload cases.

Critical checkpoint conclusions:

  • Goal and proof: The PR addresses attacker-controlled length-encoded string allocation before authentication; the implementation meets the goal, with unit coverage for the key cases.
  • Scope: The modification is minimal and localized to the shared parser used by both auth response and connection attributes.
  • Concurrency and lifecycle: No new shared state, locking, threads, or lifecycle-sensitive objects are introduced.
  • Configuration and compatibility: No new config, storage format, function symbol, or FE/BE protocol compatibility issue found. Invalid malformed packets now fail earlier, which is consistent with parser validation.
  • Parallel paths: The common helper is used by the known length-encoded string readers, so both call paths are covered.
  • Error handling: The new guard fails explicitly with IllegalArgumentException; callers that already catch runtime parsing failures continue to do so, and handshake parsing already treats malformed packets as failed negotiation.
  • Tests: Added FE unit coverage is appropriate. I attempted mvn -pl fe-core -Dtest=MysqlProtoLenEncStringTest test -Dskip.doc=true, but the local runner could not resolve org.apache.doris:fe-foundation:1.2-SNAPSHOT, so the test did not execute here.
  • Observability, transaction/persistence, data writes, memory tracking: Not applicable to this parser-only FE change.
  • Performance: The guard is constant-time and avoids pathological allocation; no additional performance issue found.

User focus: No additional user-provided review focus was present.

@morrySnow

Copy link
Copy Markdown
Contributor

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 66.67% (2/3) 🎉
Increment coverage report
Complete coverage report

1 similar comment
@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 66.67% (2/3) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 0.00% (0/4) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29035 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e2e3cac4261a2d27f6b6fb5e0f2ea80b0d2b0fc8, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17660	4362	4222	4222
q2	q3	10756	1425	831	831
q4	4690	480	357	357
q5	7500	863	618	618
q6	189	180	141	141
q7	797	816	637	637
q8	9802	1560	1660	1560
q9	6703	4502	4500	4500
q10	6800	1827	1512	1512
q11	438	276	247	247
q12	634	441	303	303
q13	18160	3374	2799	2799
q14	267	266	241	241
q15	q16	818	763	700	700
q17	1073	1005	880	880
q18	6978	5791	5489	5489
q19	1841	1360	1045	1045
q20	506	401	256	256
q21	5829	2610	2398	2398
q22	436	358	299	299
Total cold run time: 101877 ms
Total hot run time: 29035 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4529	4417	4420	4417
q2	q3	4498	4942	4350	4350
q4	2130	2215	1398	1398
q5	4452	4332	4365	4332
q6	238	185	134	134
q7	1999	1924	1737	1737
q8	2618	2256	2168	2168
q9	8129	7982	8117	7982
q10	4844	4728	4345	4345
q11	575	442	430	430
q12	880	759	546	546
q13	3291	3572	3022	3022
q14	298	321	281	281
q15	q16	768	739	668	668
q17	1410	1358	1375	1358
q18	8012	7428	7370	7370
q19	1156	1108	1098	1098
q20	2230	2231	1945	1945
q21	5356	4678	4476	4476
q22	537	458	406	406
Total cold run time: 57950 ms
Total hot run time: 52463 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 171138 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e2e3cac4261a2d27f6b6fb5e0f2ea80b0d2b0fc8, data reload: false

query5	4328	650	529	529
query6	331	220	213	213
query7	4231	561	306	306
query8	327	233	244	233
query9	8787	4010	3998	3998
query10	449	337	303	303
query11	5832	2397	2158	2158
query12	183	128	127	127
query13	1277	650	459	459
query14	6036	5471	5112	5112
query14_1	4460	4464	4513	4464
query15	218	208	189	189
query16	1018	459	437	437
query17	1159	743	606	606
query18	2527	488	368	368
query19	233	215	173	173
query20	140	142	135	135
query21	216	147	121	121
query22	13632	13573	13429	13429
query23	17244	16470	16197	16197
query23_1	16240	16435	16227	16227
query24	7425	1811	1297	1297
query24_1	1326	1316	1340	1316
query25	589	514	411	411
query26	1312	318	173	173
query27	2692	556	344	344
query28	4424	2039	2003	2003
query29	995	631	505	505
query30	318	237	198	198
query31	1117	1072	957	957
query32	97	74	72	72
query33	541	344	289	289
query34	1188	1123	680	680
query35	828	790	696	696
query36	1386	1393	1221	1221
query37	155	108	91	91
query38	3295	3236	3121	3121
query39	963	970	947	947
query39_1	916	922	936	922
query40	236	151	137	137
query41	66	63	62	62
query42	112	121	110	110
query43	328	346	301	301
query44	
query45	221	208	204	204
query46	1122	1209	738	738
query47	2418	2430	2311	2311
query48	430	413	290	290
query49	625	500	429	429
query50	1012	382	263	263
query51	4525	4366	4344	4344
query52	111	110	97	97
query53	266	281	214	214
query54	321	273	261	261
query55	96	104	94	94
query56	319	320	326	320
query57	1487	1445	1322	1322
query58	304	279	276	276
query59	1595	1656	1444	1444
query60	329	329	321	321
query61	157	163	159	159
query62	696	650	584	584
query63	258	207	211	207
query64	2419	798	669	669
query65	
query66	1693	505	366	366
query67	29754	29785	29712	29712
query68	
query69	471	335	307	307
query70	1064	1029	986	986
query71	307	279	278	278
query72	3059	2678	2382	2382
query73	878	741	430	430
query74	5117	4948	4826	4826
query75	2707	2628	2285	2285
query76	2269	1157	806	806
query77	416	434	354	354
query78	12368	12569	11831	11831
query79	1479	1040	746	746
query80	688	564	493	493
query81	463	284	253	253
query82	1388	161	159	159
query83	369	281	251	251
query84	252	147	109	109
query85	872	535	436	436
query86	403	347	325	325
query87	3438	3366	3258	3258
query88	3689	2779	2771	2771
query89	445	404	354	354
query90	1966	183	182	182
query91	178	166	137	137
query92	82	78	76	76
query93	1584	1494	861	861
query94	565	331	290	290
query95	670	479	345	345
query96	1121	739	338	338
query97	2750	2703	2568	2568
query98	240	229	229	229
query99	1175	1162	1033	1033
Total cold run time: 254977 ms
Total hot run time: 171138 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 28919 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 5a2bd9ee94f58700cccc6aae0439a5b733d4a3b2, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17776	4049	3973	3973
q2	q3	10773	1382	822	822
q4	4687	475	336	336
q5	7522	849	576	576
q6	181	167	137	137
q7	787	829	653	653
q8	9459	1611	1749	1611
q9	6313	4455	4449	4449
q10	6808	1797	1555	1555
q11	442	270	253	253
q12	652	426	294	294
q13	18212	3400	2785	2785
q14	264	260	234	234
q15	q16	826	769	706	706
q17	1274	988	801	801
q18	6791	5603	5566	5566
q19	1498	1400	1023	1023
q20	498	388	271	271
q21	5832	2782	2558	2558
q22	435	374	316	316
Total cold run time: 101030 ms
Total hot run time: 28919 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4804	4833	4678	4678
q2	q3	5031	5160	4696	4696
q4	2108	2168	1378	1378
q5	4902	4764	4758	4758
q6	232	177	129	129
q7	1860	1721	1482	1482
q8	2420	2140	2065	2065
q9	7368	7402	7415	7402
q10	4722	4682	4213	4213
q11	526	385	356	356
q12	721	750	527	527
q13	3072	3369	2819	2819
q14	277	292	264	264
q15	q16	679	709	615	615
q17	1290	1259	1246	1246
q18	7245	6775	6819	6775
q19	1085	1092	1090	1090
q20	2225	2228	1949	1949
q21	5219	4584	4415	4415
q22	497	437	405	405
Total cold run time: 56283 ms
Total hot run time: 51262 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 171439 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 5a2bd9ee94f58700cccc6aae0439a5b733d4a3b2, data reload: false

query5	4322	671	532	532
query6	331	235	218	218
query7	4232	565	319	319
query8	331	242	256	242
query9	8840	4053	4039	4039
query10	459	355	294	294
query11	5791	2357	2166	2166
query12	188	134	132	132
query13	1315	614	448	448
query14	6152	5459	5147	5147
query14_1	4472	4488	4468	4468
query15	218	211	190	190
query16	1034	466	459	459
query17	1177	753	628	628
query18	2793	514	374	374
query19	235	210	179	179
query20	153	137	140	137
query21	221	140	123	123
query22	13796	13589	13415	13415
query23	17353	16524	16213	16213
query23_1	16357	16274	16356	16274
query24	7486	1756	1328	1328
query24_1	1318	1327	1303	1303
query25	592	518	455	455
query26	1307	333	177	177
query27	2672	556	354	354
query28	4476	2060	2046	2046
query29	1059	670	527	527
query30	312	245	202	202
query31	1131	1087	961	961
query32	109	82	79	79
query33	559	376	312	312
query34	1155	1169	664	664
query35	774	842	708	708
query36	1438	1426	1287	1287
query37	154	107	91	91
query38	3210	3182	3062	3062
query39	943	913	900	900
query39_1	874	878	887	878
query40	239	151	130	130
query41	70	73	76	73
query42	114	116	113	113
query43	332	340	291	291
query44	
query45	212	206	197	197
query46	1047	1205	721	721
query47	2363	2312	2240	2240
query48	396	394	288	288
query49	641	489	401	401
query50	1027	345	251	251
query51	4386	4339	4341	4339
query52	109	116	97	97
query53	264	283	213	213
query54	326	277	256	256
query55	96	92	83	83
query56	327	327	325	325
query57	1434	1417	1307	1307
query58	314	282	280	280
query59	1621	1695	1452	1452
query60	323	334	310	310
query61	170	160	155	155
query62	706	664	588	588
query63	259	205	225	205
query64	2421	823	623	623
query65	
query66	1689	520	366	366
query67	29864	29801	29703	29703
query68	
query69	451	339	303	303
query70	1059	1030	1021	1021
query71	308	277	284	277
query72	3039	2706	2492	2492
query73	869	799	434	434
query74	7211	4940	4795	4795
query75	2708	2625	2286	2286
query76	2310	1125	763	763
query77	424	426	330	330
query78	12586	12379	11843	11843
query79	1516	1012	769	769
query80	1325	537	458	458
query81	504	282	240	240
query82	1314	162	126	126
query83	347	287	248	248
query84	258	141	116	116
query85	941	547	450	450
query86	450	361	325	325
query87	3436	3375	3246	3246
query88	3657	2765	2701	2701
query89	455	391	346	346
query90	1937	184	186	184
query91	176	166	138	138
query92	81	79	76	76
query93	1451	1575	920	920
query94	752	350	322	322
query95	691	474	358	358
query96	1008	797	358	358
query97	2741	2775	2571	2571
query98	242	229	227	227
query99	1133	1182	1029	1029
Total cold run time: 258441 ms
Total hot run time: 171439 ms

@morrySnow

Copy link
Copy Markdown
Contributor

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review opinion: no blocking issues found.

Critical checkpoint conclusions:

  • Goal and proof: The PR prevents attacker-controlled MySQL length-encoded string lengths from being cast directly to array sizes when the declared payload length exceeds the packet bytes remaining. The implementation accomplishes this for the shared MysqlProto.readLenEncodedString helper, and adds focused unit tests for oversized, negative-cast, and valid payload cases.
  • Scope/focus: The modification is small and localized to the parser helper plus tests.
  • Concurrency/lifecycle: No new shared state, locks, threads, or lifecycle-sensitive objects are introduced.
  • Configuration/compatibility: No configuration, storage format, thrift/protocol symbol, or rolling-upgrade compatibility concern found. Valid MySQL length-encoded string payloads remain accepted; malformed packets now fail earlier.
  • Parallel paths: The shared helper is used by the auth response, connect attributes, and OIDC token extraction paths, so the bound is applied consistently to current FE callers.
  • Error handling: The new IllegalArgumentException is an unchecked parse failure. In the handshake path it is contained by AcceptListener.handleConnection cleanup; OIDC extraction already catches RuntimeException and falls back to the raw auth response.
  • Tests: Added unit coverage targets the allocation-risk cases and a valid case. I attempted ./run-fe-ut.sh --run org.apache.doris.mysql.MysqlProtoLenEncStringTest, but the run failed during generated-code setup because gensrc/proto/Makefile reported thirdparty/installed/bin/protoc: No such file or directory, so the test did not execute in this runner.
  • Observability/performance: No new observability need. The added bound is constant-time and avoids pathological allocations.
  • Transaction/persistence/data writes: Not applicable.
  • User focus: No additional user-provided review focus was specified.

Residual risk: The broader handshake parser still relies on outer connection cleanup for malformed packet runtime exceptions, and this PR does not add end-to-end negotiation tests. I did not find that to be a blocker for this narrowly scoped fix.

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 0.00% (0/102) 🎉
Increment coverage report
Complete coverage report

@morrySnow

Copy link
Copy Markdown
Contributor

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review result: no blocking issues found.

Critical checkpoint conclusions:

  • Goal and proof: The PR adds a remaining-bytes bound for MySQL length-encoded string payloads, preventing attacker-controlled length values from being cast into oversized or negative byte-array allocations. The new unit test covers oversized 0xFE/8-byte lengths, negative int-cast lengths, and a valid payload.
  • Scope: The change is small and focused in MysqlProto.readLenEncodedString, with a targeted test class.
  • Concurrency and lifecycle: No shared mutable state, locking, thread lifecycle, or static initialization behavior is introduced.
  • Configuration and compatibility: No config items, storage formats, function symbols, or FE/BE protocol fields are added. Valid MySQL length-encoded strings remain accepted; malformed strings now fail before allocation.
  • Parallel paths: Existing callers of readLenEncodedString use the same helper, so the bound applies consistently to auth response, connection attributes, and OIDC token extraction parsing.
  • Error handling: The new malformed-packet failure is explicit. Existing callers already operate on packet parsing helpers that may throw runtime exceptions on malformed buffers; the OIDC extractor catches runtime exceptions and falls back as before.
  • Tests: Targeted tests are present. I attempted mvn -pl fe-core -Dtest=MysqlProtoLenEncStringTest test -Dskip.doc=true, which failed because fe-foundation SNAPSHOT was not resolved when running fe-core alone. I retried with mvn -pl fe-core -am -Dtest=MysqlProtoLenEncStringTest test -Dskip.doc=true -DfailIfNoTests=false, but the runner lacks thirdparty/installed/bin/thrift, so the reactor stopped in fe-thrift before reaching fe-core.
  • Observability: No additional logs or metrics are needed for this local packet validation helper.
  • Transaction/persistence/data writes: Not applicable.
  • Performance: The added check is O(1), before allocation, and avoids pathological allocation behavior.

User focus points: No additional user-provided review focus was present.

@morrySnow morrySnow merged commit 6a2f56a into apache:master Jun 3, 2026
30 checks passed
github-actions Bot pushed a commit that referenced this pull request Jun 3, 2026
### What problem does this PR solve?

Issue Number: close #63603

Problem Summary:

`MysqlProto.readLenEncodedString` reads a length-encoded integer and
passes it straight to `new byte[(int) length]` with no bound. The length
is fully attacker-controlled (a `0xFE` lead byte carries an 8-byte
value), and it is read before authentication from
`MysqlAuthPacket.readFrom` (the auth-response field at
`MysqlAuthPacket.java:93` and the connection-attributes loop at
`MysqlAuthPacket.java:110-118`). A small handshake response can
therefore request
a ~2 GiB allocation, and a length with the high bit set casts to a
negative size (`NegativeArraySizeException`).

This PR rejects a length that is negative or larger than the bytes
remaining in the buffer before allocating. A well-formed length-encoded
string's payload always fits in the remaining buffer, so valid input is
unaffected. One guard covers both reach paths.
github-actions Bot pushed a commit that referenced this pull request Jun 3, 2026
### What problem does this PR solve?

Issue Number: close #63603

Problem Summary:

`MysqlProto.readLenEncodedString` reads a length-encoded integer and
passes it straight to `new byte[(int) length]` with no bound. The length
is fully attacker-controlled (a `0xFE` lead byte carries an 8-byte
value), and it is read before authentication from
`MysqlAuthPacket.readFrom` (the auth-response field at
`MysqlAuthPacket.java:93` and the connection-attributes loop at
`MysqlAuthPacket.java:110-118`). A small handshake response can
therefore request
a ~2 GiB allocation, and a length with the high bit set casts to a
negative size (`NegativeArraySizeException`).

This PR rejects a length that is negative or larger than the bytes
remaining in the buffer before allocating. A well-formed length-encoded
string's payload always fits in the remaining buffer, so valid input is
unaffected. One guard covers both reach paths.
yiguolei pushed a commit that referenced this pull request Jun 4, 2026
…63604 (#64059)

Cherry-picked from #63604

Co-authored-by: MarkLee131 <kaixuan.li@ntu.edu.sg>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.1.x dev/4.0.x dev/4.1.2-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] FE: unbounded allocation in MysqlProto.readLenEncodedString from attacker-controlled length

4 participants