Extend FUNNEL_COUNT to support multiple CORRELATE_BY columns#18760
Extend FUNNEL_COUNT to support multiple CORRELATE_BY columns#18760tarun11Mavani wants to merge 4 commits into
Conversation
Enable funnel analysis that tracks users through steps within a composite key (e.g., per user per device category) by accepting multiple columns in CORRELATE_BY(col1, col2, ...). The single-key path is preserved as a zero-overhead fast path with separate addSingleKey/addMultiKey abstract methods and dedicated aggregation loops, ensuring no regression for existing single-column queries. Multi-key composite ID mapping uses stride-based arithmetic when the product of dictionary sizes fits in int, with a HashMap fallback for large key spaces. Co-authored-by: Cursor <cursoragent@cursor.com>
Benchmark was used for local validation only; not needed in the PR. Co-authored-by: Cursor <cursoragent@cursor.com>
Performance Validation (JMH)Ran Single-key path — Before (baseline) vs After (this PR):
*theta_sketch and partitioned_sorted show large error bars indicating JVM warmup variance, not a real regression. Scores overlap within error margins. Multi-key path (new feature, this PR only):
Single-key path shows NO statistically significant regression. All deltas are within error margins. The bitmap/set/partitioned strategies (which dominate real workloads) are within ±2% of baseline — effectively identical. |
Keep the original `add(Dictionary, A, int, int)` abstract method unchanged. The new multi-key method is added as `addMultiKey(A, int, Dictionary[], int[])`. Co-authored-by: Cursor <cursoragent@cursor.com>
e1d2196 to
d6bb092
Compare
…egationResult double-count - Add DictIdsWrapperTest covering the HashMap fallback path (large-cardinality composite keys where product of dict sizes exceeds Integer.MAX_VALUE): path selection, sequential ID assignment, same-key idempotency, key-order sensitivity, and round-trip for 2- and 3-column keys. Also covers stride-path reverseCompositeId round-trip. Add isHashMapPath() predicate to DictIdsWrapper for test introspection (avoids widening _strides visibility). - Add SortedAggregationResultTest with multi-key extraction scenarios. - Fix SortedAggregationResult.extractResult(): clear _secondaryKeySteps after flushMultiKeyGroup() so a second call (defensive) returns zeros rather than double-counting the last open primary group.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #18760 +/- ##
============================================
+ Coverage 64.67% 64.75% +0.08%
Complexity 1309 1309
============================================
Files 3381 3380 -1
Lines 209821 209784 -37
Branches 32805 32850 +45
============================================
+ Hits 135697 135843 +146
+ Misses 63230 63013 -217
- Partials 10894 10928 +34
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Summary
Extends
FUNNEL_COUNTto accept multiple columns inCORRELATE_BY(col1, col2, ...),enabling funnel analysis that tracks users through steps within a composite key
(e.g., per user per device category), not just a single dimension.
Design
Doc with example: https://docs.google.com/document/d/1gWQ7XBbJdQcUdZvBevFnGTVbCVJ3fN49biIsSOtRdhM/edit?tab=t.0
The single-key aggregation path is preserved as a zero-overhead fast path — structurally
identical to the original single-column implementation — so existing queries see no
regression. Multi-key support is added as a separate code path selected once per block.
AggregationStrategy: Split into two abstract methods (addSingleKey/addMultiKey)with separate aggregation loops for single-key and multi-key, eliminating per-row branching
on the dominant single-key path.
DictIdsWrapper: Added composite-key mapping for multi-column CORRELATE_BY. Usesstride-based arithmetic when the product of dictionary sizes fits in
int, falling backto a
HashMap<IntArrayList, Integer>for large key spaces. Also addstoCompositeStringfor length-prefix encoded composite string keys used during result extraction.
SortedAggregationResult: Updated to handle multi-key by tracking secondary keys viaa
HashMapwithin each primary-key group (data is sorted on the primary column only).BitmapAggregationStrategy,SortedAggregationStrategy,ThetaSketchAggregationStrategy: Implement bothaddSingleKeyandaddMultiKey.SetResultExtractionStrategy,BitmapResultExtractionStrategy: Updated toreverse-map composite IDs back to per-column dictionary values during result extraction.
FunnelCountSortedAggregationFunction: Propagates multi-dictionary context throughthe sorted aggregation result extraction pipeline.
Example Query
Test Plan