>>> import pyarrow as pa
Registry has 519 pre-compiled functions
>>> import pandas as pd
>>> import numpy as np
>>> import pyarrow.gandiva as gandiva
>>> import timeit
>>>
>>> from matplotlib import pyplot as plt
>>> for scale in range(25, 26):
... frame_data = 1.0 * np.random.randint(0, 100, size=(2**scale, 2))
... df = pd.DataFrame(frame_data).add_prefix("col")
... table = pa.Table.from_pandas(df)
...
>>>
>>> def float64_add(table):
... builder = gandiva.TreeExprBuilder()
... node_a = builder.make_field(table.schema.field_by_name("col0"))
... node_b = builder.make_field(table.schema.field_by_name("col1"))
... sum = builder.make_function(b"add", [node_a, node_b], pa.float64())
... field_result = pa.field("c", pa.float64())
... expr = builder.make_expression(sum, field_result)
... projector = gandiva.make_projector(table.schema, [expr], pa.default_memory_pool())
... return projector
...
>>> projector = float64_add(table)
>>> projector.evaluate(table.to_batches()[0])
[1] 36393 segmentation fault python
It is because there is an integer overflow in Gandiva:
|
status = AllocArrayData(field->type(), static_cast<int>(batch.num_rows()), pool, |
It should be int64_t instead of int.
Reporter: Siyuan Zhuang / @suquark
Assignee: Siyuan Zhuang / @suquark
PRs and other links:
Note: This issue was originally created as ARROW-3698. Please see the migration documentation for further details.
It is because there is an integer overflow in Gandiva:
arrow/cpp/src/gandiva/projector.cc
Line 141 in 1a6545a
It should be
int64_tinstead ofint.Reporter: Siyuan Zhuang / @suquark
Assignee: Siyuan Zhuang / @suquark
PRs and other links:
Note: This issue was originally created as ARROW-3698. Please see the migration documentation for further details.