Skip to content

ARROW-3880: [Rust] Implement simple math operations for numeric arrays#3033

Closed
andygrove wants to merge 19 commits into
apache:masterfrom
andygrove:ARROW-3880
Closed

ARROW-3880: [Rust] Implement simple math operations for numeric arrays#3033
andygrove wants to merge 19 commits into
apache:masterfrom
andygrove:ARROW-3880

Conversation

@andygrove

Copy link
Copy Markdown
Member

No description provided.

@codecov-io

codecov-io commented Nov 26, 2018

Copy link
Copy Markdown

Codecov Report

Merging #3033 into master will increase coverage by 3.43%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3033      +/-   ##
==========================================
+ Coverage      87%   90.43%   +3.43%     
==========================================
  Files         496       13     -483     
  Lines       70563     2092   -68471     
==========================================
- Hits        61393     1892   -59501     
+ Misses       9069      200    -8869     
+ Partials      101        0     -101
Impacted Files Coverage Δ
rust/src/lib.rs 100% <ø> (ø) ⬆️
rust/src/array.rs 90.97% <100%> (+1.25%) ⬆️
rust/src/tensor.rs 93.11% <100%> (ø) ⬆️
rust/src/csv/reader.rs 87.5% <100%> (+0.12%) ⬆️
rust/src/error.rs 75% <100%> (+8.33%) ⬆️
rust/src/buffer.rs 93.53% <100%> (+0.02%) ⬆️
python/pyarrow/ipc.pxi
cpp/src/parquet/column_page.h
cpp/src/parquet/bloom_filter-test.cc
cpp/src/plasma/client.cc
... and 479 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 427a219...06bbc4a. Read the comment docs.

@andygrove

Copy link
Copy Markdown
Member Author

@paddyhoran @sunchao @kszucs Looking for a review please

Comment thread rust/src/array.rs Outdated
}

pub fn divide(&self, other: &PrimitiveArray<$native_ty>) -> PrimitiveArray<$native_ty> {
self.math_helper(other, |a, b| a / b)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check if b is zero and push a null if it is?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe divide by zero should be an error, as it would be usually

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, we don't want a panic ... so probably divide should return a Result<> and should check first for any 0 values before attempting the operation

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, maybe returning a Result is more appropriate.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tricky ... the code is generic and using macros which makes it hard to check for literal zero ... I'm going to have to think about this some more. I probably have to refactor this a bit.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use something like this.

In ARROW-3878 @sunchao is introducing the following:
pub trait ArrowNumericType: ArrowPrimitiveType {}

We could make this:
pub trait ArrowNumericType: ArrowPrimitiveType + Zero {} and you could make the above generic over ArrowNumericType.

I think it's fine to merge as is and open a JIRA for the follow up work, as once ARROW-3878 is merged the solution will change.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @paddyhoran ... that num crate made things much nicer and now we no longer panic on divide by zero. Glad to see ArrowNumericType coming along ... I actually tried to implement that myself this week and failed.

Comment thread rust/src/array.rs Outdated
self.math_helper(other, |a, b| a - b)
}

pub fn multiply(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this work for boolean types as well?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is only implemented for the numeric types (as with the existing min and max methods).

Comment thread rust/src/array.rs Outdated
if self.is_null(i) || other.is_null(i) {
b.push_null().unwrap();
} else {
b.push(op(self.value(index), other.value(index))).unwrap();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we check length: what if other.len < self.len?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I've added a check for that.

Comment thread rust/src/array.rs Outdated
}

//TODO: need help here ...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paddyhoran @sunchao I wonder if you can help me figure this out. Since the change to use specialization, I don't know how to make this work for numeric types but not boolean types. I'll keep trying but I feel I am missing something.

@andygrove

Copy link
Copy Markdown
Member Author

Well, I made it work using macros .... probably not ideal, but gives me the functionality I want.

Comment thread rust/src/array.rs Outdated
}
}

macro_rules! def_numeric_math_ops {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of macro, you can do:

impl<T: ArrowNumericType> PrimitiveArray<T>
where
    T::Native: Add<Output = T::Native>
        + Sub<Output = T::Native>
        + Mul<Output = T::Native>
        + Div<Output = T::Native>
        + Zero,
{
  // actual impl
}

Comment thread rust/src/array.rs Outdated
// specific language governing permissions and limitations
// under the License.

use num::Zero;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you move this line below std and above the use array_data::{ArrayData, ArrayDataRef};?

Comment thread rust/src/array.rs Outdated
macro_rules! def_numeric_math_ops {
( $ty:ident, $native_ty:ident ) => {
impl PrimitiveArray<$ty> {
fn math_op<F>(self, other: &PrimitiveArray<$ty>, op: F) -> Result<PrimitiveArray<$ty>>

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking whether we should expose some basic functions on primitive array, such as map, fold, etc. The math_op can be generalized to accept any function that takes two arrays. In functional world, it can be achieved via zip plus map, but I'm not sure how to implement zip on multiple primitive arrays..

@andygrove

Copy link
Copy Markdown
Member Author

@sunchao Thanks for the help, I re-implemented using generics. Should be good now. I agree with supporting zip and map and will experiment with that and create a separate PR if I figure it out.

@sunchao sunchao left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove . Code-wise this looks good to me. However I don't have enough knowledge to decide whether this will be good addition to the core array functionalities. Seems other languages are not doing this. Another option could be to make a math module and implement these there? seems these functions are not accessing array internals.

Maybe @wesm , @xhochy , @kszucs can help reviewing this too?

Comment thread rust/src/array.rs Outdated
})
}

pub fn lt_eq(self, other: &PrimitiveArray<T>) -> Result<PrimitiveArray<BooleanType>> {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could we use BooleanArray instead of PrimitiveArray<BooleanType>? same below.

Comment thread rust/src/array.rs Outdated
"Cannot perform math operation on two batches of different length".to_string(),
));
}
let mut b = PrimitiveArrayBuilder::<BooleanType>::new(self.len());

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could we use BooleanArrayBuilder instead of PrimitiveArrayBuilder::<BooleanType>?

@andygrove

Copy link
Copy Markdown
Member Author

@sunchao Maybe you are right .. I have started a discussion on the mailing list about this.

@andygrove

Copy link
Copy Markdown
Member Author

@sunchao I moved the math and comparison operators out of array and into a new array_ops source file.

@sunchao sunchao left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @andygrove !

@paddyhoran

Copy link
Copy Markdown
Contributor

+1 LGTM, thanks @andygrove

@wesm

wesm commented Dec 11, 2018

Copy link
Copy Markdown
Member

Needs rebase

@andygrove andygrove closed this in 2428945 Dec 11, 2018
@andygrove andygrove deleted the ARROW-3880 branch March 30, 2019 22:33
@andygrove andygrove restored the ARROW-3880 branch March 30, 2019 22:34
@andygrove andygrove deleted the ARROW-3880 branch March 30, 2019 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants