ARROW-3880: [Rust] Implement simple math operations for numeric arrays#3033
ARROW-3880: [Rust] Implement simple math operations for numeric arrays#3033andygrove wants to merge 19 commits into
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3033 +/- ##
==========================================
+ Coverage 87% 90.43% +3.43%
==========================================
Files 496 13 -483
Lines 70563 2092 -68471
==========================================
- Hits 61393 1892 -59501
+ Misses 9069 200 -8869
+ Partials 101 0 -101
Continue to review full report at Codecov.
|
|
@paddyhoran @sunchao @kszucs Looking for a review please |
| } | ||
|
|
||
| pub fn divide(&self, other: &PrimitiveArray<$native_ty>) -> PrimitiveArray<$native_ty> { | ||
| self.math_helper(other, |a, b| a / b) |
There was a problem hiding this comment.
Should we check if b is zero and push a null if it is?
There was a problem hiding this comment.
I believe divide by zero should be an error, as it would be usually
There was a problem hiding this comment.
However, we don't want a panic ... so probably divide should return a Result<> and should check first for any 0 values before attempting the operation
There was a problem hiding this comment.
Yea, maybe returning a Result is more appropriate.
There was a problem hiding this comment.
This is tricky ... the code is generic and using macros which makes it hard to check for literal zero ... I'm going to have to think about this some more. I probably have to refactor this a bit.
There was a problem hiding this comment.
We could use something like this.
In ARROW-3878 @sunchao is introducing the following:
pub trait ArrowNumericType: ArrowPrimitiveType {}
We could make this:
pub trait ArrowNumericType: ArrowPrimitiveType + Zero {} and you could make the above generic over ArrowNumericType.
I think it's fine to merge as is and open a JIRA for the follow up work, as once ARROW-3878 is merged the solution will change.
There was a problem hiding this comment.
Thanks @paddyhoran ... that num crate made things much nicer and now we no longer panic on divide by zero. Glad to see ArrowNumericType coming along ... I actually tried to implement that myself this week and failed.
| self.math_helper(other, |a, b| a - b) | ||
| } | ||
|
|
||
| pub fn multiply( |
There was a problem hiding this comment.
does this work for boolean types as well?
There was a problem hiding this comment.
No, this is only implemented for the numeric types (as with the existing min and max methods).
| if self.is_null(i) || other.is_null(i) { | ||
| b.push_null().unwrap(); | ||
| } else { | ||
| b.push(op(self.value(index), other.value(index))).unwrap(); |
There was a problem hiding this comment.
should we check length: what if other.len < self.len?
There was a problem hiding this comment.
Good point. I've added a check for that.
| } | ||
|
|
||
| //TODO: need help here ... | ||
|
|
There was a problem hiding this comment.
@paddyhoran @sunchao I wonder if you can help me figure this out. Since the change to use specialization, I don't know how to make this work for numeric types but not boolean types. I'll keep trying but I feel I am missing something.
|
Well, I made it work using macros .... probably not ideal, but gives me the functionality I want. |
| } | ||
| } | ||
|
|
||
| macro_rules! def_numeric_math_ops { |
There was a problem hiding this comment.
Instead of macro, you can do:
impl<T: ArrowNumericType> PrimitiveArray<T>
where
T::Native: Add<Output = T::Native>
+ Sub<Output = T::Native>
+ Mul<Output = T::Native>
+ Div<Output = T::Native>
+ Zero,
{
// actual impl
}
| // specific language governing permissions and limitations | ||
| // under the License. | ||
|
|
||
| use num::Zero; |
There was a problem hiding this comment.
nit: can you move this line below std and above the use array_data::{ArrayData, ArrayDataRef};?
| macro_rules! def_numeric_math_ops { | ||
| ( $ty:ident, $native_ty:ident ) => { | ||
| impl PrimitiveArray<$ty> { | ||
| fn math_op<F>(self, other: &PrimitiveArray<$ty>, op: F) -> Result<PrimitiveArray<$ty>> |
There was a problem hiding this comment.
I'm thinking whether we should expose some basic functions on primitive array, such as map, fold, etc. The math_op can be generalized to accept any function that takes two arrays. In functional world, it can be achieved via zip plus map, but I'm not sure how to implement zip on multiple primitive arrays..
|
@sunchao Thanks for the help, I re-implemented using generics. Should be good now. I agree with supporting |
sunchao
left a comment
There was a problem hiding this comment.
Thanks @andygrove . Code-wise this looks good to me. However I don't have enough knowledge to decide whether this will be good addition to the core array functionalities. Seems other languages are not doing this. Another option could be to make a math module and implement these there? seems these functions are not accessing array internals.
Maybe @wesm , @xhochy , @kszucs can help reviewing this too?
| }) | ||
| } | ||
|
|
||
| pub fn lt_eq(self, other: &PrimitiveArray<T>) -> Result<PrimitiveArray<BooleanType>> { |
There was a problem hiding this comment.
nit: could we use BooleanArray instead of PrimitiveArray<BooleanType>? same below.
| "Cannot perform math operation on two batches of different length".to_string(), | ||
| )); | ||
| } | ||
| let mut b = PrimitiveArrayBuilder::<BooleanType>::new(self.len()); |
There was a problem hiding this comment.
nit: could we use BooleanArrayBuilder instead of PrimitiveArrayBuilder::<BooleanType>?
|
@sunchao Maybe you are right .. I have started a discussion on the mailing list about this. |
|
@sunchao I moved the math and comparison operators out of array and into a new array_ops source file. |
|
+1 LGTM, thanks @andygrove |
|
Needs rebase |
No description provided.