GPU Inference

Currently only training seems to work for GPU jobs. Inference should see even higher speed ups since it involves a large number of matrix multiplications. We replaced normal `recommend_all` with a simple implementation using tensorflow and saw speed ups in 40x+ range with exactly the same results as native implementation. It would be nice if the library supported gpu inference by default