Add IndexBinaryIDMap2 support to index binary factory#4603
Add IndexBinaryIDMap2 support to index binary factory#4603ashishresu2001 wants to merge 4 commits into
Conversation
ashishresu2001
left a comment
There was a problem hiding this comment.
Hi!
Could someone please review this PR: Add IndexBinaryIDMap2 support to index binary factory?
It adds support for IDMap2/IDMap prefix and suffix parsing in the index binary factory, normalizes description strings, and includes new unit tests for these cases.
All relevant tests have been run (pytest tests/test_binary_factory.py) and the CLA is signed.
Thank you!
|
Hi @ashishresu2001 , sorry for the delay, I will review this today |
|
@mnorris11 has imported this pull request. If you are a Meta employee, you can view this in D84619819. |
| desc_str.erase( | ||
| std::remove_if(desc_str.begin(), desc_str.end(), ::isspace), | ||
| desc_str.end()); |
There was a problem hiding this comment.
We do not need this part, we can assume it is incorrect if there are incorrect spaces. (We can follow the pattern of the other parts of the index_factory which do not have this handling)
| desc_str.end()); | ||
|
|
||
| // Check for IDMap2 or IDMap as prefix | ||
| if (desc_str.rfind("IDMap2,", 0) == 0 && desc_str.size() > 7) { |
There was a problem hiding this comment.
Are you able to use the same approach of re_match as the other IDMap2 handling for non-binary index? i.e. like here:
Lines 703 to 709 in 041ac84
(ideally we can keep it consistent)
This lets us consolidate to fewer conditions, since prefix and suffix should behave the same.
|
|
||
| int ncentroids = -1; | ||
| int M, nhash, b; | ||
| const char* desc_cstr = desc_str.c_str(); |
There was a problem hiding this comment.
Why did we need to convert to a C style string here, if the input is a C style string?
|
Thanks for this contribution @ashishresu2001 ! I just left a few small comments: we can make it closer to the existing approach. Otherwise looks great. |
| def test_factory_IDMap2_with_spaces(self): | ||
| index = faiss.index_binary_factory(16, " IDMap2 , BFlat ") | ||
| assert isinstance(index, faiss.IndexBinaryIDMap2) | ||
| assert index.index.code_size == 2 |
There was a problem hiding this comment.
We are very grateful you added tests! We can remove this spaces one though.
There was a problem hiding this comment.
Hi @mnorris11 , Thanks for the review. I’ve made the requested updates. Removed whitespace handling, switched to re_match for consistency, dropped the redundant C-string conversion, and removed the extra test.
…, eliminate C-string conversion
|
@mnorris11 merged this pull request in 484dd97. |
…ch#4603) Summary: - Add support for IDMap2/IDMap prefix and suffix parsing in index_binary_factory - Normalize description string to handle whitespace - Add unit tests for IDMap2 prefix, suffix, and whitespace cases ## Testing - Ran: pytest tests/test_binary_factory.py Fixes facebookresearch#3875 Pull Request resolved: facebookresearch#4603 Reviewed By: subhadeepkaran, junjieqi Differential Revision: D84619819 Pulled By: mnorris11 fbshipit-source-id: 5a2ac77bdce36f3322d4e3d60bb857767e551c52
…ch#4603) Summary: - Add support for IDMap2/IDMap prefix and suffix parsing in index_binary_factory - Normalize description string to handle whitespace - Add unit tests for IDMap2 prefix, suffix, and whitespace cases ## Testing - Ran: pytest tests/test_binary_factory.py Fixes facebookresearch#3875 Pull Request resolved: facebookresearch#4603 Reviewed By: subhadeepkaran, junjieqi Differential Revision: D84619819 Pulled By: mnorris11 fbshipit-source-id: 5a2ac77bdce36f3322d4e3d60bb857767e551c52
…ch#4603) Summary: - Add support for IDMap2/IDMap prefix and suffix parsing in index_binary_factory - Normalize description string to handle whitespace - Add unit tests for IDMap2 prefix, suffix, and whitespace cases ## Testing - Ran: pytest tests/test_binary_factory.py Fixes facebookresearch#3875 Pull Request resolved: facebookresearch#4603 Reviewed By: subhadeepkaran, junjieqi Differential Revision: D84619819 Pulled By: mnorris11 fbshipit-source-id: 5a2ac77bdce36f3322d4e3d60bb857767e551c52
Summary
Testing
Fixes #3875