Don't index English words

This commit is contained in:
Nicholas Rempel 2021-05-31 10:09:22 -07:00 committed by Dirkjan Ochtman
parent fabe10271d
commit 67778600ef
1 changed files with 5 additions and 0 deletions

View File

@ -68,6 +68,11 @@ async def download_build_index():
if lang == "en":
word_map[value] = embedding
else:
# Don't index words that exist in english
# to improve the quality of the results.
if value in word_map:
continue
# We track values here to build the instant-distance index
# Every value is prepended with 2 character language code.
# This allows us to determine language output later.