'how to find closeness between two keras pad_sequences?

I am writing a small proof of concept where I turn a catalog into a json that has a url, and a label that explains the web page. I read this json in python, tokenize it and create a pad_sequences.

I need to then compare some free flow texts to find which index of the pad_sequences has the most words from the free flow text.

I am generating a pad_sequences() from the text too but not sure if I can somehow compare the two sequences for closeness?

Please help.



Solution 1:[1]

You can use cosine similarity or euclidean distance to compare two vectors.

https://www.tensorflow.org/api_docs/python/tf/keras/metrics/CosineSimilarity

https://www.tutorialexample.com/calculate-euclidean-distance-in-tensorflow-a-step-guide-tensorflow-tutorial/

For sequences you can make embedding to same lenght vector at first.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Peter Pirog