This code snippet is using TensorFlow2.0, some of the code might not be compatible with earlier
versions, make sure
to update TF2.0 before executing the code.
tf.keras.losses.cosine_similarity function in tensorflow computes the cosine similarity between labels
It is a negative quantity between -1 and 0, where 0 indicates less similarity and values closer to -1
indicate greater similarity.
In this code we will use transfer learning to get pre trained token based embedding model
This provides embedding vector output with 128 dimensions.
"ex_sentence" a list of sentence having 5 sentences, we will calculate
sentence similarity by using cosine similarity function of TensorFlow.
import tensorflow as tf
import tensorflow_hub as hub
ex_sentence = ["this is test", "this is second test",
"this is third test", "not similar to others in this list",
"this is test"]
embed = hub.load("https://tfhub.dev/google/tf2-preview/nnlm-en-dim128/1")
embeddings = embed(ex_sentence)
output should be very close to -1 as both
sentence are identical ex_sentence and ex_sentence
print("Sentences are having greater similarity")
'''output should be very close to 0 as both
sentence are different ex_sentence and ex_sentence
print("Sentences are having less similarity")
Output of similarity between sentences
Sentences are having greater similarity
tf.Tensor(-0.99999994, shape=(), dtype=float32)
Sentences are having less similarity
tf.Tensor(-0.3791005, shape=(), dtype=float32)