TensorFlow | NLP | Create embedding with pre-trained models

This post explains how to create word embedding vectors in TensorFlow using Pre Trained models.

TensorFlow Hub

TensorFlow Hub is a repository of trained machine learning models ready for fine-tuning and deployable anywhere. In this tutorial we will use pre trained text-embedding model from TensorFlow Hub to create word embedding vector.

What are Embeddings

An word embedding in machine learning is used to represent text with embedding vectors. Embedding is a term used for the representation of words for text analysis, typically in the form of a real-valued vector that encodes the meaning of the word such that the words that are closer in the vector space are expected to be similar in meaning.

Create word embeddings using TensorFlow pre trained models

1. Install tensorflow-hub
   
pip install tensorflow-hub
 

2. Import tensorflow-hub and load pre trained model
   
import tensorflow_hub as hub

embed = hub.load("https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1")
 

There are many word embeddings models available in TensorFlow hub, details of these models are available here. For this demo purpose we are using https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1 this model is trained on English Google News 130GB corpus, it provides embedding vector output with 20 dimensions.

3. Generate embeddings for sentences using TensorFlow pre trained model
   
#sample list of sentences
train_examples = ["this is first sentence",
                    "post on transfer learning"]

# create embeddings
embeddings = embed(train_examples)

4. Check embeddings shapes
   
print(embeddings.shape)
 

5. Embedding vector and shape for first sentence
   
print("Embedding for sentence 1")
print(embeddings[0])
print(embeddings[0].shape)
     

6. Embedding vector and shape for second sentence
   
print("Embeddings for sentence 2")
print(embeddings[1])
print(embeddings[1].shape) 
     

Output
   
(2, 20)
Embedding for sentence 1
tf.Tensor(
[-1.0672672  -1.1043963   0.4294139   0.56906974  0.00213611 -0.65244186
    -0.78387123 -0.52442074  0.99164987 -0.7627919  -0.25447875 -0.4340241
    0.1354623   0.20142603 -0.6427126   1.4065914   0.00409365 -0.87753415
    -0.48709276 -0.27135906], shape=(20,), dtype=float32)
(20,)
Embeddings for sentence 2
tf.Tensor(
[-0.09212524 -0.9034263   0.99333376  1.2055938   0.27041954 -0.15681976
    0.6756444  -0.3102063  -0.53932023 -0.5933851  -1.2455785   0.5302402
    -0.68134886  0.16809608  0.42006266  0.13688701  1.0491474  -0.76284564
    -0.36677927  0.10497689], shape=(20,), dtype=float32)
(20,)    
 

Complete code snippet to create embeddings in TensorFlow

   
import tensorflow_hub as hub

embed = hub.load("https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1")

#sample list of sentences
train_examples = ["this is first sentence",
                    "post on transfer learning"]

# create embeddings
embeddings = embed(train_examples)

#Check embeddings shapes
print(embeddings.shape)

#Check embeddings and shape for each sentence
print("Embedding for sentence 1")
print(embeddings[0])
print(embeddings[0].shape)

print("Embeddings for sentence 2")
print(embeddings[1])
print(embeddings[1].shape) 
    
 

Category: TensorFlow