Embeddings

An embedding is a relatively low-dimensional space into which you can translate high-dimensional vectors. Embeddings make it easier to do machine learning on large inputs like sparse vectors representing words. Ideally, an embedding captures some of the semantics of the input by placing semantically similar inputs close together in the embedding space. An embedding can be learned and reused across models.

Embeddings

Motivation From Collaborative Filtering

  • Input: 1,000,000 movies that 500,000 users have chosen to watch
  • Task: Recommend movies to users

To solve this problem some method is needed to determine which movies are similar to each other.

Organizing Movies by Similarity (1d)

A list of movies ordered in a single line from left to right. Starting with the left, 'Shrek', 'The Incredibles', 'The Triplets of Belleville', 'Harry Potter', 'Star Wars', 'Bleu', 'The Dark Knight Rises', and 'Memento'

Organizing Movies by Similarity (2d)

The same list of movies in the previous slide but arranged across two dimensions, so for example 'Shrek' is to the left and above of 'The Incredibles

Two-Dimensional Embedding

Similar to the previous diagram but with axis and labels for each quadrant. The arrangement of the movies is the following: in the first, upper-right quadrant is Adult Blockbusters containing 'Star Wars' and 'The Dark Knight Rises' with the movies 'Hero' and 'Crouching Tiger, Hidden Dragon' were added to the Adult Blockbusters quadrant. The second, lower-right quadrant is Adult Arthouse containing the movies 'Bleu' and 'Memento' with 'Waking Life' added to the Adult Arthouse quadrant. The third, lower-left quadrant is Children Arthouse and it contains the movie 'The Triplets of Belleville' and 'Wallace and Gromit' is added to the Children Arthouse quadrant. The fourth and final quadrant in the upper-left is Children Blockbusters containing 'Shrek', 'The Incredibles', and 'Harry Potter' and the movie 'School of Rock' is added to the Children Blockbusters quadrant.

Two-Dimensional Embedding

The same arrangement as the last slide. 'Shrek' and 'Bleu' are highlighted as examples of their coordinates in the 2d embedding plane.

d-Dimensional Embeddings

  • Assumes user interest in movies can be roughly explained by d aspects
  • Each movie becomes a d-dimensional point where the value in dimension d represents how much the movie fits that aspect
  • Embeddings can be learned from data

Learning Embeddings in a Deep Network

  • No separate training process needed -- the embedding layer is just a hidden layer with one unit per dimension
  • Supervised information (e.g. users watched the same two movies) tailors the learned embeddings for the desired task
  • Intuitively the hidden units discover how to organize the items in the d-dimensional space in a way to best optimize the final objective

Input Representation

  • Each example (a row in this matrix) is a sparse vector of features (movies) that have been watched by the user
  • Dense representation of this example as: (0, 1, 0, 1, 0, 0, 0, 1)

Is not efficient in terms of space and time.

A table where each column header is a movie and each row represents a user and the movies they have watched.

Input Representation

  • Build a dictionary mapping each feature to an integer from 0, ..., # movies - 1
  • Efficiently represent the sparse vector as just the movies the user watched. This might be represented as: Based on the column position of the movies in the sparse vector displayed on the right, the movies 'The Triplets from Belleville', 'Wallace and Gromit', and 'Memento' can be efficiently represented as (0,1, 999999)
A sparse vector represented as a table with each column representing a movie and each row representing a user. The table contains the movies from the previous diagrams and is numbered from 1 to 999999. Each cell of the table is checked if a user has watched a movie.

An Embedding Layer in a Deep Network

Regression problem to predict home sales prices:

A diagram of a deep neural network used to predict home sale prices image/svg+xml Sparse Vector Encoding 3 DimensionalEmbedding ... Words in real estate ad Latitude Longitude Sale Price L2 Loss Highlights the sparse vector encoding Highlights hidden three-dimensional embedding layer Highlights the additional latitude and longitude input features Highlights input features feeding into multiple hidden layers Highlights the output of the deep neural network

An Embedding Layer in a Deep Network

Regression problem to predict home sales prices:

A diagram of a deep neural network used to predict home sale prices image/svg+xml Sparse Vector Encoding 3 DimensionalEmbedding ... Words in real estate ad Latitude Longitude Sale Price L2 Loss Highlights the sparse vector encoding Highlights hidden three-dimensional embedding layer Highlights the additional latitude and longitude input features Highlights input features feeding into multiple hidden layers Highlights the output of the deep neural network

An Embedding Layer in a Deep Network

Regression problem to predict home sales prices:

A diagram of a deep neural network used to predict home sale prices image/svg+xml Sparse Vector Encoding 3 DimensionalEmbedding ... Words in real estate ad Latitude Longitude Sale Price L2 Loss Highlights the sparse vector encoding Highlights hidden three-dimensional embedding layer Highlights the additional latitude and longitude input features Highlights input features feeding into multiple hidden layers Highlights the output of the deep neural network

An Embedding Layer in a Deep Network

Regression problem to predict home sales prices:

A diagram of a deep neural network used to predict home sale prices image/svg+xml Sparse Vector Encoding 3 DimensionalEmbedding ... Words in real estate ad Latitude Longitude Sale Price L2 Loss Highlights the sparse vector encoding Highlights hidden three-dimensional embedding layer Highlights the additional latitude and longitude input features Highlights input features feeding into multiple hidden layers Highlights the output of the deep neural network

An Embedding Layer in a Deep Network

Regression problem to predict home sales prices:

A diagram of a deep neural network used to predict home sale prices image/svg+xml Sparse Vector Encoding 3 DimensionalEmbedding ... Words in real estate ad Latitude Longitude Sale Price L2 Loss Highlights the sparse vector encoding Highlights hidden three-dimensional embedding layer Highlights the additional latitude and longitude input features Highlights input features feeding into multiple hidden layers Highlights the output of the deep neural network

An Embedding Layer in a Deep Network

Regression problem to predict home sales prices:

A diagram of a deep neural network used to predict home sale prices image/svg+xml Sparse Vector Encoding 3 DimensionalEmbedding ... Words in real estate ad Latitude Longitude Sale Price L2 Loss Highlights the sparse vector encoding Highlights hidden three-dimensional embedding layer Highlights the additional latitude and longitude input features Highlights input features feeding into multiple hidden layers Highlights the output of the deep neural network

An Embedding Layer in a Deep Network

Multiclass Classification to predict a handwritten digit:

Diagram of a deep neural network used to predict handwritten digits image/svg+xml 3 DimensionalEmbedding Sparse Vector Encoding Raw bitmap of the hand drawn digit Other features ... ... Target Class Label 0 1 2 3 4 5 6 7 8 9 "One-hot"target probdist. (sparse) LogitLayer 0 1 2 3 4 5 6 7 8 9 Softmax Loss Highlights the sparse vector encoding feeding into a hidden three-dimensional embedding layer. Highlights input features feeding into multiple hidden layers. Highlights the output as a logit layer. Highlights the target class layer. Highlights the sparse vector encoding as an input. Highlights other optional unnamed features.

An Embedding Layer in a Deep Network

Multiclass Classification to predict a handwritten digit:

Diagram of a deep neural network used to predict handwritten digits image/svg+xml 3 DimensionalEmbedding Sparse Vector Encoding Raw bitmap of the hand drawn digit Other features ... ... Target Class Label 0 1 2 3 4 5 6 7 8 9 "One-hot"target probdist. (sparse) LogitLayer 0 1 2 3 4 5 6 7 8 9 Softmax Loss Highlights the sparse vector encoding feeding into a hidden three-dimensional embedding layer. Highlights input features feeding into multiple hidden layers. Highlights the output as a logit layer. Highlights the target class layer. Highlights the sparse vector encoding as an input. Highlights other optional unnamed features.

An Embedding Layer in a Deep Network

Multiclass Classification to predict a handwritten digit:

Diagram of a deep neural network used to predict handwritten digits image/svg+xml 3 DimensionalEmbedding Sparse Vector Encoding Raw bitmap of the hand drawn digit Other features ... ... Target Class Label 0 1 2 3 4 5 6 7 8 9 "One-hot"target probdist. (sparse) LogitLayer 0 1 2 3 4 5 6 7 8 9 Softmax Loss Highlights the sparse vector encoding feeding into a hidden three-dimensional embedding layer. Highlights input features feeding into multiple hidden layers. Highlights the output as a logit layer. Highlights the target class layer. Highlights the sparse vector encoding as an input. Highlights other optional unnamed features.

An Embedding Layer in a Deep Network

Multiclass Classification to predict a handwritten digit:

Diagram of a deep neural network used to predict handwritten digits image/svg+xml 3 DimensionalEmbedding Sparse Vector Encoding Raw bitmap of the hand drawn digit Other features ... ... Target Class Label