# Embeddings

An embedding is a relatively low-dimensional space into which you can translate high-dimensional vectors. Embeddings make it easier to do machine learning on large inputs like sparse vectors representing words. Ideally, an embedding captures some of the semantics of the input by placing semantically similar inputs close together in the embedding space. An embedding can be learned and reused across models.

# Embeddings

• Input: 1,000,000 movies that 500,000 users have chosen to watch
• Task: Recommend movies to users

To solve this problem some method is needed to determine which movies are similar to each other.

• Assumes user interest in movies can be roughly explained by d aspects
• Each movie becomes a d-dimensional point where the value in dimension d represents how much the movie fits that aspect
• Embeddings can be learned from data
• No separate training process needed -- the embedding layer is just a hidden layer with one unit per dimension
• Supervised information (e.g. users watched the same two movies) tailors the learned embeddings for the desired task
• Intuitively the hidden units discover how to organize the items in the d-dimensional space in a way to best optimize the final objective
• Each example (a row in this matrix) is a sparse vector of features (movies) that have been watched by the user
• Dense representation of this example as: (0, 1, 0, 1, 0, 0, 0, 1)

Is not efficient in terms of space and time.

• Build a dictionary mapping each feature to an integer from 0, ..., # movies - 1
• Efficiently represent the sparse vector as just the movies the user watched. This might be represented as:

Regression problem to predict home sales prices:

Regression problem to predict home sales prices:

Regression problem to predict home sales prices:

Regression problem to predict home sales prices:

Regression problem to predict home sales prices:

Regression problem to predict home sales prices:

Multiclass Classification to predict a handwritten digit:

Multiclass Classification to predict a handwritten digit:

Multiclass Classification to predict a handwritten digit:

Multiclass Classification to predict a handwritten digit:

Multiclass Classification to predict a handwritten digit:

Multiclass Classification to predict a handwritten digit:

Multiclass Classification to predict a handwritten digit:

Collaborative Filtering to predict movies to recommend:

Collaborative Filtering to predict movies to recommend:

Collaborative Filtering to predict movies to recommend:

Collaborative Filtering to predict movies to recommend:

Collaborative Filtering to predict movies to recommend:

Collaborative Filtering to predict movies to recommend:

Collaborative Filtering to predict movies to recommend:

Deep Network

• Each of hidden units corresponds to a dimension (latent feature)
• Edge weights between a movie and hidden layer are coordinate values

Geometric view of a single movie embedding

• Higher-dimensional embeddings can more accurately represent the relationships between input values
• But more dimensions increases the chance of overfitting and leads to slower training
• Empirical rule-of-thumb (a good starting point but should be tuned using the validation data):
• $$dimensions \approx \sqrt[4]{possible\;values}$$
• Embeddings map items (e.g. movies, text,...) to low-dimensional real vectors in a way that similar items are close to each other
• Embeddings can also be applied to dense data (e.g. audio) to create a meaningful similarity metric
• Jointly embedding diverse data types (e.g. text, images, audio, ...) define a similarity between them