You can solve the core problems of sparse input data by mapping your high-dimensional data into a lower-dimensional space.
As you saw in the movie exercises earlier, even a small multi-dimensional space provides the freedom to group semantically similar items together and keep dissimilar items far apart. Position (distance and direction) in the vector space can encode semantics in a good embedding. For example, the following visualizations of real embeddings show geometrical relationships that capture semantic relations like the relation between a country and its capital:
Figure 4. Embeddings can produce remarkable analogies.
This sort of meaningful space gives your machine learning system opportunities to detect patterns that may help with the learning task.
Shrinking the network
While we want enough dimensions to encode rich semantic relations, we also want an embedding space that is small enough to allow us to train our system more quickly. A useful embedding may be on the order of hundreds of dimensions. This is likely several orders of magnitude smaller than the size of your vocabulary for a natural language task.