Let's start with a quick review of a key idea from Machine Learning Crash Course. Look at the distribution in the chart below.

**Figure 1: House prices versus latitude.**

For the following question, click the desired arrow to check your answer:

In cases like the latitude example, you need to divide the latitudes into
buckets to learn something different about housing values for each
bucket. This transformation of numeric features into categorical features, using
a set of thresholds, is called
**bucketing**
(or binning). In this bucketing example, the boundaries are equally spaced.

**Figure 2: House prices versus latitude, now divided into buckets.**

## Quantile Bucketing

Let's revisit our car price dataset with buckets added. With one feature per bucket, the model uses as much capacity for a single example in the >45000 range as for all the examples in the 5000-10000 range. This seems wasteful. How might we improve this situation?

**Figure 3: Number of cars sold at different prices.**

The problem is that equally spaced buckets don’t capture this distribution well.
The solution lies in creating buckets that each have the same number of points.
This technique is called **quantile
bucketing**. For example, the
following figure divides car prices into quantile buckets. In order to get the
same number of examples in each bucket, some of the buckets encompass a narrow
price span while others encompass a very wide price span.

**Figure 4: Quantile bucketing gives each bucket about the same number of cars.**

## Bucketing Summary

If you choose to bucketize your numerical features, be clear about how you are setting the boundaries and which type of bucketing you’re applying:

**Buckets with equally spaced boundaries**: the boundaries are fixed and encompass the same range (for example, 0-4 degrees, 5-9 degrees, and 10-14 degrees, or $5,000-$9,999, $10,000-$14,999, and $15,000-$19,999). Some buckets could contain many points, while others could have few or none.**Buckets with quantile boundaries**: each bucket has the same number of points. The boundaries are not fixed and could encompass a narrow or wide span of values.