The following questions cover concepts that you should have a solid grasp on
before moving on to more advanced courses. Click on your selection to expand and
check your answer.
Which of the following suggest a potential problem in using ML
for your project?
You only make predictions.
Want to make decisions, not just predictions! Your product should take
action on the output of the model. ML is better at making decisions than
giving you insights.
You have a clear use case.
Start with the problem, not the solution. Focus on problems that would
be difficult to solve with traditional programming. Make sure you aren't
treating ML as a hammer for your problems.
You have access to historical data.
Actually, you're good, this is what you want!
Machine learning is about finding patterns in relevant data and applying
it to data you haven't seen before. This requires you have (or can get)
existing relevant data.
When using supervised machine learning, your ML problem is well-defined
if you have:
BOTH inputs and outputs identified
A well-defined problem has both inputs and outputs. Inputs are the
features. Outputs are the labels to predict.
EITHER inputs or outputs identified
If you're missing inputs or outputs, then your problem isn't
How many features should you pick when you are first starting a machine
Pick 1-3 features that seem to have strong predictive power
It's best for your data collection pipeline to start with only one to
three features. This will help you confirm that ML is a viable approach to
your problem. Also, when you build a baseline from a couple of features,
you'll feel like you're making progress!
Pick 4-6 features that seem to have strong predictive power
You might eventually use this many features, but it's still better to
start with fewer.
Pick as many features as you can, so you can start observing which
features have the strongest predictive power.
Start smaller. The more features you begin with, the harder it is to
see what's working. Fewer features usually means fewer unnecessary
Should you collect data and look for correlations before defining your
Searching for correlations in existing data dumps is hard because the
correlations you find might be spurious. This is only advisable if you
have HUGE amounts of data and can conduct live experiments.
Warning: if you try enough experiments, you’ll find something that
works, but there's no guarantee that it’ll be useful in production
(or even that it’s a real phenomenon).