Stay organized with collections
Save and categorize content based on your preferences.
ML practitioners spend far more time evaluating, cleaning, and transforming
data than building models.
Data is so important that this course devotes three entire units to the topic:
This unit focuses on
numerical data,
meaning integers or floating-point values
that behave like numbers. That is, they are additive, countable, ordered,
and so on. The next unit focuses on
categorical data, which can
include numbers that behave like categories. The third unit focuses on how to
prepare your data to ensure high-quality results when training and evaluating
your model.
Examples of numerical data include:
Temperature
Weight
The number of deer wintering in a nature preserve
In contrast, US postal codes, despite
being five-digit or nine-digit numbers, don't behave like numbers or represent
mathematical relationships. Postal code 40004 (in Nelson County, Kentucky) is
not twice the quantity of postal code 20002 (in Washington, D.C.). These numbers
represent categories, specifically geographic areas, and are considered
categorical data.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-10-09 UTC."],[[["This module focuses on preparing numerical data, such as temperature or weight, for use in machine learning models."],["Machine learning practitioners spend significant time on data preparation tasks like cleaning and transformation."],["The module covers techniques like feature scaling, outlier detection, and binning to improve data quality for model training."],["Learners should have a basic understanding of machine learning concepts before starting this module."],["Categorical data, like postal codes, will be addressed in a separate module due to its distinct characteristics and handling requirements."]]],[]]