Production ML Systems

There's a lot more to machine learning than just implementing an ML algorithm. A production ML system involves a significant number of components.

Production ML Systems

ML system diagram showing just
ML system diagram containing the following components: data collection, feature extraction, process management tools, data verification, configuration, machine resource management, monitoring, and serving infrastructure, and ML code. The ML code part of the diagram is dwarfed by the other nine components.
  • No, you don't have to build everything yourself.
    • Re-use generic ML system components wherever possible.
    • Google CloudML solutions include Dataflow and TF Serving
    • Components can also be found in other platforms like Spark, Hadoop, etc.
    • How do you know what you need?
      • Understand a few ML system paradigms & their requirements

Video Lecture Summary

So far, Machine Learning Crash Course has focused on building ML models. However, as the following figure suggests, real-world production ML systems are large ecosystems of which the model is just a single part.

ML system diagram containing the following components: data collection, feature extraction, process management tools, data verification, configuration, machine resource management, monitoring, and serving infrastructure, and ML code. The ML code part of the diagram is dwarfed by the other nine components.

Figure 1. Real-world production ML system.

The ML code is at the heart of a real-world ML production system, but that box often represents only 5% or less of the overall code of that total ML production system. (That's not a misprint.) Notice that an ML production system devotes considerable resources to input data—collecting it, verifying it, and extracting features from it. Furthermore, notice that a serving infrastructure must be in place to put the ML model's predictions into practical use in the real world.

Fortunately, many of the components in the preceding figure are reusable. Furthermore, you don't have to build all the components in Figure 1 yourself.

TensorFlow Extended (TFX) is an end-to-end platform for deploying production ML pipelines.

Subsequent modules will help guide your design decisions in building a production ML system.