Machine Learning Engineering for production

3 min readJun 16, 2021

To perform any machine learning model in production, several experimentation cycles are required to recognise the right ML model to accomplish business value. This experimentation phase adds additional complexity to any ML project. These complexities include the following:

Preparing and maintaining high-quality data for training ML models; Tracking models in production to detect performance degradation; Performing ongoing experimentation of new data sources, ML algorithms, and hyperparameters, and then tracking these experiments; Maintaining the veracity of models by continuously retraining them on fresh data; Avoiding training-serving skews that are due to inconsistencies in data and in runtime dependencies between training environments and serving environments; Handling concerns about model fairness and adversarial attacks;

To manage this complexity, we need well-defined structures, processes, and proper software tools that manage ML artefacts and cover the machine learning cycle.

MLOps is the process of taking an experimental Machine Learning model into a production system. The word is a compound of “Machine Learning” and the continuous development practice of DevOps in the software field. Machine Learning models are tested and developed in isolated experimental systems

The core activity during this ML development phase is experimentation. As data scientists and ML researchers prototype model architectures and training routines, they create labeled datasets, and they use features and other reusable ML artifacts that are governed through the data and model management process. The primary output of this process is a formalized training procedure, which includes data preprocessing, model architecture, and model training settings.
If the ML system requires continuous training (repeated retraining of the model), the training procedure is operationalized as a training pipeline. This requires a CI/CD routine to build, test, and deploy the pipeline to the target execution environment.
The continuous training pipeline is executed repeatedly based on retraining triggers, and it produces a model as output. The model is retrained as new data becomes available, or if model performance decay is detected. Other training artifacts and metadata that are produced by a training pipeline are also tracked. If the pipeline produces a successful model candidate, that candidate is then tracked by the model management process as a registered model.
The registered model is annotated, reviewed, and approved for release and is then deployed to a production environment. This process might be relatively covered if we are using a no-code solution, or it can involve building a custom CI/CD pipeline for progressive delivery.

To aggregate, performing Machine Learning in a production environment doesn’t only imply deploying a model as an API for prediction. Moderately, it means deploying a Machine Learning pipeline that can automate the retraining and deployment of new models. Setting up a CI/CD system enables to automatically test and deploy new pipeline implementations. This system lets us cope with rapid changes in data and the business environment. We don’t have to immediately move all of the processes from one level to another. We can gradually implement these practices to help improve the automation of ML system development and production.

Machine Learning Engineering for production

Written by tanhim islam