Model Deployment

Making your trained model available to make predictions on new data.

Key Notes

Model deployment is the process of integrating a machine learning model into an existing production environment where it can take in input and return predictions. A model sitting on a data scientist's laptop is just an artifact; deployment is what turns it into a valuable business tool. There are several common patterns for deploying models. One of the most popular is to wrap the model in a REST API. Using a web framework like Flask or FastAPI in Python, you can create an endpoint that accepts new data (e.g., in JSON format), passes it to the trained model for prediction, and returns the model's output. This API can then be consumed by other applications, like a website or a mobile app. Another pattern is Batch Prediction. In this scenario, the model runs on a schedule (e.g., once a day) to make predictions on a large batch of new data. For example, a bank might run a batch job every night to score all of its customers for loan default risk. For applications requiring very low latency, models can be deployed via Edge Deployment. This involves deploying the model directly onto the device where the data is generated, such as a smartphone or an IoT sensor. This avoids the need for a network round-trip to a central server. Regardless of the pattern, deployment also involves considerations like saving and versioning your trained model (using tools like MLflow or simply pickling), monitoring its performance in production, and having a strategy for retraining and redeploying it as new data becomes available.

Back to ML in Practice

Model Deployment

Making your trained model available to make predictions on new data.

Key Notes