Artificial Intelligence / Machine Learning

Customer Segmentation Classification (Part-3)

MLFlow is an open-source end-to-end platform for managing a machine learning life-cycle provided by Databricks.

Why do I need MLFlow?

Most data scientists and ML engineers are able to create a model on their laptops and desktops. They may use Anaconda, Jupyter or some other IDE to code their ML models. The following problems arise when they have to improve their model performance over time and when multiple members of a team are working on the same model

Keep track of all the parameters tuned and tweaked in the model.
Keep track of the outputs, accuracy and error scores.
Maintain the record of models and their related data objects (scalers, imputers, encoders etc.)
Version their models.
Share the model with team members — what are the prerequisites/setup needed in place for other members to run the model on their system.
Wrap their models with API and deploying it, will require extra coding and tech stack knowledge.

This is where the MLFlow comes into the picture, relieves the pain of learning an entirely new tech stack to maintain, track and deploy models. Provide simple APIs which you can integrate into the model code and you are on your way to deploying your model to production.

Quick Hands-on ML flow

Before we race off to build the greatest classifier ever seen, we need to start the MLFlow server on our local system. Download the mlflow library using python

!pip3 install mlflow

Add MLflow to your code

For many popular ML libraries, you make a single function call: mlflow.autolog(). If you are using one of the supported libraries, this will automatically log the parameters, metrics, and artifacts of your run. For instance, the following autologs a scikit-learn run:

mlflow.autolog()
data = pd.read_csv("Train.csv")
data.head(4553)

Add mlflow.autolog() before model training

Comparing Models Using MLFOW UI

Once you’ve run your code, you may view the results with MLflow’s tracking UI. To start the UI, run following command in your cmd:

Now, It will start a web interface in your local host

Open that link, now you can see mlflow ui page.

The MLflow UI allows you to track and compare different model experiments easily by accessing it through http://localhost:5000/.The UI can also be used to do the following:

Model Access and Download: Every trained model can be accessed and downloaded directly from the MLflow UI.
Metrics and Parameters View: Model metrics and parameters are presented in a sortable format, enabling easy analysis and comparison.
Identify Consistently Performing Parameters: The UI helps identify parameters that consistently yield good results, reducing the need for extensive grid search training.
Interactive Dashboard: The dashboard-like interface makes sorting and filtering through numerous iterations intuitive and efficient.
Customizable Column Selection: Users can choose to display specific columns like parameters and metrics, providing a more focused view.
Export Runs Information: The UI allows users to download all runs' information into a CSV file, facilitating further analysis using external tools like Excel.
Metrics Visualization: The UI provides tools to plot and visualize metrics, aiding in understanding model performance.
Model Registration and Deployment: Once the final model is selected, it can be registered and served for predictions, streamlining the deployment process.

ML runs Folder

A brief understanding of what goes on under the hood. When mlflow is used, it creates a folder mlruns, which is a repository of the project.

The mlruns folder contains run_ids, for each run separate folder is created.
Each folder has 4 subfolders
- artifacts
- metrics
- params
- tags
Artifacts have the conda environment used in the model, the model pkl file. Sample conda.yml file from artifacts.

channels:
- defaults
- conda-forge

dependencies:
- python=3.8.5
- pip
- pip:
- mlflow
- scikit-learn==0.23.2
- cloudpickle==1.6.0

name: mlflow-env
Metrics folder logs metrics such as RSME/MAE, customer metric etc.
Params folder logs – features, model paramters(max_depth, max_iter, learning_rate, verbose etc)
Tags track history and use details.
[{“run_id”: “1a2bef6340dd4610841234d860c35f2d”, “artifact_path”: “catboost-reg-model”, “utc_time_created”: “2021-07-09 06:15:51.956861”, “flavors”: {“python_function”: {“model_path”: “model.pkl”, “loader_module”: “mlflow.sklearn”, “python_version”: “3.8.5”, “env”: “conda.yaml”}, “sklearn”: {“pickled_model”: “model.pkl”, “sklearn_version”: “0.23.2”, “serialization_format”: “cloudpickle”}}}]

Model Deployment on Flask

Model deployment is the process of making a trained machine learning model available for use in real-world applications. After a model is trained and evaluated, deploying it involves hosting it on servers or cloud-based platforms so that it can be accessed by other systems or applications to make predictions or perform specific tasks based on the model's capabilities.

What is Flask

Flask is a web application framework written in Python. It has multiple modules that make it easier for a web developer to write applications without having to worry about the details like protocol management, thread management, etc.

Flask gives is a variety of choices for developing web applications and it gives us the necessary tools and libraries that allow us to build a web application.

How to install flask

Well installing flask on your windows is very easy & straight forward.
To install Flask, you need to run the following command:

!pip install flask

Save the model :

To save a trained machine learning model in Python, you can use libraries like joblib or pickle. These libraries allow you to serialize the model object and save it to a file, which can be loaded later for making predictions. You can also download the train model from mlflow ui

You can see model.pkl file in mlflow your ui. Select it after that on the right side you can see download option click on it your model will be downloaded.

Server

In this file, we will use the flask web framework to handle the POST requests.
Importing the methods and libraries that we are going to use in the code.

import numpy as np
from flask import Flask, request, jsonify
import pickle

Here we have imported numpy to create the array of requested data, pickle to load our trained model to predict.

In the following section of the code, we have created the instance of the Flask() and loaded the model into the model.

app = Flask(name)
model = pickle.load(open('model.pkl','rb'))

Here, we have bounded /api with the method predict(). In which predict method gets the data from the json passed by the requestor. model.predict() method takes input from the json and converts it into 2D numpy array the results are stored into the variable named output and we return this variable after converting it into the json object using flasks jsonify() method.

@app.route('/')

def home():
return render_template('index.html')
@app.route('/predict',methods = ['POST'])

def predict():
int_features = [int(x) for x in request.form.values()]
final_features = [np.array(int_features)]
prediction = model.predict(final_features)
output = round(prediction[0],2)
return render_template('index.html',prediction_text='predicted output{}'.format(output))

Finally, we will run our server by following code section. Here I have used port 5000 and have set debug=True since if we get any error we can debug it and solve it.

if name == 'main':
app.run(port=5000, debug=True)

Here, our server is ready to serve the requests. Here is the whole code of the server.py.

Import libraries

import numpy as np
from flask import Flask, request, jsonify
import pickle
app = Flask(name)

Load the model

model = pickle.load(open('model.pkl','rb'))
@app.route('/')
def home():
return render_template('index.html')
@app.route('/predict',methods = ['POST'])
def predict():
int_features = [int(x) for x in request.form.values()]
final_features = [np.array(int_features)]
prediction = model.predict(final_features)
output = round(prediction[0],2)
return render_template('index.html',prediction_text = 'predicted output{}'.format(output))

if name == 'main':
app.run(port=5000, debug=True)

Template

The template folder in web development contains all the HTML, CSS, and JavaScript files that define the appearance and functionality of web pages. It is where the frontend components are stored and used to render dynamic web pages based on data from the backend. The template folder keeps the visual elements separate from the backend logic, enabling easier collaboration and maintenance. When a web page is requested, the appropriate template is combined with backend data to generate an HTML page for display in the user's web browser.

In the "index.html" file, you will create a table that contains the independent variables used to train the model. These variables are the inputs or features that the model relies on to make predictions. By displaying them in a table format, users can easily understand and interact with the data that influences the model's behavior.

How to start flask

To run flask app in your system, you need to type the following command in your command prompt:

flask --app try run

You should see the Flask app running, and you can access it by visiting http://localhost:5000/ in your web browser.

CONCLUSION

In this end-to-end article, we covered the complete workflow of building a classification model, starting from data preprocessing, model training, and finally, deploying the model on Flask. Data preprocessing ensures the data is clean and relevant, improving the model's performance. Model training allows us to build an accurate classifier through selecting and tuning appropriate algorithms.

ML Flow enables efficient experimentation, providing better insights into the model's performance and helping us make informed decisions. By deploying the model on Flask, we turn it into a user-friendly web service, accessible to anyone with an internet connection. This end-to-end process allows us to not only develop but also share our classification model with others, encouraging collaboration and further advancements in the field.

As the field of machine learning continues to evolve, it is essential to stay updated with the latest tools and methodologies to build robust and scalable classification systems. This article serves as a foundational guide for anyone seeking to dive into classification tasks, and I hope it inspires further exploration and innovation in the field of machine learning..

For First Part click here

For Second Part click here

References:

Written by Ankit Mandal