Django for Data Scientists | Tutorial 1: How To Serve A Machine Learning Model with Django

Blogs
by Leon Feb. 6, 2019

In my previous blog post: 5 tips that make your data scientist resume shine. I mentioned that one way to improve your data scientist resume is to add unique skills that make you stand out. In this tutorial, I will walk you through how to productize a machine learning model by serving it through a web API server with Django.

This is the first article for our Django for data scientist tutorials that aims to help a data scientist become more ‘full stack’ and 'stand out' among other data scientists.

Here are a few reasons to consider if you are wondering how web development skills can help with a data scientist's career.

  1. Web dev skills make your resume stand out among other candidates, it could be a killer skill for hiring teams who struggle to fight for engineering resources to productize machine learning models;
  2. Very often a decision maker in a company is not very technical, if you can develop a prototype and allow them to interact with using a web browser, it gives you an edge to help them realize the power of your model and get greeelight your project;
  3. If have a revolutionary idea about using AI to make a dent to the universe, you can quickly convert your idea to a web product, make it available over the whole internet, and let the whole world know about it.

Our Goals:

I will show you how to productize a machine learning model and create a web service hosting your model in 3 steps:

  1. Set up your Django project with Cookiecutter, a great tool to jump start your Django project;
  2. Train a sentiment classification model using 2000 movie reviews data;
  3. Productize your model locally by setting up your web service API;

Let's jump right in. To make it simple, I have created a GitHub repository so you can download all the files in here:

https://github.com/skills-ai/classification_model

Creating a python virtual environment

  1. I am assuming you are using a Mac, the same steps should apply to Unix/Linux based operating system;
  2. Follow step by step instruction and install django cookiecutter, a great Django package to jumpstart your project;

pip install "cookiecutter>=1.4.0"

cookiecutter https://github.com/pydanny/cookiecutter-django

Leons-iMac:projects leon$ cookiecutter https://github.com/pydanny/cookiecutter-django
You've downloaded /Users/leon/.cookiecutters/cookiecutter-django before. Is it okay to delete and re-download it? [yes]:
project_name [My Awesome Project]: Classification Project
project_slug [classification_project]:
description [Behold My Awesome Project!]: My Classification Project
author_name [Daniel Roy Greenfeld]: leon
domain_name [example.com]:
email [leon@example.com]: leon@skills.ai
version [0.1.0]:
Select open_source_license:
1 - MIT
2 - BSD
3 - GPLv3
4 - Apache Software License 2.0
5 - Not open source
Choose from 1, 2, 3, 4, 5 [1]: 5
timezone [UTC]: US/Pacific
windows [n]: n
use_pycharm [n]: y
use_docker [n]: n
Select postgresql_version:
1 - 10.5
2 - 10.4
3 - 10.3
4 - 10.2
5 - 10.1
6 - 9.6
7 - 9.5
8 - 9.4
9 - 9.3
Choose from 1, 2, 3, 4, 5, 6, 7, 8, 9 [1]: 1
Select js_task_runner:
1 - None
2 - Gulp
Choose from 1, 2 [1]: 1
custom_bootstrap_compilation [n]: n
use_compressor [n]: y
use_celery [n]: n
use_mailhog [n]: n
use_sentry [n]: n
use_whitenoise [n]: y
use_heroku [n]: y
use_travisci [n]: n
keep_local_envs_in_vcs [y]: n
debug [n]: n
 [WARNING]: Cookiecutter Django does not support Python 2. Stability is guaranteed with Python 3.6+ only, are you sure you want to proceed (y/n)?
y
 [SUCCESS]: Project initialized, keep up the good work!

Once you the Django project is created, we create a virtual environment and use Python 3.6 so we can avoid conflicts by using the specific python libraries and

Leons-iMac:projects leon$ cd classification_project/

Leons-iMac:classification_project leon$ ls
Procfile                              docs                                  pytest.ini                            setup.cfg
README.rst                            locale                                requirements                          utility
classification_project                manage.py                             requirements.txt
config                                merge_production_dotenvs_in_dotenv.py runtime.txt

Leons-iMac:classification_project leon$ virtualenv -p python3 venv

Leons-iMac:classification_project leon$ ls
Procfile                              docs                                  pytest.ini                            setup.cfg
README.rst                            locale                                requirements                          utility
classification_project                manage.py                             requirements.txt                      venv
config                                merge_production_dotenvs_in_dotenv.py runtime.txt

Notice the newly created venv folder, which contains all the necessary files that will be used for your virtual environment.

Then we enter the virtual environment

leons-iMac:classification_project leon$ source venv/bin/activate
(venv) leons-iMac:classification_project leon$

Notice the venv in front of your shell prompt which indicates that you are not in the virtual environment, to leave the virtual env, simply run deactivate on the command line.

Install all the Django libraries that is needed for your local dev environment

(venv) leons-iMac:classification_project leon$ pip install -r requirements/local.txt

Now start your Django development server,

(venv) leons-iMac:classification_project leon$ python manage.py runserver

And you will see this error message:

django.db.utils.OperationalError: FATAL: database "classification_project" does not exist

That is simply because Django tries to access the default Postgres database which does not exists yet, let’s fix that.

(optional) If you have not installed Postgres on your computer, you can install it with homebrew.

mkdir homebrew && curl -L https://github.com/Homebrew/brew/tarball/master | tar xz --strip 1 -C homebrew

Then install Postgres

brew install postgres

Now that your Postgres server is installed, we can go ahead and create a database for this project.

createdb classification_project

Then start the Django dev server again:

(venv) leons-iMac:classification_project leon$ python manage.py runserver
Performing system checks...

System check identified no issues (0 silenced).

You have 23 unapplied migration(s). Your project may not work properly until you apply the migrations for app(s): account, admin, auth, contenttypes, sessions, sites, socialaccount, users.
Run 'python manage.py migrate' to apply them.

January 28, 2019 - 23:05:20
Django version 2.0.10, using settings 'config.settings.local'
Starting development server at http://127.0.0.1:8000/
Quit the server with CONTROL-C.

Open your web browser (e.g., chrome), then go to http://127.0.0.1:8000/

Viola, your Django website is initialed and up and runnin, congratulations.

Now we run the migration command so that Django will create the first set of tables to provision the database.

^C(venv) leons-iMac:classification_project leon$ python manage.py migrate
Operations to perform:
  Apply all migrations: account, admin, auth, contenttypes, sessions, sites, socialaccount, users
Running migrations:
  Applying contenttypes.0001_initial... OK
  Applying contenttypes.0002_remove_content_type_name... OK
  Applying auth.0001_initial... OK
  Applying auth.0002_alter_permission_name_max_length... OK
  Applying auth.0003_alter_user_email_max_length... OK
  Applying auth.0004_alter_user_username_opts... OK
  Applying auth.0005_alter_user_last_login_null... OK
  Applying auth.0006_require_contenttypes_0002... OK
  Applying auth.0007_alter_validators_add_error_messages... OK
  Applying auth.0008_alter_user_username_max_length... OK
  Applying users.0001_initial... OK
  Applying account.0001_initial... OK
  Applying account.0002_email_max_length... OK
  Applying admin.0001_initial... OK
  Applying admin.0002_logentry_remove_auto_add... OK
  Applying auth.0009_alter_user_last_name_max_length... OK
  Applying sessions.0001_initial... OK
  Applying sites.0001_initial... OK
  Applying sites.0002_alter_domain_unique... OK
  Applying sites.0003_set_site_domain_and_name... OK
  Applying socialaccount.0001_initial... OK
  Applying socialaccount.0002_token_max_lengths... OK
  Applying socialaccount.0003_extra_data_default_dict... OK

With the local Django dev project created, now we move on to build our model.

Since our main focus in this article is mainly about how to host a machine learning model, we will not go into too much details about tuning the machine learning model parameters, but the same model serving method can also be a applied to other models.

Start a Django app for modeling

(venv) leons-iMac:classification_project leon$ django-admin startapp modeling

(venv) leons-iMac:classification_project leon$ cd modeling/

(venv) leons-iMac:modeling leon$ ls
__init__.py admin.py    apps.py     migrations  models.py   tests.py    views.py

After that, add 'modeling' to your installed apps in the project settings file: config/base.py.

Notice the app is currently located directly in our project root directory, many of you may prefer to have django app inside the project_slug directory ( classification_project/classificaiton_project instead of classification_project/). To achieve that, follow these 3 simple steps:

  1. move the entire app directory into classification_project/classificaiton_project/ and update the path.

mv modeling classification_project/
cd classificatino_project/modeling/

2. open the apps.py and change `name = modeling.app` to `name = "classification_project.modeling"`

If you followed the above step, make sure include classification_project.modeling.apps.ModelingConfig to installed app section in the settings file: config/base.py

Now we need to install the scikit learn libraries to train the model and predict an incoming sample.

(venv) leons-iMac:modeling leon$ pip install scikit-learn==0.20.2

We should also include scikit-learn in the requirements file to make sure when deploying to production it will be installed.

echo 'scikit-learn==0.20.2' >> requirements/base.txt

Now we can download the movie review data sets, which include 2 preprocessed data sets: positive reviews and negative reviews.

(venv) leons-iMac:classification_project leon$ cd modeling/

(venv) leons-iMac:modeling leon$ python
Python 3.7.2 (default, Jan 13 2019, 12:50:01)
[Clang 10.0.0 (clang-1000.11.45.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

Then execute the following script, the original script can be found on scikit-learn's official github page:

https://github.com/scikit-learn/scikit-learn/blob/master/doc/tutorial/text_analytics/data/movie_reviews/fetch_data.py

import os
import tarfile
from contextlib import closing
try:
    from urllib import urlopen
except ImportError:
    from urllib.request import urlopen


URL = ("http://www.cs.cornell.edu/people/pabo/"
       "movie-review-data/review_polarity.tar.gz")

ARCHIVE_NAME = URL.rsplit('/', 1)[1]
DATA_FOLDER = "txt_sentoken"


if not os.path.exists(DATA_FOLDER):

    if not os.path.exists(ARCHIVE_NAME):
        print("Downloading dataset from %s (3 MB)" % URL)
        opener = urlopen(URL)
        with open(ARCHIVE_NAME, 'wb') as archive:
            archive.write(opener.read())

    print("Decompressing %s" % ARCHIVE_NAME)
    with closing(tarfile.open(ARCHIVE_NAME, "r:gz")) as archive:
        archive.extractall(path='.')
    os.remove(ARCHIVE_NAME)

Now we exit the python and get back to the command line.

(venv) leons-iMac:modeling leon$ ls
__init__.py        apps.py            model.file         poldata.README.2.0 txt_sentoken
admin.py           migrations         models.py          tests.py           views.py

We notice that there is a new folder txt_sentoken, that’s where 2000 preprocessed movie review files under two filer: pos (positive reviews) and neg (negative reviews).

Train the model and save the model into a pickle file. We launch python again and paste the following code.

Original script and detailed explanations can be found here

import sys
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import load_files
from sklearn.model_selection import train_test_split
from sklearn import metrics


movie_reviews_data_folder = 'txt_sentoken'
dataset = load_files(movie_reviews_data_folder, shuffle=False)
print("n_samples: %d" % len(dataset.data))

docs_train, docs_test, y_train, y_test = train_test_split(
    dataset.data, dataset.target, test_size=0.25, random_state=None)

pipeline = Pipeline([
    ('vect', TfidfVectorizer(min_df=3, max_df=0.95)),
    ('clf', LinearSVC(C=1000)),
])

for the parameters
parameters = {
    'vect__ngram_range': [(1, 1), (1, 2)],
}
grid_search = GridSearchCV(pipeline, parameters, n_jobs=-1)
grid_search.fit(docs_train, y_train)

Now that the model is trained using grid_search. We need to save the ‘best’ model so we can use it to serve incoming request and make a prediction. Run the following in python:

from sklearn.externals import joblib

joblib.dump(grid_search.best_estimator_, 'model.file', compress = 1)

You have now saved your classifier model into a binary file ‘classifier.model’ . That’s a lot of codes, stay with me, and we are almost there. Now let’s serve the model by creating a local API.

Open views.py from the modeling app, and add the following code.

from django.shortcuts import render
import os

from django.http import JsonResponse
from sklearn.externals import joblib

CURRENT_DIR = os.path.dirname(__file__)
model_file = os.path.join(CURRENT_DIR, 'model.file')

model = joblib.load(model_file)


# Create your views here.
def api_sentiment_pred(request):
    review = request.GET['review']
    model.predict([review])
    result = 'Positive' if model.predict(['']) else 'Negative'
    return (JsonResponse(result, safe=False))

Now that we have a predict function, we need to bind it to an url, in modeling folder, create a urls.py file and enter the following code:

from django.urls import path

from .views import api_sentiment_pred

urlpatterns = [
    path('api/predict/', api_sentiment_pred, name='api_sentiment_pred'),    
]

Now we need to include this url configuration to project.

Open classification_project/config/urls.py file and add the following

# Your stuff: custom urls includes go here
path('model/', include('classification_project.modeling.urls'))

Now let's start the server from the project root, enter the following:

python manage.py runserver

After the django server is up and running, it might take a few seconds for it to load the model. Go to your browser and enter the following url:

http://localhost:8000/model/api/predict/?review=This movie is great

If everything is running as expected, you will see the predicted results says:

"Positive"

You can also try a few more examples such as:

http://localhost:8000/model/api/predict/?review=I really liked this movie
http://localhost:8000/model/api/predict/?review=This movie is long and boring

Alternatively, you can also use CURL to submit a web request on your command line:

curl 'http://localhost:8000/model/api/predict/?review=This%20movie%20sucks'
"Negative"

Congratulations! Now you have succesfully created a web server to host your machine learning model on your local machine.

Conclusion

In this tutorial:

We used cookiecutter to jumpstart a Django project;

We then trained a classification model based on 2000 movie review data;

We created a local http server to handle web traffic, taking a review text, and output a predicted sentimental analysis result

We've accomplished a lot in this tutorial, if you have followed each step and successfully see a predicted result, you can proudly say now you have hosted your machine learning model and convert it into a http service.

In next tutorial, we will walk you through how to deploy your model into Heroku and let the whole internet use your machine learning service. Stay tuned.

...
Leon
Machine Learning Expert, former Amazon research scientist