Develop a model using DEEP DS template

1. Prepare DEEP DS environment

Install cookiecutter (if not yet done)

$ pip install cookiecutter

Run the DEEP DS cookiecutter template

$ cookiecutter https://github.com/indigo-dc/cookiecutter-data-science

Answer all questions from DEEP DS cookiecutter template with attentions to repo_name i.e. the name of your github repositories, etc. This creates two project directories:

~/DEEP-OC-your_project
~/your_project

Go to github.com/your_account and create corresponding repositories: DEEP-OC-your_project and your_project Do git push origin master in both created directories. This puts your initial code to github.

2. Improve the initial code of the model

The structure of your_project created using DEEP DS template contains the following core items needed to develop a DEEP DS model:

requirements.txt
data/
models/
{{repo_name}}/dataset/make_dataset.py
{{repo_name}}/features/build_features.py
{{repo_name}}/models/model.py

2.1 Installing development requirements

Modify requirements.txt according to your needs (e.g. add more libraries) then run

$ pip install -r requirements.txt

You can modify and add more source files and put them accordingly into the directory structure.

2.2 Make datasets

Source files in this directory aim to manipulate raw datasets. The output of this step is also raw data, but cleaned and/or pre-formatted.

{{repo_name}}/dataset/make_dataset.py
{{repo_name}}/dataset/

2.3 Build features

This step takes the output from the previous step Make datasets and creates train, test as well as validation ML data from raw but cleaned and pre-formatted data. The realisation of this step depends on the concrete use case, the aim of the application as well as available technological backgrounds (e.g. high-performance supports for data processing).

{{repo_name}}/features/build_features.py
{{repo_name}}/features/

2.4 Develop models

This step deals with the most interesting phase in ML i.e. modelling. The most important thing of DEEP DS models is located in model.py containing DEEP entry point implementations. DEEP entry points are defined using API methods. You don’t need to implement all of them, just the ones you need.

{{repo_name}}/models/model.py
{{repo_name}}/models/

3. Create a python installable package for your model

To create a python installable package the initial directory structure should look something like this:

your_model_package/
        your_model_package/
                __init__.py
        setup.py
        setup.cfg
        requirements.txt
        LICENSE
        README
  • The top level directory will be the root of your repo, e.g. your_model_package.git. The subdir, also called your_model_package, is the actual python module.
  • setup.py is the build script for setuptools. It tells setuptools about your package (such as the name and version) as well as which code files to include. You can find an example of a setup.py file here. For the official documentation on how to write your setup script, you can go here.
  • setup.cfg can be used to get some information from the user, or from the user’s system in order to proceed. Configuration files also let you providedefault values for any command option. An example of a setup.cfg file can be found here. The official python documentation on the setup configuration file can be found here.
  • requirements.txt contains any external requirement needed to run the package. You can see an example of a requirements file here. An example of a requirements file can be found here.
  • The README file will contain information on how to run the package or anything else that you may find useful for someone running your package.
  • LICENSE It’s important for every package uploaded to the Python Package Index to include a license. This tells users who install your package the terms under which they can use your package. For help choosing a license, go here.

To see how to install your model package, check the Dockerfile in the next section.

4. Create a docker container for your model

Once your model is well in place, you can encapsulate it by creating a docker container. For this you need to create a Dockerfile. This file will contain the information about the Docker, including the type of operating system you want to run on and the packages you need installed to make your package run.

The simplest Dockerfile could look like this:

FROM ubuntu:18.04

WORKDIR /srv

#Download and install your model package
RUN git clone https://github.com/your_git/your_model_package && \
cd image-classification-tf && \
python -m pip install -e . && \
cd ..

#Install DEEPaaS
pip install deepaas

# Install rclone
RUN wget https://downloads.rclone.org/rclone-current-linux-amd64.deb && \
dpkg -i rclone-current-linux-amd64.deb && \
apt install -f && \
rm rclone-current-linux-amd64.deb && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* && \
rm -rf /root/.cache/pip/* && \
rm -rf /tmp/*

# Expose API on port 5000 and tensorboard on port 6006
EXPOSE 5000 6006

CMD deepaas-run --listen-ip 0.0.0.0

For more details on rclone or on DEEPaas API you can check here and here respectively.

If you want to see an example of a more complex Dockerfile, you can check it here.

In order to compile the Dockerfile, you should choose a name for the container and use the docker build command:

docker build -t your_container_name -f Dockerfile

You can then upload it to Docker hub so that you can download the already compiled image directly. To do so, follow the instructions here.