DEEP Data Science template¶

To simplify the development and in an easy way integrate your model with the DEEPaaS API, a project template, cookiecutter-data-science [*], is provided in our GitHub.

In order to create your project based on the template, one has to install and then run cookicutter tool as follows:

$ cookiecutter https://github.com/indigo-dc/cookiecutter-data-science

You are first provided with [Info] line about the parameter and in the next line you configure this parameter. You will be asked to configure:

Remote URL to host your new repositories (git), e.g. https://github.com/deephdc, git_base_url
Project name, project_name
Name of your new repository, to be added after “git_base_url” (see above)”, repo_name (aka <your_project> in the following)
Author name(s) (and/or your organization/company/team). If many, separate by comma, author_name
E-Mail(s) of main author(s) (or contact person). If many, separate by comma, author_email
Short description of the project, description
Application version (expects X.Y.Z (Major.Minor.Patch)), app_version
Choose open source license, default is MIT. For more info: https://opensource.org/licenses, open_source_license
User account at hub.docker.com, e.g. ‘deephdc’ in https://hub.docker.com/u/deephdc, dockerhub_user
Docker image your Dockerfile starts from (FROM <docker_baseimage>) (don’t provide the tag here), e.g. tensorflow/tensorflow, docker_baseimage
CPU tag for the baseimage, e.g. 1.14.0-py3. Has to match python3!, baseimage_cpu_tag
GPU tag for the baseimage, e.g. 1.14.0-gpu-py3. Has to match python3!, baseimage_gpu_tag

Note

These parameters are defined in cookiecutter.json in the cookiecutter-data-science source.

When these questions are answered, following two repositories will be created locally and immediately linked to your git_base_url:

~/DEEP-OC-your_project
~/your_project

each repository has two branches: ‘master’ and ‘test’.

<your_project> repo¶

Main repository to integrate model with the following structure:

|
├── data                   Placeholder for the data
│   └── raw                   The original, immutable data dump.
│
├── docs                   Documentation on the project; see sphinx-doc.org for details
│
├── models                 Trained and serialized models, model predictions, or model summaries
│
├── notebooks              Jupyter notebooks. Naming convention is a number (for ordering),
│                            the creator's initials (if many user development),
│                            and a short `_` delimited description,
│                            e.g. `1.0-jqp-initial_data_exploration.ipynb`.
│
├── references             Data dictionaries, manuals, and all other explanatory materials.
│
├── reports                Generated analysis as HTML, PDF, LaTeX, etc.
│
├── your_project           Main source code of the project
│    │
│    ├── __init__.py          Makes your_project a Python module
│    │
│    ├── dataset              Scripts to download and manipulate raw data
│    │   └── make_dataset.py
│    │
│    ├── features             Scripts to prepare raw data into features for modeling
│    │   └── build_features.py
│    │
│    ├── models               Scripts to train models and then use trained models to make predictions
│    │   └── deep_api.py         Main script for the integration with DEEP API
│    │
│    ├── tests                Scripts to perfrom code testing
│    │
│    └── visualization        Scripts to create exploratory and results oriented visualizations
│        └── visualize.py
│
├── .dockerignore          Describes what files and directories to exclude for building a Docker image
│
├── .gitignore             Specifies intentionally untracked files that Git should ignore
│
├── Jenkinsfile            Describes basic Jenkins CI/CD pipeline
│
├── LICENSE                License file
│
├── README.md              The top-level README for developers using this project.
│
├── requirements.txt       The requirements file for reproducing the analysis environment,
│                             e.g. generated with `pip freeze > requirements.txt`
│
├── setup.cfg              makes project pip installable (pip install -e .)
│
├── setup.py               makes project pip installable (pip install -e .)
│
├── test-requirements.txt  The requirements file for the test environment
│
└── tox.ini                tox file with settings for running tox; see tox.testrun.org

Certain files, e.g. README.md, Jenkinsfile, setup.cfg, tox.ini, etc are pre-populated based on the answers you provided during cookiecutter call (see above).

<DEEP-OC-your_project>¶

Repository for the integration of the DEEPaaS API and your_project in one Docker image.

├─ Dockerfile     Describes main steps on integrationg DEEPaaS API and
│                     your_project application in one Docker image
│
├─ Jenkinsfile    Describes basic Jenkins CI/CD pipeline
│
├─ LICENSE        License file
│
├─ README.md      README for developers and users.
│
├─ docker-compose.yml     Allows running the application with various configurations via docker-compose
│
├─ metadata.json          Defines information propagated to the DEEP Open Catalog, https://marketplace.deep-hybrid-datacloud.eu

All files get filled with the info provided during cookiecutter execution (see above).

Step-by-step guide¶

(if not yet done) install cookiecutter, as e.g. pip install cookiecutter
run cookiecutter https://github.com/indigo-dc/cookiecutter-data-science
answer all the questions, pay attention about docker tags!
two directories will be created: <your_project> and <DEEP-OC-your_project> (each with two git branches: master and test)
go to github.com/user_account and create corresponding repositories <your_project> and <DEEP-OC-your_project>
go to your terminal, <your_project>, git push origin --all
go to your terminal, <DEEP-OC-your_project>, git push origin --all
your github repositories are now updated with initial commits
you can build <deep-oc-your_project> Docker image locally: go to <DEEP-OC-your_project> directory, do docker build -t dockerhubuser/deep-oc-your_project .
you can now run deepaas as docker run -p 5000:5000 dockerhubuser/deep-oc-your_project

[*]	The more general cookiecutter-data-science template was adapted for the purpose of DEEP.