DEEP Data Science template

To simplify the development and in an easy way integrate your model with the DEEPaaS API, a project template, cookiecutter-data-science [*], is provided in our GitHub.

In order to create your project based on the template, one has to install and then run cookicutter tool as follows:

$ cookiecutter https://github.com/indigo-dc/cookiecutter-data-science

You are first provided with [Info] line about the parameter and in the next line you configure this parameter. You will be asked to configure:

  • Remote URL to host your new repositories (git), e.g. https://github.com/deephdc, git_base_url
  • Project name, project_name
  • Name of your new repository, to be added after “git_base_url” (see above)”, repo_name (aka <your_project> in the following)
  • Author name(s) (and/or your organization/company/team). If many, separate by comma, author_name
  • E-Mail(s) of main author(s) (or contact person). If many, separate by comma, author_email
  • Short description of the project, description
  • Application version (expects X.Y.Z (Major.Minor.Patch)), app_version
  • Choose open source license, default is MIT. For more info: https://opensource.org/licenses, open_source_license
  • User account at hub.docker.com, e.g. ‘deephdc’ in https://hub.docker.com/u/deephdc, dockerhub_user
  • Docker image your Dockerfile starts from (FROM <docker_baseimage>) (don’t provide the tag here), e.g. tensorflow/tensorflow, docker_baseimage
  • CPU tag for the baseimage, e.g. 1.14.0-py3. Has to match python3!, baseimage_cpu_tag
  • GPU tag for the baseimage, e.g. 1.14.0-gpu-py3. Has to match python3!, baseimage_gpu_tag

Note

These parameters are defined in cookiecutter.json in the cookiecutter-data-science source.

When these questions are answered, following two repositories will be created locally and immediately linked to your git_base_url:

~/DEEP-OC-your_project
~/your_project

each repository has two branches: ‘master’ and ‘test’.

<your_project> repo

Main repository to integrate model with the following structure:

|
├── data                   Placeholder for the data
│   └── raw                   The original, immutable data dump.

├── docs                   Documentation on the project; see sphinx-doc.org for details

├── models                 Trained and serialized models, model predictions, or model summaries

├── notebooks              Jupyter notebooks. Naming convention is a number (for ordering),
│                            the creator's initials (if many user development),
│                            and a short `_` delimited description,
│                            e.g. `1.0-jqp-initial_data_exploration.ipynb`.

├── references             Data dictionaries, manuals, and all other explanatory materials.

├── reports                Generated analysis as HTML, PDF, LaTeX, etc.

├── your_project           Main source code of the project
│    │
│    ├── __init__.py          Makes your_project a Python module
│    │
│    ├── dataset              Scripts to download and manipulate raw data
│    │   └── make_dataset.py
│    │
│    ├── features             Scripts to prepare raw data into features for modeling
│    │   └── build_features.py
│    │
│    ├── models               Scripts to train models and then use trained models to make predictions
│    │   └── deep_api.py         Main script for the integration with DEEP API
│    │
│    ├── tests                Scripts to perfrom code testing
│    │
│    └── visualization        Scripts to create exploratory and results oriented visualizations
│        └── visualize.py

├── .dockerignore          Describes what files and directories to exclude for building a Docker image

├── .gitignore             Specifies intentionally untracked files that Git should ignore

├── Jenkinsfile            Describes basic Jenkins CI/CD pipeline

├── LICENSE                License file

├── README.md              The top-level README for developers using this project.

├── requirements.txt       The requirements file for reproducing the analysis environment,
│                             e.g. generated with `pip freeze > requirements.txt`

├── setup.cfg              makes project pip installable (pip install -e .)

├── setup.py               makes project pip installable (pip install -e .)

├── test-requirements.txt  The requirements file for the test environment

└── tox.ini                tox file with settings for running tox; see tox.testrun.org

Certain files, e.g. README.md, Jenkinsfile, setup.cfg, tox.ini, etc are pre-populated based on the answers you provided during cookiecutter call (see above).

<DEEP-OC-your_project>

Repository for the integration of the DEEPaaS API and your_project in one Docker image.

├─ Dockerfile     Describes main steps on integrationg DEEPaaS API and
│                     your_project application in one Docker image

├─ Jenkinsfile    Describes basic Jenkins CI/CD pipeline

├─ LICENSE        License file

├─ README.md      README for developers and users.

├─ docker-compose.yml     Allows running the application with various configurations via docker-compose

├─ metadata.json          Defines information propagated to the DEEP Open Catalog, https://marketplace.deep-hybrid-datacloud.eu

All files get filled with the info provided during cookiecutter execution (see above).

Step-by-step guide

  1. (if not yet done) install cookiecutter, as e.g. pip install cookiecutter
  2. run cookiecutter https://github.com/indigo-dc/cookiecutter-data-science
  3. answer all the questions, pay attention about docker tags!
  4. two directories will be created: <your_project> and <DEEP-OC-your_project> (each with two git branches: master and test)
  5. go to github.com/user_account and create corresponding repositories <your_project> and <DEEP-OC-your_project>
  6. go to your terminal, <your_project>, git push origin --all
  7. go to your terminal, <DEEP-OC-your_project>, git push origin --all
  8. your github repositories are now updated with initial commits
  9. you can build <deep-oc-your_project> Docker image locally: go to <DEEP-OC-your_project> directory, do docker build -t dockerhubuser/deep-oc-your_project .
  10. you can now run deepaas as docker run -p 5000:5000 dockerhubuser/deep-oc-your_project

[*]The more general cookiecutter-data-science template was adapted for the purpose of DEEP.