DEEP Data Science template¶
$ cookiecutter https://github.com/indigo-dc/cookiecutter-data-science
You are first provided with [Info] line about the parameter and in the next line you configure this parameter. You will be asked to configure:
- Remote URL to host your new repositories (git), e.g. https://github.com/deephdc,
- Project name,
- Name of your new repository, to be added after “git_base_url” (see above)”,
repo_name(aka <your_project> in the following)
- Author name(s) (and/or your organization/company/team). If many, separate by comma,
- E-Mail(s) of main author(s) (or contact person). If many, separate by comma,
- Short description of the project,
- Application version (expects X.Y.Z (Major.Minor.Patch)),
- Choose open source license, default is MIT. For more info: https://opensource.org/licenses,
- User account at hub.docker.com, e.g. ‘deephdc’ in https://hub.docker.com/u/deephdc,
- Docker image your Dockerfile starts from (FROM <docker_baseimage>) (don’t provide the tag here), e.g. tensorflow/tensorflow,
- CPU tag for the baseimage, e.g. 1.14.0-py3. Has to match python3!,
- GPU tag for the baseimage, e.g. 1.14.0-gpu-py3. Has to match python3!,
These parameters are defined in
cookiecutter.json in the cookiecutter-data-science source.
When these questions are answered, following two repositories will be created locally and immediately linked to your
each repository has two branches: ‘master’ and ‘test’.
Main repository to integrate model with the following structure:
| ├── data Placeholder for the data │ └── raw The original, immutable data dump. │ ├── docs Documentation on the project; see sphinx-doc.org for details │ ├── models Trained and serialized models, model predictions, or model summaries │ ├── notebooks Jupyter notebooks. Naming convention is a number (for ordering), │ the creator's initials (if many user development), │ and a short `_` delimited description, │ e.g. `1.0-jqp-initial_data_exploration.ipynb`. │ ├── references Data dictionaries, manuals, and all other explanatory materials. │ ├── reports Generated analysis as HTML, PDF, LaTeX, etc. │ ├── your_project Main source code of the project │ │ │ ├── __init__.py Makes your_project a Python module │ │ │ ├── dataset Scripts to download and manipulate raw data │ │ └── make_dataset.py │ │ │ ├── features Scripts to prepare raw data into features for modeling │ │ └── build_features.py │ │ │ ├── models Scripts to train models and then use trained models to make predictions │ │ └── deep_api.py Main script for the integration with DEEP API │ │ │ ├── tests Scripts to perfrom code testing │ │ │ └── visualization Scripts to create exploratory and results oriented visualizations │ └── visualize.py │ ├── .dockerignore Describes what files and directories to exclude for building a Docker image │ ├── .gitignore Specifies intentionally untracked files that Git should ignore │ ├── Jenkinsfile Describes basic Jenkins CI/CD pipeline │ ├── LICENSE License file │ ├── README.md The top-level README for developers using this project. │ ├── requirements.txt The requirements file for reproducing the analysis environment, │ e.g. generated with `pip freeze > requirements.txt` │ ├── setup.cfg makes project pip installable (pip install -e .) │ ├── setup.py makes project pip installable (pip install -e .) │ ├── test-requirements.txt The requirements file for the test environment │ └── tox.ini tox file with settings for running tox; see tox.testrun.org
Certain files, e.g.
tox.ini, etc are pre-populated
based on the answers you provided during cookiecutter call (see above).
Repository for the integration of the DEEPaaS API and your_project in one Docker image.
├─ Dockerfile Describes main steps on integrationg DEEPaaS API and │ your_project application in one Docker image │ ├─ Jenkinsfile Describes basic Jenkins CI/CD pipeline │ ├─ LICENSE License file │ ├─ README.md README for developers and users. │ ├─ docker-compose.yml Allows running the application with various configurations via docker-compose │ ├─ metadata.json Defines information propagated to the DEEP Open Catalog, https://marketplace.deep-hybrid-datacloud.eu
All files get filled with the info provided during cookiecutter execution (see above).
- (if not yet done) install cookiecutter, as e.g.
pip install cookiecutter
- answer all the questions, pay attention about docker tags!
- two directories will be created: <your_project> and <DEEP-OC-your_project> (each with two git branches: master and test)
- go to github.com/user_account and create corresponding repositories <your_project> and <DEEP-OC-your_project>
- go to your terminal, <your_project>,
git push origin --all
- go to your terminal, <DEEP-OC-your_project>,
git push origin --all
- your github repositories are now updated with initial commits
- you can build <deep-oc-your_project> Docker image locally: go to <DEEP-OC-your_project> directory, do
docker build -t dockerhubuser/deep-oc-your_project .
- you can now run deepaas as
docker run -p 5000:5000 dockerhubuser/deep-oc-your_project
|[*]||The more general cookiecutter-data-science template was adapted for the purpose of DEEP.|