When new machine learning processes are developed scientists and engineers spend time not only on choosing the right algorithm and dataset, but also to preprocess the data, evaluate the model and manage the computational resources manually. This involves setting up the computation infrastructure, deploying their code, starting the necessary processes and keeping track of intermediate artifacts. All these additional tasks slow down the development process, make it error-prone and have a negative impact on the motivation.
To improve the developer experience we are currently working on a set of libraries that can be easily integrated into python scripts and automate a lot of these tasks by providing the following features:
- Splitting python scripts into independent logical tasks that can be run individually
- Easy integration using decorators
- Support for running tasks with different python interpreters
- Asynchronous execution of tasks
- Automatic deployment and running on different computational platforms (local, cluster via SSH, etc.)
- Tracking of intermediate artifacts
- Reusable configuration
The first product is planned to be available in Q4 2021.