Typically, machine learning processes are developed in multiple sequential steps. Unfortunately most of the time the process isn’t as straightforward as one would want it to be. During development issues come up that require changes to previous steps, so that one needs to go back and start over. Therefore, in reality the process looks more like this:
This can lead to multiple calculations of the same data and creating the same intermediate result over and over again.
Additionally, all those steps have different run-time behavior and a diverse set of computational requirements, even though they are often all run on the same computation platform.
Our solution keeps track of all individual tasks together with their input and output data. It collects profiling information about each task to give you insight about the efficiency of your whole process. This enables you to:
- Determine the computational cost of your model and its individual parts
- Avoid unnecessary re-computation of intermediate results
- Find hot-spots in your code that consume most of your resources
- Discover tasks that require more resources than they actually use
- Run tasks on the most cost-effective machines