blog

Keep your data in order by using boxs

Boxs 0.1 joins the ranks of Kantai's Open-Source libraries

Good news everyone!

With the first release of our data managing library “Boxs” we are happy to complete our set of foundational libraries for creating machine learning processes.

Boxs helps you to organize your data and artifacts that are created in your workflow. No more need for always changing file paths and S3 keys or sprinkling your code with functions that write values to file or upload them to the cloud. Boxs takes care of this with its simple API. Define your own set of boxes and put related values together in the same box. All artifacts are tracked with their lineage and across multiple runs of the same script. A command line interface lets you inspect your data easily and even compare the same data item across different runs.

Organization of data across multiple boxes and runs

The new library can easily be integrated into Bandsaw, our tool for breaking up a process into individual parts.

A more thorough description of the library can be found at our documentation hub.

What is coming next?

Boxs is currently limited to storing data in the file system, so it requires a distributed file system if being used by a distributed process. To remove this limit, the next version of Boxs will include storage implementations that allow to use cloud storage services instead, so that workflows running in different regions or across different cloud platforms are supported, too.

Additionally, work on our first commercial product has begun, a service that allows to monitor your processes and discover optimization potential, stay tuned!

Cheers,

Christoph

References

Boxs

2022-02-03