D1.4 – Data Management Plan

This deliverable focuses on the management of the data in BigDataStack.

The following kind of data will have to be handled:

  • First one are the data sets provided by the use cases (either directly such as in the case of Danaos, or through their customers such as in the cases of Atos Worldline and GFT) and used to validate the project. Each use case data has its specific requirements. Although anonymized, these data sets are consortium confidential. Moreover, they may be complemented by open data sets. This already has happened in the first half of the project where NOAA [2] weather data was used to complement the vessels data provided by Danaos. In the second half of the project, GFT intends to augment insurance data with public data sets for the Insurance use-case.
  • The second kind of data are the publications that have and will be published. Here the main concern is to make sure that FAIR principles will be adhered to.
  • The third kind of data are the deliverables. Except for the very few that are consortium private, all these deliverables have and will be made publicly and freely accessible from the Project Web site. The fourth kind of data are the open source software artifacts. As of September 2019, substantial code output of the project has already been up streamed to big Open Source Project No 779747 (BigDataStack) D1.4 – Data management plan Date: 01.10.2019 Dissemination Level: PU page 5 of 16 bigdatastack.eu projects such as OpenStack. During the second part of the project, we envision further contributions to OS projects.
  • Fifth and last kind of data: artifacts of research value obtained from the BigDataStack infrastructure. For instance, logs or playbook generated by applications being executed over BigDataStack may be of research interest. In this case, FAIR principles will be applied to them.