Big Data Stack EU Project: A European Open Source Initiative

The Big Data Stack EU research project aims at providing a complete infrastructure management system, which bases the management and deployment decisions on data from current and past application and infrastructure deployments. It focuses on:

Performance and dynamicity
Optimization and scalability
Automation, agility, quality
Dimensioning and automation
Openness and extensibility
Sustainability and competitive advantage
Applicability and validation

This project envisioned a task with the focus on increasing the impact of research activities as well as increasing its sustainability, making innovation easier:

The European Open Source Initiative

This European open source initiative is actually an effort from Red Hat to provide their expertise on how the research outcomes in Europe can be actually exploited and not simply dropped on an open repository. This was already present in an informal way in other EU research projects where Red Hat participated (Orbit and Superfluidity). This time it has been done in a more formal way, where Red Hat has coordinated this effort with two main objectives:

Promote Big Data Stack outcomes as open source artefacts merged in related upstream projects, ensuring sustainability and reusability.
Provide the expertise and know-how to the EU research community for creating valuable open source artifacts by engaging with related upstream communities, maximising their impact and adoption.

One of the common misconceptions is that Open Source is about making your code public. While this is one of the pre-requisites, that alone does not make it a “real” open source project. If nobody is checking/using your code, the fact that the code is open does not mean much. To fully embrace the open source way, and increasing the chances of making a successful (widely used, supported) project, you need to create a community around it. There are a few extra steps to keep in mind that will lead to a more successful result than just adding your modifications to your own fork of the project:

Build on top, make use of existing knowledge/projects, even if you need to modify/change them to fit your use case – do not start from scratch just to make it easier for yourself (like engineers like to do…)
Make your modifications as part of the related open source project (e.g., OpenStack, OpenShift, linux kernel). Ensure you are covering a specific gap in that project or improving an existing functionality.
Engage with the related upstream communities and ensure the feature is aligned with the needs, as well as how to best implement it so that it can be reused in other slightly different use cases.
Engage with the community also to understand their standards for code writing and submission. For example about testing coverage that ensures that your feature (if accepted) is not going to be broken by new functionalities to be added after yours
Review code from others. It will make a better product, get you more knowledge about other pieces of the project and allow you to better understand the code “lifecycle” (e.g., API changes, upgrades from one release to another, etc)
Merge code upstream! (and celebrate!)

When you engage with an existing community, and you build on top of it, there is a greater chance for your idea to be exposed, more people are going to contribute to it, and get the benefits of being “open” – aka your project to become a successful one.

The process to achieve the above points, leading to public release of a software artefact are not seamless though. Advice and feedback may be needed to achieve a remarkable result in terms of the exploitability of the produced software. In fact, for people not used to it, it can be frustrating due to the extra time and effort needed (https://assafmuller.com/2016/12/02/upstream-contribution-give-up-or-double-down/) — which should pay off at the end anyway. For this reason, we have initiated the European Open Source Initiative. Given that Red Hat has the know-how and it is highly engaged in many upstream open source communities (kernel, OpenStack, Kubernetes, OpenShift, …), through the European Open Source Initiative, has helped other partners to identify relevant communities for their components as well as the standard way of coding/working with them (e.g., the current trend on containerization and application management through operators).

As part of the Big Data Stack project, through the European Open Source Initiative, we have:

Organize the “Red Hat Research Day”. It is an event happening twice a year (one in Europe, 1 in US) dedicated to research activities where Red Hat is involved. Its goal is to bring together international researchers with Red Hat engineers (and any other people interested) to share knowledge about latest research findings and move great research ideas into open source communities. https://research.redhat.com/research-day/
Made use of several de-facto standards (Kubernetes, OpenStack, Linux, Spark, Ansible), as well as to try to follow their models when writing components, which helps reusability by other projects. As an example, we deploy Big Data Stack components in a fully containerized way. And some components, such as the Triple Monitoring Engine, follow the operators model to manage the deployment and configuration of the application itself.
Make contributions to related upstream projects. We have made contributions to many projects, such as: Spark, Linux Kernel, Kuryr-kubernetes, Octavia, Neutron, Kubernetes, OpenShift Installer, Cluster Network Operator, Gophercloud, Terraform, …
Participate in upstream community gatherings, such as KubeConf or Open Infrastructure Summit. Even having some talks related to our contributions to Kuryr.
Participate in events to foster open source contributions such as DevConf
Get involved on the Outreachy program (https://www.outreachy.org/) to mentor interns of underrepresented minorities onto the Open Source Way and their upstream communities.

Example of success stories:

The process for partners not usually following this model is usually longer than for partners already familiar with this process (such as Red Hat or IBM). However, with the main objective of increase adoption/impact as well as maintainability after the project ends, many partners had onboarded on this and targeted open source communities for their components, following the similar existing projects (e.g., Sandbox projects), as well as the current standards for software development and deployment.

Even though this process takes time, and full open source contribution could happen actually after the project finishes, some outstanding contributions has already been made:

Open Source contributions to many projects, already merged upstream: Spark, Linux Kernel, Kuryr-kubernetes, Octavia, Neutron, Kubernetes, OpenShift Installer, Cluster Network Operator, Gophercloud, Terraform, …
Components developed with Sandbox projects in mind, like Triple Monitoring Engine which is managed and deployed through an operator. This allows easy deployment and testing of the component by other projects/people, and if it receives enough attention (github starts) it could eventually make it as a new component there — probably after some extra functionality is provided by extra contributors to make it more general and flexible for different use cases.

To conclude this blog post, there is another example of a success story that happened in a previous project, and where upstream contributions were finalized after the project finished. It was at the Orbit EU project. In this case, the feature to implement was “post-copy live migration of VMs” and it required modifications at the different level of the stack (from lower to upper layers):

Modifications at the Linux Kernel for virtual pages management
Modifications at the Qemu level to make use of the new kernel functionality and offer the new migration option
Modifications at Libvirt level to orchestrate the migration level based on the Qemu functionality
Modifications at the OpenStack (Nova) level to make use of the new live migration feature exposed by Libvirt

In this development, and during the time frame of that project, Red Hat completed and upstreamed the features for the first two layers (Kernel and Qemu). However, Red Hat also collaborated with Umea University and helped them in the development of Libvirt and OpenStack extensions. Red Hat mentored the people involved from Umea University in the way to approach both Libvirt and OpenStack communities to get the features merged upstream. This eventually happened right after the project finished (less than half a year after). This has ensured the feature is still supported by OpenStack (more than 4 years after) and many users/customers around the world could rely on them (some even transparently) on a daily basis.

Conclusions

More and more companies are adopting the Open Source development model. A lot of the innovation in the world is happening in the Open. The European Open Source Initiative is a way to onboard more and more companies and researchers on this journey, to ease their way in, and to benefit as a society from collaboration and transparency. We believe in this model and we are pushing to make it widely acceptable.