Demonstrating Data as a Service in Big Data Stack
The Data as a Service block presents a fine set of data services which can be mapped to the major phases of Big Data processing. The architecture and the design of these data services is achieved through dedicated techniques, contextualized in the BigDataStack environment in order to run on top of the data-driven infrastructure management system and provide the required data services.
- The adaptable distributed storage component is based on the LeanXcale relational datastore;
- Big Data Layout and data skipping covers automated big data layout, as well as state of the art skipping techniques, in order to improve SQL analytics on rectangular data in object storage. This component also aims to research automatic algorithms for dynamic data layout and data skipping index creation;
- The Data Quality mechanisms offers domain-agnostic data cleaning, veracity and enhancement;
- The Predictive and Process Analytics component strives, using multiple process mining algorithms, to analyse, structure and process models derived from event driven data;
- The Complex Event Processing will run on geo-distributed environments in order to avoid delays in the processing and optimize resource consumption;
- The Seamless data analytics framework builds on top of the LeanXcale database and IBM Cloud Object Storage.
This set of data services is very powerful and it fits to 2 particular important scenarios: data ingestion and data query. The goal is not only to show the strength of solutions built out of BigDataStack data services but also the ease with which they can be assembled.