Finally out, the ultimate version of the BigDataStack architecture following the previous releases describing our “Conceptual model and Reference architecture” (namely D2.4 and D2.5).
The final version of the overall conceptual architecture includes information flows and capabilities provided by each one of the main building blocks. It also defines the role of each user of the platform, shedding light on their responsibilities, their functions and the possibilities provided by the tool. This document serves as design documentation for the individual components of the architecture. It presents the outcomes (in terms of design) of the final integrated prototypes and the obtained experimentation and validation results.
The users – For whom our platform is intended to be?
In BigDataStack we want to ease the working life of managers, developers and data scientists in their decision-making processes. Our platform has been tuned and tested on the needs of 3 specific industrial sectors, shipping, retail and insurance but it’s domain agnostic and it can easily fit to virtually any business.
We have identified 3 major players in the dataflow analysis, with very specific roles:
The business analyst is responsible for defining the business processes through specific objectives. The Process Modelling Framework allows him or her to design the whole data flow via a graphical interface (a Business Process Modeling Notation). The output of this process is a graph-like output with a high-level description of the workflow from the business analyst’s perspective along with the related end-to-end business objectives. The Process Mapping will, therefore, interpret, map and convert the information in the graph into algorithms.
The data analys will then finalise the set up of the data flow in few more steps:
- Setting lower resources if the selected algorithms perform sufficiently well.
- Defining the data sources from where the datasets will be ingested.
- Defining any data curation tasks necessary for the algorithms.
- Tweaking or design new algorithms and analysis tasks, which are then stored to the Catalogue of Predictive and Process Analytics (can be re-used in the future).
- Selecting performance metrics to evaluate the algorithm/model and resources configurations.
BigDataStack offers the Application Dimensioning Workbench to enable application owners and engineers to experiment with their application and obtain dimensioning outcomes regarding the required resources for specific data needs and data-related properties
The key results of BigDataStack are reflected in a set of components and blocks in the corresponding overall architecture of the stack. The final version of the critical functionalities of the overall architecture, the interactions between the main building blocks and their components are all included in this document.
The main component listed in this document are:
- Resources Management - Container-based and Virtual Machine-based application management on cloud and on-premise infrastructures
- Data-Driven Network Management - optimisation and management for computing, storage and networking resources.
- Dynamic Orchestrator - redeployment of applications during runtime to ensure they comply with their Service Level Objectives (SLOs)
- Triple Monitoring and QoS Evaluation - API and methods for gathering metrics from different sources, evaluation of SLOs
- Applications & Data Services/ Realization Engine - converting meta code application (user-defined) into actual running deployment and their management.
- Data Quality Assessment - set of algorithms to enable domain-agnostic error detection
- Real-Time Complex Events Processing (CEP) - real-time analysis of data collected from heterogeneous data sources at high rates
- Process Mapping & Analytics - predict and apply the best algorithm from a set
- Seamless Analytics Framework - analysis of dataset stored in one or more underlying physical data stores
- Application Dimensioning Workbench - provide insights regarding the required infrastructure resources for the data services components, linking the used resources with load and expected QoS levels.
- Big Data Layout and Data Skipping - avoid unnecessary data from Object Storage and sending them across the network
- Process Modelling Framework - provides an interface to business users to model their processes and workflows and obtain recommendations for their optimization
- Data Toolkit - design and support data analysis workflows
- Adaptable Visualization - integrate data from several components and display them in a visualisation dashboard
- Adaptable Distributed Storage - dynamic data load balancing, requesting resources from the infrastructure to accomplish the process needs
It should be noted that further design details and evaluation results for all components of the architecture will be delivered in the corresponding follow-up deliverables addressing the user interaction block, the data as a service block and the infrastructure management block. Stay tuned!