BigDataStack technology Data Skipping has been identified in the EC Innovation Radar and its developer IBM Research as a key innovator and has been published in it’s innovation radar.
The EC Innovation Radar platform builds on the information and data gathered by independent experts involved in reviewing ongoing research and innovation projects funded by the European Commission. These experts also provided an independent view regarding the innovations in the projects and their market potential.
The aim is to make information about EU-funded innovations from high-quality projects visible and accessible to the public via the EU's Innovation Radar platform. This will show citizens the many excellent technological and scientific advances being delivered by researchers and innovators around Europe, funded on their behalf by the European Commission. This initiative has the support of EU Member States and, to date, Ministers from 23 countries have signed the Innovation Radar declaration confirming their support for this initiative.
What is Data Skipping?
IBM data skipping pertains to cloud storage. For a given dataset, it builds summary metadata for each object of the dataset. For example, if the dataset has a column which represents the temperature, the summary metadata could include the minimum and maximum temperatures of all the data records contained in the object. This summary metadata, significantly smaller than the data itself, can then be indexed. SQL queries which apply a predicate on the temp column—for example, queries looking for temperatures >30C—can then benefit from the index by skipping over all the objects whose metadata proves that no included record has a temperature field which matches the predicate (e.g., all objects with maximum temperature strictly less than 30C can be skipped since non relevant to the query). Beyond this extremely simple example, the IBM team has extended the Data Skipping technology in numerous novel ways such as supporting User Defined Functions or handling boolean conditions over SQL predicates. Read how you can reduce I/O and accelerate SQL performance by orders of magnitude using data skipping in this IBM blog >>.
BigDataStack Data Skipping during distributed reads for SQL Queries’ Identified Market Potential
BigDataStack Data Skipping during distributed reads for SQL Queries has been identified for the EC innovation radar. Based on the method described in the Innovation Radar: Identifying the maturity of innovations in EU-funded research and innovation projects paper, the Market Maturity of the Innovation has been identified as 'Tech Ready'. As the Data Skipping technology has already reached the "open beta" level in the IBM SQL Cloud service as detailed here, and has reached the General Availability (GA) level for three other IBM services:
-
IBM Watson Studio, specifically the data skipping library is pre-installed on all Spark environments in Watson Studio.
The EC identified the Market Creation Potential of the innovation as addressing the needs of existing markets.
Watch the video with Yosef Moatti (IBM research) on Data Skipping in BigdataStack. At the time of recording, Data Skipping was a closed beta, we’re proud to say it is now open data.