Data Integration and Management Component (DIM)

Goal

The DIM component is one of the core components of S2CP platform that enables different aspects related with data via collection of developed microservice and API exposition. S2CP will consist of the following services as part of DIM to be used by different CRFS actors. 

1.) Integration and analysis service: The integration and analysis service is responsible for data acquisition from different resources and putting it in CITIES2030 data model format. The data coming for analysis would have different data models, and the integration service will create a central database view to put data from all the sources in data warehouse cloud location.

The collated data then goes through two modular components to achieve the underlying   objective and provide insights, and business intelligence to different actors:

     Data processing: Data processing consists of transformation required on the data before it is passed in the pipeline to analytical models. The methodology here consists of approaches such as data clustering, featurization and feature engineering.

     Data Analysis: The data analytics pipeline consists of approaches such as federate and ensemble learning to deal with the heterogeneity in data to address different objectives of all the actors involved.

2.) Data governance service: The data governance service is responsible for ensuring data security and privacy aspects of the platform. The service included following features as part of the platform:

     Data discovery – this component allows different data formats and model of various data sources to be discoverable by the required components

     Data value exchange negotiation – this component works on the data exchange and data interoperability for the underlying objective, and ensuring that the data exchange takes place in the specified format

     Multi-party incentivized data sharing – this component aims towards incentivizing data exchange between different actors to achieve a common objective and showing the positive impact of data sharing in terms of data rich insights generated from the analytical models

     Integrated with smart contracts – this component provides the facility to connect with the underlying blockchain platform and provide smart contracts facility for the data exchange

 

An infographic representation of the same is provided in Figure 1.

DM S2CP architecture
Figure 1 - Modular representation of Data Integration and Management Component in S2CP Platform

Tools and Technologies

·         Programming: Python – Scripting and Development, SciKit Learn, Numpy, Scipy, Matplotlib, Pandas

·         Database and APIs: MySQL, CouchDB, Flask, JSON, WSGI (e.g., Gunicorn)

Use-case example: Reducing Food Wastage – Sustainable an Efficient Food Supply Chain

Using an AI enabled small camera that can perform real-time calculations at the edge. An AI model can be trained to determine the volumetric detection of produce in a display bin. Such a system can determine the amount of produce available in real-time in display bins. If a shopper comes and takes the produce, the AI model can automatically update to reflect the fact that there is now less produce in the bin, and if the produce is restocked (or the shopper puts the selected item back), the model automatically gets updated. This kind of system can be used to perform two primary outputs:

  1. One is for store personnel, they can be alerted if there is an outage or depletion of produce in the display bin, and they can re-stock
  2. By taking time series data snapshots of the produce over time, the store personnel can use this to improve their supply chain. They can have deep analytical insights around – hours to sell, days to sell, replenishment rates, restocking rates, purchasing rates. This can be used to tune the supply chain on behalf of the retailer. 

Now such systems working in parallel at different geographical locations in a city (extended to national and international level) can update and exchange parametric information via federated learning, which can further tune and make the supply chain efficient – at regional and global level as part of decision support system provided.