Data Aggregation and Integration System
The Data Aggregation and Integration System is a fundamental component of our SAAS platform, designed with a singular objective - to ensure a steady flow of accurate, updated and pertinent data that drives the due diligence and Third Party Risk Management (TPRM) processes. This system, leveraging an array of technologies and frameworks, aims to aggregate and integrate data from public sources, third-party providers, and offline copies of databases, to provide a comprehensive dataset for conducting due diligence.
Components
The Data Aggregation and Integration System is made up of the following primary components:
Public Data Scrapers
Designed using technologies such as Scrapy, Selenium, and Python requests, our public data scrapers are proficient at crawling and scraping relevant public data sources required for due diligence. They are programmed to navigate through complex web structures, bypass common scraping roadblocks and extract data efficiently while respecting the rules of the data source.
Data Warehouse and ETL Pipelines
Our Data Warehouse is constructed using PostgreSQL, providing a robust and scalable solution for storing vast amounts of data. Paired with SQLAlchemy, it facilitates complex queries and data analysis required for our due diligence processes.
The Extract-Transform-Load (ETL) pipelines, built in Python, work in harmony with the Data Warehouse. They are responsible for processing data collected by the scrapers, turning raw data into structured, useful information ready for analysis. This includes tasks such as data cleansing, normalization, aggregation, and transformation.
The ETL pipelines also enable periodic updates of our offline database copies, ensuring the freshness and accuracy of data in the warehouse.
Data Services
Our Data Services layer acts as the primary interface for other systems to access the information housed in our Data Warehouse. It also manages the integration with third-party data providers. The services are developed using Nameko, a Python microservices framework, offering the benefits of loose coupling and high cohesion.
Summary
The Data Aggregation and Integration System is the backbone of our platform's data infrastructure. It guarantees the consistent availability of high-quality data, that forms the foundation of the detailed and actionable insights our platform provides to the clients. With this system, we ensure transparency, accuracy, and robustness in our due diligence and TPRM processes.