Public Data Scrapers

Public Data Scrapers are an integral part of our Data Aggregation and Integration System, designed to proficiently collect and deliver the wide range of public data necessary for our platform's due diligence and Third Party Risk Management (TPRM) processes. These robust tools provide us with invaluable insights into various aspects of an entity under scrutiny, from litigation and regulatory compliance to watchlists, sanctions, and more.

Our scrapers encompass a diverse range of data, diligently covering specific types of sources, including court websites, regulator databases, watchlists, media outlets, corporate registries, and more, both domestically and globally.

Categories of Data Scraped

Our Public Data Scrapers cater to the following primary categories of data:

Litigation Data

We employ Python scripts and spiders that are tailored to extract litigation data from sources such as 'e-courts' mobile API, multiple high court websites, and several tribunal websites. These tools are designed to capture detailed and comprehensive litigation data associated with a target entity, including past and ongoing cases.

Regulatory Data

Our platform leverages Python scripts and spiders to access data from various regulatory bodies. This includes crucial information like defaults and compliance data from sites like CIBIL, EPF, GST, RBI, and SEBI, among others. Moreover, our spiders also crawl through Indian regulator watchlists to fetch data on debarred entities, defaulters, and more.

Criminal Watchlists, PEP, and Sanctions Data

Understanding the potential risks associated with politically exposed persons (PEPs) and sanctions is a critical part of due diligence. For this purpose, our system includes Python scripts and spiders designed to fetch data from various global watchlists, sanctions lists, and PEP databases. This helps identify any entity or individual that could pose a higher risk due to their political affiliation or if they are sanctioned by a regulatory body.

Media Data

Media mentions can reveal a great deal about an entity's public perception, potential risks, and opportunities. Our platform's Python scripts are adept at scouring media data from sources like Google News, providing valuable insights into the target entity's media coverage.

Target Overview

Target overview data provides a basic understanding of the entity in question. Our Python scripts and spiders collect data from various databases and websites like MCA, DataGov, Zaubacorp, and credit rating agencies. This data helps form a comprehensive profile of the entity, including company master data, directors' details, credit ratings, and more.

Summary

Public Data Scrapers, employing technologies like Python, Scrapy, and Selenium, provide the crucial data needed to assess the potential risks associated with an entity. Their capability to navigate complex web structures and bypass common scraping obstacles ensures that our SAAS platform can offer a complete and detailed due diligence report. By continually extracting and updating data from a diverse array of sources, they form the beating heart of our Data Aggregation and Integration System.

Categories of Data Scraped​

Litigation Data​

Regulatory Data​

Criminal Watchlists, PEP, and Sanctions Data​

Media Data​

Target Overview​

Summary​