3.1.2.1 SEBI Defaulters List Spider

Source Used

The source used for the SEBI Defaulters List Spider is the official website of the Securities and Exchange Board of India (SEBI), specifically the section dedicated to entities who have been categorized as defaulters.

Inputs

The SEBI Defaulters List Spider does not require any specific inputs as the information is publicly available. However, the Spider is set to run periodically to ensure the data is always up to date.

Mechanism

The spider is programmed using Python and the Scrapy library. It's built to navigate the SEBI website's specific structure to reach the defaulters list section. The spider identifies and follows the appropriate links to access each defaulter's individual profile.

This process includes:

Navigating to the SEBI website's homepage
Identifying and following the correct link to reach the defaulters list
Accessing each defaulter's individual profile
Scraping the relevant data from these profiles

This Spider employs Scrapy's auto-throttle extension to ensure respectful and efficient crawling by automatically adjusting the scraping speed based on the load on both the spider and the website server.

Output

The Spider scrapes essential details from each defaulter's profile, including:

Name of the Defaulter
Date of Default
Amount in Default
Detailed Remarks about the Default

The spider collects this data and formats it into structured JSON format, which is then stored in our Data Warehouse. This format is chosen because of its versatility, easy readability, and compatibility with various data analysis tools and our internal systems.

Here is an example of the output format:

{
    "defaulter_name": "ABC Corporation",
    "date_of_default": "2021-05-15",
    "amount_in_default": "500000",
    "detailed_remarks": "Failed to repay the agreed amount."
}

Source Used​

Inputs​

Mechanism​

Output​

Source Used

Inputs

Mechanism

Output