3.1.2.1 SEBI Defaulters List Spider
Source Used
The source used for the SEBI Defaulters List Spider is the official website of the Securities and Exchange Board of India (SEBI), specifically the section dedicated to entities who have been categorized as defaulters.
Inputs
The SEBI Defaulters List Spider does not require any specific inputs as the information is publicly available. However, the Spider is set to run periodically to ensure the data is always up to date.
Mechanism
The spider is programmed using Python and the Scrapy library. It's built to navigate the SEBI website's specific structure to reach the defaulters list section. The spider identifies and follows the appropriate links to access each defaulter's individual profile.
This process includes:
- Navigating to the SEBI website's homepage
- Identifying and following the correct link to reach the defaulters list
- Accessing each defaulter's individual profile
- Scraping the relevant data from these profiles
This Spider employs Scrapy's auto-throttle extension to ensure respectful and efficient crawling by automatically adjusting the scraping speed based on the load on both the spider and the website server.
Output
The Spider scrapes essential details from each defaulter's profile, including:
- Name of the Defaulter
- Date of Default
- Amount in Default
- Detailed Remarks about the Default
The spider collects this data and formats it into structured JSON format, which is then stored in our Data Warehouse. This format is chosen because of its versatility, easy readability, and compatibility with various data analysis tools and our internal systems.
Here is an example of the output format:
{
"defaulter_name": "ABC Corporation",
"date_of_default": "2021-05-15",
"amount_in_default": "500000",
"detailed_remarks": "Failed to repay the agreed amount."
}