Automation
Logs
- Data logs:
data_generation.log - Output generation logs:
output_file_generation.log
Scripts
Udyam.py
- Takes Udyam numbers as input.
- Crawls live data and stores fetched HTML in:
msme/udyam/html/ - Parses HTML for Udyam details.
- Details categorized into Detailed and Normal Detailed Checks.
- Stores extracted details in:
msme/udyam/details/ - Compiles output object and stores it in:
msme/udyam/output/
Udyog.py
- Similar process as
Udyam.py, but for Udyog numbers.
Lambda Functions
gst-function-for-gst-automation
- Accepts PAN or GSTIN.
- Crawls GST site for details.
- Used for GST Normal & Detailed Checks.
- Output stored in:
gst/gst-normal-check/
gst-function-for-micro-checks
- Accepts PAN.
- Fetches legal name, trade name, and address.
- Used in Struck Off Detailed Checks.
gst-function-msme-urn-verification
- Accepts PAN.
- Fetches GST details and stores at:
msme/pan-to-gst-data/
msme-pan-verification
- Accepts PAN.
- Checks registration status via Udyam.
- Stores results at:
msme/msme-verification/{PAN}-{date}.json
Common Script
Used across MSME, GST, and Struck-Off automation processes.
File Paths
Input (S3 xlsx):
- MSME:
micro-checks/msme/input-files/ - GST:
micro-checks/gst/input-files/ - Struck Off:
micro-checks/struck_of/input-files/
Output (S3 xlsx):
- MSME:
micro-checks/msme/output-files/ - GST:
micro-checks/gst/output-files/ - Struck Off:
micro-checks/struck_of/output-files/
Fuzzy (S3 xlsx):
- MSME:
micro-checks/msme/fuzzy-files/ - GST:
micro-checks/gst/fuzzy-files/ - Struck Off:
micro-checks/struck_of/fuzzy-files/
Cron Tabs
Data Cron Tab
- Handles
PENDINGorders. - Retrieves input from:
type/sub_type/order_id.json - Uses Lambda, DB, and live crawling.
- Stores output as JSON in S3.
- If all entities are processed → Status:
DATA_FETCHED - If not → Input updated, status remains/reverts to
PENDING
Output File Cron Tab
- Handles
DATA_FETCHEDorders. - Verifies data completeness.
- If complete → Output file created, status:
COMPLETED, client emailed. - If incomplete → Status:
PENDING
crontab.sh
- Monitors and runs
PENDINGstatus jobs. - Ensures cron tabs are triggered as needed.
MSME
NORMAL_CHECKS
- Input: PAN
- Output: Registration Status (
Udyam Registered,Udyog Registered,Not Registered) - Lambda:
msme-pan-verification - Output path:
msme/msme-verification/{PAN}-{date}.json
DETAILED_CHECKS
- Input: Udyam/Udyog Numbers
- Process: Live crawling using
Udyam.py&Udyog.py - Output: Udyam/Udyog details
- Output Path:
- Udyog:
msme/udyog/output/{number} - Udyam:
msme/udyam/output/{number}
- Udyog:
CERTIFICATE_CHECKS
- HTML files parsed from:
msme/udyam/html/ - Converted to PDF
- Stored in:
msme/certificates/udyam-certificates/{filename}.pdf
NORMAL_DETAILED_CHECKS
- Input: PAN
- Output: GST details
- GST details used for matching with local DB
- Matching based on Registration Status (Udyam/Udyog)
- Uses in-house Udyam/Udyog name-matching APIs
- Output:
msme/pan-to-urn-identification/{PAN}-{date}.json - Crawled Data Output:
msme/normal-detailed/{PAN}-{date}.json
NORMAL_DETAILED_CERTIFICATE_CHECKS
- Input: PAN
- Similar to detailed checks
- HTML → PDF
- Output path:
msme/certificates/udyam-certificates/{filename}.pdf
GST
NORMAL_CHECKS
- Input: PAN/GSTIN
- Output: GST details
- Lambda:
gst_function_for_gst_automation - Logs:
gst_normal_check.log
DETAILED_CHECKS
- Same as Normal, includes GST filings
- Logs:
gst_normal_check.log
Struck Off
DETAILED_CHECKS
-
Input: PAN/CIN
-
If PAN given:
- Uses
gst-function-for-micro-checks - Legal name/trade name stored:
micro-checks/struck_of/cin/ - Cleaned names → in-house
company_suggestionsAPI - CINs stored at:
micro-checks/struck_of/company-suggestions/ - Fuzzy match performed (score > 95 passes)
- Uses
-
If CIN given:
- Uses
company_full_detailsAPI - Output:
micro-checks/struck_of/data/
- Uses
-
Logs:
automation/struck_off_checks/struck_data_generation.log