Skip to main content

Automation

Logs

  • Data logs: data_generation.log
  • Output generation logs: output_file_generation.log

Scripts

Udyam.py

  • Takes Udyam numbers as input.
  • Crawls live data and stores fetched HTML in: msme/udyam/html/
  • Parses HTML for Udyam details.
  • Details categorized into Detailed and Normal Detailed Checks.
  • Stores extracted details in: msme/udyam/details/
  • Compiles output object and stores it in: msme/udyam/output/

Udyog.py

  • Similar process as Udyam.py, but for Udyog numbers.

Lambda Functions

gst-function-for-gst-automation

  • Accepts PAN or GSTIN.
  • Crawls GST site for details.
  • Used for GST Normal & Detailed Checks.
  • Output stored in: gst/gst-normal-check/

gst-function-for-micro-checks

  • Accepts PAN.
  • Fetches legal name, trade name, and address.
  • Used in Struck Off Detailed Checks.

gst-function-msme-urn-verification

  • Accepts PAN.
  • Fetches GST details and stores at: msme/pan-to-gst-data/

msme-pan-verification

  • Accepts PAN.
  • Checks registration status via Udyam.
  • Stores results at: msme/msme-verification/{PAN}-{date}.json

Common Script

Used across MSME, GST, and Struck-Off automation processes.

File Paths

Input (S3 xlsx):

  • MSME: micro-checks/msme/input-files/
  • GST: micro-checks/gst/input-files/
  • Struck Off: micro-checks/struck_of/input-files/

Output (S3 xlsx):

  • MSME: micro-checks/msme/output-files/
  • GST: micro-checks/gst/output-files/
  • Struck Off: micro-checks/struck_of/output-files/

Fuzzy (S3 xlsx):

  • MSME: micro-checks/msme/fuzzy-files/
  • GST: micro-checks/gst/fuzzy-files/
  • Struck Off: micro-checks/struck_of/fuzzy-files/

Cron Tabs

Data Cron Tab

  • Handles PENDING orders.
  • Retrieves input from: type/sub_type/order_id.json
  • Uses Lambda, DB, and live crawling.
  • Stores output as JSON in S3.
  • If all entities are processed → Status: DATA_FETCHED
  • If not → Input updated, status remains/reverts to PENDING

Output File Cron Tab

  • Handles DATA_FETCHED orders.
  • Verifies data completeness.
  • If complete → Output file created, status: COMPLETED, client emailed.
  • If incomplete → Status: PENDING

crontab.sh

  • Monitors and runs PENDING status jobs.
  • Ensures cron tabs are triggered as needed.

MSME

NORMAL_CHECKS

  • Input: PAN
  • Output: Registration Status (Udyam Registered, Udyog Registered, Not Registered)
  • Lambda: msme-pan-verification
  • Output path: msme/msme-verification/{PAN}-{date}.json

DETAILED_CHECKS

  • Input: Udyam/Udyog Numbers
  • Process: Live crawling using Udyam.py & Udyog.py
  • Output: Udyam/Udyog details
  • Output Path:
    • Udyog: msme/udyog/output/{number}
    • Udyam: msme/udyam/output/{number}

CERTIFICATE_CHECKS

  • HTML files parsed from: msme/udyam/html/
  • Converted to PDF
  • Stored in: msme/certificates/udyam-certificates/{filename}.pdf

NORMAL_DETAILED_CHECKS

  • Input: PAN
  • Output: GST details
  • GST details used for matching with local DB
  • Matching based on Registration Status (Udyam/Udyog)
  • Uses in-house Udyam/Udyog name-matching APIs
  • Output: msme/pan-to-urn-identification/{PAN}-{date}.json
  • Crawled Data Output: msme/normal-detailed/{PAN}-{date}.json

NORMAL_DETAILED_CERTIFICATE_CHECKS

  • Input: PAN
  • Similar to detailed checks
  • HTML → PDF
  • Output path: msme/certificates/udyam-certificates/{filename}.pdf

GST

NORMAL_CHECKS

  • Input: PAN/GSTIN
  • Output: GST details
  • Lambda: gst_function_for_gst_automation
  • Logs: gst_normal_check.log

DETAILED_CHECKS

  • Same as Normal, includes GST filings
  • Logs: gst_normal_check.log

Struck Off

DETAILED_CHECKS

  • Input: PAN/CIN

  • If PAN given:

    • Uses gst-function-for-micro-checks
    • Legal name/trade name stored: micro-checks/struck_of/cin/
    • Cleaned names → in-house company_suggestions API
    • CINs stored at: micro-checks/struck_of/company-suggestions/
    • Fuzzy match performed (score > 95 passes)
  • If CIN given:

    • Uses company_full_details API
    • Output: micro-checks/struck_of/data/
  • Logs: automation/struck_off_checks/struck_data_generation.log