Indian job market · fraud detection

The job post
looked real.
It wasn't.

A hybrid ML system detecting fraudulent job postings in the Indian market. TF-IDF text analysis combined with 15 hand-engineered fraud signals, deployed as a live API and Chrome extension.

94% Test accuracy
0.94 Fraud F1 score
2,118 Training examples
6 Fraud archetypes
30k+ TF-IDF features
Live API · Real-time analysis

Paste a posting.
Get a risk assessment.

Submit a posting to see the risk report

Legitimate posting

Extension showing legitimate result

Suspicious posting

Extension showing suspicious result

About the extension

The Chrome extension scrapes job postings on LinkedIn and runs them against the live API in real time. Results appear as an overlay on each listing. No copy-pasting required.

Falls back to local heuristics if the API is cold-starting

No data stored — requests are discarded after analysis

Works on "linkedin.com/jobs/view/" and "collections" pages

Model architecture · Feature design · Signal library

How the detection works.

01

Feature extraction

Each posting is vectorised with TF-IDF across 30,000 unigram and bigram features, then combined with 15 hand-crafted regex features targeting India-specific fraud patterns.

TF-IDF 1–2 ngrams 30k vocab 15 regex signals
02

Logistic regression

A sparse hstacked matrix feeds a balanced logistic regression model (C=1.0, class_weight='balanced'). Outputs a calibrated fraud probability between 0 and 1.

balanced classes calibrated proba sklearn
03

Explainable output

Each prediction returns the top contributing model features, matched regex signals with descriptions, a risk band, and a confidence score — every decision is auditable.

feature weights matched signals risk band

Six Indian fraud archetypes.

01 · Registration Scam

Deposit / Fee Required

Candidates asked to pay ₹500–5,000 as registration fee or security deposit before the role is confirmed.

02 · WFH Trap

Data Entry from Home

₹30,000–80,000/month for copy-paste work. Training kits purchased upfront; payouts never materialise.

03 · Messaging App Recruitment

WhatsApp / Telegram Only

Entire recruitment over messaging apps. No verifiable company presence, offers sent as screenshots.

04 · Brand Impersonation

Fake MNC Listings

Posts impersonating Infosys, TCS, Wipro with near-identical domains or logo abuse.

05 · Overseas Lure

Gulf / Singapore Placement

Promises placements abroad. Agents charge ₹50,000–2 lakh for visa processing that leads nowhere.

06 · Fake Internship

Certification Fee Fraud

Targets fresh graduates with internships requiring a certification fee. Companies don't exist.

The signal library.

fee_payment_required

Matches "registration fee", "pay ₹", "security deposit" and 12 variants.

● High
whatsapp_contact

Contact restricted to WhatsApp number, often "WhatsApp: +91 …".

● High
inflated_salary

Salary implausibly high for the role type, or "earn X daily" patterns.

● High
guaranteed_outcome

"100% placement guaranteed", "assured income" — language no real employer uses.

● High
urgency_language

"Limited seats", "apply within 24 hours", "immediate joiners only".

● Medium
overseas_placement

References to Gulf, Dubai, Singapore with placement guarantees or visa assistance.

● High
no_experience_required

"Freshers welcome", "no qualification required", "anyone can apply".

● Medium
personal_email_domain

Contact email uses gmail.com or yahoo.com instead of a company domain.

● Medium
vague_company

Generic names like "HR Solutions", "Placement Services Pvt Ltd".

● Medium
certification_fee

Payment required for training certificates or "mandatory certifications".

● High
suspicious_domain

Domain mimicking a known brand — infosysjobs.in, tcs-careers.co.

● High
data_entry_wfh_combo

"Data entry" + "work from home" + salary claims. High-precision fraud combo.

● High