Why did the customer need Azati's platform?

The staffing agency required a solution to automatically collect resumes, classify candidates, enhance CVs with missing skills, and provide efficient search across the candidate database to speed up recruitment.

How does the web scraping engine work?

Custom Selenium scrapers are assigned to specific websites and use proxies and user-agent rotation to bypass anti-scraping measures, collecting unstructured data for further processing.

What role does machine learning play?

Machine learning predicts missing skills, classifies candidates by expertise, and tags them with relevant keywords and technologies, improving search accuracy and candidate matching.

How is the solution scalable?

The system is hosted in the cloud, using Docker containers to run multiple scraping instances, asynchronous processing to reduce bottlenecks, and React for an interactive UI, ensuring scalability and efficiency.

What benefits did the platform bring?

It processes ~17,000 webpages daily, increases relevant candidates by 127%, and classifies each candidate in ~4 seconds, speeding up recruitment and improving match quality.

Custom Search Platform for Recruitment Agency

Azati designed and developed a custom recruitment platform for a staffing agency based in New Jersey. The platform uses a network of interconnected microservices to improve the process of resume search, candidate evaluation, and general hiring, ultimately speeding up the recruitment process and enhancing overall efficiency.

127%

increase in relevant candidates identified

4 sec

average time to classify and tag a candidate

17K

webpages processed per day

All technologies used

Python

Keras

TensorFlow

Flask

Rails

React

Ruby

Numpy

MongoDB

Selenium

Redis

Motivation

The customer needed a custom solution to automatically collect resumes from various websites, create a database, classify candidates, enhance their CVs with missing skills, and enable efficient search across the candidate database. The goal was to improve the hiring process, reducing time and costs for recruitment while ensuring high-quality matches between candidates and job descriptions.

Main challenges

Recruiters were overwhelmed by the manual effort required to search multiple websites and databases for candidate resumes. Candidates often had incomplete or outdated profiles across different platforms, leading to missed opportunities and poor matching between candidates and job roles.

Resume data was scattered across various websites, inconsistent, and sometimes contradictory. This made it difficult for recruiters to build comprehensive candidate profiles and slowed down decision-making.

Recruiters lacked the technical knowledge to understand complex skills, programming languages, and frameworks. Identifying the right candidates for specialized roles was time-consuming and error-prone.

Many websites restricted automated scraping to prevent abuse. Ensuring continuous data collection while respecting these limitations was a significant technical challenge.

Our approach

Custom Web Scraping Network

We developed dedicated Selenium-based scrapers for each target website, managing proxies and rotating user agents to bypass anti-scraping measures. This ensured reliable extraction of resumes while maintaining a 'normal user' profile, reducing the risk of IP bans or interruptions.

Unstructured Data Analysis and Structuring

The collected resumes were often incomplete or inconsistent. We transformed this raw data into structured profiles stored in a NoSQL database, enabling efficient searching, merging duplicate information, and enriching candidate data for downstream processing.

Machine Learning for Skill Prediction and Classification

A machine learning model analyzed candidate resumes, classified them into skill groups, and predicted missing competencies by correlating known technologies, frameworks, and programming languages. This enhanced incomplete profiles and helped recruiters quickly identify suitable candidates.

Cloud-Based Scalable Architecture

All services were deployed in the cloud using Docker containers for scalable web scraping and asynchronous processing. React was used for a responsive, interactive user interface, reducing latency and providing recruiters with real-time access to enriched candidate data.

Integrated Search and Filtering

A robust Search API allowed recruiters to query candidates by skills, predicted competencies, and other attributes. This reduced the time to find suitable candidates, improved the accuracy of matches, and enhanced the overall hiring process efficiency.

Facing the same challenge?

Bring your complexity. We'll bring the plan. Select a convenient slot to start a conversation with our experts.

Schedule a call

Solution

Web Scraping Engine

This module automatically collects resumes and candidate data from multiple job sites such as LinkedIn, Indeed, Stack Overflow, and Toptal. Each scraper is assigned to a specific website and configured with proxy management and user-agent rotation to bypass anti-scraping measures. The engine extracts unstructured information and forwards it asynchronously for processing, enabling continuous data collection without manual intervention and ensuring that candidate profiles are complete and up-to-date.

Key capabilities:

Automated scraping from multiple recruitment websites
Proxy management and user-agent rotation to bypass limitations
Asynchronous processing for faster data collection
Merging of duplicate or partial resumes into a single candidate profile

Unstructured Data Analysis Module

After scraping, raw candidate data is often incomplete or inconsistent. This module processes and normalizes the data, converting it into structured formats suitable for search and analytics. Using a NoSQL database, the system can efficiently store diverse data types, merge overlapping information from different sources, and provide recruiters with a comprehensive view of each candidate.

Key capabilities:

Normalization and structuring of raw resume data
Merging multiple resumes for a single candidate
Storage in scalable NoSQL databases for flexible querying
Preparation of enriched data for machine learning and classification

Resume Classification & Skill Prediction

This module leverages machine learning to classify candidates by expertise and predict missing skills based on known technologies, frameworks, and programming languages. Candidates are tagged with relevant groups, keywords, and competencies, improving search accuracy and helping recruiters quickly identify the most suitable candidates.

Key capabilities:

Machine learning-based classification of candidate expertise
Prediction of missing skills and competencies
Tagging of candidates with keywords, languages, and frameworks
Enhanced search and filtering for recruiters

Search API & Candidate Matching

The Search API provides recruiters with fast, accurate access to the enriched candidate database. It supports complex queries, keyword searches, and filtering by skills, experience, and predicted competencies, reducing the time needed to find suitable candidates.

Key capabilities:

Fast, accurate candidate search with advanced filtering
Real-time matching between job descriptions and candidates
Support for queries using enriched, predicted, and merged data
Integration with recruiter dashboards and front-end interfaces

Cloud-Based Architecture

The platform is fully hosted in the cloud, allowing it to scale efficiently. Docker containers run multiple instances of the web scraping engine to speed up data collection, while asynchronous processing reduces bottlenecks. The React front-end provides a highly interactive user interface, and cloud infrastructure ensures reliability and cost efficiency.

Key capabilities:

Scalable cloud-based deployment of all services
Containerized scraping engine for faster HTML processing
Interactive React front-end for recruiters
Reduced maintenance costs and high system reliability

Business Value

Enhanced Recruitment Efficiency: Automated scraping, skill prediction, and candidate classification significantly reduced time and manual effort required for hiring.

Scalability and Flexibility: Cloud-based architecture and containerized services allowed the platform to scale with growing data and recruiter demand without additional infrastructure costs.

Improved Candidate Matching: Machine learning-based skill prediction and tagging increased the pool of relevant candidates by 127%, improving job fit and reducing hiring errors.

Faster Decision-Making: Average classification and tagging time per candidate decreased to ~4 seconds, allowing recruiters to make faster, more informed hiring decisions.

Data Accuracy and Reliability: The system merges partial and duplicate resumes into comprehensive profiles, ensuring recruiters have complete and accurate information.

Customer Satisfaction: The staffing agency now experiences faster, more efficient recruitment processes and views Azati as a reliable partner for future technology-driven solutions.

Got a job for Azati? Let’s talk business!

Full Name^*

Email^*

Upload additional information or RFP

Browse files

Your request^*

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

What's next?

1. Tell Us Your Story

Share your project details. We'll connect within 24 hours and ensure confidentiality with an NDA.
2. Get Your Roadmap

Receive a detailed proposal with scope, team composition, timeline, and costs tailored to your goals.
3. Start Building

Azati aligns on details, finalize terms, and launch your project with full transparency.