Homepage > Portfolio > Custom Search Platform for Recruitment Agency

Custom Search Platform for Recruitment Agency

Azati designed and built up a recruitment platform for the staffing firm. The system comprises several interconnected modules microservices. Our solution significantly improves resume search and candidate evaluation, speeds up general hiring processes.

CUSTOMER:

Our customer is the staffing agency in New Jersey settled in 2007. The customer focuses on developing highly skilled in-house teams for huge IT corporations. IT recruitment process is time consuming and costly.

Even for an experienced candidate finding a new job can be a quite frustrating experience. The customer understands the importance of employee happiness: as high engagement leads to greater productivity. It motivates workers who feel valued to make real, tangible contributions to their companies. This way, it is essential to find the proper candidates for the right jobs.

The platform we developed helps the customer to discover candidates that perfectly suit the jobs, eliminating the additional fuss and making the process less stressful.

OBJECTIVE:

The customer wanted us to build a custom solution that can automatically collect resumes from various sites, create a database, classify candidates, enhance their CVs with missing skills, and provide proper search across this database.

There were two tech-related challenges while developing this solution:

CHALLENGE #1:

There are several social networks and many popular websites where job seekers leave resumes. The most popular are LinkedIn, Indeed, Toptal, Remote.co, Stack Overflow, and etc.

It was a regular situation when a candidate might leave slightly different resumes on multiple websites: here and there the information was outdated, else where the essential information was missing at all. It is critical to create a complete candidate profile to make sure that he or she suits the job description.

An accepted fact, that job sites prevent abusive behavior like web scrapping or manual data extraction. The main complication was to build an instrument, that can automatically extract unstructured information from webpages, avoid abusive limitations, merge resumes from different sources to create a full candidate profile.

CHALLENGE #2:

Tech recruitment varies from traditional recruitment in many aspects. One of these aspects is a lack of knowledge about specific technologies, programming languages, and frameworks.

As there are thousands of specific tools developers use in day-to-day work, it is impossible to know, track, and remember them all. The client asked us to build a module that could help recruiters overcome this issue.

We wanted to train a machine learning model that builds relationships between these technologies: determine how they are interconnected with each other.

PROCESS:

During initial business analysis, we figured out that there is a considerable number of restrictions that prevent web parsers from scraping the job websites.

This way, we built a network of custom web scrapers based on Selenium: we assigned each instance to a specific website and used a set of rules for content extraction. It is easy to tune a scraper to extract data from any site.

To avoid limitations, our engineers built a subsystem that manages a list of proxies and user agents used by Selenium to crawl websites. From a target site, our crawler looks like an ordinary user.

We integrated the subsystem with a third-party proxy provider. This way, the customer could manage expenses, website templates, proxy list and the list of user agents with one unified user interface.

After the crawlers extract the required data, our solution analyzes the information and searches for associations in the database to enrich the remaining details on the specific candidate.

Our engineers decided to create a database where a recruiter can find complete information about a candidate. We convert unstructured data extracted from web pages to structured that is stored in NoSQL database and used for further data look ups.

After we built a database, we beat another challenge — a vast number of different frameworks and libraries related to specific technologies and programming languages. To better understand the problem, check out the shot below.

On a screenshot, we can see several popular Python frameworks. We’ve arranged these frameworks into four groups: web, machine learning, cloud computing, and frameworks for data science. Some of them are language specific like Django, Flask, Tornado, PyTorch, and some are not: TensorFlow, Apache Spark.

It means, that candidate proficient in Flask must know Python and probably is capable of web-development — even is he or she missed Python in programming languages section. And vice versa, if candidate skills contain Apache Spark and TensorFlow, we can’t say he knows Python, but it is a high probability — because Python is the most popular programming language for Machine Learning.

Our engineers trained a machine learning model to analyze the content of a resume to classify each candidate, and predict with a certain level of confidence what additional skills the candidate might have. According to this information, our system classifies the resumes and tags a candidate into many groups and associates these groups with specific keywords, programming languages and frameworks.

This idea was brilliant, and our customer was truly impressed with this feature: we discovered that tech resumes are almost always incomplete and miss something important.

As a result, our solution helps not only recruiters find relevant candidates quicker but also offers better opportunities job seekers never thought about.

SOLUTION:

The final solution comprises five modules hosted in the cloud. Cloud architecture helps us to cutdown maintenance costs and avoid on-site personnel training. We enjoy building cloud applications because of its flexibility, scalability, and cost-effectiveness.

The system consists of five modules:

 

Web Scrapping Engine
 

Unstructured Data Analysis Module
 

Resume Classification Module
 

Search API
 

User Interface API

The biggest bottleneck of web scraping is the delay between request is sent, and our module gets a response from the target site. It may even take several seconds to load a single page. This way, we tried to make this module work asynchronous.

But even this technique was unsuccessful. The data extraction process was slow enough. We figured out that the bottleneck of such processing is Selenium: a tool we used to emulate user-like behavior. It does not work both with proxies and headless mode well.

Our DevOps specialists proposed to scale web scrapping engine in the cloud: launching multiple instances inseparate Docker containers. This technique helped us speed up general HTML processing.

As a library for user interface, our engineers used React: it can provide a high level of interactivity for the customer and speed up the overall development process. React is a powerful tool our engineers enjoy using.

TECHNOLOGIES:

 
 
 
 
 
 
 
 
 
 
 

SCREENSHOTS:

RESULTS:

Solution processes half a million webpages every month to provide relevant information about candidates to researchers and recruiters. The system we delivered speeds up and simplifies the recruitment process for the customer, making its employees more motivated and stress-free.

A FEW NUMBERS:

~17K
Webpages
Our system processes per day
+127%
extra candidates
after inspecting the data
~4
second
It takes to classify and tag a candidate

NOW:

We launched the solution in late 2017. Now a small dedicated team is maintaining the system. Some websites improved its privacy protection for GDPR, so it became much more complicated to extract data with web scrapers. But the system still provides a massive number of candidates and satisfies the customer.

Drop us a line

If you are interested in the development of a custom solution — send us the message and we'll schedule a talk about it.