Azati designed and built up a recruitment platform for the staffing firm. The system comprisesseveral interconnected modules microservices. Our solution significantly improves resume searchand candidate evaluation, speeds up general hiring processes.
Our customer is the staffing agency in New Jersey settled in 2007. The customer focuses ondeveloping highly skilled in-house teams for huge IT corporations. IT recruitment process istime consuming and costly.
Even for an experienced candidate finding a new job can be a quite frustrating experience. Thecustomer accepts the importance of employee happiness: as high engagement leads to greaterproductivity. It motivates workers who feel valued to make real, tangible contributions to theircompanies. This way, it is essential to find the proper candidates for the right jobs.
The platform we developed helps the customer to discover candidates that perfectly suit the jobs,eliminating the additional fuss and making the process less stressful.
The customer wanted us to build a custom solution that can automatically collect resumes fromvarious sites, create a database, classify candidates, enhance their CVs with missing skills,and provide proper search according across this database.
There were two tech-related challenges while developing this solution:
There are several social networks and many popular websites where job seekers leaveresumes. The most popular are LinkedIn, Indeed, Toptal, Remote.co, Stack Overflow, andetc.
It was a regular situation when a candidate might leave slightly different resumes onmultiple websites: here and there the information was outdated, elsewhere the essentialinformation was missing at all. It is critical to create a complete candidate profile tomake sure that he or she suits the job description.
An accepted fact, that job sites prevent abusive behavior like web scrapping or manualdata extraction. The main complication was to build an instrument, that canautomatically extract unstructured information from webpages, avoid abusive limitations,merge resumes from different sources to create a full candidate profile.
Tech recruitment varies from traditional recruitment in many aspects. One of theseaspects is a lack of knowledge about specific technologies, programming languages, andframeworks.
As there are thousands of specific tools developers use in day-to-day work, it isimpossible to know, track, and remember them all. The client asked us to build a modulethat could help recruiters overcome this issue.
We wanted to train a machine learning model that builds relationships between thesetechnologies: determine how they are interconnected with each other.
During initial business analysis, we figured out that there is a considerable number ofrestrictions that prevent web parsers from scraping the job websites.
This way, we built a network of custom web scrapers based on Selenium: we assigned each instanceto a specific website and used a set of rules for content extraction. It is easy to tune ascraper to extract data from any site.
To avoid limitations, our engineers built a subsystem that manages a list of proxies and useragents used by Selenium to crawl websites. From a target site, our crawler looks like anordinary user.
We integrated the subsystem with a third-party proxy provider. This way, the customer couldmanage expenses, website templates, proxy list and the list of user agents with one unified userinterface.
After the crawlers extract the required data, our solution analyzes the information and searchesfor associations in the database to enrich the remaining details on the specific candidate.
Our engineers decided to create a database where a recruiter can find complete information abouta candidate. We convert unstructured data extracted from web pages to structured that is storedin NoSQL database and used for further data look ups.
After we built a database, we beat another challenge — a vast number of differentframeworks and libraries related to specific technologies and programming languages. To betterunderstand the problem, check out the shot below.
On a screenshot, we can see several popular Python frameworks. We’ve arranged these frameworksinto four groups: web, machine learning, cloud computing, and frameworks for data science. Someof them are language specific like Django, Flask, Tornado, PyTorch, and some are not:TensorFlow, Apache Spark.
It means, that candidate proficient in Flask must know Python and probably is capable ofweb-development — even is he or she missed Python in programming languages section. Andvice versa, if candidate skills contain Apache Spark and TensorFlow, we can’t say he knowsPython, but it is a high probability — because Python is the most popular programminglanguage for Machine Learning.
Our engineers trained a machine learning model to analyze the content of a resume to classifyeach candidate, and predict with a certain level of confidence what additional skills thecandidate might have. According to this information, our system classifies the resumes and tagsa candidate into many groups and associates these groups with specific keywords, programminglanguages and frameworks.
This idea was brilliant, and our customer was truly impressed with this feature: we discoveredthat tech resumes are almost always incomplete and miss something important.
As a result, our solution helps not only recruiters find relevant candidates quicker but alsooffers better opportunities job seekers never thought about.
The final solution comprises five modules hosted in the cloud. Cloud architecture helps us to cutdown maintenance costs and avoid on-site personnel training. We enjoy building cloudapplications because of its flexibility, scalability, and cost-effectiveness.
The system consists of five modules:
The biggest bottleneck of web scraping is the delay between request is sent, and our module getsa response from the target site. It may even take several seconds to load a single page. Thisway, we tried to make this module work asynchronous.
But even this technique was unsuccessful. The data extraction process was slow enough. We figuredout that the bottleneck of such processing is Selenium: a tool we used to emulate user-likebehavior. It does not work both with proxies and headless mode well.
Our DevOps proposed to scale web scrapping engine in the cloud: launching multiple instances inseparate Docker containers. This technique helped us speed up general HTML processing.
As a library for user interface, our engineers used React: it can provide a high level ofinteractivity for the customer and speed up the overall development process. React is a powerfultool our engineers enjoy using.
Solution processes half a million webpages every month to provide relevant information aboutcandidates to researchers and recruiters. The system we delivered speeds up and simplifies therecruitment process for the customer, making its employees more motivated and stress-free.
A FEW NUMBERS:
Our system processes per day
after inspecting the data
It takes to classify and tag a candidate
We launched the solution in late 2017. Now a small dedicated team is maintaining the system. Somewebsites improved its privacy protection for GDPR, so it became much more complicated to extractdata with web scrapers. But the system still provides a massive number of candidates andsatisfies the customer.