Search Engine Algorithm Improvement For An Employment Platform


An employment platform is designed to help recruiters and job seekers in the IT community find each other.

The platform has two types of users: recruiters and job candidates. Each candidate’s profile includes the necessary data, such as personal information, education, work experience, skills, contact details, etc.

While looking for a candidate, a recruiter specifies the skills that a candidate must have for the job. Then, the recruiter gets a list of candidates ordered by the matching rates between the recruiter requirements and the candidate profile.

One of the most important matching criteria is the candidate’s set of skills. The problem was that the original algorithm considered only the perfectly matched skills. Quite often job seekers indicate skills very similar to what a recruiter is looking for, but because of the strict nature of the search engine matching mechanism, those candidates were excluded and didn’t appear among the search results at all. For example, if a recruiter looked for someone with strong knowledge of C++, and a job seeker had indicated that he excelled at “C++ embedded,” the latter wouldn’t be shown among the search results, although the skills are very much alike. Another thing to improve was that the algorithm didn’t consider connections between some skills, which exist among different programming languages and frameworks. For instance, the candidate with skill “Django” is likely to have skill “Python” (as Django is a Python framework), but the original algorithm was unable to take this into account, so relevant candidates would be missed.

Our objective was to improve the skill matching algorithm of the employment platform, so it would behave more like an experienced and knowledgeable human recruiter.


We enhanced the algorithm by implementing fuzzy matching of skills to make it consider similarity between related skills and the same skills under different names.

While checking a candidate for satisfying the recruiter skills requirements, the algorithm compares the required skill and the candidate’s set of skills (so that the entire pool of the required skills is not forced to match exactly that of the candidate). This approach enables to evaluate similarity between the two skills being compared. Thereby the problem can be reduced to finding the extent to which a recruiter entered skill sr matches some candidate’s skill sc.

The employment platform allows candidates to import skills from their LinkedIn accounts instead of typing them manually. Such import is possible because the candidates enter the same skills as those that can be found on LinkedIn.

It’s worth noting that the similarity between the skills sr and sc is asymmetric. For example, skill “AngularJS” usually implies having skill “JavaScript” (as AngularJS is a JavaScript framework), but that is not true for the opposite case: knowing JavaScript doesn’t necessarily mean knowing AngularJS.

Let’s denote the set of skills of user u as skills(u).

Then as an estimate of the match degree between recruiter entered skill sr and a candidate skill sc we can use conditional probability of having skill sr given that the candidate has skill sc, i.e. P(sr∈skills(u)∣sc∈skills(u)).

According to the definition of conditional probability, we have the following equation:P(sr∈skills(u)∣sc∈skills(u))=P(sr∈skills(u)∩sc∈skills(u))P(sc∈skills(u))=P({sr,sc}⊆skills(u))P(sc∈skills(u)).

Thus, in order to calculate match degree between sr and sc we need to estimate two kinds of probabilities:

  • P(s∈skills(u))for a given skill s.
  • P({s1,s2}⊆skills(u)) for a given pair of skills s1 and s2.

To estimate these probabilities, we used Existing Tags and data sources where each topic is labelled with one or more tags, so the topics were found relevant to the skills matching. We were able to map a significant part of LinkedIn tags, combining with the following simple techniques:

  • case-insensitive comparison
  • ignoring punctuation marks (e.g. spaces, colons and dashes)
  • abbreviation expansion
  • words normalization using lemmatization
  • full-text search in the topics description

The following considerations explain the calculation of the extent to which candidate skill sc matches the skill sr entered by a recruiter in the employment platform.

Let’s denote the set of tags of an Existing Tags topic tas tags(t).

Let T be the set of all topics on Existing Tags.Having the LinkedIn to Existing Tags mapping f, we can approximate P(s∈skills(u)) by P(f(s)∈tags(t)) for a random topic t∈T.

 Similarly, P({s1,s2}⊆skills(u)) can be approximated by P({f(s1),f(s2)}⊆tags(t)) for a random topic t∈T.

P(s′∈tags(t)) for a given Existing Tags skill s′ and a random topic t∈T is estimated to be 1|T|∑k∈T[s′∈tags(k)].

 Similarly, P({s′1,s′2}⊆tags(t)) for given Existing Tags skills s′1, s′2 and a random topic t∈T is estimated to be 1|T|∑k∈T[{s′1,s′2}⊆tags(k)].

Therefore, the extent to which candidate skill sc matches the skill sr entered by a recruiter is calculated by the formula:((sc)}⊆tags(t)]∑t∈T[f(sc)∈tags(t)].

The developed smart-matching algorithm provides more reasonable and precise search results. It has greatly improved the recruiters’ ability to find suitable candidates: on average, there are 27% more relevant candidates found in searching results than before.

Drop us a line

If you are interested in the development of a custom solution - send us the message and we'll schedule a talk about it.