# Search Engine Algorithm

Improvement

##### Business Situation

MyAmichi is a recruiting platform, created mostly with the aim to help recruiters and job seekers from IT community to meet each other.

The platform has two types of users: recruiters and job candidates. Each candidate has their profile containing necessary information, such as personal information, previous education, working experience, skills, contact information etc.

While looking for a candidate, a recruiter specifies the skills that a candidate must have for the job. Then, he gets the list of candidates ordered by the matching rates between the recruiter requirements and the candidate profile.

One of the most important matching criteria is the candidate’s set of skills. The problem laid in that the algorithm considered only the perfectly matched skills. Quite often job seekers indicate skills very similar to what a recruiter is looking for, but because of the strict nature of the search engine matching mechanism, those candidates were neglected and didn’t appear among the search results at all. For example, if a recruiter looked for someone with strong knowledge of C++, and a job seeker had indicated that he excelled at “C++ embedded” , the latter wouldn’t be shown among the search results, although the skills are very much alike. Another thing to improve was that the algorithm didn’t consider connections between some skills, which exist among different programming languages and frameworks. For instance, the candidate with skill “Django” is likely to have skill “Python” (as Django is a Python framework), but the algorithm wouldn’t pay attention to this fact while assigning scores, so a part of relevant candidates could have been missed.

Our objective was to improve the skill matching algorithm to make it work properly in these cases.

##### Solution

We enhanced the algorithm by implementing fuzzy matching of skills to make it consider similarity between related skills and the same skills under different names.

While checking a candidate for satisfying the recruiter skills requirements, the algorithm compares the required skill and the candidate’s set of skills (so it’s not that the whole pool of the recruiters skills being forced to comply with the candidate’s). This approach enables to evaluate similarity between the two skills being compared. Thereby the problem can be reduced to finding the extent to which a recruiter entered skill *S _{r}* matches some candidate’s skill

*S*.

_{c}MyAmichi allows candidates to import skills from their LinkedIn accounts instead of typing them manually. Such import is possible because MyAmichi skills are mostly the same as the skills on LinkedIn website. Therefore, the problem is to find skills similarity in LinkedIn skills domain.

It’s worth noting that the similarity between the skills *S _{r}* and

*S*is asymmetric. For example, skill “AngularJS” usually implies having skill “JavaScript” (as AngularJS is a JavaScript framework), but that is not true for the opposite case: knowing JavaScript doesn’t necessarily mean knowing AngularJS.

_{c}Let’s denote the set of skills of user *u* as *skills(u)* . Then as an estimate of the match degree between recruiter entered skill *S _{r}* and a candidate skill

*S*we can use conditional probability of having skill

_{c}*S*given that the candidate has skill

_{r}*S*, i.e.

_{c}According to the definition of conditional probability, we have the following equation:

Thus, in order to calculate match degree between *S _{r}* and

*S*we need to estimate two kinds of probabilities:

_{c} for a given skill *s*

for a given pair of skills *s* 1 and *s* 2.

To estimate these probabilities, we used an external data source StackOverflow. Each topic on the site is labelled with one or more tags, so the topics were found relevant to the skills matching.

We mapped a significant part of LinkedIn skills to StackOverflow tags combining the following simple techniques:

– case-insensitive comparison

– ignoring punctuation marks (e.g. spaces, colons and dashes)

– abbreviation expansion

– words normalization using lemmatization

– full-text search in the topics description

The following considerations explain the calculation of the extent to which candidate skill *S _{c}* matches the skill

*S*entered by a recruiter.

_{r}Let’s denote the set of tags of a StackOverflow topic *t* as *tags(t)*

Let *T* be the set of all topics on StackOverflow.

Having the LinkedIn to StackOverflow mapping *f*, we can approximate for a random topic *t* ∈ *T*. Similarly, can be approximated by for a random topic *t* ∈ *T* .

for a given StackOverflow skill *s’* and a random topic *t* ∈ *T* is estimated to be

Similarly, for given StackOverflow skills *s’ _{1} , s’_{2}* and a random topic

*t*∈

*T*estimated to be

Therefore, the extent to which candidate skill – *S _{c}* matches the skill –

*S*entered by a recruiter is calculated by the formula:

_{r}The developed smart-matching algorithm provides more reasonable and precise search results. It has greatly improved the recruiters’ ability to find suitable candidates: on average there are 27% more relevant candidates found in searching results than before.