Drop us a line
If you are interested in the development of a custom solution — send us the message and we'll schedule a talk about it.
Azati designed and developed a semantic search engine powered by machine learning. It extracts the actual meaning from the search query and looks for the most relevant results across huge scientific datasets.
A US Company focused on the development of in vitro diagnostic (IVD) and biopharmaceutical products. It provides products and services that support research and development activities and accelerates the time to market of products.
The customer offers clinical trial management services, biological materials, central laboratory testing and other solutions that enable product development and research in infectious diseases,oncology, rheumatology, endocrinology, cardiology, and genetic disorders.
A lot of companies suffer from the lack of accurate and fast search engine that can handle substantial scientific datasets. Scientific datasets are known for the structural complexity and a vast number of interconnected terms and abbreviations that make data processing quite tricky.
The customer was looking for a partner who can overcome this challenge.
The customer wanted us to build an intelligent search engine that can help him deal with the internal inventory search. The inventory included a considerable number of blood samples. Each blood sample was described using several tags, grouped into subcategories, which were grouped into larger categories, etc.
Customer’s employees were forced to select many tags by hand to get the information they wanted. It tooks several minutes to perform a single search. And what was more disappointing, if an employee makes a single mistake or provides an inaccurate query, he or she will get an empty result page.
The entire data lookup process was a huge disappointment and headache for the personnel and the customer. There were several challenges to overcome to improve the customer’s workflow.
Every blood sample was described using a textual description and specific tags, manually mapped by external data entry vendor according to the description.
There was a typical situation, where a blood sample had misstatements in the description and the tags. It means that any approach to improving the search by tag or by description would fail due to inconsistent data.
The first thing we thought about was cleansing the data. We faced two interconnected issues. First one was the lack of knowledge about all possible factors that can differentiate one blood sample from another.
Another was the lack of knowledge about all alternative disease names: for example, Hepatitis B, HBV DNA, Hepatitis B Virus, HBV PCR, Hepatitis B Virus Genotype by Sequencing basically mean the same thing.
Another challenge was the amount of data. There was a significant number of entries to process, and what was more important — there was no sample data for algorithm training to match the tags automatically.
From the very beginning, the customer provided a list of keywords that describe blood samples. Very soon our team discovered that this list was incomplete and required additional research and it was not enough to complete the project. Similar issues can’t deny our team from completing the project in time.
This way we decided to split the final solution into two pluggable modules. One for intelligent matching, it determined the level of confidence while tagging a blood sample. Another to extract all possible tags from search queries. De facto the second module transferred unstructured user input into structured data.
The first challenge our engineers overcame was the lack of sample data. We trained the custom model based on hundred thousand life science documents related to blood samples from the open datasources. Data Scientist used Word2vec to analyze the connections between the most common words from the thesaurus to find synonyms and determine how these words are related to each other.
The module responsible for entity detection in search queries was partially ready. We had already built a similar module while developing a platform for custom chatbot development. All that was left to do was retrain the model according to the list of entities: sample types, geography, diseases, genders, etc.
To achieve a high level of confidence, we analyzed the massive number of user search queries collected from the open data sources. In the end, we compiled a collection of patterns used to form search queries.
The final solution consists of three separate interconnected modules hosted in the cloud. Such an approach helps us to maintain the system remotely to avoid on-site personnel training. Cloud architecture makes the application more flexible, cutting down development and maintenance costs.
We are proud to say, two of the modules are powered by machine learning. Query Analysis module uses natural language processing algorithms to extract entities from search queries, while Search Engine module uses the extracted entities to match these entities with synonyms to perform an accurate and fast search.
Modules are built as independent RESTful microservices, which helps us to scale the final solution to any size in the cloud.
We significantly optimized traditional search algorithms. Instead of searching among the whole dataset we processed about 150.000 samples with about 100 tags and performed the search among these tags. We cached all processed samples with Redis, which helped us to implement in-memory data lookups and avoid the bottlenecks of reading/writing data to hard-drive.
Performed optimizations helped us to provide outstanding search quality and blazing speed.
We successfully implemented a commercial semantic search engine that can handle massive scientific datasets. We used modern Natural Language Processing technologies to extract entities from search queries and categorize scientific texts by tags. The algorithms we built helped the customer to eliminate ineffective search results and significantly improve employee’s satisfaction with the data lookup process.
A considerable amount of samples helped us to train the machine learning model effectively.
Advanced caching and algorithms improvement helped to make a blazing fast search engine struture.
Our engineers built a scalable system that can easily be retrained to any amount of similar samples.
We have successfully launched a semantic search engine in the middle of March. Now we are maintaining the application, processing new datasets, increasing search quality and scaling the system in the cloud.
If you are interested in the development of a custom solution — send us the message and we'll schedule a talk about it.
JavaScript, Ruby
HR Planning SoftwareThe customer asked Azati to audit the existing solution in terms of general performance to create a roadmap of future improvements. Our team also increased application performance and delivered several new features.
Python
Stock Market Trend Discovery with Machine LearningAt Azati Labs, our engineers developed an AI-powered prototype of a tool that can spot a stock market trend. Online trading applications may use this information to calculate the actual stock market price change.
Python
Semantic Search Engine for Bioinformatics CompanyAzati designed and developed a semantic search engine powered by machine learning. It extracts the actual meaning from the search query and looks for the most relevant results across huge scientific datasets.
Java, JavaScript
E-health Web Portal for International Software IntegratorAzati helped a well-known software integrator to eliminate legacy code, rebuild a complex web application, and fix the majority of mission-critical bugs.
JavaScript, Ruby
Custom Platform for Logistics and Goods TransportationAzati helped a European startup to create a custom logistics platform. It helps shippers to track goods in a real-time, as well as guarantees that the buyer will receive the product in a perfect condition.