Inventory Search Engine for Auto Parts Retailer
Azati designed and developed an intelligent search engine for inventory search enhancingtraditional search algorithms. It analyzes the user input and looks for a specific entry. If thealgorithm can’t find the requested item in the inventory, it explores the characteristics of theobject and returns the list of similar products.
Brazilian online retailer of automotive parts and accessories for cars, vans, trucks, and sportutility vehicles. Customer owns and manages a chain of auto parts stores, car workshops, andengineering lab. The customer is known for a broad selection of automotive parts and accessoriesand has structured a discount pricing scheme for individual consumers. A customer has a complex supply chain which helps him to deliver auto parts in the shortest termsall across the country. There are about 500.000 parts and about 100.000 auto-related products inits catalog. The customer suffered from a tech-related issue — the search among huge catalog wasinaccurate and quite slow. It took about 2 minutes to complete data lookup. There was no searchalgorithms optimization, so there were several ways we could improve the situation.
After applying to the project, we discovered, that there is no single catalog of all auto parts.There were several catalogs from different manufacturers and several catalogs for internal use;the existing solution was performing ineffective request chains from one data source to another.From the very beginning, there was a single catalog, that included an enormous number of autoparts and accessories. Today other catalogs also store a massive amount of data about auto parts and accessories. Thenumber of data increased dramatically. This way we decided to give up the improvement of theexisting solution and design the new system that can handle the amount of data. There wereseveral challenges we faced.
There were several formats of catalogs: XSLX files (Excel), TXT files, CSV (Open-sourcespreadsheets), Databases (MySQL), and APIs. Another ERP software autogenerated the majority offiles. This way our team decided to develop several universal connectors (one for each catalog) that canhandle data extraction and convert the data from different sources into one well-structuredformat.
Once the data is extracted, it is time for the intelligent search. To make an accurate, fast, andsecure search engine we decided to retrieve attributes from every auto part and use that data insmart tagging. It means that every auto part has a unique set of tags, but products from thesame category will have similar tags. The user query is analyzed in search of object attributes. According to these tags, the systemdetermines the type of object and category and makes a data lookup.
A customer wanted us to build a system, where a user can look for any auto part from a singlesearch field. Our engine should analyze user query and perform a data lookup in real timerebuilding the result page. A content of the page should automatically refresh when a user adsnew words into the query — as Google does. This way we decided to give up using a database and perform an in-memory search to avoidread-write disk bottlenecks.
At Azati we enjoy building modular systems while developing small projects: such systems are morestraightforward to build, test, scale and maintain. This project is not an exclusion. Our teamdecided to create two modules: one for data processing, another for user interface generation. From the very beginning, as the customer wanted the system to re-generate the result page contentevery time the user changes the input, we decided to use React as a front-end framework for UIgeneration. It is a powerful tool that can modify the page source code without reloading thebrowser tab. The first module is based on React and Python Tornado as a back-end for requesthandling. The second module was responsible for entity extraction, parts tagging, and data lookup. Therewere a lot of different files that contained the information we needed, so our team developed acustom rule-based parser that can identify the document type and provider on the fly. For every provider, we described a set of rules, that are used by a parser to extract data. Theparser took the document as an input and provided the structured data as an output. There were several data sources, where the information about auto parts was represented as MySQLand Access databases. For these databases, we built an algorithm that asynchronously sendsrequests to multiple databases at the same time extracting the data and eliminating any delays. As the core of a search engine, we used enhanced pairwise comparison algorithm, developed by ourPHDs and well-tested on some bioinformatics samples during the recent projects. As a result, wedeveloped a prototype that was presented to the customer. He was quite impressed and asked us toimprove the prototype and turn it into a solution.
The final solution includes interconnected two modules: one for user interface generation,another for document processing and data lookup. These modules are hosted in a self-made cloudinside the client’s infrastructure. As there is a vast amount of information describing autoparts, it takes about 3GB of RAM to store all the attributes and tags in memory. If a customerloads a bigger dataset, the system will automatically allocate additional memory.
The search engine includes two modules:
We successfully built an intelligent search engine that aggregates information from various datasources: from documents to external databases. Our team successfully implemented enhanced searchalgorithm to provide quick and accurate search results. As all data is stored in memory, we madedata lookups blazing fast. The engine also offers the most relevant search results according toadvanced scoring.
A FEW NUMBERS:
were analyzed while developing a prototype
were generated from the sample dataset
It takes to examine the search query and return a result
We have successfully launched this project in late November 2017. Now we are maintaining thissolution and collecting usage statistics. Statistics help us to understand the users better. Wewill propose the client to improve the search with machine learning, which will make search evenmore accurate and fast.