Drop us a line
If you are interested in the development of a custom solution — send us the message and we'll schedule a talk about it.
Identifying and eliminating the probable issues of the Extract, Transform, Load (ETL) process for reference data that comes from several operational systems.
The healthcare sector is one of the most important domains that impacts the entire global population and is closely linked to the development of any country. It also plays a crucial role in how a country is perceived in maintaining economic stability. As such, healthcare systems form a key part of government strategies across the world, and the level of industry expenditure is projected to increase at an annual rate of 4.1 per cent globally between 2017-2021.
The US national leader in customized insurance, claims and patient safety & risk solutions for physicians, surgeons, dentists and other healthcare professionals, as well as hospitals, senior care and other healthcare facilities.
The ETL process receives data from multiple sources. Some of them provide more detailed and complete information than the others. In certain cases, the attribute values received from more detailed sources could be replaced by empty values of the same attributes received from less detailed sources. Therefore, the incomplete data might be loaded into the data warehouse.
The primary task was to prevent the ETL process from the issues mentioned above, so that it provides the most complete, accurate and consistent information at the time.
We addressed the issue by adding a transformation step to the ETL process that clearly defines whether to keep or to rewrite an attribute value.
First, we analyzed all the attributes from every source and set priorities for each of them. Then we developed a Survivorship Matrix, which contains attributes and priority relationships. Its main purpose is to define whether to keep or rewrite an attribute value judging by the priority of the source. The Survivorship Matrix logic is incorporated within one SQL logic, allowing to escape any pre-processing.
By implementing the Survivorship Matrix, we’ve achieved data deduplication, data completeness and data reliability. Such an approach eliminates redundant overwriting and ensures consistency for any period of time. Additionally, we’ve increased flexibility of the system, since the priority values can be easily changed if needed. The solution became easily scalable, as the new attribute priorities are added outside the ETL process. Moreover, the performance requirement of the total running time to be less than 30 minutes was successfully met, as it was reduced to less than 5 minutes.
If you are interested in the development of a custom solution — send us the message and we'll schedule a talk about it.
JavaScript, Ruby
HR Planning SoftwareThe customer asked Azati to audit the existing solution in terms of general performance to create a roadmap of future improvements. Our team also increased application performance and delivered several new features.
Python
Stock Market Trend Discovery with Machine LearningAt Azati Labs, our engineers developed an AI-powered prototype of a tool that can spot a stock market trend. Online trading applications may use this information to calculate the actual stock market price change.
Python
Semantic Search Engine for Bioinformatics CompanyAzati designed and developed a semantic search engine powered by machine learning. It extracts the actual meaning from the search query and looks for the most relevant results across huge scientific datasets.
Java, JavaScript
E-health Web Portal for International Software IntegratorAzati helped a well-known software integrator to eliminate legacy code, rebuild a complex web application, and fix the majority of mission-critical bugs.
JavaScript, Ruby
Custom Platform for Logistics and Goods TransportationAzati helped a European startup to create a custom logistics platform. It helps shippers to track goods in a real-time, as well as guarantees that the buyer will receive the product in a perfect condition.