As the part of Azati Labs, our data scientists have successfully built a prototype of the system, that can detect road defects analyzing images and videos. This prototype can beuseful to municipal government to simplify roads defect detection to calculate road repairing costs automatically. The information extracted by the prototype can also beused by automotive manufacturers to help smart cars avoid potholes and decrease overall repairing costs.


Road defects are one of the most common reasons for suspension repairs and tire replacements. According to general statistics, even the most careful drivers face suspension repairing every three years or so. Bad roads are a well-known issue of all east-European countries, especially Belarus, Ukraine, and Russia.
The goal of this project was to train computer vision to determine road defects, especially potholes. There are a vast number of pre-made machine learning models for object detection and classification trained by Google and Facebook. Unfortunately, there were no pre-made models for pothole detection. This way we decided to train a custom model from scratch.


The machine learning process was exciting and worrying at the same time: we were among the first who decided to train CV for such purposes. While developing a prototype, there were several challenges.


The very first challenge our team faced was a lack of data. Data Scientists require huge cleansed datasets for successful machine learning model training. In our case, there were no high-quality images of potholes and other road defects on the Internet. Our engineers tried to use pictures extracted from the open sources, but the results of the trainings were quite disappointing. This way, our team decided to use the live-data: the data collected from the Belarusian roads.


We spent some time driving and capturing on video the roads of Belarus, and after that, we faced another issue – low data quality. Potholes have different shapes and look differently in sunnyand cloudy weather. Also, footages captured from different vehicles had different fields of view due to the different camera mounting points. We could not use the collected data “as is” due to its inconsistency. The only way was to map all the footages manually.


As mapping the entire video is a quite tricky process, we split videos on a sets keyframes. If the camera recorded video in 30 frames per second, it took about an hour to map all the keyframes in minute footage. During data mapping, we also considered the footage quality. If a clip was recorded in poorquality or low resolution, it made a clip unusable for a model training. The usage oflow-quality clips does more harm distorting the data.


While solving these challenges we developed small script written in Python. The prototype takesan image or video clip as the input and returns a set of frames where the potential potholes andother road defects are outlined with squares. If the script takes a video, it splits it into a set of frames and examines each frame separately. When a script processed all the data, it joins all frames into a one video. Here are how fancy clips about computer vision are made. As a result, we get an image or a video, where the potential potholes and other road defects areoutlined with squares. Check out the screenshots below to see how the results look like.





The development process was quite challenging for our data scientists. We made it is possible to find potholes and other road defects using machine learning and computer vision and delivered the proof of concept prototype. The model finds the road defects quite accurately, but there are some issues with the classification of defects. It is quite hard to identify pothole if it looks similar to roadhatch. For complex object classification, the model requires additional data and extra training.The more data we provide, the more accurately a model classifies road defects. But as the data requires manual mapping, it takes a lot of time and makes data processing and cleansing quite expensive.


The prototype uses a single model to find and classify road defects. It is simple to understand how it works, and why it provides predictable results. When data is processed using a single model, we know where and when everything went wrong. According to our calculations, to make this model to classify defects accurately it will takeabout 12 million of manually mapped images, what is entirely unaffordable for the majority of companies. This way, our scientists suggest processing the data in several steps and using multiple methods:combining traditional object classification algorithms and machine learning. We found several ways to improve the prototype, so if you want to learn more — contact us,and we will have a chat about that!