Trend Discovery with
At Azati Labs, our engineers developed an AI-powered prototype of a tool that can spot a stock market trend. Online trading applications may use this information to calculate the actual stock market price change.
Stock trading is one of these industries, where no one knows how it works. There are various factors that can affect the stock market price. Hundreds of software applications are trying to analyze all factors and predict a price change.
One of the most crucial factors is offline events – everything that happens in our day-to-day life: from the election in the US to a local mall fire. Everyone knows that the media shapes public opinion. Press covers each event differently, showing alternative points of view to a general audience.
Advanced sentiment analysis algorithms can help us predict whenever the trend goes up or down. As complex text processing is not a thing typical software handles well, typical applications are not an option.
Our engineers were not the first to analyze how news impacts the stock market. There was no much information about this topic on the Internet. If someone makes a breakthrough and discovers something new in this field – he won’t tell anyone about his discovery, as it helps him earn more.
We created an MVP from scratch, but there were several main tech and data-related challenges our developers faced. Let’s have a closer look at the most critical ones.
Our team related the first challenge to a lack of data. When you are creating something from scratch, you are trying to cut down general costs and use pre-built libraries, frameworks, and cleansed datasets. As our team completed several projects related to natural language processing, data scientists had already mastered the most popular text processing frameworks. But as usual, data was the key to success.
There was no dataset, that can satisfy our needs. All the information our team had for that moment, was a history of stock price changes and the several news articles about a well-known tech company. The information we had was not enough. The team trained several machine learning models using the existing dataset, but these models were unsuccessful.
Engineers prepared all the data manually and built a set of web scrappers for this purpose. As all news articles had inconsistent structure, it took additional time and resources to extract, map and filter all the data. It was very tricky to extract all the data we needed.
Record preparation and data mapping was another challenge. All text-related information most often is classified as unstructured data. It means that it is not suitable for automatic processing and requires manual mapping. There was no one responsible for manual data mapping at Azati, but it was a good purpose to hire several data entry specialists to make our team even stronger.
Two specialists were responsible for initial data mapping. They looked through the articles and highlighted main keywords that can affect stock market trends. After that, our engineers used this information to train several machine learning models.
However, this approach wasn’t successful too, as it is time- consuming process to make the application understand what stands behind words. For that moment, we did not have enough time and human resources to continue work in this field. After a series of failures, our team looked at this project from a different point of view.
Our engineers moved in another direction: if we cannot learn what words and phrases affect the stock market trends – we can find out how the trend changed over time, and what news was released around this moment. Let’s have a closer look at this example.
If a tech company releases a successful device, then in 2 or 3 days (after the device is officially launched) the trend goes up. It is easy to spot, as a huge number of reviewers can access the device before the release date. If these reviewers leave positive feedback, the stock price of the company may make a huge leap when a device is launched.
But how to train a machine to understand if the review is positive or negative? Our engineers used long short-term memory (LSTM) neural networks, that helped us in narrative and sentiment analysis. It is extremely complicated to train a model, that can analyze the sentiment of any text. It makes this technology costly to apply, as for every new industry, you have to build a module, that was trained to analyze only the text of this industry. It means that it is hardly possible to build a solution, that discovers stock trend both for financial and for construction companies.
The prototype comprises several interconnected scripts written in Python. It does not include any code responsible for text collection, as the existing tools or by custom applications can easily handle this issue.
All scripts can be divided into three groups:
This code translates all text into a format, that is being used for further data processing.
Several scripts are responsible for machine learning models training, that are later used for sentiment and narrative analysis.
After all scripts process the articles, it is essential to combine all the results to calculate the final probability.
MVP takes text as the input, processes, and analyses in-text information, and provides a probability and trend direction – whenever it goes up or down.
At first, we wanted to build a small API, that invoked all these scripts one-by-one. After several tries, we found: the more text we upload – the more time it takes for data processing to an end. Sometimes it took around twenty minutes to process about one thousand articles. This information prevented us from building a real-time API.
We cannot say that a project was one hundred percent successful – as our developers expected different results. But the general idea works – you can statistically analyze news to find out a stock market reaction.
As a result, we created an MVP, that takes a set of documents and predicts the trend according to these documents. The average accuracy is not that high – close to 65%. The more data the solution analyses, the more accurate the predictions it provides.
Ways of improvement:
As was mentioned above, the more data prototype processes – the more accurate it becomes. If fact, we build a fully functioning algorithm that can spot stock market trends, but it still lacks some accuracy and stability.
The main concern at this moment is a lack of cleansed data – it is complicated to make any predictions according to one article, even if an industry expert wrote this article. If we want to improve accuracy, the algorithm should process thousands of articles, to understand how these articles impact public opinion.
And the main problem is data collection – it is very costly to process thousands of websites in search of valuable data. This process requires additional human resources, as web-scraping requires constant attention.
The knowledge earned while doing research and training machine learning models helped our team develop several amazing projects for Oil and Gas and Healthcare industries.
What concerns the prototype, it is still in the development phase: from time to time our engineers look thru the related technologies for any valuable updates, which can improve the stability of the prototype and overall accuracy rate.