Homepage > Portfolio > Voice-Command-Based Restaurant Operations Management

Voice-Command-Based Restaurant Operations Management

Azati developers have created a restaurant system for receiving voice commands, which are automatically processed and transformed into tasks, which ensures effective restaurant management through automation of routine workflow.

Idea

The project idea revolves around enhancing the dining experience through innovative technology. Each table is equipped with a sophisticated sound system allowing customers to naturally issue commands like: “call the waiter”, “give the bill”, “bring the bread”, “provide the menu”, etc.

These commands seamlessly integrate into the control system, where they undergo interpretation, context analysis, and the extraction of any necessary supplementary details. Automatically, tasks are generated, assigned to the right staff member, and precise timers are set for performance tracking.

Once assigned, tasks are sent to the staff member’s device with voice commands for specific actions, creating a personalized to-do list. Reminders are sent when the timer runs out.

Conversely, the waitstaff also engage with the system through vocal interactions. For instance, they might say, “I’ve taken an order for table 7: one black coffee and one croissant,” or “Please arrange a taxi for table 3”. And the task will be created. Additionally, they provide timely status updates on completed tasks, such as “The bill has been settled for the guest at table 5,” triggering automatic task completion in the queue.

All information is duplicated on internal resources, dashboards, and screens. The system allows monitoring current processes, quickly finding bottlenecks, identifying and fixing problems.

Objective

The project aimed to develop and implement a system utilizing machine learning to recognize and analyze speech from both restaurant employees and customers. Key tasks involved converting speech into text, extracting commands and their attributes, and discerning when commands were completed.

Furthermore, the project successfully implemented the application architecture, established a task management system, and enabled seamless communication between the server and wireless devices for prompt command processing.

Part of the analytics and optimization includes analyzing the efficiency of all processes, identifying weaknesses and bottlenecks in the system, which allows improving its productivity and operational efficiency.

Сhallenges

#1

CHALLENGE#1:

 

Multilingualism is a key challenge encountered in this project.
The objective here was to ensure the system’s proficiency in accommodating multiple languages. This entails the incorporation of support for foundational languages such as English, Spanish, and French, along with their possible combinations, to meet the needs of multilingual customers. Achieving this goal required creating and integrating mechanisms specifically designed to adeptly recognize and process a wide array of language-based requests.

CHALLENGE#2:

 

Voice commands should not be formalized.
Customers and employees alike should be able to express their requests and instructions in natural language, without having to follow a strict format. The system must be able to automatically identify commands in their speech, determine the attributes of those commands, and create appropriate tasks for execution.

#2
#3

CHALLENGE#3:

 

Deploying the model on weak hardware.
Another significant problem in the project was the need to deploy the model on weak hardware. The customer expressed a desire to have a resource-efficient solution that could be deployed locally even on ordinary laptops. This posed challenges as the model had to be optimized to function under limited computing conditions.

Process

The journey of crafting a speech processing system involved numerous stages where our accumulated expertise proved invaluable. Commencing with speech recording and digitization, we progressed to the audio-to-text transformation, a pivotal phase in translating audio data into comprehensible information for the system. Subsequently, we delved into text content analysis, deciphering not just the words but also the emotions and intonations, enabling a profound grasp of the statements’ meanings and users’ needs. This meticulous, multi-step process served as the bedrock for the creation of a speech processing system that is both precise and highly responsive.

Solution

To implement the ML part of the project, we deployed and configured machine learning models. This process involved training the model to identify several dozen core commands, which required collecting and annotating a large amount of data. We then conducted preliminary testing and verification of the model, creating a Proof of Concept (POC) to ensure its functionality and effectiveness. Finally, we successfully demonstrated the developed ML system, showing its ability to accurately recognize and process a variety of commands, which confirmed its readiness for integration into the overall project architecture.

Results

We verified and showcased the functionality of the essential components within the machine learning domain: speech analysis, command identification, interpretation, attribute search, and task formulation. A proof of concept (POC) was meticulously prepared and presented to the client, complete with detailed calculations and a comprehensive commercial proposal.

Techstack

For the speech2text task, we utilized the whisper model, enhancing its capabilities through additional training on the data provided by our customer. To effectively deploy this model on modest hardware, we used whisper.cpp.

Addressing the challenge of identifying commands, attributes, and completion markers, we explored various approaches:

While LLM models (GPT, LLaMA, MPT) displayed promising results, achieving consistent stability proved challenging.

QA models (BERT, XLNet) delivered satisfactory outcomes and were relatively straightforward to train. However, we opted for a more promising avenue.

Recognizing the potential of the NER (Named-entity recognition) , we chose this solution as our best course of action. We meticulously trained the model with our own classes, ensuring optimal performance.

Libraries:

 
 
 
 
 
 
 

Drop us a line

If you are interested in the development of a custom solution — send us the message and we'll schedule a talk about it.