Conversational AI is a subset of artificial intelligence that allows consumers to interact with computer applications as if they were interacting with another human.
According to Deloitte, the global conversational AI market is set to grow by 22% between 2022 and 2025, reaching $14 billion by 2025. Providing enhanced language customisations to cater to a highly diverse and vast group of hyperlocal audiences, many practical applications of this include financial services, hospital wards and conferences, and can take the form of a translation app or a chatbot. Indeed, 70% of white-collar workers purportedly regularly interact with conversational platforms, but this is just a drop in the ocean of what can unfold this decade.
In spite of the exciting potential within the space, there is one significant hurdle; the data used to train conversational AI models does not adequately account for the subtleties of dialect, language, speech patterns and inflection.
When using a translation app, for example, an individual will speak in their source language, and the AI will compute this source language and convert it into the target language. When the source speaker deviates from a standardised learned accent – for example speaking in a regional accent or using regional slang – the efficacy rate of live translation dips. Not only does this provide a subpar experience, it also inhibits users’ ability to interact in real time, either with friends/ family or in a business setting.
The need for humanity in AI
In order to avoid a drop in efficacy rates, AI must make use of a diverse data set, for example having an accurate depiction of speakers across the UK (both on a regional and national level), in order to provide a better active translation and speed up the interaction between speakers of different languages/ dialects.
The idea of using training data in ML programs is a simple concept, but it is also foundational to the way that these technologies work. Training data works in a singular structure of reinforcement learning and is used to help a program understand how to apply technologies like neural networks to learn and produce sophisticated results. The wider the pool of people interacting with this technology on the back-end, for example speakers with speech impediments or stutters, the better the resulting translation experience will be.
Specifically within the translation space, focusing on how a user speaks rather than what they speak about is the key to augmenting end-user experience. The darker side of reinforcement learning was illustrated in recent news with Meta, who recently came under fire for having a chatbot that spewed insensitive comments (which it learned from public interaction). Training data should therefore always have a human-in-the-loop (HITL), in which a human is able to ensure the overarching algorithm is accurate and fit for purpose.
Accounting for active nature of human conversation
Of course, human interaction is incredibly nuanced and building bot conversational design that can navigate its complexity is a perennial challenge. However, once achieved, well-structured, fully-realised conversational design can lighten the load on customer service teams, translation apps and improve customer experiences. Beyond regional dialects and slang, training data needs to also account for active conversation between two or more speakers interacting with each other. The bot must learn from their speech patterns, the time taken to actualise an interjection, the pause between speakers and then the response.
Prioritising balance is also a great way to ensure that conversations remain an active experience for the user and one way to do so is via eliminating dead-end responses. Think of this akin to being in an improv setting, in which “yes, and” sentences are foundational. In other words, you’re supposed to accept your partner’s world-building while bringing a new element to the table. The most effective bots operate similarly by phrasing responses in an open way that encourages additional inquiries. Offering options and additional, relevant choices can help ensure all end users’ needs are met.
Lots of people have trouble remembering long strings of thoughts, or take a little longer to process their thoughts so translation apps would do well to give users enough time to compute their thoughts before taking a pause as the end of an interjection. Training a bot to learn filler words (so, erm, well, like, in English for example) and getting them to associate a longer lead time with these words is a good way of allowing users to engage in a more realistic real-time conversation. Offering targeted “barge-in” programming (chances for users to interrupt the bot) is also another way of more accurately simulating the active nature of conversation.
Future innovations in the space
Conversational AI still has some way to go before all users feel accurately represented. Accounting for subtleties of dialect, the time taken for speakers to think, as well as the active nature of a conversation will be pivotal to propelling this technology forward. Specifically within the realm of translation apps, accounting for pauses and words associated with thinking will ameliorate the experience for everyone involved and simulate a more natural, active conversation. Getting the data to draw from a wider data set in the back-end process, for example learning from both English RP and Geordie inflections, will avoid the efficacy of a translation dropping owing to processing issues due to accent. These innovations provide exciting potential, and it is high time translation apps and bots account for linguistic subtleties and speech patterns.
See the original article published on Venture Beat heat here