Welcome to the first blog in our NLP Blog Series!
Natural Language Processing—widely known as NLP—continues to be a rising star within the data space, creating more and more job opportunities in data as new tools and technologies arise. Saying this, we wanted to dedicate a 3-part blog series to this exciting aspect of the data space.
In this first blog, we will explore what NLP is, its growing use across different sectors, and the types of jobs you can enter within the NLP field. We also share insights from our data specialists Niall Wharton and Matthew Jones on some of the challenges and limitations currently faced in the NLP/AI space.
What is NLP?
NLP is defined by IBM as ‘the branch of computer science—and more specifically, the branch of artificial intelligence or AI’ that enables computers ‘to understand text and spoken words in much the same way human beings can.’
By fusing the technologies of ‘computational linguistics—rule-based modelling of human language—with statistical, machine learning, and deep learning models’, NLP allows ‘computers to process human language in the form of text or voice data and to ‘understand’ its full meaning, complete with the speaker or writer’s intent and sentiment.
IBM defines deep learning as a subset of machine learning ‘modelled to work like the human brain’ that learns from vast quantities of data. With the help of deep learning, the accuracy of the NLP solutions—in this case, the text itself, as well as how the machine understands the text’s true sentiment and meaning—improves over time. As deep learning algorithms ‘make predictions repeatedly’, they progressively learn and improve ‘the accuracy of the outcome over time.’
You have probably witnessed the increase in accuracy in NLP tools yourself in recent years: think of what Siri can understand and help you with now, as opposed to just a few of years ago.
NLP Tools, Techniques and Approaches
It is worth noting that we will not explore specific language models such as GPT-3 and BERT here. NLP is used for a variety of automated processes, and thus the specific tools and techniques used vary depending on what process is involved. As SAS notes, NLP techniques vary from ‘statistical and machine learning methods’ to more ‘rules-based and algorithmic approaches’ to account for the variety of text and voice data NLP can analyse.
NLP tasks can include ‘tokenization and parsing, lemmatization/stemming, part-of-speech tagging, language detection and identification of semantic relationships’. Overall, this is all in the aim in the name of simplifying language into ‘elemental pieces’ to better understand ‘relationships between the pieces and explore how the pieces work together to create meaning.’
Everyday Uses of NLP
It is highly likely that you have come across, or even use NLP every day knowingly or unknowingly. From voice-operated GPS apps to automated customer service chatbots, many tools which we now take from granted when solving day-to-day problems rapidly, are all driven by NLP.
IBM lists a variety of real-world examples of NLP, from spam detection (using NLP to scan an email’s text and determine whether it indicates phishing) to virtual assistants (queue, ‘Hey Siri?’), to machine translation (such as Google Translate): many of which you will have interacted with already.
Overall, whatever NLP is being utilised for, it is all about automating and streamlining processes. As IBM notes, the use of NLP is expanding across every type of enterprise to help ‘streamline business operations, increase employee productivity, and simplify mission-critical business processes.’
As a consumer, we want answers to where our parcel is fast. By asking a virtual chatbot to perform a simple task such as tracking the status of our order on a database, we can get an answer in seconds. If we have a huge body of text that needs summarized or a report condensed, NLP tools can help speed up this process. When we want rapid answers and simple task-based solutions, NLP can help.
But what about providing more complex solutions, understanding the true meaning of text when the complexities of human language get in the way, and the wider problematic issues surrounding bias that AI-based tools and techniques such as NLP unveils?
Current Limitations of NLP: Complexities of Natural Human Language and Bias
Complexities of Human Language: That’s Just Human Nature
The complexities of human language with its web of ambiguities, emotions and sentiments is almost always guaranteed to affect the meaning of the words in any given sentence. Then there is the physical aspect of speech to think about, which adds another level of complexity to human language: from diction to accent to the speed at which we speak.
Let us take voice recognition technology as just one example. We are not robots. We say things differently depending on who we are with and how we feel all the time (as well as having different accents, dialects, use of slang, speech rate and the like). Unravelling, analysing and quantifying the complexities of human language both physically and metaphysically is just one of the fascinating challenges that current NLP specialists are engaging with.
And it is improving all the time. From brands being able to detect irony across a set of Twitter posts, to chatbots being able to identify anger in a customer service message, to enterprises being able to gauge the average sentiment from their customers based on reviews or email campaign replies, NLP tools are only going to get ‘more human’ as they are fed more and more data. The fusion of automation and what we think of as uniquely human characteristics or privileges—emotion, empathy, freewill, and even consciousness—is one of the most fascinating aspects of NLP and AI generally, especially when we try to envisage where these technologies could take us.
But then arises a wider AI-related issue which has yielded worrying headlines across the globe. NLP, as with any AI tool, can pose serious problems in regard to bias.
Speaking to Niall Wharton, our Team Lead in Data Science, ML & Big Data Engineering hiring, he comments on the dangers of overreliance on text matching within a sector like recruitment, giving the example of CV screening.
“An overreliance on this could lead to very simplistic CV screening from recruitment software products, which may lead to inaccurate CV evaluation.
One of the criticisms levelled at human-centric recruitment is that less experienced consultants may need to resort to “keyword-matching” – at Xcede we focus on individual areas of expertise to get around this issue and actually understand the quality of applications that we get. If the text algorithms are too simple, that is when problems arise”
This is illustrative of the wider issues surrounding where AI / ML products are at in general right now.
“AI/ML have notoriously been discriminatory because the algorithms they’re run on are only as good as the data that they’re fed.
Any bias or prejudice that exists in human hiring processes could quite easily filter into the behaviour of the products from the training data.
There have been numerous high-profile cases of this when not spotted early on”.
Xcede’s Associate Director of Data, Matthew Jones also comments on the dangers of NLP being over relied on when scanning the landscape for new candidates.
“Recommendation algorithms can parse CVs and help rank candidates in predicted order of suitability for a position, but do not aid diverse hiring.”
The use of NLP is growing across all sectors. From implementing NLP tools and techniques for an organization to utilize in the first place, to advancing NLP models to find solutions to the complex challenges around bias and natural language adoption, to say NLP specialists are in demand is an understatement.
Jobs in NLP
Statistics clearly illustrate the exponential growth of NLP in recent years. In 2017 the NLP space was valued at about $3bn; by 2025, experts expect this to reach $43bn- (source: TechShout).
As SAS notes, NLP and text analytics in general can be applied to many roles, across many sectors. This can include everything from crime investigation, ‘to identify patterns and clues in emails or written reports to help detect and solve crimes’ to social media marketing and advertising, ‘to track awareness and sentiment about specific topics and identify key influencers’.
Speaking to Xcede’s Associate Director of Data Matthew Jones on the current demand, he notes:
“The demand for NLP is generally growing across every sector. For example, insurance companies use it to rapidly parse and summarise huge documents, retailers and brands use it to better understand how their brand is perceived on social media, and customer-facing brands might want to use it if they've got an online platform and they need a chatbot, for example.
As text is everywhere, once companies have the foundation of a data team in place, they want to do NLP work. It is a really interesting growth.”
As you can see, anywhere that text is present, NLP can be utilized to automate and streamline the task at hand. And as businesses undergo digital transformations to better automate and streamline their digital infrastructure, the use of NLP is only going to continue to grow.
Looking to make your next career move within NLP or discuss how you can progress your career in data? Get in touch with our dedicated data team today.