How Machines Learn to Speak Our Language. Understanding NLP for dummies

Natural language processing (NLP) has come a long way since its inception in the middle of the 20th century. Without boasting, I’m a direct witness to the transformation of this field, and can testify to how far we’ve come from tiny, rule-based systems to today’s most advanced artificial intelligence models.

As well as changing the way machines understand and process human language, it has definitely changed the way we interact with tech.

The beginnings

It all started with fairly old-fashioned attempts to analyze and understand human language. And so was born the Georges-IBM experiment in 1954, which at the time was insane because it showed the potential that machine translation could have.

Even though this experiment was limited in terms of technology or hardware, there were some nosy little investors who had a particular interest in NLP. And now that computing power is becoming monstrous, along with the evolution of language theories, NLP systems have been upgraded from rule-based approaches to stable methods in the 1980s and 1990s.

The language learning machine

Basically, NLP was just about machines understanding, interpreting and generating human language. And to achieve this process requires a very difficult analysis of syntax, semantics and pragmatics. These words may be difficult for the average person, but they are in fact the elements that make up linguistic understanding.

Syntax

In fact, syntax is there only for the structure of language, looking at how words come together to form sentences. Natural language processing uses grammar to analyze sentences, looking at the parts of speech and how words relate to each other.

Semantics

Semantics is more about the meaning of words and sentences, so it enables machines to understand what is written in the text, and this level of understanding goes far beyond the way it is formed grammatically.

Pragmatics

Now, pragmatics takes context into account, which helps natural language processing systems to understand the meaning they are intended to convey, according to the situation at hand.

I’d also like to point out that these components work together, enabling NLP systems to process language in a way that mimics the way humans use that language.

Digital linguists at work

Before NLPs can examine a text, the raw input must first be processed. This is an important step, as it transforms unstructured text into a format more suitable for automatic examination. Pre-processing text requires a number of techniques, each of which is important in preparing the data for NLP tasks.

Tokenization

Tokenization is often the first step, breaking down the text into smaller units called tokens. These can be words, phrases or even individual characters, I don’t know, but it depends precisely on the NLP task. For example, I take the sentence “the dog sat on the sofa“, you can see that it can be transformed into [“The”, “dog”, “sat”, “on”, “the”, “sofa”, “.”]. So that’s pretty much how it works.

Lemmatization and steaming

How lemmatization and stemming work — Day 4: Stemming and Lemmatization- Nomidl

Then there’s stemming and lemmatization, which are techniques we use to reduce words to their original form.

Stemming is a quick and simple process, a little crude you may say, and it cuts off the ends of words. For example, ” running” becomes “to run”.

Lemmatization, on the other hand, uses morphological analysis to restore a word to the form it originally had in the dictionary. So, if you’ve understood, stemming may reduce “better” to “bet”, but lemmatization will know perfectly well that “good” is its basic form.

Silicon polyglots

If you haven’t already heard, machine translation has been the Holy Grail of NLP since its inception. The systems they had in the early days used direct translation, so they’d take a word, translate it, take another word, translate it and so on, often with funny results (I think that’s how Google translation works, wouldn’t you agree?).

On the other hand, the translation systems we have today use highly sophisticated neural networks that are trained on large quantities of multilingual data, i.e. from several languages.

And you and I both know that today’s systems are capable of translating around a hundred languages with astonishing accuracy. They don’t just translate word by word, like Google translation; they understand the context, they understand idiomatic expressions and they’re able to maintain, as it were, the tone of the original text.

But bof! There are always challenges to improving when it comes to understanding multiple languages. If there are languages with structures that are totally different, or if there is a lack of digital resources, this can pose problems. Researchers are therefore obliged to explore other techniques, such as learning from scratch and unsupervised machine translation, to improve on the challenges these systems will face.

Sentiment detectives

There’s something called system analysis, and it’s all about knowing the emotional tone of a text.

So whether you want to analyze customer comments or gauge political opinion on social media, these sentiment analysis tools have become indispensable to businesses and organizations.

You should know that these systems have evolved away from simple keyword-based approaches.

Now they’ve become advanced models and can understand contrarianism, sarcasm and sentiments of all kinds. I’d also add that they can classify texts as positive, negative or neutral (if you’re a blogger and use Rank math for your SEO, you’ll notice that they give you a point when you use a positive or negative word in the title of your article) and there are even some systems that can detect specific emotions like anger, joy or surprise.

I’d also like you to know that the applications are numerous. For example, companies are using sentiment analysis to monitor what people think of their brand, so they can improve customer service.

For their part, financial institutions analyze market sentiment to predict stock market movements.

And social media platforms use these tools to detect and hide the spread of negative or harmful content.

Question answering systems

Here, I’m going to talk about question answering (QA) systems. They used to be simple algorithms that allowed models to communicate with each other.

Today, they understand and depend on complex requests. Previously, the only thing these systems did was limit themselves to specific domains and answers that had already been programmed, so they followed rules.

Now, QA systems benefit enormously from deep learning and understanding natural language to process questions in context, search a vast library of knowledge and generate human-like answers.

This is what powers Alexa and Siri, customer service chatbots and enterprise search tools.

When transformer models like BERT and GPT were developed, they greatly improved the way quality assurance systems performed.

Chatbots and digital assistants

Let’s face it, we’ve all been blown away by the evolution of conversational AI. Back in the 1960s, when the first chatbots like ELIZA were created, they used a then-trivial correspondence model to simulate conversations.

But now, it’s sophisticated NLPs that power these chatbots and digital assistants we have today. Not only can they understand context, they are also able to retain long-term memory and even adopt a kind of personality of their own.

If we take customer service as an example, they’re there to deal with routine inquiries, leaving human agents to concentrate on other things.

If I take the field of healthcare, chatbots are able to provide simple symptom assessment and mental health support.

And if I go to the education side, they can be used to help students answer their questions and offer personalized learning experiences. Unless those chatbots don’t lose their marbles and tell the students some nonsense like these kidsUnless those chatbots don’t lose their marbles and tell the students some nonsense like these kids

NLP in the wild

I’d like to delve a little deeper into what NLPs are capable of in healthcare. So, I’d like to mention that NLPs can analyze medical records, help with diagnosis and retrieve information from scientific literature.

There are even legal professionals who use NLP tools to review contracts, and they are just as indispensable for carrying out due diligence and analyzing jurisprudence, i.e. the law or the constitution.

I’m also thinking of finance, where NLPs power algorithmic trading systems that analyze news and social media to make investment decisions.

Marketing teams use NLPs to make content better, so they analyze trends and personalize advertising.

What’s crazy and funny, and surprising at the same time, is that creative fields aren’t leaving NLP behind. Like, it’s used to generate content, it helps write automatic subtitles for videos.

The ubiquity of NLP

I myself have seen how NLP is developing, and how its presence in our everyday lives is only growing. If I remember correctly, last week, I smiled a little because I saw how to communicate with my smart home appliance in a totally ultra-fluid way.

It was just crazy, it understood when I said to “dim the lights a bit” and I assure you I didn’t give any other precise commands, it’s just that, but it set the brightness to a level I liked I’d say COMFORTABLE, I was fine!!!

Even in blogger work, I use them, for example Grammarly to correct grammar, and it also suggests style improvements, it helps me maintain a consistent tone and

if I use for example Surfer SEO or Ahref, it sometimes helps me generate content ideas (I confess this article in fact I didn’t have the idea, it came from Chatgpt, I wanted to test how generated content ideas, were they good? Did they have a medium level of competition? because often if you ask GPT to generate keywords with low competition it will generate keywords that have no competition at all, like a score of Zero)

The impact of NLP on society

According to a report from Syracuse University, the impact of NLP on society is profound and far-reaching. The report states:

NLP is used in AI chat bots and automated phone support to help diagnose issues without the need for a person in a call center. NLP is also used in automatic language transcription and translation, such as with automatic subtitles in YouTube. Researchers use NLP regularly to scrub websites for information and analyze that information for keywords or phrases.

This report also says that NLP was born in 1906, when Swiss language professor Ferdinand de Saussure created several courses at the University of Geneva. In this course, it was said that language was like a system in which sounds were like concepts, and these concepts had meanings that changed according to context.

This work, later published under the title Cours de Linguistique Générale, (General linguistics course in English ) laid the foundations for what we know today as NLP.

According to Syracuse university In the future, we’ll (perhaps) have machines that pass the Turing test with flying colors, and if all goes well, we’ll also have better real-time translation of speech and text.

Conclusion

Chatterbox, I know this article has been long and I apologize for that, I hope if you’ve made it this far it’s because you’re dying to know about Natural language processing or you’re a technophile.

But thank you for reading this far and I’ll leave you saying that I totally agree that NLP will continue to improve our world, whether in communication or to enable more natural interactions with AIs, but if it’s developed irresponsibly without taking into account privacy, prejudice and the impact on society, there will be a serious problem.

At the same time, I’m discovering how ingenious humans are, and how they’re always trying to make machines understand better. It’s totally interesting!