Natural Language Processing (NLP)


This entry is part 1 of 2 in the series NLP

What is natural language processing (NLP)? Wikipedia says: “Natural language processing is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data.” It’s a branch of artificial intelligence (AI) concerned with giving computers the ability to understand text and spoken words in much the same way human beings can. Have a look at our post on ChatGPT.

IBM says “NLP combines computational linguistics—rule-based modeling of human language—with statistical, machine learning, and deep learning models. Together, these technologies enable computers to process human language in the form of text or voice data and to ‘understand’ its full meaning, complete with the speaker or writer’s intent and sentiment.”

Natural language processing is concerned with giving computers the ability to support and manipulate human language. Can a computer “understand” the contents of documents? In order to understand, you must be aware of the context. Can a computer accurately extract information and insights contained in the documents? These tasks are broad and difficult, so computer scientists break it down into smaller tasks such as translation, summarization, classification, question answering and more.

NLP Use Cases

Suppose you have a bunch of documents and you want to group them by topic. Suppose you are sifting through thousands of articles in search of articles containing a certain topic. NLP can help with these tasks.

Text is a type of unstructured data. A collection of texts is also sometimes called “corpus”.

With the documents in front of you, the first thing you could do is count all of the words in each document and keep track of the frequency.

Optical Character Recognition (OCR)

Optical character recognition (OCR) is an AI technique designed to extract characters from images and turn them into machine- and human-readable text.

Series NavigationBag of Words >>

Leave a Reply