May 17, 20202 min read

N-Grams in Natural Language Processing

Updated: Dec 16, 2022

What are N-Grams?

N-Grams refers to sequence of words depending upon what value of N we take. If N is one then it is called a Unigram, if N is 2 then it is called bigram and so on.

Why N-Grams?

N-Grams is one of the simplest model in NLP . It makes use of previous N-1 words to predict new word. It is mainly used in texting,grammar correction,machine translation, speech recognition(previous word improves prediction strength of next work).

How do we we calculate N-Grams?

Let’s say you have to compute the probability of word ‘proved’ occurring after “The earth is not round was”. How do you calculate mathematically? Conditional Probability, right?

P(proved|The earth is not round was)

And to calculate this we first calculate the count (proved ∩ The earth is not round was) and count(The earth is not round was) . Imagine calculating it over all the history books. No less than a horrible dream,right? So to resolve this we come up with bigram P(proved| was) & trigram model P(proved|round was)where we just use previous one word and previous two words respectively rather than taking previous complete sentence. This is accurate to very good extent and was proved by Markov assumption which states that probability of a word depends on probability of limited history.

The general equation for N-Gram model to calculate probability of next word in sequence is below

An intuitive way to estimate probabilities is called maximum likelihood estimation or MLE. We get maximum likelihood estimation the MLE estimate for the parameters of an n-gram model by getting counts from a corpus, and normalizing the counts so that they lie between 0 and 1.

Let’s consider a small corpus and calculate unigram, bigram probabilities for the same

The boy ate an ice cream.
The girl bought an ice cream.
The girl then ate the ice cream.
The boy bought a toy.

Now calculate unigram probability for below sentence

The boy bought an ice cream

So we need to calculate probability of each word in above corpus.

=5/24 * 2/24 * 2/24 *2/24 *3/24 * 3/24

=360/24⁶

Now let’s calculate bigram probability for the same sentence

=Cnt(The ∩ boy)/Cnt(The) * Cnt(boy ∩ bought)/Cnt(boy) * Cnt(bought ∩ an)/Cnt(bought) * Cnt(an ∩ ice)/Cnt(an) * Cnt(ice ∩ cream)/Cnt(ice)

=2/5 * 1/2 * 1/2 * 2/2 * 3/3

=1/10=.1

So as we observed while calculating the bigram probability of (W₁ W₂) we just calculate count of (W₁ W₂) occurring together and divide it by frequency of W₁. The same concept can be extended to trigrams and so on.

Author: Sahil Pahuja

https://www.linkedin.com/in/sahil0094/

DATADNA

N-Grams in Natural Language Processing

Recent Posts

Comments

Want to Grow with us?

Company Services

Platform Development

Mobile Development

Artificial Intelligence

Digital Marketing

Our Work

Projects

Industry Use Cases

Plans

Help

Contact

Book Online

Brand Visibility

Address

Contact

Mobile: +91 9840292498

projects@datadna.in

Social Media

Return & Refund Policy

Privacy Policy

Terms & Condition