What is NLP(Natural language processing)?

In this post, we discuss what NLP is and how it differs from CV and basic data science algorithms we have seen before.

To understand a very basic and unique feature associated with texts, lets consider these examples. Suppose I ask you to recite the alphabets A-Z as fast as possible.

Record the time you took. Now try to recite the alphabets but this time starting from Z and move backward towards A. Can you do that with the same speed as before?

Same goes for a song or a poem.

So one of the very unique features of the text is the presence of a sequence. When it comes to language, it’s not just the words, but their sequence that makes a meaningful and complete sentence.


Problem Scoping

So what exactly is the outcome we desire from an NLP task? We want an AI model that can understand not only the words but their order, understand the sentiment associated with a sentence, can respond to texts, and perform translations.

A unique point about language translation is that there are:

  1. More than one way to translate a given sentence into another.
  2. The output length is not fixed and depends on the words being used.
  3. We need to think of a loss function which deals with such output conditions.

Problems with Text Data

Making models on text data requires a lot of cleaning and preprocessing.

Real life text data comes from internet sources, surveys and other fields where people of different education backgrounds write “content”. Consider a case where data is acquired from You-tube comments section.

Lets try to list down a couple of problems that we might face as an outcome of the same:

  1. Spelling mistakes can ruin data as a computer wont consider “beotiful” and “beautiful” as same words.
  2. Slang language mix-up words like “wanna”, “gotcha” would be considered as different words.
  3. Tasty, delicious, tastiest, yummy might be four different words but are technical conveying the same information.
  4. A lot of words would just increase the vocabulary size and wont help much in model building. Words like “is, as, a, an” and others are just there for grammatical purpose and wont add up much while making sense out of a sentence.
  5. We need to develop an algorithm that can incorporate the “sequential information” along with understanding words.
  6. Given that there are too many words in any given language the dimensionality of NLP tasks tend to be higher.

Applications of NLP

Here we just list down the various applications/ fields in which natural language processing is used. We will take up each topic in detail in posts that follow:

  1. Sentiment analysis in tweets and product reviews
  2. Fake news classifier
  3. Document classifier
  4. Language translation
  5. Processing voice commands
  6. Building responsive chatbots

In further posts we will discuss text preprocessing techniques like stemming, lemmatization , removing duplicates and then discuss few algorithms that deal with NLP tasks.


CV-What is Computer vision?

In this post, we discuss about computer vision (CV) and its applications.

We Probably view images of faces and trees differently, with different sentiments and reactions. But let’s recall that for a computer or any computing device any image is just a distribution of pixels. For a computer, the only difference between a tree and a face is a mere difference in RGB channels.

But without a doubt we live in a world where cell phones have face identifying features, Facebook marks faces and allows you to tag people, crime departments can recognize a particular person just based on facial features and so much more.

So how is it that a machine finds intelligent and meaningful patterns from a set of mathematical values that are pixel intensity values in RGB? Let’s have a look:

What is an image?

image ref: here

An image is just a 2D matrix of pixel values (grayscale images) and 3 channels of 2D matrices (RGB) in case of colored images.

Below you can have a look as to how RGB channels together constitute an image :

image ref: here

The task of CV

The objective of CV is to find patterns from these pixel values and give the desired output. Now on the basis of the problem statement there are different categories of CV tasks. Lets have a look at them :


Given a single object in an image classify it as a particular instance.

For example building a classifier which takes as input an image of an animal and classifies it as a particular animal.

Classification along with Localization

Given a single object in an image classify it as a particular instance and also indicate where the object is located in the image. Note that this can be done for single as well as multiple objects.

Below image will clarify the difference:

image ref: here

Object Detection

A very simple example to understand the concept of object detection is to consider the problem statement of an auto-pilot software.

The car tries to detect and classify objects it sees ,along with the specific location and distance from itself. It then uses this information to make driving decisions.

Have a look at this image where the autopilot software is trying to detect and classify humans, vehicles, lanes along with making boxes around their locations. There can be multiple objects of the same instance. For example multiple cars and human beings.

image ref: here

Instance Segmentation

Image segmentation refers to building Models that can provide you classification results on a pixel level. Image segmentation can be further divided into two categories.

  1. Instance segmentation
  2. Semantic Segmentation

Below image helps you to visualize the differences between all the categories, notice how segmentation demands a greater level of pixel accuracy.


Feel free to explore more resources. You can check out this video below:

Applications of data science

Applications of data science/AI

In this post we list down 5 applications of data science .Each application will help you visualize and understand how diversified data science /AI has become.

E Commerce websites

The most common use of AI that you can think of in today’s world is in recommendation systems. E commerce platforms like Amazon and flipkart have a feature which shows you products that are similar top the one you are viewing or may have viewed in past . Such features are based on learning algorithms which use either your history or the products internal data to suggest new similar products.

Similar is the recommendation system of Netflix.

AI IN E-commerce
image REF:

Medical and healthcare

Recently many advancements have been made in the medical domain where intelligent softwares are being used in order to predict the presence of diseases and helping to diagnose them. Many intelligent softwares have been developed which can visualize patterns from images like CT scans , X-rays and give meaningful insights. One such example is using brain scan results to predict the presence of tumor or possible tumors.

Also many wearable products are being produced which can sense/record your heartbeat and give you warnings and insights about your physical conditions.

image ref : here


Over the last few years banking has majorly shifted towards the online domain. And with that has increased the chances of frauds and cyber security issues . Your bank details /personal data are recorded during transactions and their security is of peak importance . AI has helped Finance companies to come up with models that can detect and flag fraudulent transactions .

Algorithmic trading and stock price prediction is yet another field where AI/data science is being significantly being used.

AI in finace
image ref: here

Social media

Ever imagined how you get friend recommendations on Social media platforms. The idea behind the feature is to predict “what are the chances you might follow a certain person” . Platforms like facebook and instagram uses the data of its users to build models that can help them with making such decisions. Graph based algorithms are used to study the connections and build solutions

graph data science
ref: giphy

Automobile industry( AUTO-PILOT)

Various automobile companies are coming up with autopilot technologies , TESLA particularly famous. The Intelligent softwares try to learn and replicate the behaviour of a human driver. Tesla autopilot provides features like Lane changing , auto park , summoning car in parking lot and more. The idea here is to collect data from immediate surroundings and make human like sensible decisions. Various sensors are present which collect data from the surroundings.

image ref: here

Above we discussed 5 basic domains where Artificial intelligence is used . There are many more real life problems which are solved using intelligent and learning based programs. Feel free to explore more!