NLP CBSE

What is NLP(Natural language processing)?

In this post, we discuss what NLP is and how it differs from CV and basic data science algorithms we have seen before.

To understand a very basic and unique feature associated with texts, lets consider these examples. Suppose I ask you to recite the alphabets A-Z as fast as possible.

Record the time you took. Now try to recite the alphabets but this time starting from Z and move backward towards A. Can you do that with the same speed as before?

Same goes for a song or a poem.

So one of the very unique features of the text is the presence of a sequence. When it comes to language, it’s not just the words, but their sequence that makes a meaningful and complete sentence.

NLP

Problem Scoping

So what exactly is the outcome we desire from an NLP task? We want an AI model that can understand not only the words but their order, understand the sentiment associated with a sentence, can respond to texts, and perform translations.

A unique point about language translation is that there are:

  1. More than one way to translate a given sentence into another.
  2. The output length is not fixed and depends on the words being used.
  3. We need to think of a loss function which deals with such output conditions.

Problems with Text Data

Making models on text data requires a lot of cleaning and preprocessing.

Real life text data comes from internet sources, surveys and other fields where people of different education backgrounds write “content”. Consider a case where data is acquired from You-tube comments section.

Lets try to list down a couple of problems that we might face as an outcome of the same:

  1. Spelling mistakes can ruin data as a computer wont consider “beotiful” and “beautiful” as same words.
  2. Slang language mix-up words like “wanna”, “gotcha” would be considered as different words.
  3. Tasty, delicious, tastiest, yummy might be four different words but are technical conveying the same information.
  4. A lot of words would just increase the vocabulary size and wont help much in model building. Words like “is, as, a, an” and others are just there for grammatical purpose and wont add up much while making sense out of a sentence.
  5. We need to develop an algorithm that can incorporate the “sequential information” along with understanding words.
  6. Given that there are too many words in any given language the dimensionality of NLP tasks tend to be higher.

Applications of NLP

Here we just list down the various applications/ fields in which natural language processing is used. We will take up each topic in detail in posts that follow:

  1. Sentiment analysis in tweets and product reviews
  2. Fake news classifier
  3. Document classifier
  4. Language translation
  5. Processing voice commands
  6. Building responsive chatbots

In further posts we will discuss text preprocessing techniques like stemming, lemmatization , removing duplicates and then discuss few algorithms that deal with NLP tasks.

IMAGE REF: HERE
CV COMPUTER VISION

CV-What is Computer vision?

In this post, we discuss about computer vision (CV) and its applications.

We Probably view images of faces and trees differently, with different sentiments and reactions. But let’s recall that for a computer or any computing device any image is just a distribution of pixels. For a computer, the only difference between a tree and a face is a mere difference in RGB channels.

But without a doubt we live in a world where cell phones have face identifying features, Facebook marks faces and allows you to tag people, crime departments can recognize a particular person just based on facial features and so much more.

So how is it that a machine finds intelligent and meaningful patterns from a set of mathematical values that are pixel intensity values in RGB? Let’s have a look:

What is an image?

image ref: here

An image is just a 2D matrix of pixel values (grayscale images) and 3 channels of 2D matrices (RGB) in case of colored images.

Below you can have a look as to how RGB channels together constitute an image :

image ref: here

The task of CV

The objective of CV is to find patterns from these pixel values and give the desired output. Now on the basis of the problem statement there are different categories of CV tasks. Lets have a look at them :

Classification:

Given a single object in an image classify it as a particular instance.

For example building a classifier which takes as input an image of an animal and classifies it as a particular animal.

Classification along with Localization

Given a single object in an image classify it as a particular instance and also indicate where the object is located in the image. Note that this can be done for single as well as multiple objects.

Below image will clarify the difference:

image ref: here

Object Detection

A very simple example to understand the concept of object detection is to consider the problem statement of an auto-pilot software.

The car tries to detect and classify objects it sees ,along with the specific location and distance from itself. It then uses this information to make driving decisions.

Have a look at this image where the autopilot software is trying to detect and classify humans, vehicles, lanes along with making boxes around their locations. There can be multiple objects of the same instance. For example multiple cars and human beings.

image ref: here

Instance Segmentation

Image segmentation refers to building Models that can provide you classification results on a pixel level. Image segmentation can be further divided into two categories.

  1. Instance segmentation
  2. Semantic Segmentation

Below image helps you to visualize the differences between all the categories, notice how segmentation demands a greater level of pixel accuracy.

IMAGE REF: HERE

Feel free to explore more resources. You can check out this video below:

Applications of data science

Applications of data science/AI

In this post we list down 5 applications of data science .Each application will help you visualize and understand how diversified data science /AI has become.

E Commerce websites

The most common use of AI that you can think of in today’s world is in recommendation systems. E commerce platforms like Amazon and flipkart have a feature which shows you products that are similar top the one you are viewing or may have viewed in past . Such features are based on learning algorithms which use either your history or the products internal data to suggest new similar products.

Similar is the recommendation system of Netflix.

AI IN E-commerce
image REF: https://www.slideshare.net/hava101/recommendations-play-flipkart-14115791

Medical and healthcare

Recently many advancements have been made in the medical domain where intelligent softwares are being used in order to predict the presence of diseases and helping to diagnose them. Many intelligent softwares have been developed which can visualize patterns from images like CT scans , X-rays and give meaningful insights. One such example is using brain scan results to predict the presence of tumor or possible tumors.

Also many wearable products are being produced which can sense/record your heartbeat and give you warnings and insights about your physical conditions.

AI in HEALTHCARE
image ref : here

Finance

Over the last few years banking has majorly shifted towards the online domain. And with that has increased the chances of frauds and cyber security issues . Your bank details /personal data are recorded during transactions and their security is of peak importance . AI has helped Finance companies to come up with models that can detect and flag fraudulent transactions .

Algorithmic trading and stock price prediction is yet another field where AI/data science is being significantly being used.

AI in finace
image ref: here

Social media

Ever imagined how you get friend recommendations on Social media platforms. The idea behind the feature is to predict “what are the chances you might follow a certain person” . Platforms like facebook and instagram uses the data of its users to build models that can help them with making such decisions. Graph based algorithms are used to study the connections and build solutions

graph data science
ref: giphy

Automobile industry( AUTO-PILOT)

Various automobile companies are coming up with autopilot technologies , TESLA particularly famous. The Intelligent softwares try to learn and replicate the behaviour of a human driver. Tesla autopilot provides features like Lane changing , auto park , summoning car in parking lot and more. The idea here is to collect data from immediate surroundings and make human like sensible decisions. Various sensors are present which collect data from the surroundings.

image ref: here

Above we discussed 5 basic domains where Artificial intelligence is used . There are many more real life problems which are solved using intelligent and learning based programs. Feel free to explore more!

What is data science / AI

What is data science, its purpose

Data science: definition, existence and purpose. It is not difficult to realize how enormously the amount of data , its transfer rate and storage demands have increased over the last decade . Its amusing to see how within a span of few years we have moved from CDs, to pen drives and from pen drives to cloud storage services. The amount of data being generated and stored has led to development of multiple new technologies.

How has data Collection increased?

If we compare the online services that we use today to their offline counterparts we can easily spot the differences that has led to such enormous amounts of data being generated . Lets compare it sing a very simple example, something as simple as watching movies .

Ten years back to watch a movie, all you had to do was read a news paper , see the movie timings of the theater located nearby, drive up to that place , buy tickets using cash and the only people who could hear your reviews later were the people you talked to.

Now compare this scenario to the present situation where every step you take leaves a digital footprint , a form of data that is stored and recorded, at various places. Suppose you look up for a movie , google records that data , you use a Book my show to buy tickets , it records your details , the number of tickets you bought and had it been a streaming service like Netflix it would have recorded which genre , actor you are interested in to give future recommendations.

Your bank /UPI records the data of the transaction that took place. After watching the movie you can write reviews on multiple platforms (text data) or upload a review video on you-tube(video data). Using your search history data google refines the news section on your device.

Using the data of all its customers movie service providers calculate the profit , the market trend and so much more. Also you want movie recommendations from Netflix that suit your taste! Look at this picture below to understand how data generation has increased.

Data science

Data science is the name given to the field dedicated to extract meaningful patterns , mathematical relations from huge amounts o data.

Its purpose is to make out sensible and useful interpretations out of otherwise seemingly weird data which can be used to make business scale predictions. Sometimes its a piece of cake , sometimes it requires intuitive thinking and huge computing resources.

Nevertheless its important to realize that a lot of real life problems are based on chances rather than clear cut boundaries. Consider a problem of weather forecasting. The problem focuses on using the weather data of past few days/weeks to predict the possible weather conditions for the next two days.

Also whenever we deal with data that is on a huge scale Statistics always comes to play . Comparisons are made in terms of averages , for example ” last year the average rainfall was x mms”. And any prediction would always be in terms of chances .Look at the following statements which you may have come across;

  1. Its highly probable it may rain tomorrow
  2. You may like this product -Amazon recommendations
  3. You may like this song -spotify recommendations

The data science models or Artificial intelligence models always come up with a prediction that has higher chances /probability of being true . There are multiple domains in which AI has set its foot . Be it the medical domain where today where intelligent softwares are being trained using patients data records to predict whether a new patient has the chances of same disease or not. Or be it the finance domain which is using AI models to detect fraudulent transactions. The scope is ever increasing !