In this post, we discuss about computer vision (CV) and its applications.
We Probably view images of faces and trees differently, with different sentiments and reactions. But let’s recall that for a computer or any computing device any image is just a distribution of pixels. For a computer, the only difference between a tree and a face is a mere difference in RGB channels.
But without a doubt we live in a world where cell phones have face identifying features, Facebook marks faces and allows you to tag people, crime departments can recognize a particular person just based on facial features and so much more.
So how is it that a machine finds intelligent and meaningful patterns from a set of mathematical values that are pixel intensity values in RGB? Let’s have a look:
What is an image?
An image is just a 2D matrix of pixel values (grayscale images) and 3 channels of 2D matrices (RGB) in case of colored images.
Below you can have a look as to how RGB channels together constitute an image :
The task of CV
The objective of CV is to find patterns from these pixel values and give the desired output. Now on the basis of the problem statement there are different categories of CV tasks. Lets have a look at them :
Given a single object in an image classify it as a particular instance.
For example building a classifier which takes as input an image of an animal and classifies it as a particular animal.
Classification along with Localization
Given a single object in an image classify it as a particular instance and also indicate where the object is located in the image. Note that this can be done for single as well as multiple objects.
Below image will clarify the difference:
A very simple example to understand the concept of object detection is to consider the problem statement of an auto-pilot software.
The car tries to detect and classify objects it sees ,along with the specific location and distance from itself. It then uses this information to make driving decisions.
Have a look at this image where the autopilot software is trying to detect and classify humans, vehicles, lanes along with making boxes around their locations. There can be multiple objects of the same instance. For example multiple cars and human beings.
Image segmentation refers to building Models that can provide you classification results on a pixel level. Image segmentation can be further divided into two categories.
- Instance segmentation
- Semantic Segmentation
Below image helps you to visualize the differences between all the categories, notice how segmentation demands a greater level of pixel accuracy.
Feel free to explore more resources. You can check out this video below: