Data science: definition, existence and purpose. It is not difficult to realize how enormously the amount of data , its transfer rate and storage demands have increased over the last decade . Its amusing to see how within a span of few years we have moved from CDs, to pen drives and from pen drives to cloud storage services. The amount of data being generated and stored has led to development of multiple new technologies.
How has data Collection increased?
If we compare the online services that we use today to their offline counterparts we can easily spot the differences that has led to such enormous amounts of data being generated . Lets compare it sing a very simple example, something as simple as watching movies .
Ten years back to watch a movie, all you had to do was read a news paper , see the movie timings of the theater located nearby, drive up to that place , buy tickets using cash and the only people who could hear your reviews later were the people you talked to.
Now compare this scenario to the present situation where every step you take leaves a digital footprint , a form of data that is stored and recorded, at various places. Suppose you look up for a movie , google records that data , you use a Book my show to buy tickets , it records your details , the number of tickets you bought and had it been a streaming service like Netflix it would have recorded which genre , actor you are interested in to give future recommendations.
Your bank /UPI records the data of the transaction that took place. After watching the movie you can write reviews on multiple platforms (text data) or upload a review video on you-tube(video data). Using your search history data google refines the news section on your device.
Using the data of all its customers movie service providers calculate the profit , the market trend and so much more. Also you want movie recommendations from Netflix that suit your taste! Look at this picture below to understand how data generation has increased.
Data science is the name given to the field dedicated to extract meaningful patterns , mathematical relations from huge amounts o data.
Its purpose is to make out sensible and useful interpretations out of otherwise seemingly weird data which can be used to make business scale predictions. Sometimes its a piece of cake , sometimes it requires intuitive thinking and huge computing resources.
Nevertheless its important to realize that a lot of real life problems are based on chances rather than clear cut boundaries. Consider a problem of weather forecasting. The problem focuses on using the weather data of past few days/weeks to predict the possible weather conditions for the next two days.
Also whenever we deal with data that is on a huge scale Statistics always comes to play . Comparisons are made in terms of averages , for example ” last year the average rainfall was x mms”. And any prediction would always be in terms of chances .Look at the following statements which you may have come across;
- Its highly probable it may rain tomorrow
- You may like this product -Amazon recommendations
- You may like this song -spotify recommendations
The data science models or Artificial intelligence models always come up with a prediction that has higher chances /probability of being true . There are multiple domains in which AI has set its foot . Be it the medical domain where today where intelligent softwares are being trained using patients data records to predict whether a new patient has the chances of same disease or not. Or be it the finance domain which is using AI models to detect fraudulent transactions. The scope is ever increasing !