In this post we discuss the steps taken to solve a problem using data science and artificial intelligence aka the AI project cycle
There are 5 steps in building an AI powered solution . Have a look at the picture below , we discuss each stage one by one.
Understanding the problem statement and business constraints is very important before jumping into developing a solution . Business constraints help you to realize the quality and terms of the desired solution. For example compare two situations , suppose you are building “google translate” , what are the business constraints ?
- Your model needs to understand text data
- the result should be as close as possible and grammatically correct. Slight errors are ignorable.
- The result should be displayed in milliseconds for better user experience.
Now compare it to an AI solution which predicts presence of a certain tumor using CT scans. So how do your business constraints look like now?
- Your model needs to find patterns in image data (pixels)
- The consequences of false predictions can be fatal. Errors can be deadly.
- The result can take an hour to be displayed : “take your time ,just be accurate. “
See how different cases lead to different demands? Also the type of desirable output changes. In the first case you want a text output , in the second case you want a classification , that is to what category a patient belongs ? Healthy or needing diagnosis.
For performing analysis on data first you need to gather data , from reliable data sources. Real life data can be weird and misleading. Human entries are always prone to errors, for example someone mistyping 30.0 as 3.00 . Somebody making spelling mistakes or maybe labelling the data wrong.
Data can be collected from various sources like:
- web pages
- devices like cameras and sensors ( eg. in Autopilots , weather predictions)
- Public surveys /records of purchases, transactions, registrations and more
Once you gather data you need to perform operations like data cleaning to find missing values , to remove useless data and perform basic statistical analysis like drawing plots, comparing different features of the data set and more. The entire process is known as EXPLORATORY DATA ANALYSIS.
It helps to see which features are more important and what is the overall trend of the data that you have. For example suppose you have a data set which has the features of a house ( like number of bedrooms , bathrooms , floor area etc.) and the final market price of the house.
You would expect the house price to vary linearly with data in smaller cities and it may vary quadratically in metro-politan areas depending on location . A sea facing duplex near marine drive in Mumbai will be way more expensive than a duplex located in a rural area.
Now having cleaned the data and understanding the basic trends , a data scientist tries to formulate an approximate mathematical relation between features and the final market price. We use feature importances in deciding the final price.
The ability to mathematically describe the
relationship between parameters is the heart of every AI model.
The final task is to test the trained AI model on new real life data and see how it is performing. According to problem statement different “loss functions” are used to see how much error our model is making. The purpose of these functions is to provide a mathematical estimate as to how far we are from making correct predictions
Finally if the model performs well on unseen new data , the deployment( using it as service on internet applications) stage is started.