THIS ARTICLE SUMMARISES THE TIME COMPLEXITIES OF THE MOST COMMONLY USED AND ENQUIRED MACHINE LEARNING MODELS.

TIME COMPLEXITIES ARE IMPORTANT TO REMEMBER IN RELATION TO THE FOLLOWING QUESTIONS ASKED:

- WHEN WOULD YOU CHOOSE ONE MODEL OVER THE OTHER.
- IS A PARTICULAR ALGORITHM GOOD FOR INTERNET APPLICATIONS ?
- WHETHER AN ALGORITHM SCALES GOOD ON LARGE DATSETS/DATABASES.
- IS RE-TRAINING A PARTICULAR MODEL COSTLY IN TERMS OF RESOURCES.
- IF INPUT DISTRIBUTION IS CHANGING CONSTANTLY WHICH MODEL IS PREFERABLE.

LETS START OUR DISCUSSION ON TIME COMPLEXITIES . I WILL ASSUME YOU KNOW THE BASIC INTUITION OF ALL THE ALGORITHMS AND HENCE THIS WILL JUST BE A CRISP TO THE POINT SUMMARY !

REMEMBER , MANY TIMES YOU MIGHT FIND DIFFERENT ANSWERS ON THE INTERNET FOR SAME ALGO BECAUSE OF THE UNDERLYING DATA STRUCTURE OR CODE OPTIMISATION USED. OR MAYBE BECAUSE THEY ARE STATING BEST /WORST CONDITIONS . BEST APPROACH IS TO KNOW ONE COMPLEXITY AND THE CORRESPONDING PROCEDURE USED .

**LINEAR REGRESSION **

TERMINOLOGY :

DATASET = {X,Y} DIMENSIONS OF X= n*m , DIMENSIONS OF Y = n*1

train -time complexity = O((m^2)*(n+m)) run -time complexity= O(m) space complexity(during run time)= O(m) conclusion: small run time and space complexity! good for low latency problems . source:https://levelup.gitconnected.com/train-test-complexity-and-space-complexity-of-linear-regression-26b604dcdfa3#:~:text=So%2C%20runtime%20complexity%20of%20Linear,features%2Fdimension%20of%20the%20data.

## LOGISTIC REGRESSION

TERMINOLOGY:

DATASET = {X,Y} DIMENSIONS OF X= n*m , DIMENSIONS OF Y = n*1

train -time complexity = O(n*m) run -time complexity= O(m) space complexity(during run time)= O(m) conclusion: small run time and space complexity! good for low latency problems .

## NAIVE BAYES

DATASET = {X,Y} DIMENSIONS OF X= n*d , DIMENSIONS OF Y = n*1 CLASSES =C

train -time complexity = O(n*d*c) run -time complexity= O(d*c) space complexity(during run time)= O(d*c) conclusion: good for high dimensional data set .used in spam detection and other simple non semantic NLP problems . considered as base model for comparision with complex models.

## SVM

DATASET = {X,Y} DIMENSIONS OF X= n*d , DIMENSIONS OF Y = n*1 S=NUMBER OF SUPPORT VECTORS

train -time complexity = O(n^2) run -time complexity= O(S*d) space complexity(during run time)= O(S) conclusion: high train time complexity .but low latency and space complexity.

## DECISION TREES +RF +GBDT

n=NUMBER OF SAMPLE DATA POINTS ,

d = THE DIMENSIONALITY

k =depth of decision tree

p = number of nodes

train -time complexity = O(n*log(n)*d) run -time complexity= O(k) space complexity(during run time)= O(p)NOW IF WE ARE TAKING OF RANDOM FOREST (M BASE LEARNERS)train -time complexity = O(n*log(n)*d*m) run -time complexity= O(k*m) space complexity(during run time)= O(p*m)TALKING OF GBDT(m SUCCESIVE MODELS , b is the shrinkage factor(m such coefficients , it will ultimately be a constant addition in complexity and can be ignored asymptotically but its better to point out during interviews as it is what separates it from RF )

train -time complexity = O(n*log(n)*d*m) run -time complexity= O(k*m) space complexity(during run time)= O(p*m +b*m)

**K-NEAREST NEIGHBOURS**(brute force)

K= number of nearest neighbours

d= dimensions

(no training phase)

run-time complexity = O(n*d) ( the "+kd" term is ignored asymptotically space complexity(during run time)= O(n*d) https://stats.stackexchange.com/questions/219655/k-nn-computational-complexity

## kd-tree

K= number of nearest neighbours

d= dimensions

An **algorithm** that builds a balanced k-d **tree** to sort points has a worst-case **complexity** of O(kn log n).

run-time complexity(best-case)=O(2^d *(k)*log n) (log n to find cells “near” the query point 2d to search around cells in that neighborhood ) run-time complexity(worst-case)=O(k*n)space-complexity= O(K·N)

AS you can see not suitable for very large dimensions .

## CLUSTERING ALGORITHMS

NAME OF ALGO | TRAIN COMPL. | SPACE-COMPLEXITY | REMARK |

K MEANS | O(n*k*d*i) n=points d=dimensions k=number of clusters i=iterations | O(nd+kd) “kd” for centroids | ASYMPTOTICALLY FOR SMALL d and k =O(nd) |

hierarchial clustering | O(n^3) | O(n^2) | not suitable for low latency and low space problems |

DBSCAN | O(n^2) but can be made O(nlogn) in lower dimensions using efficient data structures | O(n) {d<<n} | better than heirarchial {in terms of complexity} |

## SVD (SINGULAR VALUE DECOMPOSITION)

NOW THIS IS A TRICKY ONE, VARIOUS METHODS ARE AVAILABLE AND GENERALLY ON THE INTERNET YOU WILL FIND MULTIPLE ANSWERS TO THIS QUESTION :

BELOW I STATE WHAT WIKIPEDIA STATES , (IF YOU STATE ANY OTHER ANSWER IN INTERVIEW JUST BE SURE REMEMBER TO UNDERSTAND THE CORRESPONDING ALGO USED)

FOR One matrix, TIME COMPLXITY () https://en.wikipedia.org/wiki/Computational_complexity_of_mathematical_operations

## PCA ,t-SNE

N=NUMBER OF POINTS , D=DIMEMTIONALITY

TRAIN-TIME COMPLEXITY (PCA) =(ND×min(N, D) +D³)

EXPLANATION:

{The complexity of **Covariance matrix computation** is O(D^2(n)). Its eigenvalue decomposition is O(D^3).}

NOTICE THAT AFTER THIS WE JUST SELECT FIRST d (TARGET DIIMENSION ) VECTORS

The main reason for t-SNE being slower than PCA is that no analytical solution exists for the criterion that is being optimised. Instead, a solution must be approximated through gradient descend iterations.

t-SNE has a quadratic time and space complexity in the number of data points.

T-SNE requires O( **3N^2)** of memory.

https://www.geeksforgeeks.org/ml-t-distributed-stochastic-neighbor-embedding-t-sne-algorithm/