# TIME COMPLEXITIES OF ML ALGORITHMS

THIS ARTICLE SUMMARISES THE TIME COMPLEXITIES OF THE MOST COMMONLY USED AND ENQUIRED MACHINE LEARNING MODELS.

TIME COMPLEXITIES ARE IMPORTANT TO REMEMBER IN RELATION TO THE FOLLOWING QUESTIONS ASKED:

1. WHEN WOULD YOU CHOOSE ONE MODEL OVER THE OTHER.
2. IS A PARTICULAR ALGORITHM GOOD FOR INTERNET APPLICATIONS ?
3. WHETHER AN ALGORITHM SCALES GOOD ON LARGE DATSETS/DATABASES.
4. IS RE-TRAINING A PARTICULAR MODEL COSTLY IN TERMS OF RESOURCES.
5. IF INPUT DISTRIBUTION IS CHANGING CONSTANTLY WHICH MODEL IS PREFERABLE.

LETS START OUR DISCUSSION ON TIME COMPLEXITIES . I WILL ASSUME YOU KNOW THE BASIC INTUITION OF ALL THE ALGORITHMS AND HENCE THIS WILL JUST BE A CRISP TO THE POINT SUMMARY !

REMEMBER , MANY TIMES YOU MIGHT FIND DIFFERENT ANSWERS ON THE INTERNET FOR SAME ALGO BECAUSE OF THE UNDERLYING DATA STRUCTURE OR CODE OPTIMISATION USED. OR MAYBE BECAUSE THEY ARE STATING BEST /WORST CONDITIONS . BEST APPROACH IS TO KNOW ONE COMPLEXITY AND THE CORRESPONDING PROCEDURE USED .

## LINEAR REGRESSION

TERMINOLOGY :

DATASET = {X,Y} DIMENSIONS OF X= n*m , DIMENSIONS OF Y = n*1

train -time complexity = O((m^2)*(n+m))
run -time complexity= O(m)
space complexity(during run time)= O(m)

conclusion: small run time and space complexity! good for low latency problems .

source:https://levelup.gitconnected.com/train-test-complexity-and-space-complexity-of-linear-regression-26b604dcdfa3#:~:text=So%2C%20runtime%20complexity%20of%20Linear,features%2Fdimension%20of%20the%20data.

## LOGISTIC REGRESSION

TERMINOLOGY:

DATASET = {X,Y} DIMENSIONS OF X= n*m , DIMENSIONS OF Y = n*1

train -time complexity = O(n*m)
run -time complexity= O(m)
space complexity(during run time)= O(m)

conclusion: small run time and space complexity! good for low latency problems .

## NAIVE BAYES

DATASET = {X,Y} DIMENSIONS OF X= n*d , DIMENSIONS OF Y = n*1 CLASSES =C

train -time complexity = O(n*d*c)
run -time complexity= O(d*c)
space complexity(during run time)= O(d*c)

conclusion: good for high dimensional data set .used in spam detection and other simple non semantic NLP problems . considered as base model for comparision with  complex models.



## SVM

DATASET = {X,Y} DIMENSIONS OF X= n*d , DIMENSIONS OF Y = n*1 S=NUMBER OF SUPPORT VECTORS

train -time complexity = O(n^2)
run -time complexity= O(S*d)
space complexity(during run time)= O(S)

conclusion: high train time complexity .but low latency and space complexity.



## DECISION TREES +RF +GBDT

n=NUMBER OF SAMPLE DATA POINTS ,

d = THE DIMENSIONALITY

k =depth of decision tree

p = number of nodes

train -time complexity = O(n*log(n)*d)
run -time complexity= O(k)
space complexity(during run time)= O(p)

NOW IF WE ARE TAKING OF RANDOM FOREST (M BASE LEARNERS)

train -time complexity = O(n*log(n)*d*m)
run -time complexity= O(k*m)
space complexity(during run time)= O(p*m)

TALKING OF GBDT (m SUCCESIVE MODELS , b is the shrinkage factor(m such coefficients , it will ultimately be a constant addition in complexity and can be ignored asymptotically but its better to point out during interviews as it is what separates it from RF )
train -time complexity = O(n*log(n)*d*m)
run -time complexity= O(k*m)
space complexity(during run time)= O(p*m +b*m)



## K-NEAREST NEIGHBOURS(brute force)

K= number of nearest neighbours

d= dimensions

(no training phase)

run-time complexity = O(n*d) ( the "+kd" term is ignored asymptotically

space complexity(during run time)= O(n*d)

https://stats.stackexchange.com/questions/219655/k-nn-computational-complexity

## kd-tree

K= number of nearest neighbours

d= dimensions

An algorithm that builds a balanced k-d tree to sort points has a worst-case complexity of O(kn log n).

run-time complexity(best-case)=O(2^d *(k)*log n)
(log n to find cells “near” the query point
2d to search around cells in that neighborhood )

run-time complexity(worst-case)=O(k*n)
space-complexity= O(K·N) 

AS you can see not suitable for very large dimensions .

## SVD (SINGULAR VALUE DECOMPOSITION)

NOW THIS IS A TRICKY ONE, VARIOUS METHODS ARE AVAILABLE AND GENERALLY ON THE INTERNET YOU WILL FIND MULTIPLE ANSWERS TO THIS QUESTION :

BELOW I STATE WHAT WIKIPEDIA STATES , (IF YOU STATE ANY OTHER ANSWER IN INTERVIEW JUST BE SURE REMEMBER TO UNDERSTAND THE CORRESPONDING ALGO USED)

FOR One matrix, TIME COMPLXITY ( )

https://en.wikipedia.org/wiki/Computational_complexity_of_mathematical_operations

## PCA ,t-SNE

N=NUMBER OF POINTS , D=DIMEMTIONALITY

TRAIN-TIME COMPLEXITY (PCA) =(ND×min(N, D) +D³)

EXPLANATION:

{The complexity of Covariance matrix computation is O(D^2(n)). Its eigenvalue decomposition is O(D^3).}

NOTICE THAT AFTER THIS WE JUST SELECT FIRST d (TARGET DIIMENSION ) VECTORS

The main reason for t-SNE being slower than PCA is that no analytical solution exists for the criterion that is being optimised. Instead, a solution must be approximated through gradient descend iterations.

t-SNE has a quadratic time and space complexity in the number of data points.

T-SNE requires O( 3N^2) of memory.

https://www.geeksforgeeks.org/ml-t-distributed-stochastic-neighbor-embedding-t-sne-algorithm/