THIS ARTICLE SUMMARISES THE TIME COMPLEXITIES OF THE MOST COMMONLY USED AND ENQUIRED MACHINE LEARNING MODELS.
TIME COMPLEXITIES ARE IMPORTANT TO REMEMBER IN RELATION TO THE FOLLOWING QUESTIONS ASKED:
- WHEN WOULD YOU CHOOSE ONE MODEL OVER THE OTHER.
- IS A PARTICULAR ALGORITHM GOOD FOR INTERNET APPLICATIONS ?
- WHETHER AN ALGORITHM SCALES GOOD ON LARGE DATSETS/DATABASES.
- IS RE-TRAINING A PARTICULAR MODEL COSTLY IN TERMS OF RESOURCES.
- IF INPUT DISTRIBUTION IS CHANGING CONSTANTLY WHICH MODEL IS PREFERABLE.
LETS START OUR DISCUSSION ON TIME COMPLEXITIES . I WILL ASSUME YOU KNOW THE BASIC INTUITION OF ALL THE ALGORITHMS AND HENCE THIS WILL JUST BE A CRISP TO THE POINT SUMMARY !
REMEMBER , MANY TIMES YOU MIGHT FIND DIFFERENT ANSWERS ON THE INTERNET FOR SAME ALGO BECAUSE OF THE UNDERLYING DATA STRUCTURE OR CODE OPTIMISATION USED. OR MAYBE BECAUSE THEY ARE STATING BEST /WORST CONDITIONS . BEST APPROACH IS TO KNOW ONE COMPLEXITY AND THE CORRESPONDING PROCEDURE USED .
LINEAR REGRESSION
TERMINOLOGY :
DATASET = {X,Y} DIMENSIONS OF X= n*m , DIMENSIONS OF Y = n*1
train -time complexity = O((m^2)*(n+m)) run -time complexity= O(m) space complexity(during run time)= O(m) conclusion: small run time and space complexity! good for low latency problems . source:https://levelup.gitconnected.com/train-test-complexity-and-space-complexity-of-linear-regression-26b604dcdfa3#:~:text=So%2C%20runtime%20complexity%20of%20Linear,features%2Fdimension%20of%20the%20data.
LOGISTIC REGRESSION
TERMINOLOGY:
DATASET = {X,Y} DIMENSIONS OF X= n*m , DIMENSIONS OF Y = n*1
train -time complexity = O(n*m) run -time complexity= O(m) space complexity(during run time)= O(m) conclusion: small run time and space complexity! good for low latency problems .
NAIVE BAYES
DATASET = {X,Y} DIMENSIONS OF X= n*d , DIMENSIONS OF Y = n*1 CLASSES =C
train -time complexity = O(n*d*c) run -time complexity= O(d*c) space complexity(during run time)= O(d*c) conclusion: good for high dimensional data set .used in spam detection and other simple non semantic NLP problems . considered as base model for comparision with complex models.
SVM
DATASET = {X,Y} DIMENSIONS OF X= n*d , DIMENSIONS OF Y = n*1 S=NUMBER OF SUPPORT VECTORS
train -time complexity = O(n^2) run -time complexity= O(S*d) space complexity(during run time)= O(S) conclusion: high train time complexity .but low latency and space complexity.
DECISION TREES +RF +GBDT
n=NUMBER OF SAMPLE DATA POINTS ,
d = THE DIMENSIONALITY
k =depth of decision tree
p = number of nodes
train -time complexity = O(n*log(n)*d) run -time complexity= O(k) space complexity(during run time)= O(p) NOW IF WE ARE TAKING OF RANDOM FOREST (M BASE LEARNERS) train -time complexity = O(n*log(n)*d*m) run -time complexity= O(k*m) space complexity(during run time)= O(p*m) TALKING OF GBDT (m SUCCESIVE MODELS , b is the shrinkage factor(m such coefficients , it will ultimately be a constant addition in complexity and can be ignored asymptotically but its better to point out during interviews as it is what separates it from RF )
train -time complexity = O(n*log(n)*d*m) run -time complexity= O(k*m) space complexity(during run time)= O(p*m +b*m)
K-NEAREST NEIGHBOURS(brute force)
K= number of nearest neighbours
d= dimensions
(no training phase)
run-time complexity = O(n*d) ( the "+kd" term is ignored asymptotically space complexity(during run time)= O(n*d) https://stats.stackexchange.com/questions/219655/k-nn-computational-complexity
kd-tree
K= number of nearest neighbours
d= dimensions
An algorithm that builds a balanced k-d tree to sort points has a worst-case complexity of O(kn log n).
run-time complexity(best-case)=O(2^d *(k)*log n) (log n to find cells “near” the query point 2d to search around cells in that neighborhood ) run-time complexity(worst-case)=O(k*n) space-complexity= O(K·N)
AS you can see not suitable for very large dimensions .
CLUSTERING ALGORITHMS
NAME OF ALGO | TRAIN COMPL. | SPACE-COMPLEXITY | REMARK |
K MEANS | O(n*k*d*i) n=points d=dimensions k=number of clusters i=iterations | O(nd+kd) “kd” for centroids | ASYMPTOTICALLY FOR SMALL d and k =O(nd) |
hierarchial clustering | O(n^3) | O(n^2) | not suitable for low latency and low space problems |
DBSCAN | O(n^2) but can be made O(nlogn) in lower dimensions using efficient data structures | O(n) {d<<n} | better than heirarchial {in terms of complexity} |
SVD (SINGULAR VALUE DECOMPOSITION)
NOW THIS IS A TRICKY ONE, VARIOUS METHODS ARE AVAILABLE AND GENERALLY ON THE INTERNET YOU WILL FIND MULTIPLE ANSWERS TO THIS QUESTION :
BELOW I STATE WHAT WIKIPEDIA STATES , (IF YOU STATE ANY OTHER ANSWER IN INTERVIEW JUST BE SURE REMEMBER TO UNDERSTAND THE CORRESPONDING ALGO USED)
FOR Onematrix, TIME COMPLXITY
(
) https://en.wikipedia.org/wiki/Computational_complexity_of_mathematical_operations
PCA ,t-SNE
N=NUMBER OF POINTS , D=DIMEMTIONALITY
TRAIN-TIME COMPLEXITY (PCA) =(ND×min(N, D) +D³)
EXPLANATION:
{The complexity of Covariance matrix computation is O(D^2(n)). Its eigenvalue decomposition is O(D^3).}
NOTICE THAT AFTER THIS WE JUST SELECT FIRST d (TARGET DIIMENSION ) VECTORS
The main reason for t-SNE being slower than PCA is that no analytical solution exists for the criterion that is being optimised. Instead, a solution must be approximated through gradient descend iterations.
t-SNE has a quadratic time and space complexity in the number of data points.
T-SNE requires O( 3N^2) of memory.
https://www.geeksforgeeks.org/ml-t-distributed-stochastic-neighbor-embedding-t-sne-algorithm/
Add a Comment
You must be logged in to post a comment