The math behind constrained optimisations , drawbacks and other important discussions
One of the most important mathematical concept required to understand the working of PCA is the concept of Eigenvectors and eigenvalues . If you know the algorithm , you must be aware that in order to reduce the dimensions of a certain dataset with dimension d to a smaller dimension k , PCA works by calculating the eigenvectors/values of the covariance matrix of the data set and select the top k values in descending order.
But why ? Why does the concept of eigenvalues and eigenvectors , gets involved during reduction of dimensions.
What are Eigenvectors/eigenvalues?
For any square matrix A , its Eigenvectors and eigenvalues have the following relation:
Geometrically speaking , one can see that ,for given eigenvector-value pair , the matrix multiplication is equal to a mere scaling of the eigenvector by a factor lambda . Hence its direction is unchanged .
Eigenvalues and eigenvectors are only for square matrices. Eigenvectors are by definition nonzero. Eigenvalues may be equal to zero.
Now lets look at another concept used in PCA
Sometimes , when one needs to find maxima/minima of any expression which in turn is following a certain constraint , you cannot equate its first derivative and equate it to zero. You must take care of the constraint given .
Such constrained optimisation problems are solved using the concept of LANGRANGE MULTIPLIERS.
Note: Langrange multipliers are always positive .
The optimisation problem in PCA
Now lets see how the 2 concepts , Eigenvector/values and Langrangian multipliers are required to solve the PCA optimisation problem . Recall that how PCA is all about finding axes with maximum variance . Below is the optimisation equation of PCA, where S is the covariance matrix of the dataset , hence a d*d dimension square matrix (where d is the original dimension of the dataset) and u is the direction along which the variance has to be maximized.
you can see that the above equation is just a constrained optimisation problem , using langrange multiplier lambda in langrangian L(u,lambda) and equating dL(u,lambda)/du and dL(u,lambda)/d(lambda) to zero you can see that solving the above equation will give the solution of the form
S*u=lambda*u (eigenvector-eigenvalue relation)
so now you see how the solutions u are just eigenvectors of S .( And lambda the corresponding eigenvalues) Now these eigenvalues with decreasing order will give decreasing variance , hence for reducing d into k smaller dimensions we select the top k eigenvalues in descending order .
This post deals with the mathematics behind the optimisation of PCA algorithm. Other important interview questions are related to its limitations , comparision to tsne , and more!
Till then enjoy this