PCA, The math behind constrained optimisations , drawbacks and other important discussions
One of the most important mathematical concept required to understand the working of PCA is the concept of Eigenvectors and eigenvalues . If you know the algorithm , you must be aware that in order to reduce the dimensions of a certain dataset with dimension d to a smaller dimension k , PCA works by calculating the eigenvectors/values of the covariance matrix of the data set and select the top k values in descending order.
But why ? Why does the concept of eigenvalues and eigenvectors , gets involved during reduction of dimensions.
What are Eigenvectors/eigenvalues?
For any square matrix A , its Eigenvectors and eigenvalues have the following relation:
Geometrically speaking , one can see that ,for given eigenvector-value pair , the matrix multiplication is equal to a mere scaling of the eigenvector by a factor lambda . Hence its direction is unchanged .
Eigenvalues and eigenvectors are only for square matrices. Eigenvectors are by definition nonzero. Eigenvalues may be equal to zero.
Now lets look at another concept used in PCA
Constrained Optimisation
Sometimes , when one needs to find maxima/minima of any expression which in turn is following a certain constraint , you cannot equate its first derivative and equate it to zero. You must take care of the constraint given .
Such constrained optimisation problems are solved using the concept of LANGRANGE MULTIPLIERS.

Note: Langrange multipliers are always positive .
The optimisation problem in PCA
Now lets see how the 2 concepts , Eigenvector/values and Langrangian multipliers are required to solve the PCA optimisation problem . Recall that how PCA is all about finding axes with maximum variance . Below is the optimisation equation of PCA, where S is the covariance matrix of the dataset , hence a d*d dimension square matrix (where d is the original dimension of the dataset) and u is the direction along which the variance has to be maximized.

you can see that the above equation is just a constrained optimisation problem , using langrange multiplier lambda in langrangian L(u,lambda) and equating dL(u,lambda)/du and dL(u,lambda)/d(lambda) to zero you can see that solving the above equation will give the solution of the form
S*u=lambda*u (eigenvector-eigenvalue relation)
so now you see how the solutions u are just eigenvectors of S .( And lambda the corresponding eigenvalues) Now these eigenvalues with decreasing order will give decreasing variance , hence for reducing d into k smaller dimensions we select the top k eigenvalues in descending order .
This post deals with the mathematics behind the optimisation of PCA algorithm. Other important interview questions are related to its limitations , comparision to tsne , and more!
More posts :
Till then enjoy this