If you search for an online course on Data Science, ML and AI , the course content/syllabus that is covered varies from a one day workshop to 4 year long B.Tech/B.E specialization in AI. In this article I have tried to create a 3-4 week long curriculum for beginners in data science/ML. The focus will be on the theoretical aspects that are helpful for Interviews, publishing and understanding the mechanism behind algorithms.
The curriculum is spread over 3 weeks (week 3 can be extended to 4 as per the comfort level of the student) , the weekly content is balanced when it comes to breadth and depth of topics. Although it depends a lot on the reader as to what extent one is planning to spend time on a single topic.
The field is vast and one might find few topics missing, but then a 3 week duration would not be a justified. The aim of this curriculum stands to make you ready for at least 2-3 capstone projects and introduce you to the field in the most detailed manner possible.
Week 1
- Need of automation, introduction to machine intelligence.
- What is a dataset? Balanced and imbalanced dataset, static vs temporal data
- Types of variables/features in a data set.
- Distributions, need of identifying distributions.
- Types of distributions
- Training, cv, testing data, difference in train and test distribution
- Gaussian distribution, standard normal variate, Chebyshev’s law
- Real life examples of various distributions (log-normal, power-law etc.)
- Mean, median, quantiles, variance, Kurtosis, skewness (moments around mean)
- PDF, CDF
- Central limit theorem
- Probability and hypothesis testing
- Comparing distributions, KS testing
- QQ plots
- Transforming distributions
- Covariance, correlations, Pearson correlation coefficient, spearman rank CC, box-cox transforms
- Correlation vs causation
- Matrix factorization, cosine similarity
Topics students need to cover: confidence interval code part: data preprocessing, eda on above topics
- Supervised, unsupervised and reinforcement learning definitions
- Feature scaling, handling missing values
- Outliers, RANSAC
- Preprocessing categorical values, label encoding, one hot encoding
- Regression vs classification
- Bias variance trade-off
- MSE, log-loss, performance metrics (accuracy, AUC-ROC, TPR, FPR), need for cost-function, differentiability requirements
- Basics of 3d geometry, hyper-planes, hypercubes, generalization to n dimensions
- What is a model? Interpretability of a model? Business requirements
- Domain Knowledge
- Intro to Logistic regression, sigmoid function and probability interpretation need for regularization formulation of regularization in logistic regression types of regularization, feature sparsity in L1, Hyper-parameter tuning, (manual, grid-search, random-search)
Week 2
- Linear regression
- Assumptions of linear regression
- MAPE, R^2
- Distance metrics, KNN, problems with KNN, kd trees, LSH (locality sensitive hashing)
- Clustering algorithms, performance metrics for un-labelled data
- K means, kmeans++
- DBSCAN
- Reachability distance, LOF (Local outlier factor)
- Revisiting conditional probability
- Bayes theorem, basics of NLP (STEMMING, STOP WORDS, BOW, TF-IDF)
- Naive bayes, assumptions, LOG probabilities
- Laplace smoothing (outlier handling in naive bayes)
- Naive bayes for continuous variables
- Dimensionality reduction
- Curse of dimensionality
- PCA
- Eigen vectors, eigen values, linear transformations
- Langrange Multipliers
- Solving PCA objective function
- SNE, T-SNE, KL-Divergence
- TSNE limitations
- Intro to Decision Trees
- Entropy, Gini-impurity, Pruning of trees
- Splitting nodes for continuous variables
Deep Learning
Week 3
- Neuron structure, Neural networks
- Perceptron
- MLP, weight matrices, hidden layers
- Gradients, learning rate, saddle points, local and global minimas,
- Forward propagation and backpropagation
- GD vs SGD
- Activation functions, vanishing gradient problems
- Parameters vs Hyper-parameters of a network
- Weight initialization techniques
- Symmetric initialization
- Random initialization
- Math behind Xavier/Glorot initialization
- He weights initialization techniques
- Contour plots, Batch-Normalization
- Optimizers
- Momentum, NAG, Ada-delta, Ada-grad, rmsprop, Adam
- Soft-max in multi-class classification
- CNN feature extraction, different layers used in cnns
- Channels, padding, strides
- Filters, kernels max, min, average pooling
- Transfer learning
- Residual networks
- Image segmentation (basics)
- Object Detection (basics), brief discussion on GANS
- Rnns, sequential information
- Vanishing gradients
- Sharing weights (comparison with CNN)
- Lstms, grus
- Gates in lstms
- Encoder-decoder models, context vector
- Bidirectional networks
- BLEU score
- Disadvantages of one hot, bows model, Space efficiency
- Semantic relation of words
- Representation of words as vectors-Word embeddings
- Word2VEC model
- C-BOW
- Skip-Gram
- Embedding matrix
- Glove vectors
- Attention mechanism (NLP)
- Local vs global attention
- Transformer’s architecture, self-Attention
- Query, key and value matrices
- Multi-head and masked attention Intro to BERT (Encoder only stacks) GPT-2, GPT-3 (Decoder-only stacks)