Indian Statistical Institute - PGDSMA Course Outline (2023-24)
The course is intended to provide students with a comprehensive and rigorous training in basic theory and applications of Statistical Methods and Analytics, in addition to some exposure to Mathematics and Computational Techniques. The students are exposed to data handling using R and Python packages. The programme is so designed that on successful completion, the students will be able to take up jobs in industries and government departments where applications of statistics and data analytics are required.
The one-year programme consists of a total of 10 courses. There are five courses in the first semester and four courses plus a one-course project on data analytics supervised by a faculty in the second semester. This helps the students have some hands-on experience will real world data-driven projects. The course is run at the ISI NEC campus at Tezpur and ISI Chennai campus, where the selection is through a national level exam followed by an interview.
Semester - I
Probability Theory
Taught By Prof. Balakrishnan Ramakrishnan (Ex Professor & Head, ISI-NEC)
- Elementary concepts of Probability
- Conditional Probability, Independence, Bayes Theorem
- Random variable, probability distribution and properties; probability mass/density function, cumulative distribution function, expectation, variance, mean square error, moments.
- Bernoulli and Binomial, Poisson, Geometric and Negative Binomial, Hy- pergeometric, Uniform, Normal Exponential, Gamma, Beta distribu- tions.
- Chebyshev’s inequality, weak law of large numbers (statement and con- cept), central limit theorem (statement and concept).
- Distribution of a function of a random variable.
- Bivariate distribution for discrete and continuous random variables; joint, marginal and conditional distributions, moments, covariance, correlation coefficient. Bivariate Normal Distribution
- Independent random variables and their sums. Transformation of two random variables.
- Sampling distributions under the assumption of normality: chi-square, t, F.
- Reference Book: A First Course in Probability - Ross, S.
Statistical Methods I (Online + Offline Classes)
Taught by Dr. Souvik Roy (Ex Associate Professor, ISI-Kolkata, Now Prof.& Head, ISI-NEC) & Two Teaching Assistants
- Introduction to data, statistical problems and related data analysis. Con- cept of population, sample and statistical inference through examples.
- Summarization of univariate data; graphical methods, measures of loca- tion, spread, skewness and kurtosis; outliers and robust measures, sample moments. Empirical cumulative distribution function.
- Statistical computations: data summary and graphical display of data, basic statistics. Plotting empirical cumulative distribution function.
- Data simulations from discrete and continuous probability distributions: Bernoulli and Binomial, Poisson, Geometric and Negative Binomial, Hypergeometric, Uniform, Normal, Exponential, Chi-Square, Gamma, Beta.
- Analysis of discrete and continuous data: fitting some standard discrete and continuous probability distributions. Goodness of fit: Pearson’s Chi- square test, Kolmogorov-Smirnov test. Graphical methods of verifying the fit: Q-Q and P-P Plots. Shapiro-Wilks test for Normality.
- Introduction to resampling (bootstrap) and cross-validation techniques.
- Introducing the analysis of variance; one way analysis, F test, Kruskal- Wallis nonparametric test; two way analysis. Designs of experiment: principles of designing an experiment. Introduction to CRD, RBD, Bal- anced and Unbalanced Block Designs, cross over designs with applica- tions in industrial and clinical trials.
- Reference Book: Introduction to Probability and Statistics for Engineers and Scien- tists - Ross, S.
Statistical Inference (Online + Offline Classes)
Taught by Dr. Sudheesh Kumar Kattumannil (Associate Professor, ISI-Chennai)
- Brief introduction to random variable and probability distributions. Ran- dom sample and the concept of statistical Inference with examples.
- Point estimation: estimator and estimate. Desirable properties of an esti- mator: unbiasedness, smaller variance and mean squared error. Method of moments and maximum likelihood estimation. Asymptotic behaviour of MLE (statement on consistency and asymptotic normality).
- Interval estimation- Confidence interval and its basic properties, Con- struction of confidence interval for parameter of Uniform distribution, exponential distribution, mean of the normal distribution with known and unknown variance.
- Hypotheses and the concept of hypotheses testing. Null and alternative, simple and composite hypothesis, significance level, size, p-value and power. Introduction to likelihood ratio tests with examples.
- One sample problem: Test for randomness - run test. Test for mean under the assumption of normality with known and unknown variance. Nonparametric tests for median: signed test, Wilcoxon’s signed rank test. One sample test for proportion.
- Comparison of two samples- Two independent samples: graphical pro- cedures, K-S test. Comparing mean under the assumption of normality (two sample t test). Nonparametric test for medians - Mann-Whitney- Wilcoxon. Two sample test for proportion. Two dependent samples: paired t test under the assumption of normality, Nonparametric tests for two dependent samples.
- Reference Book: Introduction to Probability and Statistics for Engineers and Scien- tists - Ross, S.
Vectors & Matrices (Module-Based) - Before Mid-Term
Taught by Dr. Mridu Prabal Goswami (Assistant Professor, ISI-NEC)
- Introduction to Vectors and matrices.
- Vectors: Definition and examples, vector spaces and subspaces, basis of a vector space, linear dependence/independence
- Matrices: Definition and examples, matrix as a linear transformation, elementary matrices and elementary matrix operations, basic matrix op- erations including those of partitioned matrices, rank, nullity, trace, de- terminant and inverse of a matrix, idempotent matrix and its properties. Solutions of system of equations
- Spectral theory: eigenvalues and eigenvectors of matrices, decomposition of matrices, quadratic forms and definiteness of a matrix (with applica- tions in Statistics)
- Reference Book: Linear Algebra by Kenneth Hoffman
Regression Methods (Module-Based) - After Mid-Term
Taught by Dr. Kushal Banik Chowdhury (Assistant Professor, ISI-NEC)
- Introduction to Classical Linear Regression Model
- OLS method of estimation; fitted values, prediction of the response vari- able, tests of hypotheses.
- Residuals. Validation of assumptions using graphical techniques. Re- gression Diagnostics
- Use of dummy variables in regression
- Variable selection, multicollinearity. Model selection using AIC and BIC criteria.
- Concepts of robust and nonparametric regression
- Reference Book: Basic Econometrics by Damodar N Gujarati
Programming in R (Module-Based) - After Mid-Term
Taught by Dr. Sanjit Maitra (Assistant Professor, ISI-NEC)
- Introduction to packages- R: overview of packages, data han- dling, input-output operations. Basic programming: data types, arrays, loops etc.; functions and graphics.
- Improvement of the initial solution using methods of bisection, sort and search, Regula Falsi and Newton-Raphson.
- Fixed point iterative schemes, significant digits, round-off errors, finite computational processes and computational errors. Order of convergence and degree of precision.
- Matrix computations - basic operations, finding determinant, inverse, eigen roots and eigen vectors of a matrix, matrix decomposition, solving system of equations.
- Computational aspects of constrained optimization
- Unconstrained optimization: Newton, Quasi-Newton method.
- Experimentation designs: Obtaining global optimal solutions from local optimum solutions using iterative experimentation like Response Surface Methodology.
- Reference Book: An Introduction to R by CRAN R Studio
Programming in Python (Module-Based) - Before Mid-Term
Taught by Dr. Koyel Mandal (Ex Reseach Assistant, ISI-NEC, Now SERB NPDF, ISI-K)
- Introduction to packages- Python: overview of packages, data han- dling, input-output operations. Basic programming: data types, arrays, loops etc.; functions and graphics.
- Fixed point iterative schemes, significant digits, round-off errors, finite computational processes and computational errors. Order of convergence and degree of precision.
- Matrix computations - basic operations, finding determinant, inverse, eigen roots and eigen vectors of a matrix, matrix decomposition, solving system of equations.
- Reference Book: Dive into Python - Pilgrim, M.
Semester - II
Statistical Methods II (Online + Offline Classes)
Taught by Dr. Souvik Roy (Ex Associate Professor, ISI-Kolkata, Now Pro.& Head, ISI-NEC) & Two Teaching Assistants
- Multivariate Data Exploration/visualization, multivariate data handling, random vector, mean and variance-covariance matrix, introduction to multinomial and multivariate normal distributions.
- Applied Multivariate techniques: Principal components analysis and Factor Analysis.
- Introduction to discrete time Markov chains, finite and countable state space. Introduction to Markov chain Monte Carlo (MCMC) methods and applications of MCMC method in statistics. Introduction to HMM algorithm.
- Handling missing data; various methods of imputations including hot deck algorithm (MICE in R), EM algorithm
- Advanced regression techniques: Ridge, Principal component regression, LASSO and Spline smoothing
- Multiple Hypotheses testing.
- Introduction to MANOVA and Hotelling’s t2 .
- Reference Book:
Statistical Machine Learning
Taught by Dr. Sanjit Maitra (Assistant Professor, ISI-NEC)
- Introduction to bootstrap based machine learning. Assessment and model selection: confusion matrix and various criteria of evaluation, training and testing error rates.
- Pattern Recognition and Classification techniques
- Unsupervised learning: clustering procedures: hierarchical and non-hierarchical, k-means; association rules, ROCs.
- Supervised learning: Linear and quadratic discriminant analysis; Bayesian classifier, nearest neighbour classifier, Entropy based classifier.
- Tree based classification methods: predictive modeling using decision trees (CART), random forests.
- Support vector machine. Introduction to boosting and adaptive boosting algorithm.
- Introduction to Natural Language Processing (NLP), information re- trieval and text analysis: stop words, TF-IDF measure, vector space models.
- Introduction to neural networks, Convolutional NN, Deep NN.
- Reference Book: The Elements of Statistical Learning With Applications in R - James, G., Witten, D., Hastie, T. and Tibshirani, R.
Statistical Modeling (Online + Offline Classes)
Taught by Dr. Partha Sarathi Mukherjee (Associate Professor, ISI-Kolkata) & One Teaching Assistant
- Logistic regression; odds ratio, concordance-discordance measures, Lo- gistic Regression as a classsifier. Probit Regression. Introduction to Multilogit models.
- Modeling count data: Poisson Regression, Poisson models for zero in- flated data.
- Introduction and visualizing categorical data. Measures of association. Loglinear Models, Models for nominal and ordinal response.
- Survival Data Modeling: Time-to-event data and survival probabilities, notion of censoring, survival curve and other ways of representing sur- vival distribution, Kaplan-Meier and Nelson-Aalen estimates, log-rank test, Cox’s proportional hazard model. Parametric survival models for Exponential, Gamma, Wiebull distributions.
- Bayesian Inference and Modeling: Prior and posterior distributions, Bayesian models, Bayesian regression, Hierarchical Bayes models
- Introduction to mixture models
- Reference Book: An Introduction to Categorical Data Analysis (2nd edition & 2rd edition)- Agresti, A.
Time Series (Module-Based) - Before Mid-Term
Taught by Dr. Kushal Banik Chowdhury (Assistant Professor, ISI-NEC)
- Exploratory analysis and graphical display; trend, seasonal and cyclical components. Decomposition of time series into components, Smooth- ing.
- Stationary Time Series: Brief Introduction to AR, MA and ARMA mod- els; Box-Jenkins correlogram analysis, ACF and PACF, introduction to periodogram, choice of AR and MA orders.
- Non-Stationary Time Series: introduction to ARIMA model; determin- istic and stochastic trends; introduction to ARCH and GARCH mod- els.
- Forecasting: basic tools, using exponential smoothing and Box-Jenkins method. Residual analysis.
- Reference Book: Introduction to Time Series Analysis and Forecasting - Montgomery, D.C., Jennings, C.L., Kulachi, M.
Statistical Finance (Module-Based) - After Mid-Term
Taught by Dr. Mridu Prabal Goswami (Assistant Professor, ISI-NEC)
- Introduction to stock prices, returns and log-returns. Distribution of returns, Assessing Normality using skewness, kurtosis and q-q plots.
- Market return and risk free rate. Capital Asset Pricing Model (CAPM). Estimating beta and testing for CAPM.
- Options. Arbitrage and risk-neutral measure. European and American options. Option pricing using Binomial model: 1 and 2 step. Black- Scholes model (statement only), interpretation of drift and volatility.
- Value at risk and expected shortfall. Quantile estimation. Estimation of tail-index.
- Markowitz Portfolio Theory. Resampling for assessing estimation of Ef- ficient Portfolio.
- Reference Book: Statistics and Finance - Ruppert, David
Project
Our Project Guide was Dr. Mridu Prabal Goswami (Assistant Professor, ISI-NEC)