10 Modules • 10 Months • From 0 to 100

Skills & Tools

Use Python to mine datasets and predict patterns.

Production Standard

Build statistical models, regression and classification, that generate usable information from raw data.

The Big Picture

Master the basics of machine learning and harness the power of data to forecast what’s next.


Learn the tools and techniques you need to make better decisions through data, and land a job in one of the most sought after fields in tech.

One by One

Take each module separately or at once. Get learn what you need to learn.

Learn by Doing

Get your hands dirty and start learning in action. Tackle each module’s capstone project like a Data Scientist!

Mark it as DONE

Participate each module, complete the capstone project and get a certificate. Show off in your resume and mark it as “I did it”!


Module 1: MATH, & PROGRAMMING FUNDAMENTALS (Click for details)
Python & NumPy: Demonstrate introductory programming concepts using Python and NumPy as a tool to navigate data sources and collections

UNIX: Utilize UNIX commands to navigate file systems and modify files

git: Maintain a git repository in order to keep track of changes and iterations as your project evolves.

Descriptive Statistics: Define and apply descriptive statistic fundamentals

Intro to Plotting and Visualization: Practice using, iPython notebook and Tableau to plot and visualize data

Module 2: EDA, PANDAS & SCIPY (Click for details)
Experiment Design: Plan experimental study design with a well thought out problem statement and data framework

Pandas & Pivot Tables: Use Pandas to read, clean, parse, and plot data using functions such as boolean, indexing, math series, joins, and others

SciPy: Review statistical testing concepts (p-values, confidence intervals, lambda functions, correlation/causation) with SciPy

Linear regression, stats models, and scikit learn: Use scikit learn and statsmodels to run linear regression models and evaluate model fit

Bias-Variance Tradeoff: Understand bias-variance trade-off to evaluate machine learning models

Gradient Descent: Look behind the hood at the math and theory of how gradient descent helps to optimize the loss function for models

Regularization & Optimization: Learn to apply regularization and optimization when evaluating model fit

Web Scraping: Learn to scrape website data using popular scraping tools

Logistic regression: Build, evaluate, and refine a logistic regression model for a given business case study

NLP: Get introduced to natural language processing through sentiment analysis of scraped website data.

Module 5: SQL, DATABASES, & CLASSIFICATION (Click for details)
SQL & Remote Databases: Get introduced to different types of databases, review SQL commands, and practice connecting to and pulling data from a remote AWS database

Feature Selection: Use feature selection to deepen knowledge of model evaluation

kNN & SVMs: Begin to look at classification models through an application of the kNN algorithm and learn how SVMs can simplify the process of analyzing data for supervised learning algorithms

Module 6: APIS, TREES & ENSEMBLE METHODS (Click for details)
JSON & APIs: Learn to pull JSON data from APIs as another potential data source

Ensemble Models: Build and evaluate ensemble models, using decision trees, random forests, bagging, and boosting

Module 7: PCA, CLUSTERING, KMEANS & AWS (Click for details)
Clustering: Define clustering and it’s advantages and disadvantages from classification models

K-Means: Practice building and evaluating a k-means algorithm

PCA: Convert a set of observations or variables into principal components in order to improve predictive analysis.

PostgreSQL: Learn to build and maintain your own postgreSQL database

Module 8: BAYESIAN INFERENCE & LDA (Click for details)
Bayesian Methods: Build a linear regression model with bayesian methods

LDA: Refine data using latent dirichlet allocation (LDA)

Naive Bayes: Learn how Naive Bayes can simplify the process of analyzing data for supervised learning algorithms

Module 9: WORKING WITH TIME SERIES (Click for details)
Time Series & Autocorrelation: Analyze and visualize time series data using Pandas, and Tableau

ARIMA Model: Use the ARIMA model to make predictions with time series data

Module 10: INTRO TO BIG DATA AND SPARK (Click for details)
Hadoop & MapReduce: Get introduced to the history and use of Hadoop as well as the advantages and disadvantages of using parallel or distributed systems to store, access, and analyze big data

Hive & Spark: Gain an introductory understanding of how Hive interacts with Hadoop and learn about Spark’s advantages through big data case studies