Skills & Tools
Production Standard
The Big Picture
BECOME A DATA SCIENTIST
Learn the tools and techniques you need to make better decisions through data, and land a job in one of the most sought after fields in tech.
One by One
Take each module separately or at once. Get learn what you need to learn.
Learn by Doing
Get your hands dirty and start learning in action. Tackle each module’s capstone project like a Data Scientist!
Mark it as DONE
Participate each module, complete the capstone project and get a certificate. Show off in your resume and mark it as “I did it”!
Syllabus
Module 1: MATH, & PROGRAMMING FUNDAMENTALS (Click for details)
UNIX: Utilize UNIX commands to navigate file systems and modify files
git: Maintain a git repository in order to keep track of changes and iterations as your project evolves.
Descriptive Statistics: Define and apply descriptive statistic fundamentals
Intro to Plotting and Visualization: Practice using plot.ly, iPython notebook and Tableau to plot and visualize data
Module 2: EDA, PANDAS & SCIPY (Click for details)
Pandas & Pivot Tables: Use Pandas to read, clean, parse, and plot data using functions such as boolean, indexing, math series, joins, and others
SciPy: Review statistical testing concepts (p-values, confidence intervals, lambda functions, correlation/causation) with SciPy
Module 3: LINEAR REGRESSIONS, SCIKIT-LEARN, GRADIENT DESCENT, & MODEL FIT (Click for details)
Bias-Variance Tradeoff: Understand bias-variance trade-off to evaluate machine learning models
Gradient Descent: Look behind the hood at the math and theory of how gradient descent helps to optimize the loss function for models
Regularization & Optimization: Learn to apply regularization and optimization when evaluating model fit
Module 4: LOGISTIC REGRESSION, NLP, AND WEB SCRAPING (Click for details)
Logistic regression: Build, evaluate, and refine a logistic regression model for a given business case study
NLP: Get introduced to natural language processing through sentiment analysis of scraped website data.
Module 5: SQL, DATABASES, & CLASSIFICATION (Click for details)
Feature Selection: Use feature selection to deepen knowledge of model evaluation
kNN & SVMs: Begin to look at classification models through an application of the kNN algorithm and learn how SVMs can simplify the process of analyzing data for supervised learning algorithms
Module 6: APIS, TREES & ENSEMBLE METHODS (Click for details)
Ensemble Models: Build and evaluate ensemble models, using decision trees, random forests, bagging, and boosting
Module 7: PCA, CLUSTERING, KMEANS & AWS (Click for details)
K-Means: Practice building and evaluating a k-means algorithm
PCA: Convert a set of observations or variables into principal components in order to improve predictive analysis.
PostgreSQL: Learn to build and maintain your own postgreSQL database
Module 8: BAYESIAN INFERENCE & LDA (Click for details)
LDA: Refine data using latent dirichlet allocation (LDA)
Naive Bayes: Learn how Naive Bayes can simplify the process of analyzing data for supervised learning algorithms
Module 9: WORKING WITH TIME SERIES (Click for details)
ARIMA Model: Use the ARIMA model to make predictions with time series data
Module 10: INTRO TO BIG DATA AND SPARK (Click for details)
Hive & Spark: Gain an introductory understanding of how Hive interacts with Hadoop and learn about Spark’s advantages through big data case studies