stats_measures/

directory
v0.0.0-...-b3f521c Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 19, 2017 License: Apache-2.0

README

Summary Measures

Despite a huge focus on machine learning in the context of data science, solid statistical analysis (e.g., via summary and aggregation) needs to be part of every data science project and, in all honestly, can provide more value (and quicker value) than sophisticated ML. However, even in cases where more sophisticated ML is justified, data scientists must understand the statistics of their data to determine the validity of modeling techniques and to develop intuition about the data.

Notes

  • The goal of summary statistics is to communicate as much information as possible about a series of observations as simply as possible.
  • Summary statistics often include a measure of central tendency (e.g., a mean or median) and a measure of "spread" (e.g., variance or standard deviation).
  • ALL DATA SCIENCE PROJECTS MUST INCLUDE SUMMARY STATISTICS along with and before any more sophisticated modeling.

Stat Trek
Khan Academy - Statistics
Bayesian Statistics
Elements of Statistical Learning

Code Review

github.com/gonum/stat docs
github.com/montanaflynn/stats docs
github.com/gonum/floats docs
Mean, Mode, Median
Min, Max, Range
Variance, Standard Deviation
Quantiles

Exercises

Exercise 1

Output central tendency and statisitcal dispersion (or "spread") measures together for all numeric features of the iris data set. Looking at these measure together gives a quick snapshot of "what the data looks like" numerically.

Template | Answer


All material is licensed under the Apache License Version 2.0, January 2004.

Directories

Path Synopsis
Sample program to calculate means, modes, and medians.
Sample program to calculate means, modes, and medians.
Sample program to calculate means, modes, and medians.
Sample program to calculate means, modes, and medians.
Sample program to calculate standard deviation and variance.
Sample program to calculate standard deviation and variance.
Sample program to calculate quantiles
Sample program to calculate quantiles
exercises
exercise1
Sample program to calculate both central tendency and statistical dispersion measures for the iris dataset.
Sample program to calculate both central tendency and statistical dispersion measures for the iris dataset.
template1
Sample program to calculate both central tendency and statistical dispersion measures for the iris dataset.
Sample program to calculate both central tendency and statistical dispersion measures for the iris dataset.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL