Descriptive Statistics & Statistical Analyses
📊 Mean/Median/Mode 📺
This video defines the the mean, median, and mode. The mean (average) of a data set is found by adding all numbers in the data set and then dividing by the number of values in the set. The median is the middle value when a data set is ordered from least to greatest. The mode is the number that occurs most frequently in a data set.
This video defines the margin of error as how far from the estimate the true value might be, in either direction. The confidence interval is the estimate ± the margin of error. It also applies these terms to a practical QR example: a runoff in an election.
This video explains how to use the p-value to solve problems with hypothesis testing. When the p-value is less than alpha, the null hypothesis is rejected and vice versa. A simple way to remember this is: '"If the p is low, the null must go!" It also discusses when to use a one tailed test compared to a two tailed test.
This video explains significance testing, using a p value = 0.05. It explicates the following:
p > 0.05 is the probability that the null hypothesis is true.
(1 - p value) is the probability that the alternative hypothesis is true.
A statistically significant test result (p ≤ 0.05) means that the test hypothesis is false or should be rejected.
A p value greater than 0.05 means that no effect was observed.
This video explains how to calculate the correlation coefficient, r, which measures the strength and direction of a linear relationship between two variables on a scatterplot.
This video provides a comprehensive explanation to the chi-square distribution, which is used to examine the differences between categorical variables in the same population.
This video defines the chi-square statistic as the square of the difference between the observed (o) and expected (e) values divided by the expected value. It also provides a numerical example applying the chi-square statistic to hypothesis testing.
This video defines linear regression as a linear approach to modeling the relationship between a dependent variable (a scalar response) and one or more independent variables (explanatory variables). It also defines: outliers, F-statistic, total sums of squares, sums of squares for regression, and sums of squares for error.
This link is a video tutorial which distinguishes between the nominal, ordinal, interval, and ratio scales of measurement. Nominal data is named data which can be separated into discrete categories which do not overlap (i.e. eye color). Ordinal data is data which is placed into some kind of order or scale (i.e. rating customer satisfaction on a scale from 1-10). Interval data is data which comes in the form of a numerical value where the difference between points is standardized and meaningful (i.e. temperature). Ratio data is much like interval data – it must be numerical values where the difference between points is standardized and meaningful, but it also must have a true zero/no negative values (i.e. height).
This video explains how to read and construct box and whisker plots (a five-number summary of a set of data), which are used to graphically depict groups of numerical data through their quartiles.
This video explains the difference between a linear scale and a logarithmic scale. On a linear scale, the value between any two points will never change. A logarithmic scale is one in which the units on the axis are powers, or logarithms, of a base number. Exponential growth curves are displayed on a logarithmic scale.