## Descriptive Statistics [254e223eccbf] ### Population and Sample [0cf070025bd7] In statistics, a population refers to the entire group of individuals or objects under study, while a sample is a subset of the population. For example, if we're studying the height of all students in a school, the population would be all students, while a sample might be 100 randomly selected students. [{"_type":"block","asset":null,"children":[{"_type":"span","marks":[],"text":"It's often impractical or impossible to study an entire population, which is why we use samples to make inferences about the population."}],"markDefs":[],"style":"normal"}] ### Random Sampling [b0df0323b02c] A random sample is one where each member of the population has an equal chance of being selected. This is crucial for ensuring that the sample is representative of the population. [{"_type":"block","asset":null,"children":[{"_type":"span","marks":[],"text":"Many people assume that any sample is a random sample, but this isn't true. For example, surveying only your friends about a political issue isn't a random sample of the population."}],"markDefs":[],"style":"normal"}] ### Data Types [9f8db0e4eb80] Data can be classified as discrete or continuous: * Discrete data can only take specific values (usually integers). Example: number of students in a class. * Continuous data can take any value within a range. Example: height of students. ### Reliability and Bias [34574cb37a41] The reliability of data sources and potential bias in sampling are critical considerations in statistics. Bias can occur when the sample doesn't accurately represent the population. [{"_type":"block","asset":null,"children":[{"_type":"span","marks":[],"text":"If we're studying the average income of a city but only sample people from wealthy neighborhoods, our results will be biased and not representative of the entire city."}],"markDefs":[],"style":"normal"}] ### Outliers [f7c285f657d6] Outliers are data points that differ significantly from other observations. They can have a substantial impact on statistical analyses and should be carefully considered. [{"_type":"block","asset":null,"children":[{"_type":"span","marks":[],"text":"When encountering outliers, don't automatically discard them. Investigate why they exist and consider their impact on your analysis."}],"markDefs":[],"style":"normal"}] ## Data Presentation [62fec26ddd72] ### Frequency Distributions [ead5b7a398af] A frequency distribution shows how often each value occurs in a dataset. It can be presented in a table or graphically. ### Histograms [78acdaac061c] Histograms are bar graphs that display the frequency distribution of continuous data. The x-axis represents the data values, usually in intervals, and the y-axis shows the frequency. ### Cumulative Frequency Graphs [5f95d40070e0] These graphs show the cumulative frequency up to each interval. They're useful for finding medians and percentiles. ### Box and Whisker Diagrams [a1511471d3b6] Also known as box plots, these diagrams provide a visual summary of the distribution of data, showing the median, quartiles, and potential outliers. ## Measures of Central Tendency and Dispersion [4057e5166ec7] ### Central Tendency [abe14625c826] * Mean: The average of all values. * Median: The middle value when data is ordered. * Mode: The most frequent value. [{"_type":"block","asset":null,"children":[{"_type":"span","marks":[],"text":"For the dataset: 2, 3, 3, 4, 5, 5, 6, 7, 8 Mean = (2 + 3 + 3 + 4 + 5 + 5 + 6 + 7 + 8) / 9 ≈ 4.78 Median = 5 Mode = 3 and 5 (bimodal)"}],"markDefs":[],"style":"normal"}] ### Dispersion [6dc2d63c413e] * Range: The difference between the maximum and minimum values. * Variance: The average squared deviation from the mean. * Standard Deviation: The square root of the variance. [{"_type":"block","asset":null,"children":[{"_type":"span","marks":[],"text":"Standard deviation is often preferred over variance as it's in the same units as the original data."}],"markDefs":[],"style":"normal"}] ## Linear Correlation and Regression [402d37b70323] ### Correlation [6d80a915422d] Correlation measures the strength and direction of the linear relationship between two variables. The correlation coefficient, r, ranges from -1 to 1. * r = 1: Perfect positive correlation * r = -1: Perfect negative correlation * r = 0: No linear correlation ### Regression [b285cec83f8b] Linear regression finds the best-fitting straight line through a set of points. The equation is typically in the form y = mx + c, where m is the slope and c is the y-intercept. [{"_type":"block","asset":null,"children":[{"_type":"span","marks":[],"text":"If we have data on students' study time (x) and their test scores (y), we might find a regression line like: y = 2x + 60 This suggests that for each additional hour of study, the test score increases by 2 points, with a base score of 60 for no study time."}],"markDefs":[],"style":"normal"}] ## Probability [2254655c53cc] ### Basic Concepts [a87924008cf7] * Trial: A single execution of an experiment. * Outcome: A possible result of a trial. * Probability: A measure of the likelihood of an event occurring, ranging from 0 to 1. ### Probability Calculations [804e4bdad282] Various diagrams can be used to calculate probabilities: * Venn diagrams: For set operations and probabilities. * Tree diagrams: For sequential events. * Sample space diagrams: For listing all possible outcomes. [{"_type":"block","asset":null,"children":[{"_type":"span","marks":[],"text":" This tree diagram shows the probabilities for flipping a fair coin twice. The probability of getting two heads is 0.5 × 0.5 = 0.25."}],"markDefs":[],"style":"normal"}] ### Conditional Probability [a113feb2efa0] Conditional probability is the probability of an event occurring given that another event has already occurred. It's denoted as P(A|B), read as "the probability of A given B". $P(A|B) = \frac{P(A \cap B)}{P(B)}$ [{"_type":"block","asset":null,"children":[{"_type":"span","marks":[],"text":"People often confuse P(A|B) with P(B|A). These are generally not the same!"}],"markDefs":[],"style":"normal"}] ## Discrete Random Variables and Probability Distributions [79cb15d12f9c] A discrete random variable is a variable that can only take specific values. Its probability distribution gives the probability for each possible value. ### Expected Value [2e49efba5d90] The expected value (E(X)) of a discrete random variable X is the sum of each possible value multiplied by its probability: $E(X) = \sum x_i P(X = x_i)$ [{"_type":"block","asset":null,"children":[{"_type":"span","marks":[],"text":"For a fair six-sided die, E(X) = 1(1/6) + 2(1/6) + 3(1/6) + 4(1/6) + 5(1/6) + 6(1/6) = 3.5"}],"markDefs":[],"style":"normal"}] ### Binomial Distribution [95eb8fa9b07c] The binomial distribution models the number of successes in a fixed number of independent trials, each with the same probability of success. If X ~ B(n, p), where n is the number of trials and p is the probability of success on each trial: $P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}$ [{"_type":"block","asset":null,"children":[{"_type":"span","marks":[],"text":"If you flip a fair coin 10 times, the probability of getting exactly 7 heads is: $P(X = 7) = \\binom{10}{7} (0.5)^7 (0.5)^3 ≈ 0.1172$"}],"markDefs":[],"style":"normal"}] ## Normal Distribution [e01238e01b4b] The normal distribution is a continuous probability distribution with a bell-shaped curve. It's characterized by its mean (μ) and standard deviation (σ). ### Standard Normal Distribution [749ccfcaaab4] The standard normal distribution has μ = 0 and σ = 1. We can convert any normal distribution to standard normal using the z-score: $z = \frac{x - \mu}{\sigma}$ [{"_type":"block","asset":null,"children":[{"_type":"span","marks":[],"text":"The z-score tells us how many standard deviations a value is from the mean."}],"markDefs":[],"style":"normal"}] ### Normal Probability Calculations [5a3af1078668] To find probabilities for normal distributions, we typically: 1. Convert to standard normal (find z-score) 2. Use standard normal tables or technology to find the probability [{"_type":"block","asset":null,"children":[{"_type":"span","marks":[],"text":"If heights in a population are normally distributed with μ = 170 cm and σ = 10 cm, what's the probability of a person being taller than 185 cm?"}],"markDefs":[],"style":"normal"},{"_type":"span","asset":null,"marks":[],"text":" "},{"_type":"block","asset":null,"children":[{"_type":"span","marks":[],"text":"Find z-score: $z = \\frac{185 - 170}{10} = 1.5$"}],"level":1,"listItem":"number","markDefs":[],"style":"normal"},{"_type":"span","asset":null,"marks":[],"text":" "},{"_type":"block","asset":null,"children":[{"_type":"span","marks":[],"text":"Look up P(Z > 1.5) in a standard normal table or use technology ≈ 0.0668"}],"level":1,"listItem":"number","markDefs":[],"style":"normal"},{"_type":"span","asset":null,"marks":[],"text":" "},{"_type":"block","asset":null,"children":[{"_type":"span","marks":[],"text":"So, about 6.68% of the population is taller than 185 cm."}],"markDefs":[],"style":"normal"}] ## Advanced Topics (AHL) [de5391174aa8] ### Bayes' Theorem [36d6d5719e2d] Bayes' theorem relates conditional probabilities: $P(A|B) = \frac{P(B|A)P(A)}{P(B)}$ This is particularly useful when we want to update probabilities based on new evidence. ### Continuous Random Variables and Probability Density Functions [688250f6ce09] For continuous random variables, we use probability density functions (PDFs) instead of probability mass functions. The probability of a specific value is always 0; we find probabilities for ranges of values by integrating the PDF. ### Properties of Continuous Distributions [e722f8c5b84d] For a continuous random variable X with PDF f(x): * Mode: The value of x where f(x) is maximum * Median: The value m where $P(X \leq m) = 0.5$ * Mean: $E(X) = \int_{-\infty}^{\infty} x f(x) dx$ * Variance: $Var(X) = E(X^2) - [E(X)]^2$ * Standard Deviation: $\sigma = \sqrt{Var(X)}$ ### Linear Transformations [9b8a421023b1] If Y = aX + b, where X is a random variable and a and b are constants: * E(Y) = aE(X) + b * Var(Y) = a²Var(X) [{"_type":"block","asset":null,"children":[{"_type":"span","marks":[],"text":"This is particularly useful when standardizing normal distributions!"}],"markDefs":[],"style":"normal"}] [{"_type":"block","asset":null,"children":[{"_type":"span","marks":[],"text":"Throughout your study of Statistics & Probability, remember that while calculations are important, interpreting results in context is crucial. Always consider what your statistical findings mean in the real world."}],"markDefs":[],"style":"normal"}]