## Test your basic knowledge |

# AP Statistics Vocab

**Instructions:**

- Answer 50 questions in 15 minutes.
- If you are not ready to take this test, you can study here.
- Match each statement with the correct term.
- Don't refresh. All questions and answers are randomly picked and ordered every time you load a test.

This is a study tool. The 3 wrong answers for each question are randomly chosen from answers to other questions. So, you might find at times the answers obvious, but you will see it re-enforces your understanding as you take the test each time.

**1. Displays the 5-number summary as a central box with whiskers that extend to the non-outlying data values**

**2. Any attempt to force a sample to resemble specified attributes of the population**

**3. Bias introduced to a sample when individuals can choose on their own whether to participate in the sample**

**4. The differences between data values and the corresponding values predicted by the regression model; ____ = observed value - predicted value**

**5. Individuals on whom an experiment is performed**

**6. A sample drawn by selecting individuals systematically from a sampling frame**

**7. Any systematic failure of a sampling method to represent its population; common errors are voluntary response - undercoverage - nonresponse ____ - and response ____**

**8. Extreme values that don't appear to belong with the rest of the data**

**9. Shows the relationship between two quantitative variables measured on the same cases**

**10. Value found by subtracting the mean and dividing by the standard deviation**

**11. In a normal model - about 68% of values fall within 1 standard deviation of the mean - about 95% fall within 2 standard deviations of the mean - and about 99.7% fall within 3 standard deviations of the mean**

**12. Places in order the effects that many re-expressions have on the data**

**13. An arrangement of data in which each row represents a case and each column represents a variable**

**14. Sampling schemes that combine several sampling methods**

**15. When doing this - consider their shape - center - and spread**

**16. Although linear models provide an easy way to predict values of y for a given value of x - it is unsafe to predict for values of x far from the ones used to find the linear model equation; predictions should not be trusted**

**17. Displays data that change over time**

**18. When either those who could influence or evaluate the results is blinded**

**19. Numerically valued attribute of a model**

**20. The most basic situation in a simulation in which something happens at random**

**21. Displays counts and - sometimes - percentages of individuals falling into named categories on two or more variables; categorizes the individuals on all variables at once - to reveal possible patterns in one variable that may be contingent on the cate**

**22. A distribution that's roughly flat**

**23. An observational study in which subjects are followed to observe future outcomes**

**24. Adding a constant to each data value adds the same constant to the mean - the median - and the quartiles - but does not change the standard deviation or IQR**

**25. Gives a value in 'y-units per x-unit'; changes of one unit in x are associated with changes of b1 units in predicted values of y**

**26. When averages are taken across different groups - they can appear to contradict the overall averages**

**27. A value that attempts the impossible by summarizing the entire distribution with a single number - a 'typical' value**

**28. The middle value with half of the data above and half below it**

**29. This - b0 - gives a starting value in y-units; it's the y-hat-value when x is 0**

**30. This of sample size n is one in which each set of n elements in the population has an equal chance of selection**

**31. A scatterplot shows an association that is this if there is little scatter around the underlying relationship**

**32. The experimental units assigned to a baseline treatment level - typically either the default treatment - which is well understood - or a null - placebo treatment**

**33. If data consist of two or more groups that have been thrown together - it is usually best to fit different linear models to each group than to try to fit a single model to all of the data**

**34. Variables are said to be this if the conditional distribution of one variable is the same for each category of the other**

**35. A positive ____ or association means that - in general - as one variable increases - so does the other; when increases in one variable generally correspond to decreases in the other - the association is negative**

**36. Data points whose x-values are far from the mean of x are said to exert ____ on a linear model; with high enough ____ - residuals can appear to be deceptively small**

**37. The best defense against bias - in which each individual is given a fair - random chance of selection**

**38. Gives the possible values of the variable and the frequency or relative frequency of each value**

**39. Consists of the minimum and maximum - the quartiles Q1 and Q3 - and the median**

**40. Distributions with two modes**

**41. Manipulates factor levels to create treatments - randomly assigns subjects to these treatment levels - and then compares the responses of the subject groups across treatment levels**

**42. Found by substituting the x-value in the regression equation; they're the values on the fitted line**

**43. The difference between the first and third quartiles**

**44. A treatment known to have no effect - administered so that all groups experience the same conditions**

**45. Summarized with the mean or the median**

**46. A quantity or amount adopted as a standard of measurement - such as dollars - hours - or grams**

**47. In a retrospective or prospective study Subjects who are similar in ways not under study may be ____ and then compared with each other on the variables of interest**

**48. The tendency of many human subjects (often 20% or more of experiment subjects) to show a response even when administered a placebo**

**49. The difference between the lowest and highest values in a data set**

**50. A variable in which the numbers act as numerical values; always has units**