The AP Stats TOP 10 FAQ

(ok, so it grew to 13, but saying Top 13 just doesn't sound right!)

These questions show up on the list with great frequency!  So here they are:

  1. Why divide by n-1?

  2. How do I explain r^2?

  3. Do I have to teach log and power re-expressions?

  4. What's the difference between confounding and lurking variables?

  5. What's the difference between independence and mutually exclusive?

  6. How much probability do I need to teach?

  7. What's the deal with adding variances?  Var(2X) =?? Var(X) + Var(X)

  8. I'm running out time and what can I do (especially with inference for slope)?

  9. How much work do students need to show on the exam?  aka, Do students need to show the formulas to get credit on the exam? aka, what about that pesky t*?

  10. What's a good text?

  11. What's a good review book?

  12. Why do I use the pooled-p for a 2-proportion z-test?

  13. How do you solve Sally and Betty?  2002 #39 MC?

 

ANSWERS:

  1. Why n-1?

bullet

Here is the shortest possible (and still honest) answer, given by Dan Teague: 

Unfortunately, there is no good/easy answer to any question in AP
Statistics that contains the phrase "why divide by (n-1)". The answer
is both always beyond the scope of the course or the student's
mathematical ability and almost always unenlightening. Moreover, with
any reasonable set of data, it makes no whit of difference if we divide
by n or n-1. Even so, (n-1) is the "right thing to do".

 

bullet

NCSSM has an activity about why n-1 that is here.

Back to top

  1. How do you explain r^2?

Here are the two examples that most helped me explain r^2 to my students.  They have been posted on the list many times over the years and I have lost track of the original author!

bullet

Height explains weight.  Not totally, but roughly.  Suppose r^2 is 75% for a dataset between height and weight.  We know that other things affect weight, in addition to height, including genetics, diet and exercise.  So we say that 75% of a person's variation in weight can be explained by the variation in height, but that 25% of that variation is due to other factors.

bullet

Suppose you are buying a pizza that is $7 plus $1.50 for each topping.  Clearly Price = 7 + 1.50(# of toppings).  Clearly r and r^2 are 1 and 100%.  Does this mean that the number of toppings 100% determines my cost?  No, clearly the $7 base price has a lot to do with the price!  However, my variation in price is 100% by the variation in the number of toppings I choose.

Al Coons has an activity regarding this topic that is archived at this location.

Dan Teague gives a nice explanation of the math involved for r^2 on this archived post.

r^2 was discussed on the list on this date.

Back to top

  1. Do I have to teach log transformations?

Yes!  

Why? (especially when my calculator can do it for me and has all these fancy commands!  Can't I have my students use those buttons?)

It's on the course description!  And here's why:

The idea of transforming data to achieve linearity is a powerful and important idea.  It is this idea we are teaching.  Re-expressing data and dealing with it in it's transformed and linear state is crucial.  As is understanding how to back-transform to make an appropriate prediction.

Dave Bock discusses transformations for linearity on this archived post.

Back to top

  1. What is the difference between confounding and lurking variables?

Paul Velleman gives a great post about confounding and lurking variables here.

 The list had a great discussion about confounding and lurking variables on Nov. 13th and 12th.  Click here and also go back one day to see the full discussion.

  Josh Zucker discusses the issue of extraneous variables and gives a list of links.

  1. Independence vs. mutually exclusive:

If two events are independent, the outcome of one will not affect the outcome of the other.  i.e., Whether or not it rains and whether or not a coin flips heads or tails.

If two events are mutually exclusive, if one happens the other event cannot happen.  For example, in picking one M&M from a bag, I can find the probability of drawing green or red.  But if I draw green, I cannot draw red.

As we saw on the '02 MC #23, it is useful to notice that if two events are mutually exclusive, they affect each other quite powerfully:  if one of them happens, the other CANNOT occur.  Thus they are dependent.

Independence vs. mutually exclusive has been discussed on the list.

Back to top

  1. How much probability do I teach?

'Floyd Bullard has submitted an awesome post about probability that can be read in the archives.  I would strongly encourage rookies to read this post BEFORE teaching probability!

Back to top

  1. Why doesn't X + X = 2X?

Here's a great explanation from Dave Bock:

For a short answer, try a thought experiment.

Let X represent the outcome when you roll a die. the 2X represents
rolling one die and doubling the result. The possible outcomes are {2,
4, 6, 8, 10, 12}; they are equiprobable.

On the other hand, X+X represents rolling two dice (or one twice). Now
the possible outcomes are {2, 3, 4, 5, 6, 7, ..., 12}. Some are far
less likely than others. Clearly this is a very different situation.

You can actually calculate both variances, but first just think about
the distributions. It should be pretty obvious that X+X is unimodal and
symmetric, peaking around 7 with very low tails while 2X is uniform
across the same range. The two means are the same, but X+X has a
smaller variance than 2X.

When confronting these situations, students must learn to ask
themselves how many random values they are working with. One random
value multiplied by a constant behaves much differently from summing
several different random values.

I urge students to recognize that a random variable in Statistics is
not the same animal as a variable in algebra. In algebra what we call a
"variable" is really just an unspecified constant. With that
understand, no matter what number I use for X I'll always substitute
that same value every time I see an X, so it must be true that X+X+X =
3X.

Then I put my Statistics hat on, declare them "random variables", and
pick up a die. I substitute the results of the first roll for the first
X, roll again for the value of the second X, etc. It's pretty clear now
that this X+X+X = 3X equation that seems so obvious in algebra is false
for random variables in Statistics. (One time the four values I
randomly rolled actually worked! The kids thought that was hilarious.
Their laughter at my bad luck clearly showed they understood the
issue.)

Read a list discussion thread about adding random variables.

Pete Flannigan-Hyde as written an article for AP Central about adding random variable.

Back to top

  1. I'm behind!  Help!

bullet

Note that once you introduce inference, you can teach the last part of the year very quickly!  Especially inference for slope, which is on the AP test.  

bullet

For inference for slope, focusing on interpreting the computer output can save time.

bullet

Not getting into all the nitty-gritty details about homogeniety and independence can save time.

bullet

Following the pacing guide that comes with the textbooks, can help avoid this problem to begin with, but if you're reading this, it may be too late!  :o)

bullet

Starting cumulative review while finishing inference can eliminate the need for lots of days of review.

bullet

Reviewing regression while teaching inference for slope is a natural and helpful step for preparing for the exam.

Back to top

  1. How much work to show?

The short answer is that most list contributors recommend that students show formulas.  Both with just variables and then with the numbers plugged in.  It shows that the student understands what is going on and it eliminates the concern that students would lose points if they accidentally plugged something into their calculator incorrectly.

A note about the t* for t-intervals.  If a student uses technology for certain procedures (e.g., 1-sample with n = 167 or any 2-sample interval), the t* will not be on the table.  It is OK to leave the formula with all the numbers plugged in and the t* just stays as a variable.  OR a student can use a conservative approach that uses a t* that is on the table, but then they need to calculate their interval by hand so their answer matches the df they used.  

If students and/or teacher really want to find the t*, they can use the inverse t function.  If students have an 83, they need a t-inverse program.  This program is legal (because it just matches the 84) and can be made by typing this simple little program:

Prompt N

Prompt A

solve(tcdf(X,1E99,N-1)-A,X,1)- ->K

Disp K

A few other points about this:
bullet

For hypothesis tests and confidence intervals, the AP rubrics have (thus far!) required name OR formula.  So students can get full credit without the formula.

bullet

Numerous multiple problems on the '02 exam require formula understanding:
bullet

#8--1 sample t-interval

bullet

#11--Chi-Sq expected

bullet

#21--Confidence interval for slope

bullet

#32--Binomial and geometric formulas

bullet

#38--Binomial and 1-prop z formulas

bullet

TI-talk is discouraged.  Statements like:  normalcdf (1.2, 9999) are just not good communication.  While showing a total by-hand formula is not required, good communication is.  For example, on a binomial problem, students could write:
bullet

Binomial

bullet

n = 6

bullet

p = 0.87

bullet

P(x = 4) = ----- (from calculator)

bullet

It has been frequently recommended on this list that students show z-score calculations and don't use technology to shortcut that step!

Back to top

  1. Textbook?  Just click here!

 

  1. Review Book?  Just click here!

Back to top

  1. Why do we pool for a 2-proportion z-test?

Charles Peltier has written an article for AP Central about pooling.

In short, we are assuming in our null hypothesis that p1 = p2.  So then the question arises:  which p do I use the compute the standard deviation of (p1 - p2)?  The best solution is to form the pooled-p.  This pooled-p takes the weighted average of the two proportions, thus takes a compromise position.  This is the best way we can assume that p1 = p2 when calculating the standard deviation.

Back to top

  1. Betty and Sally!  Arggh!  :o)   2002 MC #39

At first this problem seems impossible!  How could a two-tailed test reject what a one-tailed test failed to reject!?!?  Answer:  if the one-tailed test shaded the wrong way!  Only z = -1.98 a sufficient value to reject a two-tailed test.  And if z = -1.98 is shaded greater than, then the one-tailed test fails to reject!  Pretty tricky!

 

Hit Counter