|
Chapter
2
When entering data into DataDesk (or Excel to put into DD), it is
important that you only enter digits or decimals. DataDesk does not
understand commas, dollar signs, or any other non-numerical symbol. Do
not input these with your data.
Chapter
4
To describe data, identify the pattern (shape) and the exceptions
(extreme observations that MAY be outliers).
·
When trying to ascertain the shape of a distribution, look at
the histogram first. Click on the "play" button in the upper
right hand corner of the graph box, select Plot Scale, and change the
bar width to find the unimodal graph that best displays the symmetry
or skewness.
·
When trying to determine the shape of a distribution, it is
important that the histogram is unimodal. If you conclude that
the shape is bimodal (or any other number of peaks other than a single
peak), you must be prepared to explain what the two different peaks
represent. For example, if you have data on heights of American
adults, it would be bimodal. But WHY??? The two different peaks
represent male and female, and these can be kept separate. But if you
cannot explain what the two different peaks represent, then play with
the bar width until there is only one peak.
· When
describing the shape of a distribution, you must supply a
picture (sketch or imported Data Desk graph) to support your
conclusion. If you refer to a histogram, your reader MUST see it.
· The
process to determine outliers is not agreed on by all statisticians.
There are multiple ways to do this. We will all use the same procedure
that Data Desk uses. In Lesson 5, you will learn about a graph called
a boxplot. These plots will display outliers in a way that makes them
easy to detect and identify. Until you get to Lesson 5 and have a
mathematical method to identify outliers, do not call any extreme
observations an outlier - it may or may not be an outlier by an
appropriate mathematical definition. Call extreme observations
"potential outliers".
You
want to change the default settings for
the statistics that are generated when you access Calc> Summaries
output. First, select a variable and then select Calc >
Summaries > Reports. Click on the HyperView button (it looks
like the "play" button on your tape or CD player) and click
on Select Summary Statistics. Remove the midrange, add the Lower
Percentile, the Upper Percentile, the Min and the Max. Make sure
the InterQuartile Range is selected. Then you must click on SET
DEFAULTS (if you just select OK, it won't change the defaults).
To
delete one (or more) observations from a data set, put the cursor
at the right-hand end of the number you want to delete and BACKSPACE
until it is gone. The Delete key will NOT work!!! If
you are deleting an observation so you can see the change that occurs
to specific statistics or to a graph, calculate the stats or plot the
graph first. After you have deleted the correct observation(s),
there will be a red exclamation point in the top left-hand corner of
the stat box and/or the graph box (the hyperview button). Click
on the red exclamation point and select the first option - Update the
Window. If you need to delete more data and examine further
changes, the red exclamation point will continue to appear after each
change in the data column (the variable's observations). When
you close the data file (or exit Data Desk), CONSIDER CAREFULLY IF YOU
WANT TO SAVE THESE CHANGES.
When
calculating statistics for variables that have
been designated as Y and X, you have to choose the correct
choice from Calc > Summaries >
-
If
you have one quantitative variable and one categorical variable,
select Reports by Groups
-
If
you have more than one quantitative variable, select Reports
Multiple
When
graphing side-by-side dotplots or side-by-side
boxplots, you can select Y and X in two different situations.
-
If
you have one quantitative variable and one categorical variable,
ALWAYS select the quantitative variable first (Y). Hold the
shift key down to select the categorical variable as X.
-
If
you have more than one quantitative variable, whichever variable
you select first (as Y) will appear first. You may designate
multiple variables as X.
Chapter
5
· You will not
be responsible for calculating the standard deviation by hand. Use
Data Desk for this. Access (or input) the data, go to Calc >
Summaries > Reports.
·
To describe a distribution, discuss the shape, center, and
spread. It is necessary to made overall conjectures/conclusions about
what the graphs and statistics tell you about the scenario. Put
comments in context of the situation. Do not list, "the mean is
15, median is 14.5, range is 4, etc." Read the problem and try
and to make overall statements that reflect you have thought about the
problem (as opposed to just copying down a bunch of numbers that you
don't know how to interpret).
·
The
concept of variation in data is very important. If all our data
were the same, then describing situations would be quite easy. Since
this is not the case, we accept (and appreciate) the variation in life
and immediately move towards trying to understand/describe it.
·
Boxplots are an easy way to identify outliers.
·
I t
only makes sense to graph side-by-side boxplots for information about one
variable. For example, if you want to compare teachers’ salaries in
the U.S., you can use regions of the country and graph boxplots for
North, South, East, and West. On the other hand, you would not graph
height vs. weight using this technique and be able to make fair
comparisons. If you want to look at the relationship between two
variables, use a scatterplot.
To
get rid of the shaded areas in the
middle of boxplots, click on the "play" button at the top
left-hand corner of the boxplot graphing box and select Boxplot
Options from the pull-down menu. When the next box opens,
change the selected radio button to Do NOT display 95% C.I. and then
check the Set Default box.
When
graphing side-by-side dotplots or
side-by-side boxplots, you can select Y and X in two different
situations.
-
If
you have one quantitative variable and one categorical variable,
ALWAYS select the quantitative variable first (Y). Hold the
shift key down to select the categorical variable as X.
-
If
you have more than one quantitative variable, whichever variable
you select first (as Y) will appear first. You may designate
multiple variables as X.
Chapter
6
To begin any question relating to the normal
distributions, always sketch a normal curve first. On this
sketch, place the mean (in the middle) and the value(s) you are
concerned with. Shade in the area of interest (either above or below
the single value of interest or the area in between two values of
interest). Use the normal tables or the DD tool ZArea
to find the area under the curve corresponding to the values of
interest. If you are given the area and asked where this value lies,
then indicate on the curve the approximate size of this area (this is
the Inverse Normal procedure).
Chapter
7
· Use
the scatterplot to determine if the relationship is approximately
linear. If the two variables do not have a linear relationship, then
we do not continue with diagnostic tools that depend on linearity
(correlation AND regression both are valid only for linear
relationships). There are statistical techniques to deal with
non-linear situations (transformations), but we will not pursue those
methods in this course. · Scatterplots
do not provide a measure of strength. Use correlation to determine the
strength of the (linear) relationship.
· It
is important to graph a scatterplot before calculating the
correlation.
· In
Data Desk, after graphing the scatterplot, you can easily access the
correlation by clicking on the hyperview button (the "play"
button) and selecting Correlation.
· As
stated in the general instructions above, you will not be responsible
for calculation the correlation coefficient by hand (or with a
hand-held calculator). Statisticians use software so we will also.
· Know
how to find the correlation from the regression output that Data Desk
provides.
Chapter
8
· You
will not be responsible for calculating the linear regression equation
by hand (or with a hand-held calculator). Statisticians use software
so we will also.
· Familiarize
yourself with the regression output provided by Data Desk.
· After
you have determined the two variables have a linear relationship, you
can access the Regression information by clicking on the hyperview
button (the "play" button) and selecting Regression.
Chapter
9:
Activity 9-1 (i) p. 181. You are asked to
find the correlation between Year Founded and Tuition for public
4-year colleges only.
Select
Tuition as Y and Founded as X. Create a scatterplot AND the open
the correlation box.
From
the HyperView menu (the "play" button), go to Selector >
Use HotSelector command, then on the same HyperView menu, Turn on
Automatic Update. Now, go to the correlation box and from that
Hyperview menu, Turn on Automatic Update. Then, in the
correlation box, click on the words "No Selector" and choose
the Use HotSelector command from the pop-up menu.
In
the horizontal box below that contains the variables, select Type and
plot a bar chart. Then, from the tool palette, select the knife
and point to the category that you've been asked about (public
4-year). The correlation box will update and give you the
statistic.
Chapter
10:
Activity
10-2 (m): Select the dependent and independent variables as Y
and X. Graph a scatterplot. From the HyperView button (the
"play button"), select Turn on Automatic Update. Now,
we're going to put Denver's airfare on sale. Make sure the point
corresponding to Denver is highlighted. Go to the price of $258
and using the backspace key, change that price to $200. What did
the line do? Change it again to $150 and again to $100.
What did the line do every time? Now find Orlando's point on the
scatterplot. Change it's airfare from $179 to $129 and then to
$79. Did the line move more from changing Denver or Orlando?
Do you think the location of the point has anything to do with which
one changed the most?
Activity
10-4 (a)-(d): Repeat the directions from Activity 9-1 (i),
but this time add a regression box and select Turn on Automatic
Update.
Chapter
12: You have to do a couple of simulations that
require instructions. You'll use a java
applet for the first two and then Data Desk for the last one.
12-5
(a): Access the applet. FIRST - uncheck the box for Gender
- all we want to look at is Years of Service and Party (Democrat or
Republican). Set the sample size to 10, Number of Samples = 1,
then click on Draw Samples. Use the results from this one sample of
size 10 to fill in the mean number of years and the proportion
Democrat. Repeat this process until the table on p.251 is filled
in.
12-6 (a) STILL USING THE
APPLET -- To understand the simulation, select sample size = 5 but
only take 1 sample. Look at your results. Click on Draw Samples
several times -- until you see what the sampling process is doing.
Then, unclick Animate, change the number of samples to 100, click on
reset, then Draw Samples. The computer will generate 100 samples, each
of size 5 and construct a histogram of the sample proportions.
12-6 (b) This step must be
done using Data Desk. Follow carefully. From the toolbar,
choose Manip > Generate Random Numbers. In this box, Generate
100 variables with 5 cases (this is taking 100 samples each of size
5). Under Distribution, click in the radio button beside
BInomial Experiments. For #Bernoulli trials/experiments, type in
10,000 (this is now your population of size 10,000) and set the
probability (success) of 0.45 -- 45% democrat. Click on
OK. Now, at the bottom of your screen you have a long,
horizontal box. In the top right hand corner of this box,
highlight the piece of paper with the corner folded down (this selects
all 100 samples). Go to the toolbar and select Calc >
Summaries > As Variables. In the new box, select the Mean as
Y and graph a histogram. NOTE: the scale here is not
percents. Your values that created the histogram look like 4415
or 4603 or other values that would be generated from a population of
10,000. So, you have to look at this graph and realize that if
we could easily convert these integers to percents, the graph would
NOT change (because you would be graphing 44.15% and 46.03%, etc).
The question is, did the approximate shape change by sampling from a
MUCH larger population???
Chapter 14-16
We will not stress
probability problems. It is important that you understand a couple of
major ideas from these Lessons: the idea of chance, randomness,
long-term behavior and patterns, sampling variability, and sampling
distributions.
Chapter
19
·
Confidence Intervals are the first inferential topic in this
course. Here, we want to estimate a population parameter (in this
course we'll be estimating the a proportion, and later on a mean, in
Ch 23). Since the population information is unavailable, we take a
sample and estimate what we think is going on in the larger
population. It is important to remember that whenever we take a
sample, we have sampling variability. So while we hope our sample
accurately reflects the population behavior, it is safer to allow some
leeway (otherwise known as the margin of error).
· You
will be responsible for calculating confidence intervals for
estimating one proportion.
Chapter
20
·
You will need
to know how to substitute into the test statistics for z-test for
individual proportions.
·
You will be
responsible for finding p-values from the tables in your book. If you
have the data, you can rely on Data Desk to provide p-values. But if
you only have sample statistics, you will need to calculate the test
statistic and then determine the p-value from the z table, or the
DataDesk tool..
Chapter
21
-22
·
You will be
responsible for finding p-values from the tables in your book. If you
have the data, you can rely on Data Desk to provide p-values. But if
you only have sample statistics, you will need to calculate the test
statistic and then determine the p-value from the z or t table.
·
You will not
be responsible for calculating p-values, simply for obtaining them
from the tables or computer.
·
You will need
to know how to substitute into the test statistics for z-test for
one-proportion.
·
You will be
responsible for inference procedures (confidence intervals and
hypothesis testing) for one proportion, and two proportions.
Chapter
23-25
· When
trying to decide between using a t-interval or a z-interval, some
textbooks suggest two different rules. However, most statisticians use
a z-interval only when the population standard deviation, sigma, is
known. If sigma is unknown, they use a t-interval. Please use the
criteria that statisticians use (do not rely on the sample size).
·You will need to
know how to substitute into the test statistics for z-test for
individual proportions, and t-test for individual means.
· You
will be responsible for finding p-values from the tables in your book.
If you have the data, you can rely on Data Desk to provide p-values.
But if you only have sample statistics, you will need to calculate the
test statistic and then determine the p-value from the z or t table.
· You
may need to determine the p-value when you have a t test statistic (it
doesn't matter if it's one sample, two-sample or paired). We'll do
this by an example. Look at the t-table in your book. Suppose your
degrees of freedom are 26 and the calculated test statistic is -
1.986. Find the row on the table that corresponds to df = 26. Look
across the row and find where 1.986 (notice it's not negative now, the
curves are symmetric so we can use the positive value) falls: 1.706
< t < 2.056. Now look at the two values 1.706 and 2.056. From
these two values read UP and go all the way to the top of the table
and read the probabilities, 0.05 and 0.025. The p-value is in between
these two (but we ALWAYS write the smaller number on the left so we
could write 0.025 < p < 0.05). This isn't an exact value, but in
terms of interpreting the p-value, this will suffice.
· You
will not need to know how to calculate the test statistics for
two-sample tests. We will use Data Desk to take care of these more
complicated formulas.
|