Mary Hudachek-Buswell, formerly Mary H. Stephens
Math Homepage     |    Math Courses    |    Math Schedule

Instructor Links

Mary Hudachek-Buswell 
Homepage

E-Mail 
Mary Hudachek-Buswell

Current Semester Schedule
Presentations
Attendance
 
 
Department Links
Math Advisement
Math Homepage
Math Courses
Math Schedule
 
Math Links
Math Review Worksheets
Math Tables
Math Forum
MathCAD Software
MathCAD in the classroom

 

 

Math 1231 Department Page

To download any file, right click on the hyperlink and select Save Target As...

Chapter 2
When entering data into DataDesk (or Excel to put into DD), it is important that you only enter digits or decimals. DataDesk does not understand commas, dollar signs, or any other non-numerical symbol. Do not input these with your data. 

Chapter 4
To describe data, identify the pattern (shape) and the exceptions (extreme observations that MAY be outliers).
·  When trying to ascertain the shape of a distribution, look at the histogram first. Click on the "play" button in the upper right hand corner of the graph box, select Plot Scale, and change the bar width to find the unimodal graph that best displays the symmetry or skewness.

·  When trying to determine the shape of a distribution, it is important that the histogram is unimodal. If you conclude that the shape is bimodal (or any other number of peaks other than a single peak), you must be prepared to explain what the two different peaks represent. For example, if you have data on heights of American adults, it would be bimodal. But WHY??? The two different peaks represent male and female, and these can be kept separate. But if you cannot explain what the two different peaks represent, then play with the bar width until there is only one peak.
· When describing the shape of a distribution, you must supply a picture (sketch or imported Data Desk graph) to support your conclusion. If you refer to a histogram, your reader MUST see it.
· The process to determine outliers is not agreed on by all statisticians. There are multiple ways to do this. We will all use the same procedure that Data Desk uses. In Lesson 5, you will learn about a graph called a boxplot. These plots will display outliers in a way that makes them easy to detect and identify. Until you get to Lesson 5 and have a mathematical method to identify outliers, do not call any extreme observations an outlier - it may or may not be an outlier by an appropriate mathematical definition. Call extreme observations "potential outliers".

 

You want to change the default settings for the statistics that are generated when you access Calc> Summaries output.  First,  select a variable and then select Calc > Summaries > Reports.  Click on the HyperView button (it looks like the "play" button on your tape or CD player) and click on Select Summary Statistics.  Remove the midrange, add the Lower Percentile, the Upper Percentile, the Min and the Max.  Make sure the InterQuartile Range is selected.  Then you must click on SET DEFAULTS (if you just select OK, it won't change the defaults).

To delete one (or more) observations from a data set, put the cursor at the right-hand end of the number you want to delete and BACKSPACE until it is gone.  The Delete key will NOT work!!!   If you are deleting an observation so you can see the change that occurs to specific statistics or to a graph, calculate the stats or plot the graph first.  After you have deleted the correct observation(s),  there will be a red exclamation point in the top left-hand corner of the stat box and/or the graph box (the hyperview button).  Click on the red exclamation point and select the first option - Update the Window.  If you need to delete more data and examine further changes, the red exclamation point will continue to appear after each change in the data column (the variable's observations).  When you close the data file (or exit Data Desk), CONSIDER CAREFULLY IF YOU WANT TO SAVE THESE CHANGES.

 

When calculating statistics for variables that have been designated as Y and X, you have to choose the correct choice from Calc > Summaries >  

  • If you have one quantitative variable and one categorical variable, select Reports by Groups

  • If you have more than one quantitative variable, select Reports Multiple

When graphing side-by-side dotplots or side-by-side boxplots, you can select Y and X in two different situations.

  • If you have one quantitative variable and one categorical variable, ALWAYS select the quantitative variable first (Y).  Hold the shift key down to select the categorical variable as X.

  • If you have more than one quantitative variable, whichever variable you select first (as Y) will appear first.  You may designate multiple variables as X.

Chapter 5
·
  You will not be responsible for calculating the standard deviation by hand. Use Data Desk for this. Access (or input) the data, go to Calc > Summaries > Reports.
·  To describe a distribution, discuss the shape, center, and spread. It is necessary to made overall conjectures/conclusions about what the graphs and statistics tell you about the scenario. Put comments in context of the situation. Do not list, "the mean is 15, median is 14.5, range is 4, etc." Read the problem and try and to make overall statements that reflect you have thought about the problem (as opposed to just copying down a bunch of numbers that you don't know how to interpret).
·   The concept of variation in data is very important. If all our data were the same, then describing situations would be quite easy. Since this is not the case, we accept (and appreciate) the variation in life and immediately move towards trying to understand/describe it.
·   Boxplots are an easy way to identify outliers. 
·     I t only makes sense to graph side-by-side boxplots for information about one variable. For example, if you want to compare teachers’ salaries in the U.S., you can use regions of the country and graph boxplots for North, South, East, and West. On the other hand, you would not graph height vs. weight using this technique and be able to make fair comparisons. If you want to look at the relationship between two variables, use a scatterplot.

To get rid of the shaded areas in the middle of boxplots, click on the "play" button at the top left-hand corner of the boxplot graphing box and select Boxplot Options from the pull-down menu.  When the next box opens, change the selected radio button to Do NOT display 95% C.I. and then check the Set Default box.

When graphing side-by-side dotplots or side-by-side boxplots, you can select Y and X in two different situations.

  • If you have one quantitative variable and one categorical variable, ALWAYS select the quantitative variable first (Y).  Hold the shift key down to select the categorical variable as X.

  • If you have more than one quantitative variable, whichever variable you select first (as Y) will appear first.  You may designate multiple variables as X.

Chapter 6
To begin any question relating to the normal distributions, always sketch a normal curve first. On this sketch, place the mean (in the middle) and the value(s) you are concerned with. Shade in the area of interest (either above or below the single value of interest or the area in between two values of interest). Use the normal tables or the DD tool ZArea to find the area under the curve corresponding to the values of interest. If you are given the area and asked where this value lies, then indicate on the curve the approximate size of this area (this is the Inverse Normal procedure).

Chapter
·  
Use the scatterplot to determine if the relationship is approximately linear. If the two variables do not have a linear relationship, then we do not continue with diagnostic tools that depend on linearity (correlation AND regression both are valid only for linear relationships). There are statistical techniques to deal with non-linear situations (transformations), but we will not pursue those methods in this course. ·   Scatterplots do not provide a measure of strength. Use correlation to determine the strength of the (linear) relationship.
·    
It is important to graph a scatterplot before calculating the correlation. 
·     
In Data Desk, after graphing the scatterplot, you can easily access the correlation by clicking on the hyperview button (the "play" button) and selecting Correlation. 
·   
As stated in the general instructions above, you will not be responsible for calculation the correlation coefficient by hand (or with a hand-held calculator). Statisticians use software so we will also. 
·   
Know how to find the correlation from the regression output that Data Desk provides.

Chapter 8
·  
You will not be responsible for calculating the linear regression equation by hand (or with a hand-held calculator). Statisticians use software so we will also. 
·  
Familiarize yourself with the regression output provided by Data Desk. 

·  
After you have determined the two variables have a linear relationship, you can access the Regression information by clicking on the hyperview button (the "play" button) and selecting Regression.

Chapter
9:  
Activity 9-1 (i) p. 181.  You are asked to find the correlation between Year Founded and Tuition for public 4-year colleges only.  

Select Tuition as Y and Founded as X.  Create a scatterplot AND the open the correlation box.

From the HyperView menu (the "play" button), go to Selector > Use HotSelector command, then on the same HyperView menu, Turn on Automatic Update.  Now, go to the correlation box and from that Hyperview menu, Turn on Automatic Update.  Then, in the correlation box, click on the words "No Selector" and choose the Use HotSelector command from the pop-up menu.

In the horizontal box below that contains the variables, select Type and plot a bar chart.  Then, from the tool palette, select the knife and point to the category that you've been asked about (public 4-year).  The correlation box will update and give you the statistic.

 

Chapter 10:  

Activity 10-2 (m):  Select the dependent and independent variables as Y and X.  Graph a scatterplot.  From the HyperView button (the "play button"), select Turn on Automatic Update.  Now, we're going to put Denver's airfare on sale.  Make sure the point corresponding to Denver is highlighted.  Go to the price of $258 and using the backspace key, change that price to $200.  What did the line do?  Change it again to $150 and again to $100.  What did the line do every time?  Now find Orlando's point on the scatterplot.  Change it's airfare from $179 to $129 and then to $79.  Did the line move more from changing Denver or Orlando?  Do you think the location of the point has anything to do with which one changed the most?

 

Activity 10-4 (a)-(d):  Repeat the directions from Activity 9-1 (i),  but this time add a regression box and select Turn on Automatic Update.

 

Chapter 12:  You have to do a couple of simulations that require instructions.  You'll use a java applet for the first two and then Data Desk for the last one.

 

12-5 (a):  Access the applet.  FIRST - uncheck the box for Gender - all we want to look at is Years of Service and Party (Democrat or Republican).  Set the sample size to 10, Number of Samples = 1, then click on Draw Samples. Use the results from this one sample of size 10 to fill in the mean number of years and the proportion Democrat.  Repeat this process until the table on p.251 is filled in.

12-6 (a) STILL USING THE APPLET -- To understand the simulation, select sample size = 5 but only take 1 sample. Look at your results. Click on Draw Samples several times -- until you see what the sampling process is doing. Then, unclick Animate, change the number of samples to 100, click on reset, then Draw Samples. The computer will generate 100 samples, each of size 5 and construct a histogram of the sample proportions.

12-6 (b) This step must be done using Data Desk.  Follow carefully.  From the toolbar, choose Manip > Generate Random Numbers.  In this box, Generate 100 variables with 5 cases (this is taking 100 samples each of size 5).  Under Distribution, click in the radio button beside BInomial Experiments.  For #Bernoulli trials/experiments, type in 10,000 (this is now your population of size 10,000) and set the probability (success) of 0.45 -- 45% democrat.   Click on OK.  Now, at the bottom of your screen you have a long, horizontal box.  In the top right hand corner of this box, highlight the piece of paper with the corner folded down (this selects all 100 samples).  Go to the toolbar and select Calc > Summaries > As Variables.  In the new box, select the Mean as Y and graph a histogram.  NOTE:  the scale here is not percents.  Your values that created the histogram look like 4415 or 4603 or other values that would be generated from a population of 10,000.  So, you have to look at this graph and realize that if we could easily convert these integers to percents, the graph would NOT change (because you would be graphing 44.15% and 46.03%, etc).  The question is, did the approximate shape change by sampling from a MUCH larger population???

Chapter 14-16
We will not stress probability problems. It is important that you understand a couple of major ideas from these Lessons: the idea of chance, randomness, long-term behavior and patterns, sampling variability, and sampling distributions.

 Chapter 19
· 
  Confidence Intervals are the first inferential topic in this course. Here, we want to estimate a population parameter (in this course we'll be estimating the a proportion, and later on a mean, in Ch 23). Since the population information is unavailable, we take a sample and estimate what we think is going on in the larger population. It is important to remember that whenever we take a sample, we have sampling variability. So while we hope our sample accurately reflects the population behavior, it is safer to allow some leeway (otherwise known as the margin of error). 
·   
You will be responsible for calculating confidence intervals for estimating one proportion.

 Chapter 20
·       
You will need to know how to substitute into the test statistics for z-test for individual proportions.
·       
You will be responsible for finding p-values from the tables in your book. If you have the data, you can rely on Data Desk to provide p-values. But if you only have sample statistics, you will need to calculate the test statistic and then determine the p-value from the z table, or the DataDesk tool..

 Chapter 21 -22
·       
You will be responsible for finding p-values from the tables in your book. If you have the data, you can rely on Data Desk to provide p-values. But if you only have sample statistics, you will need to calculate the test statistic and then determine the p-value from the z or t table. 
·       
You will not be responsible for calculating p-values, simply for obtaining them from the tables or computer.
·       
You will need to know how to substitute into the test statistics for z-test for one-proportion. 
·       
You will be responsible for inference procedures (confidence intervals and hypothesis testing) for one proportion, and two proportions.

 Chapter 23-25
·  
When trying to decide between using a t-interval or a z-interval, some textbooks suggest two different rules. However, most statisticians use a z-interval only when the population standard deviation, sigma, is known. If sigma is unknown, they use a t-interval. Please use the criteria that statisticians use (do not rely on the sample size).
·
You will need to know how to substitute into the test statistics for z-test for individual proportions,  and t-test for individual means.
· 
You will be responsible for finding p-values from the tables in your book. If you have the data, you can rely on Data Desk to provide p-values. But if you only have sample statistics, you will need to calculate the test statistic and then determine the p-value from the z or t table.
· 
You may need to determine the p-value when you have a t test statistic (it doesn't matter if it's one sample, two-sample or paired). We'll do this by an example. Look at the t-table in your book. Suppose your degrees of freedom are 26 and the calculated test statistic is - 1.986. Find the row on the table that corresponds to df = 26. Look across the row and find where 1.986 (notice it's not negative now, the curves are symmetric so we can use the positive value) falls: 1.706 < t < 2.056. Now look at the two values 1.706 and 2.056. From these two values read UP and go all the way to the top of the table and read the probabilities, 0.05 and 0.025. The p-value is in between these two (but we ALWAYS write the smaller number on the left so we could write 0.025 < p < 0.05). This isn't an exact value, but in terms of interpreting the p-value, this will suffice.  
·   You will not need to know how to calculate the test statistics for two-sample tests. We will use Data Desk to take care of these more complicated formulas.