(B) the Middle 50% of the Training Times of Which Person Had the Least Spread? Explain How You Know.

Statistics has become the universal linguistic communication of the sciences, and data analysis tin pb to powerful results. As scientists, researchers, and managers working in the natural resource sector, we all rely on statistical analysis to help u.s.a. reply the questions that arise in the populations we manage. For case:

Has there been a significant change in the hateful sawtimber volume in the red pine stands?
Has there been an increase in the number of invasive species found in the Great Lakes?
What proportion of white tail deer in New Hampshire have weights below the limit considered healthy?
Did fertilizer A, B, or C have an event on the corn yield?

These are typical questions that crave statistical analysis for the answers. In order to answer these questions, a good random sample must exist collected from the population of interests. We then use descriptive statistics to organize and summarize our sample data. The next step is inferential statistics, which allows us to utilize our sample statistics and extend the results to the population, while measuring the reliability of the outcome. But earlier we begin exploring different types of statistical methods, a brief review of descriptive statistics is needed.

Statistics is the science of collecting, organizing, summarizing, analyzing, and interpreting data.

Good statistics come up from good samples, and are used to describe conclusions or respond questions almost a population. We utilise sample statistics to estimate population parameters (the truth). Then let's brainstorm there…

Figure 1. Using sample statistics to estimate population parameters.

Department 1

Descriptive Statistics

A population is the grouping to exist studied, and population data is a collection of all elements in the population. For example:

All the fish in Long Lake.
All the lakes in the Adirondack Park.
All the grizzly bears in Yellowstone National Park.

A sample is a subset of data drawn from the population of interest. For example:

100 fish randomly sampled from Long Lake.
25 lakes randomly selected from the Adirondack Park.
60 grizzly bears with a dwelling range in Yellowstone National Park.

Populations are characterized by descriptive measures called parameters. Inferences about parameters are based on sample statistics. For case, the population mean (µ) is estimated by the sample mean ( x̄ ). The population variance (σ2) is estimated by the sample variance (s2).

Variables are the characteristics we are interested in. For example:

The length of fish in Long Lake.
The pH of lakes in the Adirondack Park.
The weight of grizzly bears in Yellowstone National Park.

Variables are divided into two major groups: qualitative and quantitative. Qualitative variables accept values that are attributes or categories. Mathematical operations cannot be applied to qualitative variables. Examples of qualitative variables are gender, race, and petal color. Quantitative variables have values that are typically numeric, such every bit measurements. Mathematical operations tin can be applied to these information. Examples of quantitative variables are age, height, and length.

Quantitative variables tin can be broken down further into two more categories: discrete and continuous variables. Discrete variables have a finite or countable number of possible values. Call back of discrete variables as "hens." Hens tin can lay 1 egg, or 2 eggs, or 13 eggs… There are a limited, definable number of values that the variable could take on.

Continuous variables take an space number of possible values. Recall of continuous variables as "cows." Cows tin give 4.6713245 gallons of milk, or 7.0918754 gallons of milk, or 13.272698 gallons of milk … In that location are an nearly space number of values that a continuous variable could take on.

Examples

Is the variable qualitative or quantitative?

Species	Weight	Diameter	Zip Code
(qualitative	quantitative,	quantitative,	qualitative)

Descriptive Measures

Descriptive measures of populations are called parameters and are typically written using Greek messages. The population mean is μ (mu). The population variance is σ ii (sigma squared) and population standard departure is σ (sigma).

Descriptive measures of samples are called statistics and are typically written using Roman messages. The sample mean is (x-bar). The sample variance is south 2 and the sample standard departure is due south. Sample statistics are used to estimate unknown population parameters.

In this department, nosotros volition examine descriptive statistics in terms of measures of centre and measures of dispersion. These descriptive statistics help us to place the middle and spread of the data.

Measures of Middle

Mean

The arithmetic mean of a variable, often called the average, is computed by calculation upwards all the values and dividing past the total number of values.

The population hateful is represented by the Greek letter μ (mu). The sample mean is represented by x̄ (x-bar). The sample mean is usually the all-time, unbiased estimate of the population mean. However, the hateful is influenced by farthermost values (outliers) and may non be the best measure of eye with strongly skewed information. The post-obit equations compute the population mean and sample mean.

where 10 i is an chemical element in the data set, North is the number of elements in the population, and n is the number of elements in the sample data gear up.

Case 2

Notice the hateful for the following sample data prepare: vi.iv, 5.ii, vii.9, 3.4

Median

The median of a variable is the center value of the data set when the data are sorted in order from to the lowest degree to greatest. It splits the data into two equal halves with 50% of the data below the median and 50% above the median. The median is resistant to the influence of outliers, and may be a amend measure out of middle with strongly skewed data.

The calculation of the median depends on the number of observations in the data set.

To calculate the median with an odd number of values (northward is odd), first sort the data from smallest to largest.

Example three

23, 27, 29, 31, 35, 39, 40, 42, 44, 47, 51

The median is 39. Information technology is the middle value that separates the lower fifty% of the information from the upper l% of the data.

To calculate the median with an fifty-fifty number of values (n is even), first sort the information from smallest to largest and take the boilerplate of the two middle values.

Example iv

23, 27, 29, 31, 35, 39, 40, 42, 44, 47

Mode

The style is the most frequently occurring value and is commonly used with qualitative information as the values are categorical. Chiselled data cannot be added, subtracted, multiplied or divided, so the mean and median cannot be computed. The mode is less commonly used with quantitative data as a measure of centre. Sometimes each value occurs only once and the mode will not be meaningful.

Understanding the relationship betwixt the mean and median is important. It gives us insight into the distribution of the variable. For example, if the distribution is skewed right (positively skewed), the mean will increase to business relationship for the few larger observations that pull the distribution to the correct. The median will be less affected by these extreme large values, so in this state of affairs, the mean volition be larger than the median. In a symmetric distribution, the mean, median, and mode will all be similar in value. If the distribution is skewed left (negatively skewed), the mean will decrease to account for the few smaller observations that pull the distribution to the left. Once more, the median volition be less affected by these extreme small-scale observations, and in this situation, the hateful will be less than the median.

Figure 2. Analogy of skewed and symmetric distributions.

Measures of Dispersion

Measures of center look at the average or eye values of a data set. Measures of dispersion look at the spread or variation of the data. Variation refers to the amount that the values vary among themselves. Values in a information set that are relatively close to each other have lower measures of variation. Values that are spread farther autonomously have higher measures of variation.

Examine the ii histograms below. Both groups have the aforementioned mean weight, but the values of Grouping A are more spread out compared to the values in Group B. Both groups have an average weight of 267 lb. but the weights of Group A are more variable.

Figure iii. Histograms of Group A and Grouping B.

This section volition examine v measures of dispersion: range, variance, standard difference, standard error, and coefficient of variation.

Range

The range of a variable is the largest value minus the smallest value. It is the simplest measure and uses only these two values in a quantitative information set.

Case v

Find the range for the given information set.

12, 29, 32, 34, 38, 49, 57

Range = 57 – 12 = 45

Variance

The variance uses the departure between each value and its arithmetic hateful. The differences are squared to bargain with positive and negative differences. The sample variance (sii) is an unbiased estimator of the population variance (σ 2), with n-1 degrees of freedom.

Degrees of freedom: In general, the degrees of liberty for an guess is equal to the number of values minus the number of parameters estimated en route to the estimate in question.

The sample variance is unbiased due to the difference in the denominator. If we used "n" in the denominator instead of "n – one", we would consistently underestimate the true population variance. To right this bias, the denominator is modified to "n – 1".

Population variance Sample variance

σ 2 = s2 =

Example 6

Compute the variance of the sample information: 3, 5, 7. The sample mean is 5.

Standard Deviation

The standard divergence is the square root of the variance (both population and sample). While the sample variance is the positive, unbiased estimator for the population variance, the units for the variance are squared. The standard difference is a common method for numerically describing the distribution of a variable. The population standard deviation is σ (sigma) and sample standard difference is southward.

Population standard difference Sample standard deviation

Example 7

Compute the standard deviation of the sample data: three, 5, 7 with a sample mean of 5.

Standard Error of the Means

Usually, we employ the sample mean x̄ to gauge the population mean μ . For case, if we desire to estimate the heights of eighty-year-sometime cherry-red trees, we can proceed every bit follows:

Randomly select 100 trees
Compute the sample mean of the 100 heights
Use that as our estimate

We want to utilise this sample hateful to estimate the true simply unknown population mean. Simply our sample of 100 trees is only 1 of many possible samples (of the aforementioned size) that could have been randomly selected. Imagine if we take a series of different random samples from the aforementioned population and all the same size:

Sample 1—nosotros compute sample hateful x̄
Sample two—we compute sample mean x̄
Sample 3—we compute sample mean x̄
Etc.

Each time nosotros sample, we may get a dissimilar result as we are using a different subset of information to compute the sample mean. This shows us that the sample mean is a random variable!

The sample mean ( x̄ ) is a random variable with its own probability distribution called the sampling distribution of the sample mean. The distribution of the sample mean volition have a hateful equal to µ and a standard difference equal to .

The standard fault is the standard deviation of all possible sample means.

In reality, we would only take one sample, but nosotros need to understand and quantify the sample to sample variability that occurs in the sampling process.

The standard error is the standard deviation of the sample means and tin can exist expressed in different means.

Notation: due south ii is the sample variance and s is the sample standard divergence

Instance viii

Depict the distribution of the sample hateful.

A population of fish has weights that are normally distributed with µ = eight lb. and s = 2.6 lb. If you take a sample of size north=half-dozen, the sample mean will have a normal distribution with a hateful of eight and a standard departure (standard error) of = 1.061 lb.

If you increase the sample size to 10, the sample mean will be usually distributed with a mean of 8 lb. and a standard difference (standard error) of = 0.822 lb.

Notice how the standard error decreases as the sample size increases.

The Central Limit Theorem (CLT) states that the sampling distribution of the sample means volition approach a normal distribution equally the sample size increases. If we practise non have a normal distribution, or know nothing virtually our distribution of our random variable, the CLT tells us that the distribution of the x̄ 'southward will get normal as n increases. How large does due north have to be? A general dominion of pollex tells us that n ≥ thirty.

The Cardinal Limit Theorem tells u.s.a. that regardless of the shape of our population, the sampling distribution of the sample mean volition exist normal equally the sample size increases.

Coefficient of Variation

To compare standard deviations between dissimilar populations or samples is difficult because the standard deviation depends on units of measure. The coefficient of variation expresses the standard deviation as a pct of the sample or population mean. It is a unitless measure.

Population information Sample data

CV = CV =

Example 9

Fisheries biologists were studying the length and weight of Pacific salmon. They took a random sample and computed the mean and standard deviation for length and weight (given below). While the standard deviations are like, the differences in units between lengths and weights get in hard to compare the variability. Computing the coefficient of variation for each variable allows the biologists to determine which variable has the greater standard deviation.

	Sample mean	Sample standard divergence
Length	63 cm	nineteen.97 cm
Weight	37.half dozen kg	nineteen.39 kg

There is greater variability in Pacific salmon weight compared to length.

Variability

Variability is described in many different ways. Standard deviation measures point to indicate variability within a sample, i.eastward., variation among individual sampling units. Coefficient of variation besides measures point to point variability but on a relative basis (relative to the hateful), and is not influenced by measurement units. Standard error measures the sample to sample variability, i.due east. variation among repeated samples in the sampling process. Typically, we only take one sample and standard error allows us to quantify the uncertainty in our sampling procedure.

Basic Statistics Example using Excel and Minitab Software

Consider the following tally from 11 sample plots on Heiburg Wood, where Xi is the number of downed logs per acre. Compute bones statistics for the sample plots.

Table i. Sample information on number of downed logs per acre from Heiburg Woods.

(ane) Sample hateful:

(ii) Median = 35

(three) Variance:

(iv) Standard deviation:

(5) Range: 55 – 5 = 50

(6) Coefficient of variation:

(7) Standard error of the mean:

Software Solutions

Minitab

Open Minitab and enter data in the spreadsheet. Select STAT>Descriptive stats and check all statistics required.

008_1.tif 008_2.tif

Descriptive Statistics: Data

Variable	N	North*	Mean	SE Mean	StDev	Variance	CoefVar	Minimum	Q1
Data	eleven	0	32.27	4.83	16.03	256.82	49.66	five.00	20.00

Variable	Median	Q3	Maximum	IQR
Data	35.00	45.00	55.00	25.00

Excel

Open up upwardly Excel and enter the data in the first column of the spreadsheet. Select Data>Data Assay>Descriptive Statistics. For the Input Range, select data in cavalcade A. Cheque "Labels in Kickoff Row" and "Summary Statistics". Also check "Output Range" and select location for output.

009_2.tif

009_1.tif

Data
Mean	32.27273
Standard Error	iv.831884
Median	35
Style	25
Standard Divergence	16.02555
Sample Variance	256.8182
Kurtosis	-0.73643
Skewness	-0.05982
Range	50
Minimum	5
Maximum	55
Sum	355
Count	11

Graphical Representation

Data arrangement and summarization tin can be done graphically, besides as numerically. Tables and graphs let for a quick overview of the information collected and support the presentation of the data used in the projection. While there are a multitude of bachelor graphics, this chapter will focus on a specific few commonly used tools.

Pie Charts

Pie charts are a skillful visual tool allowing the reader to quickly see the relationship between categories. Information technology is of import to conspicuously label each category, and adding the frequency or relative frequency is oftentimes helpful. However, too many categories tin be disruptive. Exist careful of putting too much information in a pie chart. The commencement pie chart gives a clear idea of the representation of fish types relative to the whole sample. The second pie chart is more difficult to interpret, with likewise many categories. It is important to select the best graphic when presenting the information to the reader.

Figure 4. Comparison of pie charts.

Bar Charts and Histograms

Bar charts graphically describe the distribution of a qualitative variable (fish type) while histograms describe the distribution of a quantitative variable discrete or continuous variables (bear weight).

Figure 5. Comparison of a bar chart for qualitative data and a histogram for quantitative data.

In both cases, the confined' equal width and the y-axis are conspicuously defined. With qualitative data, each category is represented by a specific bar. With continuous data, lower and upper class limits must be defined with equal class widths. There should be no gaps between classes and each observation should fall into one, and merely one, class.

Boxplots

Boxplots use the 5-number summary (minimum and maximum values with the three quartiles) to illustrate the center, spread, and distribution of your data. When paired with histograms, they requite an excellent description, both numerically and graphically, of the data.

With symmetric information, the distribution is bong-shaped and somewhat symmetric. In the boxplot, we run into that Q1 and Q3 are approximately equidistant from the median, as are the minimum and maximum values. Besides, both whiskers (lines extending from the boxes) are approximately equal in length.

012_2.tif

012_1.tif

Figure 6. A histogram and boxplot of a normal distribution.

With skewed left distributions, we see that the histogram looks "pulled" to the left. In the boxplot, Q1 is farther abroad from the median equally are the minimum values, and the left whisker is longer than the right whisker.

013_2.tif

Figure vii. A histogram and boxplot of a skewed left distribution.

013_1.tif

With skewed right distributions, we see that the histogram looks "pulled" to the correct. In the boxplot, Q3 is farther away from the median, as is the maximum value, and the right whisker is longer than the left whisker.

014_2.tif

014_1.tif

Figure 8. A histogram and boxplot of a skewed right distribution.

Section 2 Probability Distribution

Once we have organized and summarized your sample data, the next step is to identify the underlying distribution of our random variable. Calculating probabilities for continuous random variables are complicated by the fact that at that place are an infinite number of possible values that our random variable can accept on, so the probability of observing a particular value for a random variable is zero. Therefore, to observe the probabilities associated with a continuous random variable, we employ a probability density part (PDF).

A PDF is an equation used to find probabilities for continuous random variables. The PDF must satisfy the post-obit 2 rules:

The expanse nether the curve must equal ane (over all possible values of the random variable).
The probabilities must exist equal to or greater than zero for all possible values of the random variable.

The surface area under the bend of the probability density function over some interval represents the probability of observing those values of the random variable in that interval.

The Normal Distribution

Many continuous random variables have a bell-shaped or somewhat symmetric distribution. This is a normal distribution. In other words, the probability distribution of its relative frequency histogram follows a normal curve. The curve is bell-shaped, symmetric about the mean, and defined past µ and σ (the mean and standard departure).

Effigy 9. A normal distribution.

At that place are normal curves for every combination of µ and σ. The mean (µ) shifts the curve to the left or right. The standard deviation (σ) alters the spread of the curve. The get-go pair of curves have different means just the aforementioned standard difference. The 2d pair of curves share the same mean (µ) just have dissimilar standard deviations. The pink curve has a smaller standard deviation. Information technology is narrower and taller, and the probability is spread over a smaller range of values. The blue curve has a larger standard divergence. The bend is flatter and the tails are thicker. The probability is spread over a larger range of values.

07_fig05a

07_fig05b

Figure 10. A comparison of normal curves.

Backdrop of the normal curve:

The mean is the middle of this distribution and the highest indicate.
The curve is symmetric about the mean. (The expanse to the left of the hateful equals the expanse to the right of the mean.)
The total area nether the curve is equal to one.
As x increases and decreases, the bend goes to zip but never touches.
The PDF of a normal bend is .
A normal curve can be used to gauge probabilities.
A normal curve tin can exist used to estimate proportions of a population that have sure x-values.

The Standard Normal Distribution

There are millions of possible combinations of means and standard deviations for continuous random variables. Finding probabilities associated with these variables would require us to integrate the PDF over the range of values nosotros are interested in. To avoid this, nosotros can rely on the standard normal distribution. The standard normal distribution is a special normal distribution with a µ = 0 and σ = 1. Nosotros tin can use the Z-score to standardize any normal random variable, converting the x-values to Z-scores, thus assuasive us to use probabilities from the standard normal table. And then how do we discover surface area under the curve associated with a Z-score?

Standard Normal Tabular array

The standard normal table gives probabilities associated with specific Z-scores.
The table we use is cumulative from the left.
The negative side is for all Z-scores less than zero (all values less than the mean).
The positive side is for all Z-scores greater than zero (all values greater than the hateful).
Not all standard normal tables work the same fashion.

Case 10

What is the area associated with the Z-score 1.62?

Figure 11. The standard normal table and associated area for z = 1.62.

Reading the Standard Normal Table

Read downward the Z-cavalcade to get the first part of the Z-score (1.6).
Read beyond the top row to become the 2nd decimal identify in the Z-score (0.02).
The intersection of this row and column gives the area under the curve to the left of the Z-score.

Finding Z-scores for a Given Surface area

What if nosotros have an area and nosotros want to observe the Z-score associated with that area?
Instead of Z-score → area, we desire area → Z-score.
We tin can use the standard normal table to discover the expanse in the torso of values and read backwards to discover the associated Z-score.
Using the table, search the probabilities to find an area that is closest to the probability yous are interested in.

Example 11

To find a Z-score for which the area to the right is 5%:

Since the table is cumulative from the left, you lot must utilize the complement of 5%.

1.000 – 0.05 = 0.9500

Figure 12. The upper five% of the area under a normal bend.

Find the Z-score for the area of 0.9500.
Look at the probabilities and find a value as close to 0.9500 equally possible.

Effigy 13. The standard normal table.

The Z-score for the 95th percentile is 1.64.

Area in betwixt Two Z-scores

Example 12

To find Z-scores that limit the middle 95%:

The middle 95% has ii.v% on the correct and two.5% on the left.
Use the symmetry of the curve.

Figure 14. The middle 95% of the area under a normal curve.

Wait at your standard normal table. Since the tabular array is cumulative from the left, information technology is easier to find the expanse to the left starting time.
Find the expanse of 0.025 on the negative side of the table.
The Z-score for the area to the left is -1.96.
Since the curve is symmetric, the Z-score for the area to the correct is 1.96.

Common Z-scores

There are many ordinarily used Z-scores:

Z.05 = i.645 and the expanse between -1.645 and 1.645 is 90%
Z.025 = 1.96 and the area between -1.96 and 1.96 is 95%
Z.005 = two.575 and the area between -2.575 and 2.575 is 99%

Applications of the Normal Distribution

Typically, our commonly distributed information do not have μ = 0 and σ = 1, but we can relate whatsoever normal distribution to the standard normal distributions using the Z-score. We can transform values of x to values of z.

For example, if a ordinarily distributed random variable has a μ = 6 and σ = ii, then a value of x = seven corresponds to a Z-score of 0.five.

This tells yous that vii is one-half a standard deviation above its hateful. We can use this relationship to find probabilities for any normal random variable.

07_fig33

Figure 15. A normal and standard normal bend.

To notice the expanse for values of X, a normal random variable, draw a movie of the area of involvement, convert the x-values to Z-scores using the Z-score and and then utilise the standard normal tabular array to discover areas to the left, to the correct, or in between.

Example 13

Developed deer population weights are unremarkably distributed with µ = 110 lb. and σ = 29.7 lb. As a biologist yous determine that a weight less than 82 lb. is unhealthy and y'all want to know what proportion of your population is unhealthy.

P(10<82)

Figure sixteen. The expanse under a normal curve for P(x<82).

Convert 82 to a Z-score

The 10 value of 82 is 0.94 standard deviations beneath the hateful.

Figure 17. Expanse under a standard normal curve for P(z<-0.94).

Go to the standard normal tabular array (negative side) and notice the area associated with a Z-score of -0.94.

This is an "area to the left" problem then you lot can read directly from the table to get the probability.

P(x<82) = 0.1736

Approximately 17.36% of the population of adult deer is underweight, OR ane deer called at random will have a 17.36% hazard of weighing less than 82 lb.

Example fourteen

Statistics from the Midwest Regional Climate Heart indicate that Jones City, which has a large wild animals refuge, gets an boilerplate of 36.7 in. of rain each year with a standard deviation of 5.1 in. The corporeality of pelting is usually distributed. During what per centum of the years does Jones City get more than than 40 in. of rain?

P(ten > 40)

Figure 18. Surface area under a normal bend for P(x>twoscore).

P(x>40) = (1-0.7422) = 0.2578

For approximately 25.78% of the years, Jones City will become more than twoscore in. of rain.

Assessing Normality

If the distribution is unknown and the sample size is not greater than 30 (Key Limit Theorem), we have to assess the supposition of normality. Our primary method is the normal probability plot. This plot graphs the observed information, ranked in ascending order, against the "expected" Z-score of that rank. If the sample data were taken from a ordinarily distributed random variable, and so the plot would be approximately linear.

Examine the post-obit probability plot. The center line is the relationship we would expect to see if the data were drawn from a perfectly normal distribution. Discover how the observed information (red dots) loosely follow this linear human relationship. Minitab also computes an Anderson-Darling test to assess normality. The null hypothesis for this test is that the sample information have been fatigued from a unremarkably distributed population. A p-value greater than 0.05 supports the assumption of normality.

Figure nineteen. A normal probability plot generated using Minitab 16.

Compare the histogram and the normal probability plot in this next case. The histogram indicates a skewed right distribution.

Figure 20. Histogram and normal probability plot for skewed right data.

The observed data exercise not follow a linear pattern and the p-value for the A-D exam is less than 0.005 indicating a non-normal population distribution.

Normality cannot exist assumed. You must always verify this assumption. Call back, the probabilities we are finding come up from the standard NORMAL table. If our data are Not commonly distributed, then these probabilities Practise Not APPLY.

Exercise you know if the population is normally distributed?
Practice you take a big plenty sample size (n≥30)? Remember the Central Limit Theorem?
Did y'all construct a normal probability plot?

(B) the Middle 50% of the Training Times of Which Person Had the Least Spread? Explain How You Know.

Source: https://courses.lumenlearning.com/suny-natural-resources-biometrics/chapter/chapter-1-descriptive-statistics-and-the-normal-distribution/