Notes

3. Old stuff

3.1. Old pharm stuff (pre 2009)

3.1.2. Statistics

3.1.2.2. Descriptive statistics

Descriptive statistics

[MG1:Chp2, p7-p18]

Summaries of sample data (statistics) are defined by Roman letters (sample mean)

Summaries of population data (parameters) are defined by Greek letters (mu, variance)

Central tendency = The extent that observations cluster

Degreee of dispersion = The spread of the observations about a central location

Measures of central tendency

Mode = The most common value
Median = The middle value
(Arithmetic) Mean = The average value

Degree of dispersion

Range = Difference between the maximum and minimum value
Percentile = Rank observations into 100 equal parts
* Mean = 50th percentile
* Interquartile range = 25th to 75th percentile
Sample Variance = Sum of squares divided by degree of freedom
* Sum of squares = sum of the square of each differences (between each observation and the mean)
* Degree of freedom = number of observation minus 1
Population variance = Sum of squares divided by number of observation
Standard deviation = Square root of variance
Coefficient of variation (CV) = SD / mean x 100%

NB:

Degree of freedom is used when calculating the variance of a sample
* Because each observation is free to vary except for the last one which must be a defined value in order for the mean match the fixed sample mean value

Sources of variability

Biological variability
Measurement imprecision
--> Resulting in random error
Mistakes or biases in measurement
--> Systemic error

Standard error (SE)

[MG1:p9]

Standard error (SE)
= aka standard error of the mean
SE = SD / square root of n
SE is NOT meant to be used to describe variability of sample data
SE is a measure of precision (of how well sample data can be used to predict population mean (a population parameter))
* Used to calculate confidence interval
* Often derived from one sample
* Reliability of sample mean in predicting population mean [Chris Flynn]
SE is the standard deviation of the sample means
Increasing sample size can be a way of reducing SE
* But need to increase sample 4 times to reduce SE by half

Confidence interval

Derived from SE
95% confidence interval of the mean = sample mean +/- (1.96 x SE)
99% confidence interval = sample mean +/- (2.58 x SE)
Definition of 95% CI
= The range within which there is 95% probability the true population mean may lie

NB:

In a normal distribution, 95% of the observations lie within 1.96 standard deviation of the mean

Frequency distributions

Kurtosis describes how peaked the distribution is
* Kurtosis of a normal distribution = 0
Median is a better measurement of central tendency in a skewed distribution
* Skew to the right, median will be smaller than the mean
Bimodal distribution = Distribution with two peaks
--> Suggests that the sample is not homogeneous and may represent two different populations

Normal distribution

Sometimes referred to as a Gaussian distribution
Two parameters define the curve, mu (the mean), and sigma (the standard deviation)
Mode = median = mean
Formula at [MG1:p13]

NB:

Mean +/- 1 SD includes 68% of total area
Mean +/- 1.96 SD includes 95% of total area
Mean +/- 2 SD includes 95.4% of total area
Mean +/- 3 SD includes 99.7% of total area

Z distribution

In a STANDARD normal distribution

Mean = 0
Standard deviation = 1
aka the z distribution
A z transformation converts any normal distribution curve (with different mean and SD) to a standard normal distribution curve (mean = 0, SD = 1)
* z = (x - mu)/SD

Central limit theorem

[MG1:p14]

As the number of observations increase (n>100)
--> The shape of a sampling distribution will approximate a normal distribution curve
* Even if the distribution of the variable is not normal

Binomial distribution

[MG1:p14-p15]

Formula at [MG1:p15]

A binomial distribution exists if a population contains items which belong to one of two mutually exclusive categories
* e.g. gender, complication

Conditions include:

Fixed number of observations (trials)
Only two outcomes are possible
Trials are independent
Constant probability for occurrence of each event

Poisson distribution

A binomial distribution approximates Poisson distribution when
* The number of observation is very large, AND
* Probability of an event is small (<0.05)
A single parameter (lamda) which is both mean and the variance

Conditions:

Events occur randomly
Events occur independently
Events occur uniformly (same probability) and singly

Example used in [MG1:p15] is for calculation of probability of more than one admission on late night admission

Incidence and prevalence

Incidence = the number of individuals who develop a condition (i.e. new cases) in a given time period
--> An estimation of probability of developing a disease in a specified time period
Prevalence = the number of individuals with a condition at a point of time (i.e. total cases, pre-existing and new)

Presentation of data

[MG1:p17]

For a normal distribution, mean and standard deviation are the best statistics to describe data
* But mean can be affected by extreme values
A bimodal distribution is best described with mode
Ordinal data should be described with mode or median

Box and whisker plot

Used to depict mean, interquartile range and range
Middle line = median
Box = 25th to 75th percentiles
Whiskers = minimum and maximum, or 5th and 95th percentiles

Next page
3.1.2.3. Principles of probability and inference

Table of contents | Bibliography | Index