% Rapport package team
% Normality Tests
% 2011-04-26 20:25 CET
## Description
Overview of several normality tests and diagnostic plots that can screen departures from normality.
### Introduction
In statistics, _normality_ refers to an assumption that the distribution of a random variable follows _normal_ (_Gaussian_) distribution. Because of its bell-like shape, it's also known as the _"bell curve"_. The formula for _normal distribution_ is:
$$f(x) = \frac{1}{\sqrt{2\pi{}\sigma{}^2}} e^{-\frac{(x-\mu{})^2}{2\sigma{}^2}}$$
_Normal distribution_ belongs to a _location-scale family_ of distributions, as it's defined two parameters:
- $\mu$ - _mean_ or _expectation_ (location parameter)
- $\sigma^2$ - _variance_ (scale parameter)
[![](plots/NormalityTest-1.png)](plots/NormalityTest-1-hires.png)
### Normality Tests
#### Overview
Various hypothesis tests can be applied in order to test if the distribution of given random variable violates normality assumption. These procedures test the H~0~ that provided variable's distribution is _normal_. At this point only few such tests will be covered: the ones that are available in `stats` package (which comes bundled with default R installation) and `nortest` package that is [available](http://cran.r-project.org/web/packages/nortest/index.html) on CRAN.
- **Shapiro-Wilk test** is a powerful normality test appropriate for small samples. In R, it's implemented in `shapiro.test` function available in `stats` package.
- **Lilliefors test** is a modification of _Kolmogorov-Smirnov test_ appropriate for testing normality when parameters or normal distribution ($\mu$, $\sigma^2$) are not known. `lillie.test` function is located in `nortest` package.
- **Anderson-Darling test** is one of the most powerful normality tests as it will detect the most of departures from normality. You can find `ad.test` function in `nortest` package.
#### Results
Here you can see the results of applied normality tests (_p-values_ less than 0.05 indicate significant discrepancies):
We will use _Shapiro-Wilk_, _Lilliefors_ and _Anderson-Darling_ tests to screen departures from normality in the response variable.
-------------------------------------------------
Method Statistic p-value
--------------------------- ----------- ---------
Lilliefors 0.168 3e-52
(Kolmogorov-Smirnov)
normality test
Anderson-Darling normality 18.75 7.261e-44
test
Shapiro-Wilk normality test 0.9001 1.618e-20
-------------------------------------------------
So, the conclusions we can draw with the help of test statistics:
- based on _Lilliefors test_, distribution of _Internet usage in leisure time (hours per day)_ is not normal
- _Anderson-Darling test_ confirms violation of normality assumption
- according to _Shapiro-Wilk test_, the distribution of _Internet usage in leisure time (hours per day)_ is not normal
As you can see, the applied tests confirm departures from normality.
### Diagnostic Plots
There are various plots that can help you decide about the normality of the distribution. Only a few most commonly used plots will be shown: _histogram_, _Q-Q plot_ and _kernel density plot_.
#### Histogram
_Histogram_ was first introduced by _Karl Pearson_ and it's probably the most popular plot for depicting the probability distribution of a random variable. However, the decision depends on number of bins, so it can sometimes be misleading. If the variable distribution is normal, bins should resemble the "bell-like" shape.
[![](plots/NormalityTest-2.png)](plots/NormalityTest-2-hires.png)
#### Q-Q Plot
"Q" in _Q-Q plot_ stands for _quantile_, as this plot compares empirical and theoretical distribution (in this case, _normal_ distribution) by plotting their quantiles against each other. For normal distribution, plotted dots should approximate a "straight", `x = y` line.
[![](plots/NormalityTest-3.png)](plots/NormalityTest-3-hires.png)
#### Kernel Density Plot
_Kernel density plot_ is a plot of smoothed _empirical distribution function_. As such, it provides good insight about the shape of the distribution. For normal distributions, it should resemble the well known "bell shape".
[![](plots/NormalityTest-4.png)](plots/NormalityTest-4-hires.png)
## Description
Overview of several normality tests and diagnostic plots that can screen departures from normality.
### Introduction
In statistics, _normality_ refers to an assumption that the distribution of a random variable follows _normal_ (_Gaussian_) distribution. Because of its bell-like shape, it's also known as the _"bell curve"_. The formula for _normal distribution_ is:
$$f(x) = \frac{1}{\sqrt{2\pi{}\sigma{}^2}} e^{-\frac{(x-\mu{})^2}{2\sigma{}^2}}$$
_Normal distribution_ belongs to a _location-scale family_ of distributions, as it's defined two parameters:
- $\mu$ - _mean_ or _expectation_ (location parameter)
- $\sigma^2$ - _variance_ (scale parameter)
### Normality Tests
#### Overview
Various hypothesis tests can be applied in order to test if the distribution of given random variable violates normality assumption. These procedures test the H~0~ that provided variable's distribution is _normal_. At this point only few such tests will be covered: the ones that are available in `stats` package (which comes bundled with default R installation) and `nortest` package that is [available](http://cran.r-project.org/web/packages/nortest/index.html) on CRAN.
- **Shapiro-Wilk test** is a powerful normality test appropriate for small samples. In R, it's implemented in `shapiro.test` function available in `stats` package.
- **Lilliefors test** is a modification of _Kolmogorov-Smirnov test_ appropriate for testing normality when parameters or normal distribution ($\mu$, $\sigma^2$) are not known. `lillie.test` function is located in `nortest` package.
- **Anderson-Darling test** is one of the most powerful normality tests as it will detect the most of departures from normality. You can find `ad.test` function in `nortest` package.
#### Results
Here you can see the results of applied normality tests (_p-values_ less than 0.05 indicate significant discrepancies):
We will use _Shapiro-Wilk_, _Lilliefors_ and _Anderson-Darling_ tests to screen departures from normality in the response variable.
-------------------------------------------------
Method Statistic p-value
--------------------------- ----------- ---------
Lilliefors 0.168 3e-52
(Kolmogorov-Smirnov)
normality test
Anderson-Darling normality 18.75 7.261e-44
test
Shapiro-Wilk normality test 0.9001 1.618e-20
-------------------------------------------------
So, the conclusions we can draw with the help of test statistics:
- based on _Lilliefors test_, distribution of _Internet usage in leisure time (hours per day)_ is not normal
- _Anderson-Darling test_ confirms violation of normality assumption
- according to _Shapiro-Wilk test_, the distribution of _Internet usage in leisure time (hours per day)_ is not normal
As you can see, the applied tests confirm departures from normality.
### Diagnostic Plots
There are various plots that can help you decide about the normality of the distribution. Only a few most commonly used plots will be shown: _histogram_, _Q-Q plot_ and _kernel density plot_.
#### Histogram
_Histogram_ was first introduced by _Karl Pearson_ and it's probably the most popular plot for depicting the probability distribution of a random variable. However, the decision depends on number of bins, so it can sometimes be misleading. If the variable distribution is normal, bins should resemble the "bell-like" shape.
[![](plots/NormalityTest-2.png)](plots/NormalityTest-2-hires.png)
#### Q-Q Plot
"Q" in _Q-Q plot_ stands for _quantile_, as this plot compares empirical and theoretical distribution (in this case, _normal_ distribution) by plotting their quantiles against each other. For normal distribution, plotted dots should approximate a "straight", `x = y` line.
[![](plots/NormalityTest-5.png)](plots/NormalityTest-5-hires.png)
#### Kernel Density Plot
_Kernel density plot_ is a plot of smoothed _empirical distribution function_. As such, it provides good insight about the shape of the distribution. For normal distributions, it should resemble the well known "bell shape".
[![](plots/NormalityTest-4.png)](plots/NormalityTest-4-hires.png)
## Description
Overview of several normality tests and diagnostic plots that can screen departures from normality.
### Introduction
In statistics, _normality_ refers to an assumption that the distribution of a random variable follows _normal_ (_Gaussian_) distribution. Because of its bell-like shape, it's also known as the _"bell curve"_. The formula for _normal distribution_ is:
$$f(x) = \frac{1}{\sqrt{2\pi{}\sigma{}^2}} e^{-\frac{(x-\mu{})^2}{2\sigma{}^2}}$$
_Normal distribution_ belongs to a _location-scale family_ of distributions, as it's defined two parameters:
- $\mu$ - _mean_ or _expectation_ (location parameter)
- $\sigma^2$ - _variance_ (scale parameter)
[![](plots/NormalityTest-1.png)](plots/NormalityTest-1-hires.png)
### Normality Tests
#### Overview
Various hypothesis tests can be applied in order to test if the distribution of given random variable violates normality assumption. These procedures test the H~0~ that provided variable's distribution is _normal_. At this point only few such tests will be covered: the ones that are available in `stats` package (which comes bundled with default R installation) and `nortest` package that is [available](http://cran.r-project.org/web/packages/nortest/index.html) on CRAN.
- **Shapiro-Wilk test** is a powerful normality test appropriate for small samples. In R, it's implemented in `shapiro.test` function available in `stats` package.
- **Lilliefors test** is a modification of _Kolmogorov-Smirnov test_ appropriate for testing normality when parameters or normal distribution ($\mu$, $\sigma^2$) are not known. `lillie.test` function is located in `nortest` package.
- **Anderson-Darling test** is one of the most powerful normality tests as it will detect the most of departures from normality. You can find `ad.test` function in `nortest` package.
#### Results
Here you can see the results of applied normality tests (_p-values_ less than 0.05 indicate significant discrepancies):
We will use _Shapiro-Wilk_, _Lilliefors_ and _Anderson-Darling_ tests to screen departures from normality in the response variable.
-------------------------------------------------
Method Statistic p-value
--------------------------- ----------- ---------
Lilliefors 0.168 3e-52
(Kolmogorov-Smirnov)
normality test
Anderson-Darling normality 18.75 7.261e-44
test
Shapiro-Wilk normality test 0.9001 1.618e-20
-------------------------------------------------
So, the conclusions we can draw with the help of test statistics:
- based on _Lilliefors test_, distribution of _Internet usage in leisure time (hours per day)_ is not normal
- _Anderson-Darling test_ confirms violation of normality assumption
- according to _Shapiro-Wilk test_, the distribution of _Internet usage in leisure time (hours per day)_ is not normal
As you can see, the applied tests confirm departures from normality.
### Diagnostic Plots
There are various plots that can help you decide about the normality of the distribution. Only a few most commonly used plots will be shown: _histogram_, _Q-Q plot_ and _kernel density plot_.
#### Histogram
_Histogram_ was first introduced by _Karl Pearson_ and it's probably the most popular plot for depicting the probability distribution of a random variable. However, the decision depends on number of bins, so it can sometimes be misleading. If the variable distribution is normal, bins should resemble the "bell-like" shape.
[![](plots/NormalityTest-2.png)](plots/NormalityTest-2-hires.png)
#### Q-Q Plot
"Q" in _Q-Q plot_ stands for _quantile_, as this plot compares empirical and theoretical distribution (in this case, _normal_ distribution) by plotting their quantiles against each other. For normal distribution, plotted dots should approximate a "straight", `x = y` line.
[![](plots/NormalityTest-6.png)](plots/NormalityTest-6-hires.png)
#### Kernel Density Plot
_Kernel density plot_ is a plot of smoothed _empirical distribution function_. As such, it provides good insight about the shape of the distribution. For normal distributions, it should resemble the well known "bell shape".
[![](plots/NormalityTest-4.png)](plots/NormalityTest-4-hires.png)
-------
This report was generated with [R](http://www.r-project.org/) (3.0.1) and [rapport](https://rapporter.github.io/rapport/) (0.51) in _2.401_ sec on x86_64-unknown-linux-gnu platform.
![](images/logo.png)