Statistical Analysis Methods

Page 4

Back to the BioSPP Home Page

 

Statistical analysis and testing based on distances

Let Δ = {δ1, ...,  δn } denote the set of nearest neighbour distances defined on page 3.  Δ can be summarized using standard statistical methods.

Graphically, we typically use

A single set of distances is useful for descriptive purposes, and also can be used to test specific modelling hypotheses (such as randomness and/or uniformity of distribution within the nucleus of the observed set of foci).  However, for comparative purposes, it is useful to have multiple sets of distances arising from the investigation of multiple images, or by studying distances between different object types within the same image.

To compare two sets of distances, Δ1 and Δ2 say, we can use similar graphical methods, or more formal statistical hypothesis testing approaches, such as

We will investigate the Kolmogorov-Smirnov test in detail on page 5

The general principle of statistical testing is to disprove one explicitly stated (null) hypothesis in favour of another hypothesis of scientific interest to the researcher.  The two hypotheses can be interpreted as two competing models for the phenomenon being studied. 

The assessment of the comparative merits of the competing models is achieved by proposing a data-dependent quantity or summary (the test statistic) and comparing the numerical value of this summary with it's predicted behaviour under the explicitly stated null hypothesis.  If the observed value is in agreement with predicted behaviour under the corresponding model, then it can be concluded that the this model is satisfactory.  If however the observed behaviour is at odds with the predicted behaviour, then the null model is rejected as being unsuitable as the data generation model.

For example, the set of distances is a form of summary statistic, which can be summarized further by a single number, the mean nearest neighbour distance, δ, defined by

δ = (δ1+ ... + δn )/n.

A specific hypothesis about how the foci might be distributed in the nucleus is that they are uniformly and randomly distributed within the spatial region encompassed by the nuclear lamina.  Under this assumption, we can predict how δ should behave, that is, how large or small δ should be for a data set of a given size.  Furthermore, we can quantify precisely how likely it is that δ takes different possible values; formally, we can compute the null probability distribution of the test statistic δTo complete the hypothesis test, we make a final, probabilistic assessment of how likely the observed value of δ was in light of the proposed null model; if it is extremely unlikely, then we conclude that the null model must be incorrect.

The specifics of how such a method is implemented in practise is well known to statisticians and scientists.  The quantification of what constitutes an "extremely unlikely" event is carried out according to well established rules.

On to page 5:

Pages 1 2 3 4 5 6 7

 

 

Back to the BioSPP Home Page