Statistical Analysis Methods

Page 5

Back to the BioSPP Home Page

 

The Kolmogorov-Smirnov Test

 

The Kolmogorov-Smirnov test is a significance test for comparing samples with specified probability distributions, or with each other.  It has the following derivation and components:

 

The Cumulative Distribution Function (CDF): Any probability distribution function can be specified via its cumulative distribution function, typically denoted by F and defined for random quantity X by

That is, for any number x, F(x) computes the probability that the random quantity takes a value less than or equal to x.  For example, if X records the shortest inter-object distance between objects in an image, then, say, F(2.54) defines the probability that the shortest distance is less than or equal to 2.54 units.  Clearly the function F has to take values between 0 and 1, and must be non-decreasing (that is, F(a) ≤ F(b) if a < b)

 

The Empirical Distribution Function (EDF): The EDF is a data-based estimate of CDF.  The EDF is denoted and typically defined by

where I is the indicator function:  I(A)=1 if the event A occurs, and I(A)=0 otherwise (so that I(xi ≤ x) = 1 if xi ≤ x).  is also a function that takes values between 0 and 1, and also is non-decreasing.

 

So, the hypothetical data distribution is described by F, whereas the sample is described by .

The Kolmogorov-Smirnov Test compares either with a hypothesized  F (in a one-sample test), or two functions arising from two data samples (in a two-sample test)

 

 

 

 

On to page 6:

Pages 1 2 3 4 5 6 7

 

 

Back to the BioSPP Home Page