123 lines
4.7 KiB
Plaintext
123 lines
4.7 KiB
Plaintext
[section:kolmogorov_smirnov_dist Kolmogorov-Smirnov Distribution]
|
||
|
||
``#include <boost/math/distributions/kolmogorov_smirnov.hpp>``
|
||
|
||
namespace boost{ namespace math{
|
||
|
||
template <class RealType = double,
|
||
class ``__Policy`` = ``__policy_class`` >
|
||
class kolmogorov_smirnov_distribution;
|
||
|
||
typedef kolmogorov_smirnov_distribution<> kolmogorov_smirnov;
|
||
|
||
template <class RealType, class ``__Policy``>
|
||
class kolmogorov_smirnov_distribution
|
||
{
|
||
public:
|
||
typedef RealType value_type;
|
||
typedef Policy policy_type;
|
||
|
||
// Constructor:
|
||
kolmogorov_smirnov_distribution(RealType n);
|
||
|
||
// Accessor to parameter:
|
||
RealType number_of_observations()const;
|
||
|
||
}} // namespaces
|
||
|
||
The Kolmogorov-Smirnov test in statistics compares two empirical distributions,
|
||
or an empirical distribution against any theoretical distribution.[footnote
|
||
[@https://en.wikipedia.org/wiki/Kolmogorov–Smirnov_test Wikipedia:
|
||
Kolmogorov-Smirnov test]] It makes use of a specific distribution which is
|
||
informally known in the literature as the Kolmogorv-Smirnov distribution,
|
||
implemented here.
|
||
|
||
Formally, if /n/ observations are taken from a theoretical distribution /G(x)/,
|
||
and if /G/[sub /n/](/x/) represents the empirical CDF of those /n/ observations,
|
||
then the test statistic
|
||
|
||
[equation kolmogorov_smirnov_test_statistic] [/ D_n = \sup_x|G(x)-\hat{G}(x)| ]
|
||
|
||
will be distributed according to a Kolmogorov-Smirnov distribution parameterized by /n/.
|
||
|
||
The exact form of a Kolmogorov-Smirnov distribution is the subject of a large,
|
||
decades-old literature.[footnote
|
||
[@https://www.jstatsoft.org/article/view/v039i11 Simard, R. and L'Ecuyer, P.
|
||
(2011) "Computing the Two-Sided Kolmogorov-Smirnov Distribution". Journal of
|
||
Statistical Software, vol. 39, no. 11.]] In the interest of simplicity, Boost
|
||
implements the first-order, limiting form of this distribution (the same form
|
||
originally identified by Kolmogorov[footnote Kolmogorov A (1933). "Sulla
|
||
determinazione empirica di una legge di distribuzione". G. Ist. Ital. Attuari.
|
||
4: 83–91.]), namely
|
||
|
||
[equation kolmogorov_smirnov_distribution]
|
||
[/ \lim_{n \to \infty} F_n(x/\sqrt{n})=1+2\sum_{k=1}^\infty (-1)^k e^{-2k^2x^2} ]
|
||
|
||
Note that while the exact distribution only has support over \[0, 1\], this
|
||
limiting form has positive mass above unity, particularly for small /n/. The
|
||
following graph illustrations how the distribution changes for different values
|
||
of /n/:
|
||
|
||
[graph kolmogorov_smirnov_pdf]
|
||
|
||
[h4 Member Functions]
|
||
|
||
kolmogorov_smirnov_distribution(RealType n);
|
||
|
||
Constructs a Kolmogorov-Smirnov distribution with /n/ observations.
|
||
|
||
Requires n > 0, otherwise calls __domain_error.
|
||
|
||
RealType number_of_observations()const;
|
||
|
||
Returns the parameter /n/ from which this object was constructed.
|
||
|
||
[h4 Non-member Accessors]
|
||
|
||
All the [link math_toolkit.dist_ref.nmp usual non-member accessor functions]
|
||
that are generic to all distributions are supported: __usual_accessors.
|
||
|
||
The domain of the random variable is \[0, +[infin]\].
|
||
|
||
[h4 Accuracy]
|
||
|
||
The CDF of the Kolmogorov-Smirnov distribution is implemented in terms of the
|
||
fourth [link math_toolkit.jacobi_theta.jacobi_theta_overview Jacobi Theta
|
||
function]; please refer to the accuracy ULP plots for that function.
|
||
|
||
The PDF is implemented separately, and the following ULP plot illustrates its
|
||
accuracy:
|
||
|
||
[graph kolmogorov_smirnov_pdf_ulp]
|
||
|
||
Because PDF values are simply scaled out and up by the square root of /n/, the
|
||
above plot is representative for all values of /n/. Note that for present
|
||
purposes, "accuracy" refers to deviations from the limiting approximation,
|
||
rather than deviations from the exact distribution.
|
||
|
||
[h4 Implementation]
|
||
|
||
In the following table, /n/ is the number of observations, /x/ is the random variable,
|
||
[pi] is Archimedes' constant, and [zeta](3) is Apéry's constant.
|
||
|
||
[table
|
||
[[Function][Implementation Notes]]
|
||
[[cdf][Using the relation: cdf = __jacobi_theta4tau(0, 2*x*x/[pi])]]
|
||
[[pdf][Using a manual derivative of the CDF]]
|
||
[[cdf complement][
|
||
When x*x*n == 0: 1
|
||
|
||
When 2*x*x*n <= [pi]: 1 - __jacobi_theta4tau(0, 2*x*x*n/[pi])
|
||
|
||
When 2*x*x*n > [pi]: -__jacobi_theta4m1tau(0, 2*x*x*n/[pi])]]
|
||
[[quantile][Using a Newton-Raphson iteration]]
|
||
[[quantile from the complement][Using a Newton-Raphson iteration]]
|
||
[[mode][Using a run-time PDF maximizer]]
|
||
[[mean][sqrt([pi]/2) * ln(2) / sqrt(n)]]
|
||
[[variance][([pi][super 2]/12 - [pi]/2*ln[super 2](2))/n]]
|
||
[[skewness][(9/16*sqrt([pi]/2)*[zeta](3)/n[super 3/2] - 3 * mean * variance - mean[super 2] * variance) / (variance[super 3/2])]]
|
||
[[kurtosis][(7/720*[pi][super 4]/n[super 2] - 4 * mean * skewness * variance[super 3/2] - 6 * mean[super 2] * variance - mean[super 4]) / (variance[super 2])]]
|
||
]
|
||
|
||
[endsect] [/section:kolmogorov_smirnov_dist Kolmogorov-Smirnov]
|