The STATISTICS library

The STATISTICS library provides some basic statistical functions along with optimized implementations.

Note

Currently, the statistical functions are only available for limited vectors of <double-float> values. This is expected to change in the future.

The STATISTICS module

Types

<double-float-vector> Type

A <vector> that only contains <double-float> values.

Equivalent:

limited(<vector>, of: <double-float>)

Discussion:

A <vector> that only contains <double-float> values.

This type is used for implementations of statistical functions which are specialized for <double-float> values.

See also:

<double-float?-vector> Type

A <vector> that contains values that are either <double-float> or #f.

Equivalent:

limited(<vector>, of: false-or(<double-float>))

Discussion:

A <vector> that contains values that are either <double-float> or #f.

This type is used for implementations of statistical functions which may need to handle missing data. By using a separate type from <double-float-vector>, the implementation can limit any overhead from handling missing values to only being applied where it is needed.

Note

Implementations of the statistical functions which handle missing data have not yet been provided.

See also:

<numeric-sequence> Type
Equivalent:

type-union(<double-float-vector>, <double-float?-vector>)

See also:

Coercion Functions

double-float-vector Function

Utility function for converting a sequence that contains only <double-float> values to a <double-float-vector> for use with the optimized implementations of the basic statistical functions.

Signature:

double-float-vector (seq) => (vec)

Parameters:
Values:
Example:
let dv = double-float-vector(#[1.0d0, 2.0d0, 3.0d0]);

Extrema

maximum Open Generic function

Returns the maximum value from a numeric sequence.

Signature:

maximum (sample) => (maximum)

Parameters:
Values:
Example:

Assuming that dv contains the values #[1.0d0, -1.0d0, 2.0d0]:

? maximum(dv)
=> 2.0d0
See also:

maximum(<double-float-vector>) Sealed Method

A specialized implementation of maximum for <double-float>.

Parameters:
Values:
maximum/trimmed Open Generic function

Returns the maximum value from a numeric sequence that is below (or optionally equal to) an upper limit.

Signature:

maximum/trimmed (sample upper-limit #key inclusive?) => (maximum)

Parameters:
Values:
Discussion:

Returns the maximum value from a numeric sequence that is below (or optionally equal to) an upper-limit.

If inclusive? is true (the default), then values equal to the upper-limit are included when calculating the maximum value.

Example:

Assuming that dv contains the values #[1.0d0, 2.0d0, 3.0d0, 4.0d0]:

? maximum/trimmed(dv, 3.0d0, inclusive?: #t)
=> 3.0d0

? maximum/trimmed(dv, 3.0d0, inclusive?: #f)
=> 2.0d0
See also:

maximum/trimmed(<double-float-vector>, <double-float>) Sealed Method

A specialized implementation of maximum/trimmed for <double-float>.

Parameters:
Values:
minimum Open Generic function

Returns the minimum value from a numeric sequence.

Signature:

minimum (sample) => (minimum)

Parameters:
Values:
Example:

Assuming that dv contains the values #[1.0d0, -1.0d0, 2.0d0]:

? minimum(dv)
=> -1.0d0
See also:

minimum(<double-float-vector>) Sealed Method

A specialized implementation of minimum for <double-float>.

Parameters:
Values:
minimum/trimmed Open Generic function

Returns the minimum value from a numeric sequence that is over (or optionally equal to) a lower-limit.

Signature:

minimum/trimmed (sample lower-limit #key inclusive?) => (minimum)

Parameters:
Values:
Discussion:

Returns the minimum value from a numeric sequence that is over (or optionally equal to) a lower-limit.

If inclusive? is true (the default), then values equal to the lower-limit are included when calculating the minimum value.

Example:

Assuming that dv contains the values #[1.0d0, 2.0d0, 3.0d0, 4.0d0]:

? minimum/trimmed(dv, 2.0d0, inclusive?: #t)
=> 2.0d0

? minimum/trimmed(dv, 2.0d0, inclusive?: #f)
=> 3.0d0
See also:

minimum/trimmed(<double-float-vector>, <double-float>) Sealed Method

A specialized implementation of minimum/trimmed for <double-float>.

Parameters:
Values:
minimum+maximum Open Generic function

Returns both the minimum and maximum values within a numeric sequence.

Signature:

minimum+maximum (sample) => (minimum maximum)

Parameters:
Values:
Example:

Assuming that dv contains the values #[1.0d0, -1.0d0, 2.0d0]:

? minimum+maximum(dv)
=> values(-1.0d0, 2.0d0)
See also:

minimum+maximum(<double-float-vector>) Sealed Method

A specialized implementation of minimum+maximum for <double-float>.

Parameters:
Values:

Means

mean/arithmetic Open Generic function

Returns the arithmetic mean of a numeric sequence.

Signature:

mean/arithmetic (sample) => (mean)

Parameters:
Values:
Discussion:

Returns the arithmetic mean of a numeric sequence.

Commonly known as just ‘mean’ or ‘average’, the arithmetic mean is the sum of the values of the sequence, divided by the number of values in the sequence. It is distinct from other ways of calculating a mean such as those provided by mean/geometric and mean/harmonic.

A simple (and slightly faster) naive implementation of the arithmetic mean is subject to numerical inaccuracy. This implementation follows the method presented by Knuth in The Art of Computer Programming, 3rd edition on page 232.

Equivalent:

The arithmetic mean is given by:

\[\frac{1}{n} \sum_{i=1}^{n} x_{i}\]

Our implementation is computed as follows:

\[\begin{split}&m_{1} = x_{1} \\ &m_{k} = m_{k-1} + \frac{x_{k} - m_{k-1}}{k}\end{split}\]
Example:

Assuming that dv contains the values #[1.0d0, 2.0d0, 8.0d0, 9.0d0]:

? mean/arithmetic(dv)
=> 5.25d0
See also:

mean/arithmetic(<double-float-vector>) Sealed Method

A specialized implementation of mean/arithmetic for <double-float>.

Parameters:
Values:
mean/fast Open Generic function

Returns the arithmetic mean of a numeric sequence.

Signature:

mean/fast (sample) => (mean)

Parameters:
Values:
Discussion:

Returns the arithmetic mean of a numeric sequence.

This differs from mean/arithmetic by using a naive algorithm that is slightly faster, but subject to numerical inaccuracy. You should only use this function if you’re aware of the risks.

Equivalent:

\(\frac{1}{n} \sum_{i=1}^{n} x_{i}\)

Example:

Assuming that dv contains the values #[1.0d0, 2.0d0, 8.0d0, 9.0d0]:

? mean/arithmetic(dv)
=> 5.25d0
See also:

mean/fast(<double-float-vector>) Sealed Method

A specialized implementation of mean/fast for <double-float>.

Parameters:
Values:
mean/geometric Open Generic function

Returns the geometric mean of a numeric sequence.

Signature:

mean/geometric (sample) => (mean)

Parameters:
Values:
Discussion:

Returns the geometric mean of a numeric sequence.

For greater numerical accuracy, our implementation is based on the exponentiation of the arithmetic mean of the natural logarithm of each value in sample.

Equivalent:

The geometric mean is given by:

\[\left(\prod_{i=1}^na_i \right)^{1/n}\]

Our implementation is computed as follows:

\[\exp\left[\frac1n\sum_{i=1}^n\ln a_i\right]\]
Example:

Assuming that dv contains the values #[2.0d0, 4.0d0, 8.0d0]:

? mean/geometric(dv)
=> 4.0d0
See also:

mean/geometric(<double-float-vector>) Sealed Method

A specialized implementation of mean/geometric for <double-float>.

Parameters:
Values:
mean/harmonic Open Generic function

Returns the harmonic mean of a numeric sequence.

Signature:

mean/harmonic (sample) => (mean)

Parameters:
Values:
Discussion:

Returns the harmonic mean of a numeric sequence.

The harmonic mean is the reciprocal of the arithmetic mean of the reciprocals of the values of the sequence.

Equivalent:

The harmonic mean is given by:

\[\frac{n}{\sum_{i=1}^{n} \frac{1}{x_{i}}}\]
See also:

mean/harmonic(<double-float-vector>) Sealed Method

A specialized implementation of mean/harmonic for <double-float>.

Parameters:
Values:

Scaling

scale Open Generic function
Signature:

scale (sample lower-bound upper-bound) => (res)

Parameters:
Values:
scale(<double-float-vector>, <double-float>, <double-float>) Sealed Method

A specialized implementation of scale for <double-float>.

Parameters:
Values:

Variance and Deviation

standard-deviation/population Open Generic function
Signature:

standard-deviation/population (sample) => (standard-deviation)

Parameters:
Values:
  • standard-deviation – An instance of <number>.

See also:

standard-deviation/population(<double-float-vector>) Sealed Method

A specialized implementation of standard-deviation/population for <double-float>.

Parameters:
Values:
standard-deviation/sample Open Generic function
Signature:

standard-deviation/sample (sample) => (standard-deviation)

Parameters:
Values:
  • standard-deviation – An instance of <number>.

Discussion:

The standard-deviation calculation for a sample, rather than a complete population, uses sample.size - 1 rather than the sample size. This is Bessel’s Correction.

See also:

standard-deviation/sample(<double-float-vector>) Sealed Method

A specialized implementation of standard-deviation/sample for <double-float>.

Parameters:
Values:
variance/population Open Generic function
Signature:

variance/population (sample) => (variance)

Parameters:
Values:
See also:

variance/population(<double-float-vector>) Sealed Method

A specialized implementation of variance/population for <double-float>.

Parameters:
Values:
variance/sample Open Generic function
Signature:

variance/sample (sample) => (variance)

Parameters:
Values:
See also:

variance/sample(<double-float-vector>) Sealed Method

A specialized implementation of variance/sample for <double-float>.

Parameters:
Values:
standard-scores Open Generic function
Signature:

standard-scores (population) => (scores)

Parameters:
Values:
Equivalent:

The standard score of a value in a sequence is given by:

\[z = {x- \mu \over \sigma}\]

Where:

  • μ is the mean of the population

  • σ is the standard deviation of the population

See also:

standard-scores(<double-float-vector>) Sealed Method

A specialized implementation of standard-scores for <double-float>.

Parameters:
Values: