Patterns in static

Apophenia

Macros | Functions
apop_stats.c File Reference

Macros

#define Check_vw
 

Functions

long double apop_vector_sum (const gsl_vector *in)
 
double apop_vector_skew (const gsl_vector *in)
 
double apop_vector_kurtosis (const gsl_vector *in)
 
double apop_vector_skew_pop (gsl_vector const *v, gsl_vector const *weights)
 
double apop_vector_kurtosis_pop (gsl_vector const *v, gsl_vector const *weights)
 
double apop_vector_var_m (const gsl_vector *in, const double mean)
 
double apop_vector_correlation (const gsl_vector *ina, const gsl_vector *inb)
 
double apop_vector_distance (const gsl_vector *ina, const gsl_vector *inb, const char metric, const double norm)
 
void apop_vector_normalize (gsl_vector *in, gsl_vector **out, const char normalization_type)
 
void apop_matrix_normalize (gsl_matrix *data, const char row_or_col, const char normalization)
 
long double apop_matrix_sum (const gsl_matrix *m)
 
double apop_matrix_mean (const gsl_matrix *data)
 
void apop_matrix_mean_and_var (const gsl_matrix *data, double *mean, double *var)
 
apop_dataapop_data_summarize (apop_data *indata)
 
double apop_vector_mean (gsl_vector const *v, gsl_vector const *weights)
 
double apop_vector_var (gsl_vector const *v, gsl_vector const *weights)
 
double apop_vector_cov (const gsl_vector *v1, const gsl_vector *v2, const gsl_vector *weights)
 
apop_dataapop_data_covariance (const apop_data *in)
 
apop_dataapop_data_correlation (const apop_data *in)
 
double apop_kl_divergence (apop_model *from, apop_model *to, int draw_ct, gsl_rng *rng)
 
long double apop_multivariate_gamma (double a, int p)
 
long double apop_multivariate_lngamma (double a, int p)
 
int apop_matrix_is_positive_semidefinite (gsl_matrix *m, char semi)
 
void vfabs (double *x)
 
double apop_matrix_to_positive_semidefinite (gsl_matrix *m)
 

Detailed Description

Basic moments and some distributions.

Macro Definition Documentation

#define Check_vw
Value:
Apop_stopif(!v, return GSL_NAN, 0, "data vector is NULL. Returning NaN.\n"); \
Apop_stopif(!v->size, return GSL_NAN, 0, "data vector has size 0. Returning NaN.\n"); \
Apop_stopif(weights && weights->size != v->size, return GSL_NAN, 0, "data vector has size %zu; weighting vector has size %zu. Returning NaN.\n", v->size, weights->size);
#define Apop_stopif(test, onfail, level,...)
Definition: apop.h:1004

Function Documentation

apop_data* apop_data_correlation ( const apop_data in)

Returns the matrix of correlation coefficients $(\sigma^2_{xy}/(\sigma_x\sigma_y))$ relating each column with each other.

Parameters
inA data matrix: rows are observations, columns are variables. If you give me a weights vector, I'll use it.
Returns
Returns the variance/covariance matrix relating each column with each other. This function allocates the matrix for you.
Exceptions
out->error='a'Allocation error.
apop_data* apop_data_covariance ( const apop_data in)

Returns the sample variance/covariance matrix relating each column of the matrix to each other column.

Parameters
inAn apop_data set. If the weights vector is set, I'll take it into account.
  • This is the sample covariance—dividing by $n-1$, not $n$.
Returns
Returns a apop_data set the variance/covariance matrix relating each column with each other.
Exceptions
out->error='a'Allocation error.
double apop_kl_divergence ( apop_model from,
apop_model to,
int  draw_ct,
gsl_rng *  rng 
)

Kullback-Leibler divergence.

This measure of the divergence of one distribution from another has the form $ D(p,q) = \sum_i \ln(p_i/q_i) p_i $. Notice that it is not a distance, because there is an asymmetry between $p$ and $q$, so one can expect that $D(p, q) \neq D(q, p)$.

Parameters
fromthe $p$ in the above formula. (No default; must not be NULL)
tothe $q$ in the above formula. (No default; must not be NULL)
draw_ctIf I do the calculation via random draws, how many? (Default = 1e5)
rngA gsl_rng. If NULL or number of threads is greater than 1, I'll take care of the RNG; see apop_rng_get_thread. (Default = NULL)

This function can take empirical histogram-type models (apop_pmf) or continuous models like apop_loess or apop_normal.

If there is a PMF (I'll try from first, under the presumption that you are measuring the divergence of data from an observed data distribution), then I'll step through it for the points in the summation.

  • If you have two empirical distributions, that they must be synced: if $p_i>0$ but $q_i=0$, then the function returns GSL_NEGINF. If apop_opts.verbose >=1 I print a message as well.

If neither distribution is a PMF, then I'll take draw_ct random draws from to and evaluate at those points.

  • Set apop_opts.verbose = 3 for observation-by-observation info.
int apop_matrix_is_positive_semidefinite ( gsl_matrix *  m,
char  semi 
)

Test whether the input matrix is positive semidefinite.

A covariance matrix will always be PSD, so this function can tell you whether your matrix is a valid covariance matrix.

Consider the 1x1 matrix in the upper left of the input, then the 2x2 matrix in the upper left, on up to the full matrix. If the matrix is PSD, then each of these has a positive determinant. This function thus calculates $N$ determinants for an $N$x $N$ matrix.

Parameters
mThe matrix to test. If NULL, I will return zero—not PSD.
semiIf anything but 's', check for positive definite, not semidefinite. (default 's')

See also apop_matrix_to_positive_semidefinite, which will change the input to something PSD.

void apop_matrix_normalize ( gsl_matrix *  data,
const char  row_or_col,
const char  normalization 
)

Normalize each row or column in the given matrix, one by one.

Basically just a convenience fn to iterate through the columns or rows and run apop_vector_normalize for you.

Parameters
dataThe data set to normalize.
row_or_colEither 'r' or 'c'.
normalizationsee apop_vector_normalize.
double apop_matrix_to_positive_semidefinite ( gsl_matrix *  m)

First, this function passes tests, but is under development.

It takes in a matrix and converts it to the `closest' positive semidefinite matrix.

Parameters
mOn input, any matrix; on output, a positive semidefinite matrix.
Returns
the distance between the original and new matrices.

Adapted from the R Matrix package's nearPD, which is Copyright (2007) Jens Oehlschlägel [and is GPL].

long double apop_multivariate_gamma ( double  a,
int  p 
)

The multivariate generalization of the Gamma distribution.

\[ \Gamma_p(a)= \pi^{p(p-1)/4}\prod_{j=1}^p \Gamma\left[ a+(1-j)/2\right]. \]

Because $\Gamma(x)$ is undefined for $x\in\{0, -1, -2, ...\}$, this function returns NAN when $a+(1-j)/2$ takes on one of those values.

See also apop_multivariate_lngamma, which is more numerically stable in most cases.

long double apop_multivariate_lngamma ( double  a,
int  p 
)

The log of the multivariate generalization of the Gamma; see also apop_multivariate_gamma.

double apop_vector_cov ( const gsl_vector *  v1,
const gsl_vector *  v2,
const gsl_vector *  weights 
)

Find the sample covariance of a pair of vectors, with an optional weighting. This only makes sense if the weightings are identical, so the function takes only one weighting vector for both.

Parameters
v1,v2The data vectors
weightsThe weight vector. Default: equal weights for all elements.
Returns
The sample covariance
double apop_vector_mean ( gsl_vector const *  v,
gsl_vector const *  weights 
)

Find the mean, weighted or unweighted.

Parameters
vThe data vector
weightsThe weight vector. Default: assume equal weights.
Returns
The weighted mean
void apop_vector_normalize ( gsl_vector *  in,
gsl_vector **  out,
const char  normalization_type 
)

This function will normalize a vector, either such that it has mean zero and variance one, or ranges between zero and one, or sums to one.

Parameters
inA gsl_vector which you have already allocated and filled. NULL input gives NULL output. (No default)
outIf normalizing in place, NULL. If not, the address of a gsl_vector. Do not allocate. (default = NULL.)
normalization_type'p': normalized vector will sum to one. E.g., start with a set of observations in bins, end with the percentage of observations in each bin. (the default)
'r': normalized vector will range between zero and one. Replace each X with (X-min) / (max - min).
's': normalized vector will have mean zero and variance one. Replace each X with $(X-\mu) / \sigma$, where $\sigma$ is the sample standard deviation.
'm': normalize to mean zero: Replace each X with $(X-\mu)$

Example

1 #include <apop.h>
2 
3 int main(void){
4 gsl_vector *in, *out;
5 
6 in = gsl_vector_calloc(3);
7 gsl_vector_set(in, 1, 1);
8 gsl_vector_set(in, 2, 2);
9 
10 printf("The original vector:\n");
11 apop_vector_show(in);
12 
13 apop_vector_normalize(in, &out, 's');
14 printf("Standardized with mean zero and variance one:\n");
15 apop_vector_show(out);
16 
17 apop_vector_normalize(in, &out, 'r');
18 printf("Normalized range with max one and min zero:\n");
19 apop_vector_show(out);
20 
21 apop_vector_normalize(in, NULL, 'p');
22 printf("Normalized into percentages:\n");
23 apop_vector_show(in);
24 }
double apop_vector_var ( gsl_vector const *  v,
gsl_vector const *  weights 
)

Find the sample variance of a vector, weighted or unweighted.

  • This uses (n-1) in the denominator of the sum; i.e., it corrects for the bias introduced by using $\bar x$ instead of $\mu$.
  • At the moment, there is no var_pop function. Just multiply this by (n-1)/n if you need that.
Parameters
vThe data vector
weightsThe weight vector. If NULL, assume equal weights.
Returns
The weighted sample variance.
  • Apophenia tries to be smart about reading the weights. If weights sum to one, then the system uses w->size as the number of elements, and returns the usual sum over $n-1$. If weights > 1, then the system uses the total weights as $n$. Thus, you can use the weights as standard weightings or to represent elements that appear repeatedly.

Autogenerated by doxygen on Sun Oct 26 2014 (Debian 0.999b+ds3-2).