Patterns in static

Apophenia

Functions
apop_hist.c File Reference

Functions

apop_modelapop_model_to_pmf (apop_model *model, apop_data *binspec, long int draws, int bin_count, gsl_rng *rng)
 
apop_dataapop_histograms_test_goodness_of_fit (apop_model *observed, apop_model *expected)
 
apop_dataapop_test_kolmogorov (apop_model *m1, apop_model *m2)
 
apop_dataapop_data_to_bins (apop_data *indata, apop_data *binspec, int bin_count, char close_top_bin)
 

Function Documentation

apop_data* apop_data_to_bins ( apop_data indata,
apop_data binspec,
int  bin_count,
char  close_top_bin 
)

Create a histogram from data by putting data into bins of fixed width.

Parameters
indataThe input data that will be binned. This is copied and the copy will be modified.
close_top_binNormally, a bin covers the range from the point equal to its minimum to points strictly less than the minimum plus the width. if 'y', then the top bin includes points less than or equal to the upper bound. This solves the problem of displaying histograms where the top bin is just one point.
binspecThis is an apop_data set with the same number of columns as indata. If you want a fixed size for the bins, then the first row of the bin spec is the bin width for each column. This allows you to specify a width for each dimension, or specify the same size for all with something like:
bin_countIf you don't provide a bin spec, I'll provide this many evenly-sized bins. Default: $\sqrt(N)$.
1 Apop_row(indata, 0, firstrow);
2 apop_data *binspec = apop_data_copy(firstrow);
3 gsl_matrix_set_all(binspec->matrix, 10); //bins of size 10 for all dim.s
4 apop_data_to_bins(indata, binspec);
The presumption is that the first bin starts at zero in all cases. You can add a second row to the spec to give the offset for each dimension. Default: NULL. if no binspec and no binlist, then a grid with offset equal to the min of the column, and bin size such that it takes $\sqrt{N}$ bins to cover the range to the max element.
Returns
A pointer to a binned apop_data set. If you didn't give me a binspec, then I attach one to the output set as a page named <binspec>, so you can snap a second data set to the same grid using
1 apop_data_to_bins(first_set, NULL);
2 apop_data_to_bins(second_set, apop_data_get_page(first_set, "<binspec>"));

The text segment, if any, is not binned. I use apop_data_pmf_compress as the final step in the binning, and that does respect the text segment.

Here is a sample program highlighting the difference between apop_data_to_bins and apop_data_pmf_compress .

#define _GNU_SOURCE
#include <apop.h>
#ifdef Testing
#define printdata(dataset) ;
#else
#define printdata(dataset) \
printf("\n-----------\n\n"); \
apop_data_print(dataset);
#endif
int main(){
apop_data_fill(d, 1, 2, 3, 3, 1, 2);
apop_text_fill(d, "A", "A", "A", "A", "A", "B");
asprintf(&d->names->title, "Original data set");
printdata(d);
//binned, where bin ends are equidistant but not necessarily in the data
apop_data *binned = apop_data_to_bins(d, NULL);
asprintf(&binned->names->title, "Post binning");
printdata(binned);
assert(apop_sum(binned->weights)==6);
assert(fabs(//equal distance between bins
(apop_data_get(binned, 1, -1) - apop_data_get(binned, 0, -1))
- (apop_data_get(binned, 2, -1) - apop_data_get(binned, 1, -1))) < 1e-5);
//compressed, where the data is as in the original, but weights
//are redome to accommodate repeated observations.
asprintf(&d->names->title, "Post compression");
printdata(d);
assert(apop_sum(d->weights)==6);
apop_data *firstrow = Apop_r(d, 0); //1A
assert(fabs(apop_p(firstrow, d_as_pmf) - 2./6 < 1e-5));
}

Autogenerated by doxygen on Sun Oct 26 2014 (Debian 0.999b+ds3-2).