Patterns in static

Apophenia

Functions
apop_sort.c File Reference

Functions

apop_dataapop_data_sort (apop_data *data, apop_data *sort_order, char asc, char inplace, double *col_order)
 

Detailed Description

Copyright (c) 2013 by Ben Klemens. Licensed under the modified GNU GPL v2; see COPYING and COPYING2.

Function Documentation

apop_data* apop_data_sort ( apop_data data,
apop_data sort_order,
char  asc,
char  inplace,
double *  col_order 
)

Sort an apop_data set on an arbitrary sequence of columns.

The sort_order set is a one-row data set that should look like the data set being sorted. The easiest way to generate it is to use Apop_r to pull one row of the table, then copy and fill it. For each column you want used in the sort, assign a ranking giving whether the column should be sorted first, second, .... Columns you don't want used in the sorting should be set to NAN. Ties are broken by the earlier element in the default order (see below).

E.g., to sort by the last column of a five-column matrix first, then the next-to-last column, then the next-to-next-to-last, then by the first text column, then by the second text column:

1 apop_data *sort_order = apop_data_copy(Apop_r(data, 0));
2 sort_order->vector = NULL; //so it will be skipped.
3 Apop_data_fill(sort_order, NAN, NAN, 3, 2, 1);
4 apop_text_add(sort_order, 0, 0, "4");
5 apop_text_add(sort_order, 0, 1, "5");
6 apop_data_sort(data, sort_order);

I use only comparisons, not the actual numeric values, so you can use any sequence of numbers: (1, 2, 3) and (-1.32, 0, 27) work identically.

  • Strings are sorted case-insensitively, using strcasecmp. [exercise for the reader: modify the source to use Glib's locale-correct string sorting.]
  • The setup generates a lexicographic sort using the columns you specify. If you would like a different sort order, such as Euclidian distance to the origin, you can generate a new column expressing your preferred metric, and then sorting on that. See the example below.
Parameters
dataThe data set to be sorted. If NULL, this function is a no-op that returns NULL.
sort_orderA apop_data set describing the order in which columns are used for sorting, as above. If NULL, then sort by the vector, then each matrix column, then text, then weights, then row names.
inplaceIf 'n', make a copy, else sort in place. (default: 'y').
ascIf 'a', ascending; if 'd', descending. This is applied to all columns; column-by-column application is to do. (default: 'a').
col_orderFor internal use only. In your call, it should be NULL; the Designated initializers syntax will takes care of it for you.
Returns
A pointer to the sorted data set. If inplace=='y' (the default), then this is the same as the input set.

A few examples:

#ifdef Datadir
#define DATADIR Datadir
#else
#define DATADIR "."
#endif
#include <apop.h>
#include <unistd.h>
#ifdef Testing
#include "sort_tests.c" //For Apophenia's test suite, some tedious checks that the sorts worked
#endif
//get_distance is for the sort-by-Euclidian distance example below.
double get_distance(gsl_vector *v) {return apop_vector_distance(v);}
int main(){
apop_text_to_db( DATADIR "/" "amash_vote_analysis.csv" , .tabname="amash_vote_analysis");
apop_data *d = apop_query_to_mixed_data("mntmtm", "select 1,id,party,contribs/1000.0,vote,ideology from amash_vote_analysis ");
//use the default order of columns for sorting
apop_data *sorted = apop_data_sort(d, .inplace='n');
#ifndef Testing
apop_data_print(sorted);
#else
check_sorting1(sorted);
#endif
//set up a specific column order
perm->vector = NULL;
apop_data_fill(perm, 5, 3, 4);
apop_text_add(perm, 0, 0, "2");
apop_text_add(perm, 0, 1, "1");
apop_data_sort(d, perm);
#ifndef Testing
#else
check_sorting2(d);
#endif
//sort a list of names
apop_data_add_names(blank, 'r', "C", "E", "A");
assert(*blank->names->row[0] == 'A');
assert(*blank->names->row[1] == 'C');
assert(*blank->names->row[2] == 'E');
//take each row of the matrix as a vector; store the Euclidian distance to the origin in the vector;
//sort in descending order.
apop_data *rowvectors = apop_text_to_data( DATADIR "/" "test_data" );
apop_map(rowvectors, .fn_v=get_distance, .part='r', .inplace='y');
apop_data *arow = apop_data_copy(Apop_r(rowvectors, 0));
arow->matrix=NULL; //sort only by the distance vector
apop_data_sort(rowvectors, arow, .asc='d');
#ifndef Testing
apop_data_show(rowvectors);
#else
double prev = INFINITY;
for (int i=0; i< rowvectors->vector->size; i++){
double this = apop_data_get(rowvectors, i, -1);
assert(this < prev);
prev = this;
}
#endif
}

Autogenerated by doxygen on Sun Oct 26 2014 (Debian 0.999b+ds3-2).