Learning C

Modeling with Data has a full tutorial for C, oriented at users of standard stats packages. More nuts-and-bolts tutorials are in abundance. Some people find pointers to be especially difficult; fortunately, there's a claymation cartoon which clarifies everything.

Header aggregation

There is only one header. Put

#include <apop.h>

at the top of your file, and you're done. Everything declared in that file starts with apop_ or Apop_. It also includes assert.h, math.h, signal.h, and string.h.

Linking

You will need to link to the Apophenia library, which involves adding the -lapophenia flag to your compiler. Apophenia depends on SQLite3 and the GNU Scientific Library (which depends on a BLAS), so you will probably need something like:

gcc sample.c -lapophenia -lsqlite3 -lgsl -lgslcblas -o run_me -g -Wall -O3

Your best bet is to encapsulate this mess in a Makefile. Even if you are using an IDE and its command-line management tools, see the Makefile page for notes on useful flags.

Standards compliance

To the best of our abilities, Apophenia complies to the C standard (ISO/IEC 9899:2011). As well as relying on the GSL and SQLite, it uses some POSIX function calls, such as strcasecmp and popen.

Designated initializers

Errors, logging, debugging and stopping

The error element

The apop_data set and the apop_model both include an element named error. It is normally 0, indicating no (known) error.

For example, apop_data_copy detects allocation errors and some circular links (when Data->more == Data) and fails in those cases. You could thus use the function with a form like

apop_data *d = apop_text_to_data("indata");
apop_data *cp = apop_data_copy(d);
if (cp->error) {printf("Couldn't copy the input data; failing.\n"); return 1;}

There is sometimes (but not always) benefit to handling specific error codes, which are listed in the documentation of those functions that set the error element. E.g.,

apop_data *d = apop_text_to_data("indata");
apop_data *cp = apop_data_copy(d);
if (cp->error == 'a') {printf("Couldn't allocate space for the copy; failing.\n"); return 1;}
if (cp->error == 'c') {printf("Circular link in the data set; failing.\n"); return 2;}

The end of Appendix O of Modeling with Data offers some GDB macros which can make dealing with Apophenia from the GDB command line much more pleasant. As discussed below, it also helps to set apop_opts.stop_on_warning='v' or 'w' when running under the debugger.

Verbosity level and logging

The global variable apop_opts.verbose determines how many notifications and warnings get printed by Apophenia's warning mechanism:

-1: turn off logging, print nothing (ill-advised)
0: notify only of failures and clear danger
1: warn of technically correct but odd situations that might indicate, e.g., numeric instability
2: debugging-type information; print queries
3: give me everything, such as the state of the data at each iteration of a loop.

These levels are of course subjective, but should give you some idea of where to place the verbosity level. The default is 1.

The messages are printed to the FILE* handle at apop_opts.log_file. If this is blank (which happens at startup), then this is set to stderr. This is the typical behavior for a console program. Use

apop_opts.log_file = fopen("mylog", "w");

to write to the mylog file instead of stderr.

As well as the error and warning messages, some functions can also print diagnostics, using the Apop_notify macro. For example, apop_query and friends will print the query sent to the database engine iff apop_opts.verbose >=2 (which is useful when building complex queries). The diagnostics attempt to follow the same verbosity scale as the warning messages.

Stopping

By default, warnings and errors never halt processing. It is up to the calling function to decide whether to stop.

When running the program under a debugger, this is an annoyance: we want to stop as soon as a problem turns up.

The global variable apop_opts.stop_on_warning changes when the system halts:

'n': never halt. If you were using Apophenia to support a user-friendly GUI, for example, you would use this mode.
The default: if the variable is '\0' (the default), halt on severe errors, continue on all warnings.
'v': If the verbosity level of the warning is such that the warning would print to screen, then halt; if the warning message would be filtered out by your verbosity level, continue.
'w': Halt on all errors or warnings, including those below your verbosity threshold.

See the documentation for individual functions for details on how each reports errors to the caller and the level at which warnings are posted.

Legible output

The output routines handle four sinks for your output. There is a global variable that you can use for small projects where all data will go to the same place.

apop_opts.output_type = 'f'; //named file
apop_opts.output_type = 'p'; //a pipe or already-opened file
apop_opts.output_type = 'd'; //the database

You can also set the output type, the name of the output file or table, and other options via arguments to individual calls to output functions. See apop_prep_output for the list of options.

C makes minimal distinction between pipes and files, so you can set a pipe or file as output and send all output there until further notice:

apop_opts.output_type = 'p';
apop_opts.output_pipe = popen("gnuplot", "w");
apop_plot_lattice(...); //see https://github.com/b-k/Apophenia/wiki/gnuplot_snippets
fclose(apop_opts.output_pipe);
apop_opts.output_pipe = fopen("newfile", "w");
apop_data_print(set1);
fprintf(apop_opts.output_pipe, "\nNow set 2:\n");
apop_data_print(set2);

Continuing the example, you can always override the global data with a specific request:

apop_vector_print(v, "vectorfile"); //put vectors in a separate file
apop_matrix_print(m, "matrix_table", .output_type = 'd'); //write to the db
apop_matrix_print(m, .output_pipe = stdout);  //now show the same matrix on screen

I will first look to the input file name, then the input pipe, then the global output_pipe, in that order, to determine to where I should write. Some combinations (like output type = 'd' and only a pipe) don't make sense, and I'll try to warn you about those.

What if you have too much output and would like to use a pager, like less or more? In C and POSIX terminology, you're asking to pipe your output to a paging program. Here is the form:

FILE *lesspipe = popen("less", "w");
assert(lesspipe);
apop_data_print(your_data_set, .output_pipe=lesspipe);
pclose(lesspipe);

popen will search your usual program path for less, so you don't have to give a full path.

About SQL, the syntax for querying databases

For a reference, your best bet is the Structured Query Language reference for SQLite. For a tutorial; there is an abundance of tutorials online. Here is a nice blog entry about complementaries between SQL and matrix manipulation packages.

Apophenia currently supports two database engines: SQLite and mySQL/mariaDB. SQLite is the default, because it is simpler and generally more easygoing than mySQL, and supports in-memory databases.

The global apop_opts.db_engine is initially NULL, indicating no preference for a database engine. You can explicitly set it:

apop_opts.db_engine='s' //use SQLite

apop_opts.db_engine='m' //use mySQL/mariaDB

If apop_opts.db_engine is still NUL on your first database operation, then I will check for an environment variable APOP_DB_ENGINE, and set apop_opts.db_engine='m' if it is found and matches (case insensitive) mariadb or mysql.

export APOP_DB_ENGINE=mariadb
apop_text_to_db indata mtab db_for_maria
unset APOP_DB_ENGINE
apop_text_to_db indata stab db_for_sqlite.db

Write apop_data sets to the database using apop_data_print, with .output_type='d'.

Column names are inserted if there are any. If there are, all dots are converted to underscores. Otherwise, the columns will be named c1, c2, c3, &c.
If apop_opts.db_name_column is not blank (the default is "row_name"), then a so-named column is created, and the row names are placed there.
If there are weights, they will be the last column of the table, and the column will be named weights.
If the table does not exist, create it. Use apop_data_print (data, "tabname", .output_type='d', .output_append='w') to overwrite an existing table or with .output_append='a' to append. Appending is the default. Or, call apop_table_exists ("tabname", 'd') to ensure that the table is removed ahead of time.

If your data set has zero data (i.e., is just a list of column names or is entirely blank), apop_data_print returns without creating anything in the database.
Especially if you are using a pre-2007 version of SQLite, there may be a speed gain to wrapping the call to this function in a begin/commit pair:

apop_query("begin;");
apop_data_print(dataset, .output_name="dbtab", .output_type='d');
apop_query("commit;");

Finally, Apophenia provides a few nonstandard SQL functions to facilitate math via database; see Database moments (plus pow()!).

Threading

Apophenia uses OpenMP for threading. You generally do not need to know how OpenMP works to use Apophenia, and many points of work will thread without your doing anything.

All functions strive to be thread-safe. Part of how this is achieved is that static variables are marked as thread-local or atomic, as per the C standard. There still exist compilers that can't implement thread-local or atomic variables, in which case your safest bet is to set OMP's thread count to one as below (or get a new compiler).

Some functions modify their inputs. It is up to you to use those functions in a thread-safe manner. The apop_matrix_realloc handles states and global variables correctly in a threaded environment, but if you have two threads resizing the same gsl_matrix at the same time, you're going to have problems.

There are few compilers that don't support OpenMP. When compiling on such a system all work will be single-threaded.

Set the maximum number of threads to N with the environment variable

export OMP_NUM_THREADS N

or the C function

#include <omp.h>

omp_set_num_threads(N);

Use one of these methods with N=1 if you want a single-threaded program. You can return later to using all available threads via omp_set_num_threads(omp_get_num_procs()).

apop_map and friends distribute their for loop over the input apop_data set across multiple threads. Therefore, be careful to send thread-unsafe functions to it only after calling omp_set_num_threads(1).

There are a few functions, like apop_model_draws, that rely on apop_map, and therefore also thread by default.

The function apop_rng_get_thread retrieves a statically-stored RNG specific to a given thread. Therefore, if you use that function in the place of a gsl_rng, you can parallelize functions that make random draws.

apop_rng_get_thread allocates its store of threads using apop_opts.rng_seed, then incrementing that seed by one. You thus probably have threads with seeds 479901, 479902, 479903, .... [If you have a better way to do it, please feel free to modify the code to implement your improvement and submit a pull request on Github.]

See this tutorial on C threading if you would like to know more, or are unsure about whether your functions are thread-safe or not.