Learning C
Modeling with Data has a full tutorial for C, oriented at users of standard stats packages. More nuts-and-bolts tutorials are in abundance. Some people find pointers to be especially difficult; fortunately, there's a claymation cartoon which clarifies everything.
Header aggregation
There is only one header. Put
at the top of your file, and you're done. Everything declared in that file starts with apop_
or Apop_
. It also includes assert.h
, math.h
, signal.h
, and string.h
.
Linking
You will need to link to the Apophenia library, which involves adding the -lapophenia
flag to your compiler. Apophenia depends on SQLite3 and the GNU Scientific Library (which depends on a BLAS), so you will probably need something like:
Your best bet is to encapsulate this mess in a Makefile. Even if you are using an IDE and its command-line management tools, see the Makefile page for notes on useful flags.
Standards compliance
To the best of our abilities, Apophenia complies to the C standard (ISO/IEC 9899:2011). As well as relying on the GSL and SQLite, it uses some POSIX function calls, such as strcasecmp
and popen
.
The error
element
The apop_data set and the apop_model both include an element named error
. It is normally 0
, indicating no (known) error.
For example, apop_data_copy detects allocation errors and some circular links (when Data->more == Data
) and fails in those cases. You could thus use the function with a form like
There is sometimes (but not always) benefit to handling specific error codes, which are listed in the documentation of those functions that set the error
element. E.g.,
The end of Appendix O of Modeling with Data offers some GDB macros which can make dealing with Apophenia from the GDB command line much more pleasant. As discussed below, it also helps to set apop_opts.stop_on_warning='v'
or 'w'
when running under the debugger.
The global variable apop_opts.verbose
determines how many notifications and warnings get printed by Apophenia's warning mechanism:
-1: turn off logging, print nothing (ill-advised)
0: notify only of failures and clear danger
1: warn of technically correct but odd situations that might indicate, e.g., numeric instability
2: debugging-type information; print queries
3: give me everything, such as the state of the data at each iteration of a loop.
These levels are of course subjective, but should give you some idea of where to place the verbosity level. The default is 1.
The messages are printed to the FILE*
handle at apop_opts.log_file
. If this is blank (which happens at startup), then this is set to stderr
. This is the typical behavior for a console program. Use
to write to the mylog
file instead of stderr
.
As well as the error and warning messages, some functions can also print diagnostics, using the Apop_notify macro. For example, apop_query and friends will print the query sent to the database engine iff apop_opts.verbose >=2
(which is useful when building complex queries). The diagnostics attempt to follow the same verbosity scale as the warning messages.
By default, warnings and errors never halt processing. It is up to the calling function to decide whether to stop.
When running the program under a debugger, this is an annoyance: we want to stop as soon as a problem turns up.
The global variable apop_opts.stop_on_warning
changes when the system halts:
'n'
: never halt. If you were using Apophenia to support a user-friendly GUI, for example, you would use this mode.
The default: if the variable is '\0'
(the default), halt on severe errors, continue on all warnings.
'v'
: If the verbosity level of the warning is such that the warning would print to screen, then halt; if the warning message would be filtered out by your verbosity level, continue.
'w'
: Halt on all errors or warnings, including those below your verbosity threshold.
See the documentation for individual functions for details on how each reports errors to the caller and the level at which warnings are posted.
The output routines handle four sinks for your output. There is a global variable that you can use for small projects where all data will go to the same place.
You can also set the output type, the name of the output file or table, and other options via arguments to individual calls to output functions. See apop_prep_output for the list of options.
C makes minimal distinction between pipes and files, so you can set a pipe or file as output and send all output there until further notice:
Continuing the example, you can always override the global data with a specific request:
I will first look to the input file name, then the input pipe, then the global output_pipe
, in that order, to determine to where I should write. Some combinations (like output type = 'd'
and only a pipe) don't make sense, and I'll try to warn you about those.
What if you have too much output and would like to use a pager, like less
or more
? In C and POSIX terminology, you're asking to pipe your output to a paging program. Here is the form:
popen
will search your usual program path for less
, so you don't have to give a full path.
For a reference, your best bet is the Structured Query Language reference for SQLite. For a tutorial; there is an abundance of tutorials online. Here is a nice blog entry about complementaries between SQL and matrix manipulation packages.
Apophenia currently supports two database engines: SQLite and mySQL/mariaDB. SQLite is the default, because it is simpler and generally more easygoing than mySQL, and supports in-memory databases.
The global apop_opts.db_engine
is initially NULL
, indicating no preference for a database engine. You can explicitly set it:
If apop_opts.db_engine
is still NUL
on your first database operation, then I will check for an environment variable APOP_DB_ENGINE
, and set apop_opts.db_engine='m'
if it is found and matches (case insensitive) mariadb
or mysql
.
Write apop_data sets to the database using apop_data_print, with .output_type='d'
.
c1
, c2
, c3
, &c. "row_name"
), then a so-named column is created, and the row names are placed there. weights
. (data, "tabname", .output_type='d', .output_append='w')
to overwrite an existing table or with .output_append='a'
to append. Appending is the default. Or, call apop_table_exists ("tabname", 'd')
to ensure that the table is removed ahead of time.Finally, Apophenia provides a few nonstandard SQL functions to facilitate math via database; see Database moments (plus pow()!).
Apophenia uses OpenMP for threading. You generally do not need to know how OpenMP works to use Apophenia, and many points of work will thread without your doing anything.
gsl_matrix
at the same time, you're going to have problems.N
with the environment variableor the C function
Use one of these methods with N=1
if you want a single-threaded program. You can return later to using all available threads via omp_set_num_threads(omp_get_num_procs())
.
for
loop over the input apop_data set across multiple threads. Therefore, be careful to send thread-unsafe functions to it only after calling omp_set_num_threads(1)
.gsl_rng
, you can parallelize functions that make random draws.apop_opts.rng_seed
, then incrementing that seed by one. You thus probably have threads with seeds 479901, 479902, 479903, .... [If you have a better way to do it, please feel free to modify the code to implement your improvement and submit a pull request on Github.]See this tutorial on C threading if you would like to know more, or are unsure about whether your functions are thread-safe or not.