Modeling with Data has a full tutorial for C, oriented at users of standard stats packages. More nuts-and-bolts tutorials are in abundance. Some people find pointers to be especially difficult; fortunately, there's a claymation cartoon which clarifies everything.
There is only one header. Put
at the top of your file, and you're done. Everything declared in that file starts with
Apop_. It also includes
You will need to link to the Apophenia library, which involves adding the
-lapophenia flag to your compiler. Apophenia depends on SQLite3 and the GNU Scientific Library (which depends on a BLAS), so you will probably need something like:
Your best bet is to encapsulate this mess in a Makefile. Even if you are using an IDE and its command-line management tools, see the Makefile page for notes on useful flags.
To the best of our abilities, Apophenia complies to the C standard (ISO/IEC 9899:2011). As well as relying on the GSL and SQLite, it uses some POSIX function calls, such as
For example, apop_data_copy detects allocation errors and some circular links (when
Data->more == Data) and fails in those cases. You could thus use the function with a form like
There is sometimes (but not always) benefit to handling specific error codes, which are listed in the documentation of those functions that set the
error element. E.g.,
The end of Appendix O of Modeling with Data offers some GDB macros which can make dealing with Apophenia from the GDB command line much more pleasant. As discussed below, it also helps to set
'w' when running under the debugger.
The global variable
apop_opts.verbose determines how many notifications and warnings get printed by Apophenia's warning mechanism:
-1: turn off logging, print nothing (ill-advised)
0: notify only of failures and clear danger
1: warn of technically correct but odd situations that might indicate, e.g., numeric instability
2: debugging-type information; print queries
3: give me everything, such as the state of the data at each iteration of a loop.
These levels are of course subjective, but should give you some idea of where to place the verbosity level. The default is 1.
The messages are printed to the
FILE* handle at
apop_opts.log_file. If this is blank (which happens at startup), then this is set to
stderr. This is the typical behavior for a console program. Use
to write to the
mylog file instead of
As well as the error and warning messages, some functions can also print diagnostics, using the Apop_notify macro. For example, apop_query and friends will print the query sent to the database engine iff
apop_opts.verbose >=2 (which is useful when building complex queries). The diagnostics attempt to follow the same verbosity scale as the warning messages.
By default, warnings and errors never halt processing. It is up to the calling function to decide whether to stop.
When running the program under a debugger, this is an annoyance: we want to stop as soon as a problem turns up.
The global variable
apop_opts.stop_on_warning changes when the system halts:
'n': never halt. If you were using Apophenia to support a user-friendly GUI, for example, you would use this mode.
The default: if the variable is
'\0' (the default), halt on severe errors, continue on all warnings.
'v': If the verbosity level of the warning is such that the warning would print to screen, then halt; if the warning message would be filtered out by your verbosity level, continue.
'w': Halt on all errors or warnings, including those below your verbosity threshold.
See the documentation for individual functions for details on how each reports errors to the caller and the level at which warnings are posted.
The output routines handle four sinks for your output. There is a global variable that you can use for small projects where all data will go to the same place.
You can also set the output type, the name of the output file or table, and other options via arguments to individual calls to output functions. See apop_prep_output for the list of options.
C makes minimal distinction between pipes and files, so you can set a pipe or file as output and send all output there until further notice:
Continuing the example, you can always override the global data with a specific request:
I will first look to the input file name, then the input pipe, then the global
output_pipe, in that order, to determine to where I should write. Some combinations (like output type =
'd' and only a pipe) don't make sense, and I'll try to warn you about those.
What if you have too much output and would like to use a pager, like
more? In C and POSIX terminology, you're asking to pipe your output to a paging program. Here is the form:
popen will search your usual program path for
less, so you don't have to give a full path.
For a reference, your best bet is the Structured Query Language reference for SQLite. For a tutorial; there is an abundance of tutorials online. Here is a nice blog entry about complementaries between SQL and matrix manipulation packages.
Apophenia currently supports two database engines: SQLite and mySQL/mariaDB. SQLite is the default, because it is simpler and generally more easygoing than mySQL, and supports in-memory databases.
apop_opts.db_engine is initially
NULL, indicating no preference for a database engine. You can explicitly set it:
apop_opts.db_engine is still
NUL on your first database operation, then I will check for an environment variable
APOP_DB_ENGINE, and set
apop_opts.db_engine='m' if it is found and matches (case insensitive)
"row_name"), then a so-named column is created, and the row names are placed there.
(data, "tabname", .output_type='d', .output_append='w')to overwrite an existing table or with
.output_append='a'to append. Appending is the default. Or, call apop_table_exists
("tabname", 'd')to ensure that the table is removed ahead of time.
Finally, Apophenia provides a few nonstandard SQL functions to facilitate math via database; see Database moments (plus pow()!).
Apophenia uses OpenMP for threading. You generally do not need to know how OpenMP works to use Apophenia, and many points of work will thread without your doing anything.
gsl_matrixat the same time, you're going to have problems.
Nwith the environment variable
or the C function
Use one of these methods with
N=1 if you want a single-threaded program. You can return later to using all available threads via
forloop over the input apop_data set across multiple threads. Therefore, be careful to send thread-unsafe functions to it only after calling
gsl_rng, you can parallelize functions that make random draws.
apop_opts.rng_seed, then incrementing that seed by one. You thus probably have threads with seeds 479901, 479902, 479903, .... [If you have a better way to do it, please feel free to modify the code to implement your improvement and submit a pull request on Github.]
See this tutorial on C threading if you would like to know more, or are unsure about whether your functions are thread-safe or not.