This reference section describes the assumptions made by apop_text_to_db and apop_text_to_data.
Each row of the file will be converted to one record in the database or one row in the matrix. Values on one row are separated by delimiters. Fixed-width input is also OK; see below.
By default, the delimiters are set to "|,\t", meaning that a pipe, comma, or tab will delimit separate entries. To change the default, use an argument to apop_text_to_db or apop_text_to_data like .delimiters=" \t"
or .delimiters="|"
.
The input text file must be UTF-8 or traditional ASCII encoding. Delimiters must be ASCII characters. If your data is in another encoding, try the POSIX-standard iconv
program to filter the data to UTF-8.
#
, or ".
\li If a field contains several such special characters, surround it by \c "s
. The surrounding marks are stripped and the text read verbatim. #
is taken to be comments and ignored. gsl_matrix
element of an apop_data set, all text fields are taken as zeros. You will be warned of such substitutions unless you set apop_opts.verbose==0
beforehand. For mixed text/numeric data, try using apop_text_to_db and then apop_query_to_mixed_data. If this rule doesn't work for your situation, you can explicitly insert a note that there is a missing data point. E.g., try:
If you have missing data delimiters, you will need to set apop_opts.nan_string to text that matches the given format. E.g.,
SQLite stores these NaN-type values internally as NULL
; that means that functions like apop_query_to_data will convert both your nan_string
string and NULL
to NaN
.
atof()
function for floating-point numbers: INFINITY, -INFINITY, and NaN work as expected. row names
. That is, for a 100x100 data set with row and column names, there are 100 names in the top row, and 101 entries in each subsequent row (name plus 100 data points). 1, 2,3, 4 , 5, " six ",7
is eqivalent to 1,2,3,4,5," six ",7
. '\0'
) are treated as white space, so if your fields have NULs as padding, you should have no problem. NULs inside of a string terminates the string as it always does in C. .field_ends=(int[]){3, 5, 7}
, we have three columns, named NUM, LE, and OL. The names can be read from the first row by setting .has_row_names='y'
.