Data Fields
double	ci_level

apop_data *	data

struct loess_struct	lo_s

int	want_predict_ci

Detailed Description

The code for the loess system is based on FORTRAN code from 1988, overhauled in 1992, linked in to Apophenia in 2009. The structure that does all the work, then, is a loess_struct that you should basically take as opaque.

The useful settings from that struct re-appear in the apop_loess_settings struct so you can set them directly, and then the settings init function will copy your preferences into the working struct.

The documentation for the elements is cut/pasted/modified from Cleveland, Grosse, and Shyu.

Field Documentation

double apop_loess_settings::ci_level

If running a prediction, the level at which to calculate the confidence interval. default: 0.95

struct loess_struct apop_loess_settings::lo_s

.data: Mandatory. Your input data set.

.lo_s.model.span: smoothing parameter. Default is 0.75.

.lo_s.model.degree: overall degree of locally-fitted polynomial. 1 is locally-linear fitting and 2 is locally-quadratic fitting. Default is 2.

.lo_s.normalize: Should numeric predictors be normalized? If 'y' - the default - the standard normalization is used. If 'n', no normalization is carried out.

.lo_s.model.parametric: for two or more numeric predictors, this argument specifies those variables that should be conditionally-parametric. The argument should be a logical vector of length p, specified in the order of the predictor group ordered in x. Default is a vector of 0's of length p.

.lo_s.model.drop_square: for cases with degree = 2, and with two or more numeric predictors, this argument specifies those numeric predictors whose squares should be dropped from the set of fitting variables. The method of specification is the same as for parametric. Default is a vector of 0's of length p.

.lo_s.model.family: the assumed distribution of the errors. The values may be "gaussian" or "symmetric". The first value is the default. If the second value is specified, a robust fitting procedure is used.

lo_s.control.surface: determines whether the fitted surface is computed "directly" at all points or whether an "interpolation" method is used. The default, interpolation, is what most users should use unless special circumstances warrant.

lo_s.control.statistics: determines whether the statistical quantities are computed "exactly" or approximately, where "approximate" is the default. The former should only be used for testing the approximation in statistical development and is not meant for routine usage because computation time can be horrendous.

lo_s.control.cell: if interpolation is used to compute the surface, this argument specifies the maximum cell size of the k-d tree. Suppose k = floor(n*cell*span) where n is the number of observations. Then a cell is further divided if the number of observations within it is greater than or equal to k. default=0.2

lo_s.control.trace_hat: Options are "approximate", "exact", and "wait.to.decide". When lo_s.control.surface is "approximate", determines the computational method used to compute the trace of the hat matrix, which is used in the computation of the statistical quantities. If "exact", an exact computation is done; normally this goes quite fast on the fastest machines until n, the number of observations is 1000 or more, but for very slow machines, things can slow down at n = 300. If "wait.to.decide" is selected, then a default is chosen in loess(); the default is "exact" for n < 500 and "approximate" otherwise. If surface is "exact", an exact computation is always done for the trace. Set trace_hat to "approximate" for large dataset will substantially reduce the computation time.

lo_s.model.iterations: if family is "symmetric", the number of iterations of the robust fitting method. Default is 0 for lo_s.model.family = gaussian; 4 for family=symmetric.

That's all you can set. Here are some output parameters:

fitted_values: fitted values of the local regression model

fitted_residuals: residuals of the local regression fit

enp: equivalent number of parameters.

s: estimate of the scale of the residuals.

one_delta: a statistical parameter used in the computation of standard errors.

two_delta: a statistical parameter used in the computation of standard errors.

pseudovalues: adjusted values of the response when robust estimation is used.

trace_hat: trace of the operator hat matrix.

diagonal: diagonal of the operator hat matrix.

robust: robustness weights for robust fitting.

divisor: normalization divisor for numeric predictors.

int apop_loess_settings::want_predict_ci

If 'y' (the default), calculate the confidence bands for predicted values

Apophenia

Data Fields

Detailed Description

Field Documentation