The apop_pmf model wraps an apop_data set so it can be read as an empirical model, with a likelihood function (equal to the associated weight for observed values and zero for unobserved values), a random number generator (which simply makes weighted random draws from the data), and so on. Setting it up is a model estimation from data like any other, done via apop_estimate(your_data
, apop_pmf).
You have the option of cleaning up the data before turning it into a PMF. For example...
These are largely optional.
It is the weights
vector that holds the density represented by each row; the rest of the row represents the coordinates of that density. If the input data set has no weights
segment, then I assume that all rows have equal weight.
For a PMF model, the parameters
are NULL
, and the data
itself is used for calculation. Therefore, modifying the data post-estimation can break some internal settings set during estimation. If you modify the data, throw away any existing PMFs (via apop_model_free) and re-estimate a new one.
Using apop_data_pmf_compress puts the data into one bin for each unique value in the data set. You may instead want bins of fixed with, in the style of a histogram, which you can get via apop_data_to_bins. It requires a bin specification. If you send a NULL
binspec, then the offset is zero and the bin size is big enough to ensure that there are bins from minimum to maximum. The binspec will be added as a page to the data set, named "<binspec>"
. See the apop_data_to_bins documentation on how to write a custom bin spec.
There are a few ways of testing the claim that one distribution equals another, typically an empirical PMF versus a smooth theoretical distribution. In both cases, you will need two distributions based on the same binspec.
For example, if you do not have a prior binspec in mind, then you can use the one generated by the first call to the histogram binning function to make sure that the second data set is in sync:
You can use apop_test_kolmogorov or apop_histograms_test_goodness_of_fit to generate the appropriate statistics from the pairs of bins.
Kernel density estimation will produce a smoothed PDF. See apop_kernel_density for details. Or, use apop_vector_moving_average for a simpler smoothing method.
your_data
, apop_pmf); optional. your_pmf->data->weights
) via moving average.