The apop_pmf model wraps an apop_data set so it can be read as an empirical model, with a likelihood function (equal to the associated weight for observed values and zero for unobserved values), a random number generator (which simply makes weighted random draws from the data), and so on. Setting it up is a model estimation from data like any other, done via apop_estimate(
You have the option of cleaning up the data before turning it into a PMF. For example...
These are largely optional.
It is the
weights vector that holds the density represented by each row; the rest of the row represents the coordinates of that density. If the input data set has no
weights segment, then I assume that all rows have equal weight.
For a PMF model, the
NULL, and the
data itself is used for calculation. Therefore, modifying the data post-estimation can break some internal settings set during estimation. If you modify the data, throw away any existing PMFs (via apop_model_free) and re-estimate a new one.
Using apop_data_pmf_compress puts the data into one bin for each unique value in the data set. You may instead want bins of fixed with, in the style of a histogram, which you can get via apop_data_to_bins. It requires a bin specification. If you send a
NULL binspec, then the offset is zero and the bin size is big enough to ensure that there are bins from minimum to maximum. The binspec will be added as a page to the data set, named
"<binspec>". See the apop_data_to_bins documentation on how to write a custom bin spec.
There are a few ways of testing the claim that one distribution equals another, typically an empirical PMF versus a smooth theoretical distribution. In both cases, you will need two distributions based on the same binspec.
For example, if you do not have a prior binspec in mind, then you can use the one generated by the first call to the histogram binning function to make sure that the second data set is in sync:
your_data, apop_pmf); optional.
your_pmf->data->weights) via moving average.