The package xxl.core.math.statistics adresses several issues of statistical applications. We focus on methods of statistical inference in contrast to
descriptive statistics. Generally, statistical inference differs between parametric and
nonparametric methods. In the parametric case, an unknown
distribution belongs to a distribution class with unknown
parameters theta in R^k. These can in turn be
estimated with appropriate estimators and a given sample.
In nonparametric statistics, theta not in R^k,
e.g., the pdf or cdf are unknown. Methods of nonparametric
statistics are driven directly by the structure in the data and
provide a more sophisticated data analysis.
Picking up the idea of online aggregation as discussed in [HHW97]: J. Hellerstein, P. Haas, and H. Wang. Online Aggregation. 1997, we provide
running aggregates over a data stream that in turn are parametric or nonparametric estimators. These are based on the data
seen so far and are incrementally maintained. Our framework allows to include user-defined estimators and also provides a rich set of ready-to-use
estimators based on parametric estimators and kernel-based estimators. Generally, two methods are applicable for the generation of complex
statistical estimators: a reservoir-based one that computes iid-samples of the data stream during runtime, and an iteratively-compressed method
that compresses the current estimate with its cubic Bezier-spline interpolate in each step.