Nonparametric function estimation refers to methods that strive to approximate a target function locally, i.e., using data from a ``small'' neighborhood of the point of estimate. ``Weak'' assumptions, such as continuity of the target function and its differentiability to some order in the neighborhood, rather than an a priori assumption of the global form (e.g., linear or quadratic) of the entire target function are used. Traditionally, parametric assumptions (e.g., hydraulic conductivity is log normally distributed, floods follow a log Pearson III (LP3) distribution, annual stream flow is either log normal or gamma distributed, daily rainfall amounts are exponentially distributed, and the variograms of spatial hydrologic data follow a power law) have dominated statistical hydrologic estimation. Applications of nonparametric methods to some classical problems (frequency analysis, classification, spatial surface fitting, trend analysis, time series forecasting and simulation) of stochastic hydrology are reviewed.
As hydrologic data bases grow, computational and graphical visualization abilities undergo quantum leaps, and concerns of process heterogeneity and non-stationarity come to the fore, questions about the validity of such parametric assumptions arise. The zealotry often associated with the advocacy of particular models (e.g., floods are LP3) and parameter estimation procedures has served to mask the basic question faced by statistical hydrologists, which is, how best to estimate a function that summarizes structural relationships implicit in the data. Usually, the hydrologist has little physical or theoretical guidance as to the specific form of the target function. The traditional exercises amount to choosing between a small set of prescribed curves to fit the data at hand. What is one to do when the naked eye discerns structure in the data, and yet none of the usual candidates fit well? How does one choose between two models that fit equally well in terms of a global measure (e.g., likelihood, or sum of squared residuals), are parsimonious, and yet differ markedly in the details of the fit? How should one interpret the estimated confidence bands so as to reflect not just parameter estimation variance, but also uncertainty in model choice? Given a finite amount of data, are there situations that allow us to side step the model choice issue altogether and permit a useful interpretation of the data? Questions like these invariably steer an investigator into the realm of nonparametric function estimation or ``smoothing.''
A variety of ``smoothers'' are available in the statistical literature. They differ in their estimation efficiency, in their computational demands, in their applicability, and in their mathematical form. However, they share the goal of approximating (with asymptotically vanishing error) an arbitrary, unknown function of the data, and the notion that each estimate be local (i.e., influenced only by nearby data). Smoothers are interpretable as weighted moving averages (kernel estimators) of some function of the data. Localization is achieved by weights that vanish with distance from the point of estimate.
Nonparametric function estimation has been a very active research area
in Statistics in the last 10 years. Some relevant monographs are Devroye
and Györfi
[4]
[1985], Silverman [1986], Müller [1988], Eubank
[1988], Härdle [1989; 1990], Györfi et al. [1989],
Hastie and Tibishirani [1990], Thompson [1990], Wahba
[1990],
[4]
Tong [1990], Scott [1992], and Cleveland [1993b].
The basic ideas of smoothing are illustrated in the next section. A historical overview of hydrologic applications (to frequency analysis, to time series analysis and to spatial analysis) follows. There has not been a prior review of this topic. Consequently, this review reaches beyond the reporting period, and aims to be retrospective rather expository.