The basic idea of a smoother is illustrated in Figure 1. The kernel density estimator employed is

where n is the sample size, h
is a bandwidth along the
j
coordinate of a d-dimensional vector u, and K
(.) is a kernel or weight function along u
.
For the example, a rectangular kernel, i.e. K(v) = 1, for
1,
and 0, else, is used. The bandwidth may vary by coordinate and by
location. K(.) is chosen to be a valid density function (e.g.,
uniform, Normal).
The kernel regression estimator used in Figure 1 follows from

It is readily seen that this is a convolution estimator, representing a weighted moving average. Eubank [1988] shows that linear regression can be expressed like (2), where K(.)/nh corresponds to the ``influence matrix.'' Since the linear regression weights do not decay with distance (the end points of the data are most influential everywhere, and an error for small x, can influence the fit for large x), linear regression is not nonparametric.
Other nonparametric estimators can be expressed in the forms (1) and (2).
Nearest neighbor estimators are obtained by using a uniform kernel, with
h(x
)
as the distance from x
to its k
nearest neighbor. Local polynomial estimators, like LOESS
[ Cleveland, 1979; 1988a; 1988b]) are defined by fitting
a polynomial (with an associated weight function) to the k nearest
neighbors of the point of estimate. Fourier or orthogonal series
correspond to the use of a special kernel, as do splines.
A theoretical consideration is to design a scheme that works well in terms of pointwise bias and error variance. In Figure 1, the bias in the high curvature area could be reduced by reducing h, or by fitting a local quadratic, rather than the local constant shown. Of course, to reliably fit the local quadratic one would need more points in the neighborhood than for fitting a local constant. This suggests that there is an interplay between choosing h and K(.) in terms of local bias and variance. One can fix, say, K(.) and focus on optimizing h. This has been the theme in a lot of the statistical literature. However, for regression, local constant fitting is biased even for linear regression. A bridge to the parametric literature is provided as one varies the order of the local polynomial and the neighborhood size up to the full sample size. The Generalized Cross Validation (GCV) criteria [ Craven and Wahba, 1979] can be used to choose an appropriate regression parameterization (order, form and bandwidth). Estimates of the Integrated Square Error or related functions are useful in the density context [ Scott, 1992]). Some practitioners are disturbed to find that nonparametric estimators may be locally biased (if the order of approximation is low, and the truncated terms in the local Taylor series expansion of the underlying function are large). However, while a parametric estimator may be unbiased for the fitted model, the resulting estimate may have unquantifiable bias for the underlying function.
With this background, let us review some hydrologic applications.