For trend analysis and investigating bivariate dependence, locally weighted estimation or LOESS [ Cleveland, 1979]; Cleveland and Devlin, 1988; Cleveland et al., 1988; 1990]; Cleveland, 1988a; 1988b; 1993a; 1993b; and Cleveland and McRae, 1989] has emerged as the nonparametric method of choice. In most applications, ``default'' settings for LOESS have been used, and no optimization (sensitivity analysis) of the order of the local polynomial or of the number of neighbors used is reported. Example applications are:
Helsel and Hirsch [1992], and Hirsch et al. [1991; 1993]
formalize procedures for using LOESS for trend analysis of hydrologic and
environmental data, and to remove systematic variations in the environmental
variable of interest due to a covariate (e.g., total phosphorous concentration
variability in the time series due to streamflow discharge variability). They
also suggest the use of LOESS to investigate structure in residuals of a
parametric fit; to graphically summarize and compare salient trends in time
series of variables that may have some common variation; and to investigate
symmetry in the conditional density f(y|x) by separately smoothing the
positive and negative residuals of an original LOESS smooth, and thereby
estimating the lower and upper conditional quartiles. The LOESS smooths shown
by these authors are compared with simple parametric alternatives. The visual
superiority and adaptability of the LOESS smooths is striking.
Bradley and Potter [1992] smooth peak discharge vs 3-day flow
volume for regulated and unregulated conditions, en route to developing a Peak
to Volume FFA that may be useful for examining the impact of flow regulation on
floods.
Baier and Cohn [1993] smooth atmospheric concentrations of
selected constituents vs precipitation, to remove the effect of precipitation
variability on acid deposition trends.
Lall and Bosworth [1993] look for relationships between
precipitation, evaporation, net precipitation and annual inflow into the Great
Salt Lake.
Applications of kernel methods for bivariate association are:
Adamowski and Feluch [1991] rediscover the Nadaraya-Watson
[ Nadaraya, 1964] kernel regression estimator by developing the conditional
expectation
from a bivariate kernel density estimator f(x,y).
They use it to regress ground water depth (y) on nearby streamflow (x) in the
Castor River watershed. The raw scatter plot shows no evidence of any
relationship between y and x for low x (where most of the data is). The
bandwidth is chosen by LSCV of the bivariate density f(x,y), which may be far
from optimal for regression. As with frequency analysis, it may be better to
choose the bandwidth optimizing the target function (regression) instead.
Nevertheless, their split sample results are quite respectable for the fitting
and validating subsamples. The kernel regression results have a much higher
R
than polynomial or power regression. A cautionary note
regarding the interpretation of such statistics is in order. It is easy to
believe that since a single parameter (e.g., h or k) is being used, the degrees
of freedom are (n-1). However, as hÆ 0, the estimate is based on 1 data point
with 0 degrees of freedom, and will have an R
=1. The GCV
score, which unlike the R
accounts for the effective degrees of
freedom, should be used. Similar comments apply to all nonparametric
regressors.
Sangoyomi and Lall [1993] used k.d.e. to investigate the number
of modes in the p.d.f. of several hydro-climatic time series in the Great Salt
Lake basin. The intention was to identify distinct regimes in long term climate,
estimate transition probabilities between them, and improve predictability of
the Great Salt Lake volume variations. They found that transition to a lake
volume increase/decrease in a summer/winter was a precursor to a multi-year
rise/decline of the lake.
Lall and Bosworth [1993] develop a multivariate kernel density
estimator, that employs a set partitioning strategy to define local bandwidth
matrices proportional to subset covariance, and explore multivariate dependence
between precipitation, evaporation, net precipitation and annual inflow into the
Great Salt Lake. An interesting interplay between precipitation and evaporation
in generating inflow is seen. Serial dependence issues are not properly dealt
with. The sensitivity of k.d.e. to bandwidth variation is examined, but optimal
bandwidth selection is not attempted.