next up previous
Next: Spatial Analysis Up: Time Series Analysis Previous: Trends and Correlative

Forecasting and Simulation

Despite virtual dominance of the research literature, linear Auto-regressive Moving Average (ARMA) models for hydrologic time series forecasting and simulation have gained limited acceptance with practitioners. Simple resampling schemes, such as the index sequential method [ Kendall and Dracup, 1991] may be preferred. The ARMA framework has been successful for annual and perhaps monthly flows, largely because ``structure'' and predictability of flows are lost by the time you get to such lags. ARMA models are hard to justify when daily flows are of interest. These models are incapable of easily modeling the persistence in such flows, while at the same time responding to sudden bursts in hydrographs subsequent to a storm, and the subsequent gradual decay of the hydrograph. Recognition of such factors motivated the nonparametric, Markovian thinking described in Yakowitz [1973, 1979a, 1979b, 1985a, 1985b]. Much of the subsequent nonparametric time series literature draws on the concepts developed in these papers.

In these papers, Yakowitz considers a finite order, continuous parameter Markov chain as an appropriate model for hydrologic time series. He observes that discretization of the state space can quickly lead to an unmanageable number of parameters (the curse of dimensionality) or poor approximation of the transition functions, while the ARMA approximations to such a process call for restrictive distributional and structural assumptions. The problem is cast in a general setting with a variety of measures (e.g., conditional probability of threshold crossings, or one step conditional distribution functions or expectations) of interest, and a predictor space that can include a d-tuple of past stream flows and other auxiliary variables. The requisite transition functions are evaluated through empirical conditional distribution functions, and transition intensity functions, conditional p.d.f.'s and regressions that are evaluated using nearest neighbor (NN) or kernel methods. Strategies for the simulation of daily flow sequences, one step ahead prediction and the conditional probability of flooding (flow crossing a threshold) are exemplified with river flows and shown to be superior to ARMA models. Seasonality is accommodated by including the calendar date as one of the predictors. Nonparametric Bayesian procedures for incorporating prior or regional information (including parametric p.d.f.'s for extremes) are indicated. Yakowitz indicates that this continuous parameter Markov chain approach can reproduce any possible Hurst coefficient. He relates these ideas to hydrologic decision problems, argues that the loss functions associated with hydrologic decisions (e.g., declare a flood warning or not) are usually highly asymmetric, and that the classical ARMA or Kalman filtering framework is suited for optimal prediction only under squared error, and only for linear operations on the observables. The nonparametric framework allows attention to be focused directly on calculating these loss functions and evaluating the consequences.

Tong [1990] provides motivation for nonlinear time series analysis methodology and for nonparametric modeling and visualization of time series. He uses a daily river flow example to illustrate that such data with sudden jumps, time irreversibility, asymmetric joint distributions, persistence, lots of high level crossings, and state dependent correlation between lagged flows do not support the assumptions inherent in classical linear ARMA modeling.

Yakowitz [1987, 1993], Yakowitz and Karlsson [1987], Karlsson and Yakowitz [1987a, 1987b] motivate and provide theoretical basis for nearest neighbor (NN) regression for prediction of time series and specifically for rainfall-runoff modeling. The practical idea is simple. Given a ``feature vector'' of, say, a sequence of past flows and past and current rainfall amounts, determine the conditional expectation of, say, the next flow. This conditional expectation is evaluated by identifying the successor flows to the k historical nearest neighbors of the current feature vector, and averaging them. Importance weights may be assigned to each component of the feature vector and optimized by cross validation as part of the estimation process. They compared the one step NN predictions of daily flow on different days with storms to a Unit Hydrograph model, and to an ARMAX model with data from an Ohio basin and found that the NN model was superior. Galeati [1990] shows that this simple NN predictor provides lower mean square error predictions of daily mean inflow to an Italian reservoir relative to an autoregressive model with exogenous inputs, that was coupled to physically based, calibrated, rainfall-runoff and snow cover evolution models.

Smith [1991] and Smith et al. [1992] present some interesting applications of Yakowitz's ideas that expose the flexibility of nonparametric methods for seeking relationships between arbitrary functions of possibly linked data sets. For example, they seek to predict directly (1) accumulated daily flow over a future 1 to 4 month period, (2) the minimum daily flow over the future period, (3) the time when future flow may drop below a threshold, or (4) the total time during the future period when the daily flow is below a threshold. As predictors, they consider measures of antecedent conditions, the Southern Oscillation Index, and basin hydrologic and climatic variables. Kernel methods and empirical conditional distribution functions are used to develop such predictions. Relative importance of predictors is assessed, and the state and seasonal dependence of the predictions is graphically demonstrated. This work shows that the nonparametric framework allows one to work directly with the statistics relevant for reservoir operation, rather than worrying about successfully estimating them from a linear model designed to reproduce a serial correlation structure.

Kember et al. [1993] connect the NN predictor to state space reconstruction methods used to reconstruct nonlinear dynamics [ Farmer and Sidorowich, 1987] from time series. They consider a weighted neighborhood, with weights decreasing exponentially with distance, and the L step ahead forecast regressed on a vector of past flows that may be lagged at a rate different than the sampling rate. Predictive error criterion are used for choosing the model order, the lag time and the decay rate of the exponential weighting scheme. Performance is found to be superior to multiplicative, seasonal, ARIMA models for a 70 year record of daily streamflow.

Lall et al. [1994b] are motivated similarly, but use Multivariate Adaptive Regression Splines (MARS) due to Friedman [1991], to recover the map of the dynamical system. This is a higher order function approximation scheme than NN regression. Parameters including model order, delay, and spline parameters (number of knots, knot locations, linear or cubic splines) are chosen using GCV. The time series analyzed is the 1848-1992 biweekly volume record of the Great Salt Lake. Blind predictions up to 4 years ahead using only prior data are attempted at various points in time. These predictions are dramatically superior as the forecast horizon increases compared to those from the best fit AR model, and predict the unprecedented, and dramatic 4 year rise and fall of the Great Salt Lake in the 1980's.

The strategy used for simulation in the following work is to develop a k.d.e. for the target univariate, multivariate or conditional p.d.f., and to then sample from this k.d.e. This is tantamount to a smoothed bootstrap [ Silverman, 1986] or smoothed conditional bootstrap. Markovian interpretations of such procedures as suggested by Yakowitz apply.

Rajagopalan et al. [1993, 1994] and Lall et al. [1993b] develop a seasonal nonparametric renewal model (NPR) for simulating daily precipitation, where successive dry and wet spell lengths may be dependent or independent. All requisite p.d.f.'s (for log transformed precipitation amount, and wet/dry spell length) are estimated by kernel methods. Monte Carlo results with real data show that spell characteristics as well as other statistics are well reproduced. The development of a new k.d.e. [ Balaji and Lall, 1994, to appear] appropriate for discrete data complements this work.

Tarboton et al. [1993] develop a multivariate k.d.e. with local bandwidths proportional to local covariance based on k nearest neighbors (similar in spirit to Lall and Bosworth [1993]), as well as requisite conditional k.d.e.'s for simulation of streamflow time series. Simulation proceeds sequentially using appropriate, estimated conditional p.d.f.'s. Annual and monthly applications to Colorado River basin flow preserve desired statistics. This model is extended by Balaji et al. [1994] to consider a multivariate vector of daily weather variables (solar radiation, maximum temperature, minimum temperature, average wind speed and average dew point temperature) and to integrate it with the NPR daily precipitation model described above. Monte Carlo results with Western U.S. weather data demonstrate ability to reproduce not just the usual moments but also quartiles.

Tarboton [1994] visually evaluates the performance of Colorado river annual stream flows (some based on tree rings) simulated by SPIGOT [ Grygier and Stedinger, 1990], through plots of k.d.e.'s of the marginal p.d.f. of recorded and simulated traces.



next up previous
Next: Spatial Analysis Up: Time Series Analysis Previous: Trends and Correlative



U.S. National Report to IUGG, 1991-1994
Rev. Geophys. Vol. 33 Suppl., © 1995 American Geophysical Union