Special Focus: Advances in Data Acquisition, Management, Analysis and Display [SF]

SF11A MCC:3010 Monday 0800h

Hydroinformatics

Presiding:A White, University of Illinois at Urbana-Champaign; P Kumar, University of Illinois at Urbana-Champaign

SF11A-01 INVITED 08:00h

CUAHSI Hydrologic Information System

* Maidment, D R (maidment@mail.utexas.edu) , University of Texas at Austin, Center for Research in Water Resources, Austin, Tx 78712
Helly, J , San Diego Supercomputer Center, University of California, San Diego, CA 92093
Kumar, P , University of Illinois at Urbana-Champaign, Dept of Civil Engineering, Urbana, IL 61801
Piasecki, M , Drexel University, Dept of Civil Engineering, Philadelphia, PA 19104
Hooper, R , CUAHSI, 2000 Florida Avenue, NW, Washington, DC 20009

The Consortium of Universities for the Advancement of Hydrologic Science, Inc, (CUAHSI) is developing a Hydrologic Information System (HIS)to advance hydrologic science and education in US academic institutions. A central mission of this system is to support CUAHSI's parallel Hydrologic Observatory and Hydrologic Synthesis Center projects. The HIS consists of three parts: a Hydrologic Digital Library for storage of any kind of digital hydrologic information, a Digital Watershed for geospatial description of the water environment, and a system for computation of hydrologic fluxes and flows through this environment. The main design features of the HIS are presented and illustrated through application to the Neuse basin in North Carolina.

http://cuahsi.sdsc.edu/HIS/

SF11A-02 08:15h

Developing the CUAHSI Metadata Profile

* Piasecki, M (Michael.Piasecki@drexel.edu) , Drexel University, 3141 Chestnut Street, Philadlephia, PA 19104 United States
Bermudez, L (leb27@drexel.edu) , Drexel University, 3141 Chestnut Street, Philadlephia, PA 19104 United States
Islam, S (asi22@drexel.edu) , Drexel University, 3141 Chestnut Street, Philadlephia, PA 19104 United States
Beran, B (Bora.Beran@drexel.edu) , Drexel University, 3141 Chestnut Street, Philadlephia, PA 19104 United States

The Hydrologic Information System (HIS), of the Consortium of Universities for the Advancement of Hydrologic Science Inc., (CUAHSI), has as one of its goals to improve access to large volume, high quality, and heterogeneous hydrologic data sets. This will be attained in part by adopting a community metadata profile to achieve consistent descriptions that will facilitate data discovery. However, common standards are quite general in nature and typically lack domain specific vocabularies, complicating the adoption of standards for specific communities. We will show and demonstrate the problems encountered in the process of adopting ISO standards to create a CUAHSI metadata profile. The final schema is expressed in a simple metadata format, Metadata Template File (MTF), to leverage metadata annotations/viewer tools already developed by the San Diego Super Computer Center. The steps performed to create an MTF starting from ISO 19115:2003 are the following: 1) creation of ontologies using the Web Ontology Language (OWL) for ISO:19115 2003 and related ISO/TC 211 documents; 2) conceptualization in OWL of related hydrologic vocabularies such as NASA's Global Change Master Directory and units from the Hydrologic Handbook; 3) definition of CUAHSI profile by importing and extending the previous ontologies; 4) explicit creation of CUAHSI core set 5) export of the core set to MTF); 6) definition of metadata blocks for arbitrary digital objects (e.g. time series vs static-spatial data) using ISO's methodology for feature cataloguing; and 7) export of metadata blocks to MTF.

SF11A-03 08:30h

The Modelshed GeoData Model

Ruddell, B L (bruddell@uiuc.edu) , University of Illinois, Department of Civil and Environmental Engineering 205 North Mathews Avenue, Urbana, IL 61801 United States
* Kumar, P (kumar1@uiuc.edu) , University of Illinois, Department of Civil and Environmental Engineering 205 North Mathews Avenue, Urbana, IL 61801 United States

In recent years there has been an explosion in the availability of earth science data from satellites, remote sensors and numerical computations. This avalanche of data brings unprecedented opportunities for the study of environmental processes, but has led to new difficulties in organizing and communicating data for study. In many earth science studies, a majority of time and investment is now spent wrestling data into a useful form. Advanced Geographic Information Systems (GIS) and spatial geoprocessing methods ease this burden by streamlining data conversion and visualization. GIS-enabled databases store and organize environmental data in relational structures. Geodata models are applied to organize and describe spatial data in ways useful for particular applications. The ArcHydro data model for water resources has been successfully established as a standard for the modeling and communication of hydrologic datasets, and is being adopted by many branches of government, industry, and the academy. However, it is still difficult to process large gridded datasets from numerical simulations and remote sensors, and to meaningfully relate that data to other objects in an ArcHydro-modeled database. The Modelshed geodata model is presented as a generalized GIS data model for the organization and modeling of diverse geospatial data. It represents point, line, area, and volumetric database objects in three dimensions, and stores timeseries and flux data in association with all model features. It provides data structures to facilitate the geospatial analysis of time-indexed raster datasets and the integration of raster data with the vector structures of the data model. The study of relationships within this data model is simple and powerful, based on queries of indexed data tables in a relational database. Example applications are explored using the unique analysis capabilities of the Modelshed geodata model, including the construction of customized software tools, the visualization of Modelshed data in a GIS, and a climate study using a prototype database of the Illinois River Basin.

SF11A-04 08:45h

Jointly Retrieving Surface Soil Moisture from Active and Passive Microwave Observations Using Cubist Data-Mining

* Zhan, X (xzhan@hsb.gsfc.nasa.gov) , UMBC-GEST/NASA-GSFC, Code 974.1 NASA-GSFC Hydrological Sciences Branch, Greenbelt, MD 20771 United States
Houser, P R (Paul.R.Houser@nasa.gov) , NASA-GSFC Hydrological Sciences Branch, Code 974 NASA-GSFC Hydrological Sciences Branch, Greenbelt, MD 20771 United States

With the successful launches of NASA's Earth Observing Satellites (e.g. Terra & Aqua) and several environmental satellites (e.g. NPOESS, NPP, SMOS and HYDROS) being planned to launch in the near future, huge amounts of satellite remote sensing data are being collected every day. Maximizing the use of this wealth of data sets is a pressing issue for the Earth system science community. Data mining is extracts patterns from large system data sets. These patterns provide insight into system characteristics that enable outcome prediction for future situations that aids decision-making. The Cubist data-mining algorithm is a powerful tool for generating rule-based models that balance the need for accurate prediction against the requirements of intelligibility. Cubist models generally give better results than those produced by simple techniques such as multivariate linear regression, and are generally easier to understand than neural networks. The NASA's Hydrosphere States (HYDROS) mission, an Earth System Science Pathfinder, will use both L-band microwave coarse resolution radiometer and fine-resolution radar to make the first space borne observations of global soil water availability. These new observations will enable new scientific investigations of atmospheric predictability and global change processes. To assess the potential accuracies of retrieving land surface soil moisture from the radiometer and radar observations, the HYDROS science team has created an Observing System Simulation Experiment (OSSE) that includes a complete land surface geophysical properties data set (soil moisture, surface temperature, vegetation temperature, etc), the associated atmospheric variables, and the simulated HYDROS radar and radiometer observations for the Red-Arkansas river basin. We have applied the Cubist data-mining algorithm to this OSSE data set to evaluate its soil moisture retrieval skill using the active and passive microwave observations simultaneously. The resulting simple rules and models provide insights into how soil moisture soil is related to land-surface geophysical and meteorological variables. The potential to use this data mining tool for analyzing other NASA satellite observations will also be discussed.

SF11A-05 09:00h

Data Mining to Improve Management and Reduce Costs Associated With Environmental Remediation

* Minsker, B S (minsker@uiuc.edu) , The University of Illinois at Urbana-Champaign, Department of Civil and Environmental Engineering, 3230 Newmark Lab, MC-250, 205 N. Mathews Ave., Urbana, IL 61801 United States
Farrell, D M (d.m.farrell@gmail.com) , The University of Illinois at Urbana-Champaign, Department of Civil and Environmental Engineering, 3230 Newmark Lab, MC-250, 205 N. Mathews Ave., Urbana, IL 61801 United States

In this study, data from 105 soil and groundwater remediation projects at BP gas stations were mined for lessons to reduce cost and improve management of remediation sites. A data mining tool called D2K was used to train decision tree, stepwise linear regression and instance based weighting models that relate hydrogeologic, sociopolitical, temporal and remedial factors in the site closure reports to remediation cost. The most important factors influencing cost were found to be the amount of soil excavated and the number of wells installed, suggesting that better management of excavation and well placement could result in significant cost savings. The best model for predicting cost classes (low, medium, and high cost) was the decision tree which had a prediction accuracy of approximately 73%. The misclassification of approximately 27% of the sites in even the best model suggests that remediation costs at service stations are influenced by other site-specific factors that may be difficult to accurately predict in advance.

SF11A-06 09:15h

A Data-Driven Approach for Upscaling Solute Transport Models

* Hill, D J (djhill1@uiuc.edu) , Department of Civil and Environmental Engineering, University of Illinois at Urbana-Champaign, 2516 Hydrosystems Laboratory, MC 250 205 N. Mathews Ave, Urbana, IL 61801

The goal of this study is to use a machine learning tool, genetic programming (GP), a domain independent model generator, to search for an upscaled hydrologic model. The development of upscaled models of hydrologic processes has long been a concern of researchers, because computational limitations prevent the use of high-resolution models capable of resolving all of the spatial variability of model domains. In particular, researchers have struggled for decades to develop upscaled numerical models for solute transport in porous media, where the scale of variability can range from the order of a few meters in the horizontal direction but only ten to twenty centimeters in the vertical direction. A wide variety of methods have been employed to develop upscaled solute transport models, including stochastic analysis, spatial filtering, and homogenization. However, these methods all rely upon various simplifying assumptions (e.g. small conductivity variance, a grid-scale significantly larger than the largest scale of heterogeneity). Moreover, these methods usually make additional assumptions about the physics of the sub-grid processes. This study examines the use of GP to search for an upscaled model of transport of a solute pulse by horizontal flow in a perfectly stratified aquifer. GP was chosen because it creates mathematical models of input data from which information about the underlying physical processes can be extracted. This type of transport system was selected as the first application of the proposed upscaling method, because it has been extensively studied in the literature, and thus will allow for a direct comparison that will demonstrate the efficacy of the data-driven upscaling method. It has been suggested that if the upscaled model domain of this type of system is a depth averaged representation of the aquifer, the plume evolution can be modeled in a Lagrangian coordinate system as a Fickian dispersive process with a time dependent dispersion coefficient. GP was provided with depth-averaged solute flux data as well as other depth-averaged plume characteristics (e.g. local and non-local concentration gradients) calculated from a high-resolution numerical model of the system. GP performed a symbolic regression of this data, and the resulting models were analyzed for quality of fit, as well as physical meaning. This analysis resulted in an upscaled model that expressed the transport of solute by unresolved sub-grid velocity variations, which can be expressed entirely in terms of vertically-averaged parameters. The strong evidence of a sub-grid advective component of the upscaled solute transport found by GP was surprising, and this result has led to the creation of new upscaled models that incorporate a sub-grid advection term for modeling the solute transport. For the perfectly stratified aquifer, these new upscaled transport models are able to capture features of the solute plume that are relevant to environmental and hydrological problems that the Fickian model alone cannot predict. This result has far-reaching implications for management models as the new upscaled solute transport models can make higher quality predictions of the solute distribution, without any significant additional computational expense. It is suspected that with further work, the methodology presented here may be applicable to other upscaling problems, such as those encountered in turbulent flow or atmospheric modeling.

SF11A-07 09:30h

An Efficient Data Assimilation Technique Using the Karhunen-Loeve Kalman Filter

* Lu, Z (zhiming@lanl.gov) , Hydrology, Geochemistry, and Geology Group, Los Alamos National Laboratory, MS T003, Los Alamos, NM 87545 United States
Zhang, D (donzhang@ou.edu) , Mewbourne School of Petroleum and Geological Engineering, University of Oklahoma, 100 East Boyd, SEC T301, Norman, OK 73019 United States

The Kalman filter has been widely applied for assimilating new measurements to continuously update the estimate of state variables. The standard Kalman filtering scheme requires computing and storing the covariance matrix of state variables, which is computationally expensive for large-scale problems with millions of grid nodes. In the ensemble Kalman filter, this problem is alleviated with sampling from a limited number of realizations and computing the required subset of the covariance matrix at each update. However, the goodness of the (ensemble) covariance approximated from the limited ensemble depends on the number of realizations used. In this study, we propose a new Kalman filtering scheme based on Karhunen-Loeve or other orthogonal polynomial decompositions of the state vector. We consider flow in heterogeneous reservoirs with spatial variability in permeability. The pressure is measured at some locations at various time intervals. The aim is to characterize the permeability field and to predict the mean pressure and its uncertainty at a future time. In our scheme, the covariance of the permeability is approximated by a small set of eigenvalues and eigenfunctions using the Karhunen-Loeve (KL) decomposition, and reconstruction of this covariance from the KL decomposition can be done whenever needed. In each update, the forward problem is solved using the KL-based moment method, giving a set of functions from which the mean and covariance of the state variables can be constructed, when needed. The statistics of both the permeability field and the pressure field are then updated with the available measurements at this time using the auto-covariance of the pressure and the cross-covariance between the head and permeability from the forward problem. We illustrated our algorithm for a synthetic heterogeneous reservoir and results are compared with those from the exiting methods.