SF33B-01 INVITED 13:40h
NOAA's Scientific Data Stewardship Program
The NOAA mission is to understand and predict changes in the Earth's environment and conserve and manage coastal and marine resources to meet the Nation's economic, social and environmental needs. NOAA has responsibility for long-term archiving of the United States environmental data and has recently integrated several data management functions into a concept called Scientific Data Stewardship. Scientific Data Stewardship a new paradigm in data management consisting of an integrated suite of functions to preserve and exploit the full scientific value of NOAA's, and the world's, environmental data These functions include careful monitoring of observing system performance for long-term applications, the generation of authoritative long-term climate records from multiple observing platforms, and the proper archival of and timely access to data and metadata. NOAA has developed a conceptual framework to implement the functions of scientific data stewardship. This framework has five objectives: 1) develop real-time monitoring of all satellite observing systems for climate applications, 2) process large volumes of satellite data extending up to decades in length to account for systematic errors and to eliminate artifacts in the raw data (referred to as fundamental climate data records, FCDRs), 3) generate retrieved geophysical parameters from the FCDRs (referred to as thematic climate data records TCDRs) including combining observations from all sources, 4) conduct monitoring and research by analyzing data sets to uncover climate trends and to provide evaluation and feedback for steps 2) and 3), and 5) provide archives of metadata, FCDRs, and TCDRs, and facilitate distribution of these data to the user community. The term `climate data record' and related terms, such as climate data set, have been used for some time, but the climate community has yet to settle on a concensus definition. A recent United States National Academy of Sciences report recommends using the following definition: a climate data record (CDR) is a time series of measurements of sufficient length, consistency, and continuity to determine climate variability and change.
SF33B-02 13:55h
Registration and Fusion of Multiple Source Remotely Sensed Image Data
Earth and Space Science often involve the comparison, fusion, and integration of multiple types of remotely sensed data at various temporal, radiometric, and spatial resolutions. Results of this integration may be utilized for global change analysis, global coverage of an area at multiple resolutions, map updating or validation of new instruments, as well as integration of data provided by multiple instruments carried on multiple platforms, e.g. in spacecraft constellations or fleets of planetary rovers. Our focus is on developing methods to perform fast, accurate and automatic image registration and fusion. General methods for automatic image registration are being reviewed and evaluated. Various choices for feature extraction, feature matching and similarity measurements are being compared, including wavelet-based algorithms, mutual information and statistically robust techniques. Our work also involves studies related to image fusion and investigates dimension reduction and co-kriging for application-dependent fusion. All methods are being tested using several multi-sensor datasets, acquired at EOS Core Sites, and including multiple sensors such as IKONOS, Landsat-7/ETM+, EO1/ALI and Hyperion, MODIS, and SeaWIFS instruments. Issues related to the coregistration of data from the same platform (i.e., AIRS and MODIS from Aqua) or from several platforms of the A-train (i.e., MLS, HIRDLS, OMI from Aura with AIRS and MODIS from Terra and Aqua) will also be considered.
SF33B-03 14:10h
Linked Environments for Atmospheric Discovery (LEAD): A Cyberinfrastructure for Mesoscale Meteorology Research and Education
A new National Science Foundation Large Information Technology Research (ITR) grant - known as Linked Environments for Atmospheric Discovery (LEAD) - has been funded to facilitate the identification, access, preparation, assimilation, prediction, management, analysis, mining, and visualization of a broad array of meteorological data and model output, independent of format and physical location. A transforming element of LEAD is dynamic workflow orchestration and data management, which will allow use of analysis tools, forecast models, and data repositories as dynamically adaptive, on-demand systems that can a) change configuration rapidly and automatically in response to weather; b) continually be steered by new data; c) respond to decision-driven inputs from users; d) initiate other processes automatically; and e) steer remote observing technologies to optimize data collection for the problem at hand. Having been in operation for slightly more than a year, LEAD has created a technology roadmap and architecture for developing its capabilities and placing them within the academic and research environment. Further, much of the LEAD infrastructure being developed for the WRF model, particularly workflow orchestration, will play a significant role in the nascent WRF Developmental Test Bed Center located at NCAR. This paper updates the status of LEAD (e.g., the topics noted above), its ties with other community activities (e.g., CONDUIT, THREDDS, MADIS, NOMADS), and the manner in which LEAD technologies will be made available for general use. Each component LEAD application is being created as a standards-based Web service that can be run in stand-alone configuration or chained together to build an end-to-end environment for on-demand, real time NWP. We describe in this paper the concepts, implementation plans, and expected impacts of LEAD, the underpinning of which will be a series of interconnected, heterogeneous virtual IT "Grid environments" designed to provide a complete framework for mesoscale meteorology research and education. A set of Integrated Grid and Web Services Testbeds will maintain a rolling archive of several months of recent data, provide tools for operating on them, and serve as an infrastructure (i.e., a mini Grid) for developing distributed Web services capabilities. Education Testbeds will integrate education and outreach throughout the entire LEAD program, and will help shape LEAD research into applications that are congruent with pedagogic requirements, national standards, and evaluation metrics. Ultimately, the LEAD environments will enable researchers, educators, and students to run atmospheric models and other tools in much more realistic, real time settings than is now possible, with emphasis on the use of locally or otherwise uniquely available data.
http://lead.ou.edu
SF33B-04 14:25h
Grid Technology as a Cyberinfrastructure for Delivering High-End Services to the Earth and Space Science Community
Grid technology consists of middleware that permits distributed computations, data and sensors to be seamlessly integrated into a secure, single-sign-on processing environment. Grid technology allows resources that exist in enterprises that are under different administrative control to be securely integrated into a single processing environment. The grid community has adopted commercial web services technology as a means for implementing persistent, re-usable grid services that sit on top of the basic distributed processing environment that grids provide. These grid services can then form building blocks for even more complex grid services. The emerging Semantic grid work seeks to associates sufficient semantic information with each grid service such that applications will be able to automatically select, compose and if necessary substitute available equivalent services in order to assemble collections of services that are most appropriate for a particular application. Grid technology has been used to provide limited support to various Earth and space science applications. Looking to the future, this emerging grid service technology can provide a cyberinfrastructures for both the Earth and space science communities. Groups within these communities could transform those applications that have community-wide applicability into persistent grid services that are made widely available to their respective communities. In concert with grid-enabled data archives, users could easily create complex workflows that extract desired data from one or more archives and process it though an appropriate set of widely distributed grid services discovered using semantic grid technology. As required, high-end computational resources could be drawn from available grid resource pools. Using grid technology, this confluence of data, services and computational resources could easily be harnessed to transform data from many different sources into a desired product that is delivered to a user's workstation or to a web portal though which it could be accessed by its intended audience.
SF33B-05 INVITED 14:40h
The Data System Integrator: The OPeNDAP Data Connector
The notion of an end-to-end data system has become blurred in today's rapidly evolving environment of distributed data system elements. In this presentation, we investigate the data system integrator as the defining element of an end-to-end data system. The data system integrator provides data discovery and delivery within the user's analysis environment. The data system integrator may reside at a remote site - a web-based data portal - or it may reside on the user's computer. The OPeNDAP Data Connector (ODC), an example of the latter, will be used to demonstrate issues related to the construction of a data system integrator. The ODC is a standalone Java program that provides access to several different data discovery services as well as to the OPeNDAP-accessible data listed by these services. It can also interface to analysis applications. In addition to the different access protocols required to query remote directories and to access data from server sites, the ODC also understands several different inventory mechanisms such as GrADS Data Server (GDS) data set lists and THREDDS (Thematic Realtime Environmental Distributed Data Services) catalogs. The ODC is available under the download tab on the OPeNDAP web site: http://opendap.org.
SF33B-06 14:55h
The NOAA Operational Model Archive and Distribution System: A Status Report
The NOAA Operational Model Archive and Distribution System (NOMADS) at the National Climatic Data Center (NCDC) have been serving model and other data since August 2003. During the past year, approximately 13,000 unique users obtained up to one-half million individual model data elements per month. This represents the first US National archive for weather models and data. While the NOMADS services many Gridded Analysis and Display System (GrADS) users, other users are obtaining these data in the format neutral data transport mechanism known as the Data Access Protocol (DAP), being developed by the non-profit organization OPeNDAP (formally the Distributed Oceanographic Data System, or DODS). Traditional Web and ftp services are also available however nearly three quarters of the users are using the sub-setting and host side data manipulation capabilities inherent in the DAP distributed data paradigm. While the current success of NOMADS, at both NCDC and at NCEP can be attributed to the sub-setting capabilities and the relatively advanced level of it's current users, some of the known limitations in the NOMADS distributed data services model have yet to be seen. Of concern at NCDC is unknown server loading for multiple high volume requests for the soon to be available NCEP North American regional Reanalysis (NARR). It is expected that this 25 year, 12km regional reanalysis, The NCDC NOMADS is NCEP's primary distribution point for this new data, and NARR will have many uses, crossing several science disciplines. Many users will continue to opt to subset via the NOMADS interfaces, however many institutions require the entire period of record, which is approximately 5 Terabytes in volume. System architecture efforts are underway to expand NCDC NOMADS serving capabilities to accommodate the NARR. It is hopped that NOMADS can provide input to the DAP developers regarding the level of robustness in an operational setting. The NOMADS system is one of the first operational systems of its kind and filled a gap in the geosciences community for retrospective model data access. However users continue to request all the data, even though a subset is actually required. It is clear that many users are becoming more adept at distributed data access tools and the philosophy surrounding it, due in part to the large number of data providers adopting distributed data services and the DAP and the number of desktop clients now supporting the DAP libraries. However, as technology changes and other advances in distributed computing emerge, the challenge facing the Gesoscience community will be to forge agreements for interoperability using standards based metadata and transport mechanisms.
http://www.ncdc.noaa.gov/oa/model/model-resources.html
SF33B-07 15:10h
Managing global satellite data: The GHRSST-PP
The GODAE (Global Ocean Data Assimilation Experiment) High Resolution Sea Surface Temperature Pilot Project (GHRSST-PP) is an international effort to produce high quality enhanced Level 2 SST products (known as L2P) from a number of satellite infrared and microwave sources on both polar orbiting and geostationary platforms. Ultimately these data will be merged by the project into a daily 10 km global cloud free product. The large volumes of satellite information produced by the GHRSST-PP as well as their timeliness will require coordination among data providers (for each individual satellite sensor) and users, methods of quality control and archiving, and tools for data discovery and distribution. The JPL PO.DAAC has developed an infrastructure to meet the requirements of this project including its stringent realtime nature (data available within 4 hours of satellite downlink). This infrastructure includes dedicated software and hardware to ingest, monitor and track the data and metadata generated from global L2P providers including coordinating data delivery and "hand shaking," staging the data in a 30 day rolling store, constructing custom subsetted regional diagnostic products, and delivery of data products to external users via subscription and other methods. The PO.DAAC has also constructed a metadata repository whereby metadata for each individual L2P product is ingested into a database that is externally accessible through a web-based search and query front end. This metadata repository essentially functions as the data discovery mechanism for all GHRSST-PP products, both L2P and merged, that may be be accessible from a number of global sources (including from JPL). A separate database has been developed for satellite to in situ SST matchup information important for satellite SST validation and algorithm development for the GHRSST-PP science team. We will describe the results of an "end-to-end" test with a provider of L2P data from MODIS and AVHRR that demonstrated the complete cycle of data production, ingest and acknowledgment, server population, metadata population, data discovery and custom product generation.
http://www.ghrsst-pp.org
SF33B-08 15:25h
Distributive Online Processing, Visualization and Analysis System for Gridded Remote Sensing Data
The ability to use data stored in the current Earth Observing System (EOS) archives for studying regional or global phenomena is highly dependent on having a detailed understanding of the data's internal structure and physical implementation. Gaining this understanding and applying it to data reduction is a time-consuming task that must be undertaken before the core investigation can begin. This is an especially difficult challenge when science objectives require users to deal with large multi-sensor data sets that are usually of different formats, structures, and resolutions. The NASA Goddard Earth Sciences Data and Information Services Center (GES DISC) has taken a major step towards meeting this challenge by developing an infrastructure with a Web interface that allows users to perform interactive analysis online without downloading any data, the GES-DISC Interactive Online Visualization and Analysis Infrastructure or "Giovanni." Giovanni provides interactive, online, analysis tools for data users to facilitate their research. There have been several instances of this interface created to serve TRMM users, Aerosol scientists, Ocean Color and Agriculture applications users. The first generation of these tools support gridded data only. The user selects geophysical parameters, area of interest, time period; and the system generates an output on screen in a matter of seconds. The currently available output options are: Area plot averaged or accumulated over any available data period for any rectangular area; Time plot time series averaged over any rectangular area; Hovmoller plots image view of any longitude-time and latitude-time cross sections; ASCII output for all plot types; Image animation for area plot. In the future, correlation plots, GIS-compatible outputs, etc. This allow user to focus on data content (i.e. science parameters) and eliminate the need for expensive learning, development and processing tasks that are redundantly incurred by an archive's user community. The current implementation utilizes the GrADS-DODS Server (GDS), a stable, secure data server that provides subsetting and analysis services across the Internet for any GrADS-readable dataset. The subsetting capability allows users to retrieve a specified temporal and/or spatial subdomain from a large dataset, eliminating the need to download everything simply to access a small relevant portion of a dataset. The analysis capability allows users to retrieve the results of an operation applied to one or more datasets on the server. In our case, we use this approach to read pre-processed binary files and/or to read and extract the needed parts from HDF or HDF-EOS files. These subsets then serve as inputs into GrADS analysis scripts. It can be used in a wide variety of Earth science applications: climate and weather events study and monitoring; modeling. It can be easily configured for new applications.
http://disc.gsfc.nasa.gov