Discovering and Accessing Data from the Federation of Earth Science Information Partners
Rob Raskin, Jet Propulsion Lab, California Institute of Technology, Pasadena, CA; Howard Burrows, Autonomous Undersea Systems Institute, Lee, NH; Helen Conover, University of Alabama in Huntsville, Huntsville, AL; James Gallagher, University of Rhode Island, Narragansett Bay, RI; Gene Major, Science Systems and Applications, Inc., Greenbelt, MD; and Tim Rhyne, Oak Ridge National Laboratory, Oak Ridge, TNFor additional information, contact Rob Raskin, Jet Propulsion Laboratory, California Institute of Technology, Pasadena, USA; E-mail: raskin@seastar.jpl.nasa.gov
Copyright 2002 American Geophysical Union
In 1997, NASA funded a federation of cooperating, but autonomous Earth Science Information Partners (ESIPs), charged to work together as an experiment in self-governance and interoperability http://www.esipfed.org. Twenty-four ESIPs, selected through an open, competitive process, joined with NASA to form an initial working, autonomous federation. This group voted to include the eight existing Distributed Active Archive Centers (DAACs) and several other data centers, and it now comprises the 42 partners listed in Table 1, plus NASA. Collectively, the members span multiple Earth science disciplines and fall into three identifiable categories: Type I, data archive centers; Type II, Earth science research centers; and Type III, applications centers. One of the original requirements for all ESIPs was to provide some level of interoperability, so that the data products and services of the heterogeneous partners could be discovered and accessed in some uniform way.
|
Type
1 ESIPs Archive
Centers |
Alaska SAR (Synthetic Aperture Radar) Facility |
|
Global Hydrology Resource Center |
|
|
Goddard Space Flight Center Distributed Active
Archive Center |
|
|
Land Processes Distributed Active Archive Center |
|
|
Langley Research Center Distributed Active Archive
Center |
|
|
National Climate Data Center |
|
|
National Snow and Ice Data Center |
|
|
Oak
Ridge National Laboratory Distributed Active Archive Center |
|
|
Physical Oceanography Distributed Active Archive
Center |
|
|
Socioeconomic Data and Applications Center |
|
|
Type
2 ESIPs Research
Centers |
Distributed Oceanographic Data System |
|
Earth Science Partners’ Private Network |
|
|
Earth System Science Workbench |
|
|
EOS-Webster
(WEB based System for Terrestrial Ecosystem Research) |
|
|
Evolution
of Snow Pack in the Southwestern US (SnowSIP) |
|
|
Global
Land Cover Facility |
|
|
GPS Environmental & Earth Science Information
System (GENESIS) |
|
|
Great Plains Regional Earth Science Applications
Center |
|
|
IBM Watson Research Center/Johns Hopkins School of
Public Health |
|
|
Numerical Terradynamic Simulation Group |
|
|
Ocean Earth Science Information Partner |
|
|
Passive Microwave Earth Science Information Partner |
|
|
Seasonal
to Interannual Earth Science Information Partner |
|
|
Southwest Regional Earth Science Applications
Center |
|
|
Tropical Rain Forest Information Center (TRFIC) |
|
|
Unidata |
|
|
Type 3 ESIPs Applications
Centers |
Bay Area Shared Information Consortium (BASIC) |
|
California Land Science Information Partnership |
|
|
Earth Data Analysis Center |
|
|
Environmental Legal Information Systems |
|
|
Mid-Atlantic Regional Earth Sciences Application
Center |
|
|
Museums Teaching Planet Earth |
|
|
Northeast Applications of Useable Technology in
Land planning for Urban Sprawl (NAUTILUS) |
|
|
Planet Earth Science |
|
|
Reading Information Technology Incorporated |
|
|
Southern California Wildfire Hazard Center |
|
|
Scientific Fishery Systems, Inc. |
|
|
StormCenter.com |
|
|
TERC |
|
|
Terra SIP |
|
|
Terrain Data |
|
|
Upper Midwest Aerospace Consortium |
See http://www.esipfed.org for a complete descriptions and Web sites of Federation members.
The challenge of interoperability is a familiar one from the early 1990s, when the EOSDIS Core System (ECS) was being developed. The solution at that time was ultimately prescribed by NASA; put all data into the Hierarchical Data Format - Earth Observing System (HDF-EOS) format and use a common set of core metadata descriptions. While that was an acceptable technical solution, significant resistance was expressed from the scientific community due to the complexities of the format. The federation emerged, in part, as an alternative to that centralized approach. It would instead explore a bottom-up solution.
Over 2,000 data products are currently available from federation members. Locating and accessing those data that meet a user's needs can be a formidable task, given the myriad ways that data providers store and distribute their products. As the ESIPs developed the federation governance structure, a Standing Committee on Interoperability was formed to define and develop the interoperability layer, which is now known as the Federation Interactive Network for Discovery (FIND). The initial focus was to provide catalog interoperability, allowing data from all ESIPs to be searched from a single user interface. That phase is largely complete. Current efforts are concentrating on the more difficult task of data access interoperability, so that data from multiple sites can be accessed uniformly. No single data access protocol has been selected; instead, "clusters" of ESIPs emerged to explore various inter-operable data access technologies. These technologies will allow software applications to access data at multiple federation sites using a common interface.
Catalog Interoperability
The federation partners use NASA's Global Change Master Directory (GCMD) to maintain a comprehensive catalog of information about its data holdings. GCMD provides a federation-specific Web portal to search these holdings. Mercury, a complementary system, uses the GCMD's federation catalog and combines it with other information, including harvested ESIP Web pages, Earth Observing System Data and Information System (EOSDIS) Data Gateway (EDG) data references, and other information from the ESIPs. In addition to locating data, these systems provide information about tools and services to help use the data. Most of the Type 1 ESIPs, and several of the others, also provide data inventory listings to the EOS Data Gateway, which is a full-featured search and order system. FIND uses Mercury as its default search mechanism for simple searches (see ESIP Federation Search Tool: http://www.esipfed.org/find; see results of an advanced Mercury search for "rain forest": http://mercury.ornl.gov/esip/rainforestlink.html). Advanced searches can be carried out using either Mercury or GCMD (see results of a GCMD search by topic for "Aerosol Particle Properties" http://gcmd.nasa.gov/servlets/md/frame_page_master.py?...).
NASA's Global Change Master Directory
NASA's GCMD is a resource for discovering and locating Earth science data sets that are relevant to global change research. The GCMD contains descriptions of over 10,000 Earth science data sets from around the world, of which 20% are federation data products. Wherever possible, a GCMD search produces links from the data descriptions to the associated data.
At the core of the GCMD are well-structured metadata records designed for accurate search and retrieval and interoperability across multiple computer systems. The GCMD metadata is based on the Directory Interchange Format (DIF) for metadata, which was developed with the end user of Earth science data in mind. Central to the accurate retrieval of information was the development of controlled vocabularies to index the metadata records, and the development of software that checks the validity of those vocabularies when new metadata records are processed.
Controlled vocabularies have been developed for keyword searches by Earth science parameter, source/platform, sensor/instrument, geographic location, project/field campaign, or data center. Free-text searches (without controlled vocabularies) are available for summary, title, references, use and access constraints, and data quality. Free-text searches can be combined with geospatial and temporal constraints to further refine queries. In addition to data set descriptions, the GCMD provides descriptions of tools and services. Over 50 tools and services from federation partners are now listed and searchable. Metadata records are represented internally in the GCMD as XML documents. (XML is an element-oriented markup language.) A Document Type Definition (DTD) structure has been developed, allowing the use of tools such as XML parsers and style sheet processors, along with Java-based technologies to create an environment of shared metadata. GCMD provides additional syntactic and semantic validation of metadata documents, which allows greater consistency and quality of information.
Mercury
The Mercury portion of the FIND expands a user's ability to search for data and information. Mercury powers the "Federation Wide Search" feature on the federation's Web site to simultaneously search the its data holdings, the main federation Web site, and all its partner Web sites. Mercury's Web search screen replicates this capability with additional options. In addition, Mercury offers an advanced data search page that provides free-text, fielded, spatial, and temporal searches.
The Mercury FIND's data descriptions incorporate full GCMD directory entries and additional metadata provided directly to Mercury by the ESIPs and other sources. By using these additional sources of information, the Mercury FIND can provide multiple mechanisms for data access or ordering, including links to order via the EOSDIS Data Gateway. In some cases, user keywords, spatial, and temporal criteria are passed from the Mercury FIND to an ESIP's order system, thus saving the user from having to re-enter information. For many data sets, additional documentation is cross-referenced and included in the Mercury FIND's free-text search capability.
The Mercury FIND is evolving and improving. Work will begin soon to allow interactive visual search and access to the federation's OpenGIS Consortium (OGC)-compliant maps and data. Another enhancement will make Mercury a resource for using data that are described using the Earth Science Markup Language (ESML). ESML, which was developed by a federation partner--the University of Alabama in Huntsville's Information Technology and Systems Center--includes both structural and semantic information needed to effect a practical run-time interpretation of a data set.
Mercury began in 1997 as a new concept in scientific data management for NASA's Large Scale Biosphere-Atmosphere Experiment in Amazonia (LBA), an intensive field campaign in the Amazon Basin. Mercury can harvest metadata from multiple sources and uses XML intensively. Mercury currently serves projects for NASA, the U.S. Geological Survey, the Department of Energy, and the Environmental Protection Agency.
Data Interoperability
Data interoperability is a more ambitious objective than catalog interoperability. The goal is to enable the access and use of federation data from multiple sites via common protocols, tools, or interfaces. This must be achieved despite widely different formats, data structures, and spatio-temporal referencing schemes employed by the individual ESIPs.
Two solutions have gained widespread acceptance within the federation and are described below. Each enables the user to subset a data set by space, time, and parameter, and deliver the product via http to the user's browser or application program. The user need not know or be concerned with the local storage format, and the subsetting takes place prior to the data transfer, so only the needed portion of the file is sent. A more complete technical description of data interoperability options can be found in a Federation White Paper [Nittel, 2001].
Distributed Oceanographic Data System
The Distributed Oceanographic Data System (DODS) is a framework of free software consisting of servers and clients that simplifies using the Internet to share scientific data. DODS servers read data stored in most standard scientific data formats and distribute it to DODS clients. DODS clients include visualization packages such as MATLAB, IDL, Ferret, NetCDF, and Web browsers. A DODS server accepts DODS URL requests, extracts the requested subsets, then converts data into a form that clients can understand. In addition to providing client and server software, the DODS project makes software development tools available. Libraries are available for C++, Java, and netCDF, among others.
A Web-based access tool named Live Access Server (LAS) (see Figure 1) was developed by NOAA's Pacific Marine Environmental Laboratory (PMEL) to enable DODS data to be accessible from a standard Web browser. LAS provides a scientist's view of the data; a LAS user can request visualizations along various axes and planes of the data, and download multi-dimensional subsets in a choice of formats. A user can "fuse"--compare by differencing or co-plotting--variables that may be defined on different coordinate grids, stored in unlike file formats, and located at distributed DODS sites. To support collaborations, groups of LAS servers located at distributed sites can be linked together to appear as a virtual server with a single user interface.

Fig. 1. DODS data access is used by LAS to compare a model-derived,
sea-surface temperature anomaly field from Columbia University's Lamont-Doherty
Observatory with observations from the National Oceanic and Atmospheric
Administration's Pacific Marine Environmental Laboratory.
The DODS project began in late 1993 at the University of Rhode Island's Graduate School of Oceanography. Currently, nearly 300 data sets are available through DODS servers. Continuing work on the project is taking place at many locations, and participation from interested parties is welcomed. Clients, servers, and development tools available for DODS are listed on the DODS Web site
Web Mapping Servers (WMS)/Web Coverage Server (WCS)/Web Feature Server (WFS)
Another set of data interoperability protocols have been developed by the federation's OpenGIS Consortium (OGC). Included are standards for WMSs, which allow users to send image representations of data; WCS allows users to send the actual data; and WFS allows users to send ancillary information about an object on a map.
A typical WMS call is shown below. Parameters include geographic region, spatial reference system, width, height, format, and optionally, time and elevation.
By calling other data servers using the same BBOX, WIDTH, and HEIGHT, maps can be overlaid atop one another. WMS viewers have been developed to overlay features such as coastlines, political boundaries, instrument sites, wind vectors, or contours atop raster images. A viewer will soon be integrated into the FIND search environment to enable users to perform visual, interactive searches of WMS-compliant data within the federation without leaving the FIND environment. WCS calls are similar, but width and height are replaced by format-specific parameters (for example, SKIP for HDF-EOS). NASA is playing an active role in the evolution of WCS standards.
WFS is used to define the geographic extent of "features" such as cyclones, forest fires, or hurricanes. This technology is being explored by several ESIPs. One possible application is to use data mining techniques to identify these transient features and then use the WFS interface to search and retrieve associated feature information.
WMS/WCS/WFS metadata are stored in XML files known as capability files. These descriptors provide the means for data providers to specify what data products are available, in what formats, and in what geographic coordinate systems, etc. Catalogs of known servers have been developed; nightly harvesting of the capability files can be invoked to keep these catalogs current.
Further Interoperability Work
Catalog and data access interoperability provide the basis for uniform discovery and access to federation data. However, current trends suggest that these two levels of interoperability may not keep up with future demands for automated extraction of scientific knowledge in an era when terabytes of data per day are received. The process of extracting value and knowledge from data sets involves understanding what the data mean, a concept of semantic interoperability. Classical examples include distinguishing cloud contamination from "good" measurements and removal of instrument ambiguities using assimilation techniques. Such processing is essential to recognizing trends of global warming. A related concept of service interoperability allows users to access distributed software and hardware capacity.
In one such case, "code shipping" allows transfer of software to the data center, avoiding the need to send huge volumes of data over the wires. In other cases, federation partners are designing a framework of standards to allow a chain of requests for processing that may involve several stages of distributed processing at remote sites. Emerging Internet standards such as Web Services Definition Language (WSDL), Simple Object Access Protocol (SOAP), and Universal Description, Discovery, and Integration (UDDI) can provide tools for discovering and invoking services without the need for human intervention. Such practices are actively being developed within the federation.
The ESIP Federation is one of the world's leading providers of Earth science data and services. It is also very active in developing new methods of delivering data and providing services. These products and services, many of which are free, are a valuable resource to the scientific community. The federation recently established an associated foundation to look after its sustainability. The long-term vision is for a broader base of funding support to complement its current NASA funding.
Reference
Nittel, S., Interoperable data services for earth science data, Federation white paper, http://www.esipfed.org/knowledge_center/interoperable_data_access.pdf, 2001.
URLs
ESIP Federation: http://www.esipfed.org
Search Federation holdings: http://www.esipfed.org/find
GCMD home: http://gcmd.nasa.gov
GCMD ESIP portal: http://gcmd.nasa.gov/Data/portals/esip/
GCMD services: http://gcmd.nasa.gov/services/
Mercury advanced data search: http://mercury.ornl.gov/esip
Mercury Web and data search: http://mercury.ornl.gov/esip/freetext.html
Mercury Project Web site: http://mercury.ornl.gov
EOS Data Gateway: http://eos.nasa.gov/imswelcome
DODS: http://unidata.ucar.edu/packages/dods
Live Access Server: http://www.ferret.noaa.gov/Ferret/LAS
Open GIS Consortium: http://www.opengis.org/
WMS Specifications: http://www.opengis.org/techno/specs/01-068r3.pdf
WCS Specifications: http://www.opengis.org/techno/discussions/01-018.pdf
UDDI: http://www.uddi.org
WSDL: http://www.w3.org/TR/wsdl