Earth and Space Science Informatics [IN]

IN52A
 MC:3014  Friday  1020h

Strategies for Improved Marine and Synergistic Data Access and Interoperability II


Presiding:  C Chandler, Woods Hole Oceanographic Institution; K Baker, Scripps Institution of Oceanography

IN52A-01 INVITED

Seeking the Path to Metadata Nirvana

* Graybeal, J graybeal@mbari.org, Monterey Bay Aquarium Research Institute, 7700 Sandholdt Road, Moss Landing, CA 95039, United States

Scientists have always found reusing other scientists' data challenging. Computers did not fundamentally change the problem, but enabled more and larger instances of it. In fact, by removing human mediation and time delays from the data sharing process, computers emphasize the contextual information that must be exchanged in order to exchange and reuse data. This requirement for contextual information has two faces: "interoperability" when talking about systems, and "the metadata problem" when talking about data. As much as any single organization, the Marine Metadata Interoperability (MMI) project has been tagged with the mission "Solve the metadata problem." Of course, if that goal is achieved, then sustained, interoperable data systems for interdisciplinary observing networks can be easily built -- pesky metadata differences, like which protocol to use for data exchange, or what the data actually measures, will be a thing of the past. Alas, as you might imagine, there will always be complexities and incompatibilities that are not addressed, and data systems that are not interoperable, even within a science discipline. So should we throw up our hands and surrender to the inevitable? Not at all. Rather, we try to minimize metadata problems as much as we can. In this we increasingly progress, despite natural forces that pull in the other direction. Computer systems let us work with more complexity, build community knowledge and collaborations, and preserve and publish our progress and (dis-)agreements. Funding organizations, science communities, and technologists see the importance interoperable systems and metadata, and direct resources toward them. With the new approaches and resources, projects like IPY and MMI can simultaneously define, display, and promote effective strategies for sustainable, interoperable data systems. This presentation will outline the role metadata plays in durable interoperable data systems, for better or worse. It will describe times when "just choosing a standard" can work, and when it probably won't work. And it will point out signs that suggest a metadata storm is coming to your community project, and how you might avoid it. From these lessons we will seek a path to producing interoperable, interdisciplinary, metadata-enlightened environment observing systems.

http://marinemetadata.org/2008agumetadatanirvana

IN52A-02 INVITED

25 Years of Controlled Vocabularies in Oceanographic Data Management

* Lowry, R K rkl@bodc.ac.uk, British Oceanographic Data Centre, 6 Brownlow Street, Liverpool, L3 5DA, United Kingdom

In the 1980s data managers in IOC realised that data exchange required a common terminology for concepts such as parameters, instruments and platforms. They rose to the challenge by developing a set of 7 controlled vocabularies that were published in print as part of the GF3 standard in 1987. Unfortunately, because this was based on print, the vocabularies couldn't be maintained and were little used. However, in the 1990s the pan-European SeaSearch project developed them into a usable digital vocabulary library. Whilst this was a significant step forward, vocabulary content governance was delegated to individuals, technical governance procedures were far from watertight and the terms had no definitions. Consequently, usage problems, especially local copy evolution and term misunderstandings persisted. When SeaDataNet started in 2006 there was a determination to use technology to solve these problems. Now, list server content governance harnesses domain expertise, relational database technology provides robust, scalable, versioned storage and the NERC DataGrid SOAP and pseudo-RESTful Web Service APIs serve lists and mappings. The SeaDataNet vocabulary technology currently addresses two use cases: semantic cross-walking and metadata field content verification. Basic metadata cross-walks do not transfer fields populated from different vocabularies, thereby losing information, because machine-accessible translations are unavailable. In SeaDataNet we have assembled mappings between lists of interest, particularly parameter vocabularies, into an RDF triple store. The resulting ontology is served as RDF documents through API method calls and term URLs. Metadata content verification is implemented by embedding URNs into documents defined by Schematron- extended schemas that are automatically built by a service monitoring the vocabulary server for content changes. These schemas allow generic XML editors, such as Oxygen, to validate document semantic content against the latest vocabulary version. The NERC DataGrid/SeaDataNet Vocabulary Server is fully operational delivering over 100 lists containing over 120,000 terms linked by nearly 80,000 mappings. It is receiving approximately 400 catalogue requests and 3000 list accesses per month accompanied by over 200,000 hits from robots mining the semantic content.

http://www.bodc.ac.uk/products/web_services/vocab/

IN52A-03

Use of Ontologies in Support of GEOSS Interoperability

* Browdy, S F steveb@omstech.com, OMS Tech, Inc., 13506 Summerport Village Parkway Suite 345, Windermere, FL 34786, United States

GEOSS is an international effort to provide global data sharing for societal benefit via interoperability. This is accomplished through registered services and standards, along with associated metadata. The Standards Registry is a component of the core architecture of GEOSS, and there is an effort underway to develop an ontology of standards for Earth observations in support of interoperability. This presentation will report on the tools being used in developing the ontology, any associated mappings, the progress that has been made, and steps to take for the future.

IN52A-04

Integrating Distributed Physical and Biological Marine data using OGC Web Services

* Gemmell, A L alg@mail.nerc-essc.ac.uk, Environmental Systems Science Centre, University of Reading, UK, Harry Pitt Building, 3 Earley Gate, Whiteknights, Reading, RG66AL, United Kingdom
Blower, J D jdb@mail.nerc-essc.ac.uk, Environmental Systems Science Centre, University of Reading, UK, Harry Pitt Building, 3 Earley Gate, Whiteknights, Reading, RG66AL, United Kingdom
Haines, K kh@mail.nerc-essc.ac.uk, Environmental Systems Science Centre, University of Reading, UK, Harry Pitt Building, 3 Earley Gate, Whiteknights, Reading, RG66AL, United Kingdom
Price, M martin.price@metoffice.gov.uk, UK Met Office, FitzRoy Road, Exeter, EX13PB, United Kingdom
Millard, K k.millard@hrwallingford.co.uk, HR Wallingford, Howbery Park, Wallingford, OX108BA, United Kingdom
Harpham, Q q.harpham@hrwallingford.co.uk, HR Wallingford, Howbery Park, Wallingford, OX108BA, United Kingdom

Earth scientists use highly diverse sources of data, including in-situ measurements, remotely-sensed information and the results of numerical simulations. The ability to access, visualize, combine and compare these datasets is at the core of scientific investigation, but such tasks have hitherto been very difficult or impossible due to a fundamental lack of harmonization of data products. As a result, much valuable data remains underused. We present a web portal that visualizes and compares physical and biological marine data from both numerical models and in-situ observations. The model data are obtained via an Open Geospatial Consortium (OGC)-compatible Web Map Service (WMS), and the observed data are obtained via an OGC Web Feature Service (WFS). The physical model WMS, the biological model WMS and the WFS are located at three different institutes. This ability to display in-situ point observations alongside model data facilitates much valuable work on model validation. As models become increasingly complex, and sources of observed data become more numerous, it is important to be able to access and compare this growing amount of data efficiently, to ensure cross-checking and consistency between models and observations. The web portal is being applied in a large European operational oceanography project (ECOOP), where it is used to provide support to ecosystem modellers, and specifically to aid detection of potentially harmful algal blooms in coastal areas. The development of this system has been enabled by the conceptual framework of the Climate Science Modelling Language (CSML), which provides a common view onto all these datasets, independent of their storage format or physical location. CSML is based upon emerging international standards, enabling interoperability with other standards-based infrastructures. By creating a reusable Java library that embodies the CSML concepts we are able to apply these techniques to a number of other projects.

IN52A-05

Moving Toward Climate Data Integration: The Observing System Monitoring Center

* O'Brien, K kevin.m.obrien@noaa.gov, University of Washington/JISAO, Box 355672, Seattle, wa 98195, United States
Hankin, S steven.c.hankin@noaa.gov, NOAA/PMEL, 7600 Sand Point Way NE, Seattle, WA 98115, United States
Hankin, S steven.c.hankin@noaa.gov, University of Washington/JISAO, Box 355672, Seattle, wa 98195, United States
Habermann, T Ted.Habermann@noaa.gov, NOAA/NGDC, 325 Broadway, Boulder, CO 80305, United States
Kern, K Kevin.Kern@noaa.gov, NOAA/NDBC, 1007 Balch Blvd, Stennis Space Center, MS 39529, United States
Schweitzer, R Roland.Schweitzer@noaa.gov, Weathertop Consulting, LLC, 2802 Cimarron CT., College Station, TX 77845, United States
Little, M Michelle.Little@noaa.gov, Planning Systems Incorporated, 115 Christian Lane, Slidell, LA 70458, United States
Snowden, D Derrick.Snowden@noaa.gov, NOAA/OCO, 1100 Wayne Avenue, Silver Spring, MD 20910, United States
Cartwright, J John.C.Cartwright@noaa.gov, NOAA/NGDC, 325 Broadway, Boulder, CO 80305, United States
LaRocque, J John.Larocque@noaa.gov, NOAA/NGDC, 325 Broadway, Boulder, CO 80305, United States
Li, J Jing.Y.Li@noaa.gov, Macroostaff Consulting, 11711 S.E. 8th, Bellevue, WA 98005, United States
Malczyk, J Jeremy.Malczyk@noaa.gov, University of Washington/JISAO, Box 355672, Seattle, wa 98195, United States
Manke, A Ansley.B.Manke@noaa.gov, NOAA/PMEL, 7600 Sand Point Way NE, Seattle, WA 98115, United States

Understanding climate variability requires the development, maintenance and evaluation of a sustained global climate observing system. The purpose of the Observing System Monitoring Center (OSMC), which is being funded by the National Oceanic and Atmospheric Administration's (NOAA) Office of Climate Observation (OCO), is to provide a tool that will assist managers and scientists with monitoring the performance of the global in-situ ocean observing system, identifying and correcting problems, and evaluating the adequacy of the observations in support of ocean/climate state estimation, forecasting and research. Initially, the sole source of ocean in-situ data being added to the OSMC database was from the subset of data which is distributed daily via the Global Telecommunications System (GTS). However, it has become clear that in order to maintain a complete record of the observations going into the climate data record, it is necessary to include observations that are collected but not distributed via the GTS system. The challenges of integrating such data into the OSMC parallel the challenges of integrating climate data for general use and discovery by those who would like to utilize the observations. In this presentation, we will be talking about our approach to integrating climate observations through the OSMC. The areas of integration under the OSMC include realtime data input from GTS for the management of in-situ platforms, integration of climate platform archives from Data Access Centers (DACs), and integration of climate products and data. The OSMC hopes to significantly advance climate services by making the data available through web services such as OPeNDAP and the Sensor Observation Service (SOS).

IN52A-06

Extensible Database Designs for Marine Observations

Snowden, D Derrick.Snowden@noaa.gov, NOAA Office of Climate Observations, Suite 1202, Silver Spring, MD 20910-5603, United States
* Habermann, T ted.habermann@noaa.gov, NOAA National Geophysical Data Center, E/GC1, 325 Broadway, Boulder, CO 80305- 3328,
Cartwright, J C john.c.cartwright@noaa.gov, NOAA National Geophysical Data Center, E/GC1, 325 Broadway, Boulder, CO 80305- 3328,
LaRocque, J John.Larocque@noaa.gov, NOAA National Geophysical Data Center, E/GC1, 325 Broadway, Boulder, CO 80305- 3328,
Kern, K Kevin.Kern@noaa.gov, NOAA National Data Buoy Center, 1007 Balch Blvd., Stennis Space Center, MS 39529, United States
Little, M Michelle.Little@noaa.gov, NOAA National Data Buoy Center, 1007 Balch Blvd., Stennis Space Center, MS 39529, United States
O'Brien, K M Kevin.M.O'Brien@noaa.gov, NOAA Pacific Marine Environmental Laboratory, NOAA /R/PMEL 7600 Sand Point Way NE, Seattle, WA 98115, United States
Hankin, S Steven.C.Hankin@noaa.gov, NOAA Pacific Marine Environmental Laboratory, NOAA /R/PMEL 7600 Sand Point Way NE, Seattle, WA 98115, United States

The NOAA Observing System Monitoring Center (OSMC) has been created to assist NOAA and others with managing in-situ ocean observing system resources. The OSMC will ultimately collect and store observations and observation metadata for all in-situ ocean observing platform programs that NOAA contributes to. These data are stored in a spatial relational database management system. The number and diversity of the platforms and parameters included in OSMC is expected to grow with time. An extensible database design is required to accommodate that expansion without requiring changes to the design for each new observing system. This has been accomplished using a design that includes platforms, instruments, locations, observed parameters and observed values as separate entities. This design can support a variety of text reports as well as several different interactive mapping interfaces. We will describe the design and show how it supports integrated access to a wide variety of observations as well as addition of new observing systems to the OSMC.

IN52A-07

Modernized Techniques for Dealing with Quality Data and Derived Products

* Neiswender, C cneiswender@ucsd.edu, Scripps Institution of Oceanography, 8635 Discovery Way, RH315, La Jolla, CA 92037, United States
Miller, S P spmiller@ucsd.edu, Scripps Institution of Oceanography, 8635 Discovery Way, RH315, La Jolla, CA 92037, United States
Clark, D dclark@ucsd.edu, Scripps Institution of Oceanography, 8635 Discovery Way, RH315, La Jolla, CA 92037, United States

"I just want a picture of the ocean floor in this area" is expressed all too often by researchers, educators, and students in the marine geosciences. As more sophisticated systems are developed to handle data collection and processing, the demand for quality data, and standardized products continues to grow. Data management is an invisible bridge between science and researchers/educators. The SIOExplorer digital library presents more than 50 years of ocean-going research. Prior to publication, all data is checked for quality using standardized criterion developed for each data stream. Despite the evolution of data formats and processing systems, SIOExplorer continues to present derived products in well- established formats. Standardized products are published for each cruise, and include a cruise report, MGD77 merged data, multi-beam flipbook, and underway profiles. Creation of these products is made possible by processing scripts, which continue to change with ever-evolving data formats. We continue to explore the potential of database-enabled creation of standardized products, such as the metadata-rich MGD77 header file. Database-enabled, automated processing produces standards-compliant metadata for each data and derived product. Metadata facilitates discovery and interpretation of published products. This descriptive information is stored both in an ASCII file, and a searchable digital library database. SIOExplorer's underlying technology allows focused search and retrieval of data and products. For example, users can initiate a search of only multi-beam data, which includes data-specific parameters. This customization is made possible with a synthesis of database, XML, and PHP technology. The combination of standardized products and digital library technology puts quality data and derived products in the hands of scientists. Interoperable systems enable distribution these published resources using technology such as web services. By developing modernized strategies to deal with data, Scripps Institution of Oceanography is able to produce and distribute well-formed, and quality-tested derived products, which aid research, understanding, and education.

http://gdc.ucsd.edu/

IN52A-08

Quality Assurance System for Earth Science Data and Information

* Koziana, J V james.v.koziana@saic.com, Science Applications International Corp (SAIC), One Enterprise Parkway Suite 310, Hampton, VA 23666, United States
Olson, J john.o.olson@saic.com, Science Applications International Corp (SAIC), One Enterprise Parkway Suite 310, Hampton, VA 23666, United States
Lu, W weiwei.w.lu@saic.com, Science Applications International Corp (SAIC), One Enterprise Parkway Suite 310, Hampton, VA 23666, United States
Anselmo, T M troy.m.anselmo@saic.com, Science Applications International Corp (SAIC), One Enterprise Parkway Suite 310, Hampton, VA 23666, United States
Ramsayer, D B douglas.b.ramsayer@saic.com, Science Applications International Corp (SAIC), One Enterprise Parkway Suite 310, Hampton, VA 23666, United States

The US Integrated Ocean Observing System (IOOS) vision for the observing systems will bring a wide variety of real-time data from a distributed sensor network. Data quality assurance is the foundation that allows the earth science data and information to be used to create environmental data records and climate data records. We have developed a scalar, modular automated data quality assurance system that can be used by a single data provider or a large data center to characterize the relative quality of various data sets for various collections and platforms and standards. Furthermore, the system is easily configurable to work with different observations and model outputs and only requires minimal code changes to accommodate the addition of quality assurance algorithms and data products. A major component of the system is an Algorithm Library that was implemented using a modular architecture that encapsulates algorithms into decoupled, re-useable modules while providing the mechanism for assembling them into a working system. We are continuing to build and enhance the library from simple rate and limit checks to more sophisticated quality assurance methods for same senor inter-comparisons to comparisons with models. This paper will present an overview of the architecture of the quality assurance system and the application of the Algorithm Library.

IN52A-09

ERDDAP - An Easier Way for Diverse Clients to Access Scientific Data From Diverse Sources

Mendelssohn, R roy.mendelssohn@noaa.gov, NOAA Environmental Research Division, 1352 Lighthouse Ave, Pacific Grove, CA 93950, United States
* Simons, R A bob.simons@noaa.gov, University of Hawaii at Manoa, JIMAR, 1000 Pope Road Marine Sciences Bldg, Rm 312, Honolulu, HI 96822, United States
* Simons, R A bob.simons@noaa.gov, NOAA Environmental Research Division, 1352 Lighthouse Ave, Pacific Grove, CA 93950, United States

ERDDAP is a new open-source, web-based service that aggregates data from other web services: OPeNDAP grid servers (THREDDS), OPeNDAP sequence servers (Dapper), NOS SOAP service, SOS (IOOS, OOStethys), microWFS, DiGIR (OBIS, BMDE). Regardless of the data source, ERDDAP makes all datasets available to clients via standard (and enhanced) DAP requests and makes some datasets accessible via WMS. A client's request also specifies the desired format for the results, e.g., .asc, .csv, .das, .dds, .dods, htmlTable, XHTML, .mat, netCDF, .kml, .png, or .pdf (formats more directly useful to clients). ERDDAP interprets a client request, requests the data from the data source (in the appropriate way), reformats the data source's response, and sends the result to the client. Thus ERDDAP makes data from diverse sources available to diverse clients via standardized interfaces. Clients don't have to install libraries to get data from ERDDAP because ERDDAP is RESTful and resource-oriented: a URL completely defines a data request and the URL can be used in any application that can send a URL and receive a file. This also makes it easy to use ERDDAP in mashups with other web services. ERDDAP could be extended to support other protocols. ERDDAP's hub and spoke architecture simplifies adding support for new types of data sources and new types of clients. ERDDAP includes metadata management support, catalog services, and services to make graphs and maps.

http://coastwatch.pfel.noaa.gov/erddap