IN52A-01 INVITED
Seeking the Path to Metadata Nirvana
Scientists have always found reusing other scientists' data challenging. Computers did not fundamentally
change the problem, but enabled more and larger instances of it. In fact, by removing human mediation and
time delays from the data sharing process, computers emphasize the contextual information that must be
exchanged in order to exchange and reuse data. This requirement for contextual information has two faces:
"interoperability" when talking about systems, and "the metadata problem" when talking about data.
As much as any single organization, the Marine Metadata Interoperability (MMI) project has been tagged with
the mission "Solve the metadata problem." Of course, if that goal is achieved, then sustained, interoperable
data systems for interdisciplinary observing networks can be easily built -- pesky metadata differences, like
which protocol to use for data exchange, or what the data actually measures, will be a thing of the past. Alas,
as you might imagine, there will always be complexities and incompatibilities that are not addressed, and
data systems that are not interoperable, even within a science discipline.
So should we throw up our hands and surrender to the inevitable? Not at all. Rather, we try to minimize
metadata problems as much as we can. In this we increasingly progress, despite natural forces that pull in
the other direction. Computer systems let us work with more complexity, build community knowledge and
collaborations, and preserve and publish our progress and (dis-)agreements. Funding organizations,
science communities, and technologists see the importance interoperable systems and metadata, and direct
resources toward them. With the new approaches and resources, projects like IPY and MMI can
simultaneously define, display, and promote effective strategies for sustainable, interoperable data systems.
This presentation will outline the role metadata plays in durable interoperable data systems, for better or
worse. It will describe times when "just choosing a standard" can work, and when it probably won't work. And
it will point out signs that suggest a metadata storm is coming to your community project, and how you might
avoid it.
From these lessons we will seek a path to producing interoperable, interdisciplinary, metadata-enlightened
environment observing systems.
http://marinemetadata.org/2008agumetadatanirvana
IN52A-02 INVITED
25 Years of Controlled Vocabularies in Oceanographic Data Management
In the 1980s data managers in IOC realised that data exchange required a common terminology for concepts
such as parameters, instruments and platforms. They rose to the challenge by developing a set of 7
controlled vocabularies that were published in print as part of the GF3 standard in 1987. Unfortunately,
because this was based on print, the vocabularies couldn't be maintained and were little used. However, in
the 1990s the pan-European SeaSearch project developed them into a usable digital vocabulary library.
Whilst this was a significant step forward, vocabulary content governance was delegated to individuals,
technical governance procedures were far from watertight and the terms had no definitions. Consequently,
usage problems, especially local copy evolution and term misunderstandings persisted. When SeaDataNet
started in 2006 there was a determination to use technology to solve these problems. Now, list server
content governance harnesses domain expertise, relational database technology provides robust, scalable,
versioned storage and the NERC DataGrid SOAP and pseudo-RESTful Web Service APIs serve lists and
mappings.
The SeaDataNet vocabulary technology currently addresses two use cases: semantic cross-walking and
metadata field content verification. Basic metadata cross-walks do not transfer fields populated from different
vocabularies, thereby losing information, because machine-accessible translations are unavailable. In
SeaDataNet we have assembled mappings between lists of interest, particularly parameter vocabularies, into
an RDF triple store. The resulting ontology is served as RDF documents through API method calls and term
URLs.
Metadata content verification is implemented by embedding URNs into documents defined by Schematron-
extended schemas that are automatically built by a service monitoring the vocabulary server for content
changes. These schemas allow generic XML editors, such as Oxygen, to validate document semantic
content against the latest vocabulary version.
The NERC DataGrid/SeaDataNet Vocabulary Server is fully operational delivering over 100 lists containing
over 120,000 terms linked by nearly 80,000 mappings. It is receiving approximately 400 catalogue requests
and 3000 list accesses per month accompanied by over 200,000 hits from robots mining the semantic
content.
http://www.bodc.ac.uk/products/web_services/vocab/
IN52A-03
Use of Ontologies in Support of GEOSS Interoperability
GEOSS is an international effort to provide global data sharing for societal benefit via interoperability. This is accomplished through registered services and standards, along with associated metadata. The Standards Registry is a component of the core architecture of GEOSS, and there is an effort underway to develop an ontology of standards for Earth observations in support of interoperability. This presentation will report on the tools being used in developing the ontology, any associated mappings, the progress that has been made, and steps to take for the future.
IN52A-04
Integrating Distributed Physical and Biological Marine data using OGC Web Services
Earth scientists use highly diverse sources of data, including in-situ measurements, remotely-sensed information and the results of numerical simulations. The ability to access, visualize, combine and compare these datasets is at the core of scientific investigation, but such tasks have hitherto been very difficult or impossible due to a fundamental lack of harmonization of data products. As a result, much valuable data remains underused. We present a web portal that visualizes and compares physical and biological marine data from both numerical models and in-situ observations. The model data are obtained via an Open Geospatial Consortium (OGC)-compatible Web Map Service (WMS), and the observed data are obtained via an OGC Web Feature Service (WFS). The physical model WMS, the biological model WMS and the WFS are located at three different institutes. This ability to display in-situ point observations alongside model data facilitates much valuable work on model validation. As models become increasingly complex, and sources of observed data become more numerous, it is important to be able to access and compare this growing amount of data efficiently, to ensure cross-checking and consistency between models and observations. The web portal is being applied in a large European operational oceanography project (ECOOP), where it is used to provide support to ecosystem modellers, and specifically to aid detection of potentially harmful algal blooms in coastal areas. The development of this system has been enabled by the conceptual framework of the Climate Science Modelling Language (CSML), which provides a common view onto all these datasets, independent of their storage format or physical location. CSML is based upon emerging international standards, enabling interoperability with other standards-based infrastructures. By creating a reusable Java library that embodies the CSML concepts we are able to apply these techniques to a number of other projects.
IN52A-05
Moving Toward Climate Data Integration: The Observing System Monitoring Center
Understanding climate variability requires the development, maintenance and evaluation of a sustained global climate observing system. The purpose of the Observing System Monitoring Center (OSMC), which is being funded by the National Oceanic and Atmospheric Administration's (NOAA) Office of Climate Observation (OCO), is to provide a tool that will assist managers and scientists with monitoring the performance of the global in-situ ocean observing system, identifying and correcting problems, and evaluating the adequacy of the observations in support of ocean/climate state estimation, forecasting and research. Initially, the sole source of ocean in-situ data being added to the OSMC database was from the subset of data which is distributed daily via the Global Telecommunications System (GTS). However, it has become clear that in order to maintain a complete record of the observations going into the climate data record, it is necessary to include observations that are collected but not distributed via the GTS system. The challenges of integrating such data into the OSMC parallel the challenges of integrating climate data for general use and discovery by those who would like to utilize the observations. In this presentation, we will be talking about our approach to integrating climate observations through the OSMC. The areas of integration under the OSMC include realtime data input from GTS for the management of in-situ platforms, integration of climate platform archives from Data Access Centers (DACs), and integration of climate products and data. The OSMC hopes to significantly advance climate services by making the data available through web services such as OPeNDAP and the Sensor Observation Service (SOS).
IN52A-06
Extensible Database Designs for Marine Observations
The NOAA Observing System Monitoring Center (OSMC) has been created to assist NOAA and others with managing in-situ ocean observing system resources. The OSMC will ultimately collect and store observations and observation metadata for all in-situ ocean observing platform programs that NOAA contributes to. These data are stored in a spatial relational database management system. The number and diversity of the platforms and parameters included in OSMC is expected to grow with time. An extensible database design is required to accommodate that expansion without requiring changes to the design for each new observing system. This has been accomplished using a design that includes platforms, instruments, locations, observed parameters and observed values as separate entities. This design can support a variety of text reports as well as several different interactive mapping interfaces. We will describe the design and show how it supports integrated access to a wide variety of observations as well as addition of new observing systems to the OSMC.
IN52A-07
Modernized Techniques for Dealing with Quality Data and Derived Products
"I just want a picture of the ocean floor in this area" is expressed all too often by researchers, educators, and
students in the marine geosciences. As more sophisticated systems are developed to handle data collection
and processing, the demand for quality data, and standardized products continues to grow. Data
management is an invisible bridge between science and researchers/educators.
The SIOExplorer digital library presents more than 50 years of ocean-going research. Prior to publication, all
data is checked for quality using standardized criterion developed for each data stream. Despite the
evolution of data formats and processing systems, SIOExplorer continues to present derived products in well-
established formats. Standardized products are published for each cruise, and include a cruise report,
MGD77 merged data, multi-beam flipbook, and underway profiles. Creation of these products is made
possible by processing scripts, which continue to change with ever-evolving data formats. We continue to
explore the potential of database-enabled creation of standardized products, such as the metadata-rich
MGD77 header file.
Database-enabled, automated processing produces standards-compliant metadata for each data and
derived product. Metadata facilitates discovery and interpretation of published products. This descriptive
information is stored both in an ASCII file, and a searchable digital library database. SIOExplorer's underlying
technology allows focused search and retrieval of data and products. For example, users can initiate a
search of only multi-beam data, which includes data-specific parameters. This customization is made possible
with a synthesis of database, XML, and PHP technology.
The combination of standardized products and digital library technology puts quality data and derived
products in the hands of scientists. Interoperable systems enable distribution these published resources
using technology such as web services. By developing modernized strategies to deal with data, Scripps
Institution of Oceanography is able to produce and distribute well-formed, and quality-tested derived
products, which aid research, understanding, and education.
http://gdc.ucsd.edu/
IN52A-08
Quality Assurance System for Earth Science Data and Information
The US Integrated Ocean Observing System (IOOS) vision for the observing systems will bring a wide variety of real-time data from a distributed sensor network. Data quality assurance is the foundation that allows the earth science data and information to be used to create environmental data records and climate data records. We have developed a scalar, modular automated data quality assurance system that can be used by a single data provider or a large data center to characterize the relative quality of various data sets for various collections and platforms and standards. Furthermore, the system is easily configurable to work with different observations and model outputs and only requires minimal code changes to accommodate the addition of quality assurance algorithms and data products. A major component of the system is an Algorithm Library that was implemented using a modular architecture that encapsulates algorithms into decoupled, re-useable modules while providing the mechanism for assembling them into a working system. We are continuing to build and enhance the library from simple rate and limit checks to more sophisticated quality assurance methods for same senor inter-comparisons to comparisons with models. This paper will present an overview of the architecture of the quality assurance system and the application of the Algorithm Library.
IN52A-09
ERDDAP - An Easier Way for Diverse Clients to Access Scientific Data From Diverse Sources
ERDDAP is a new open-source, web-based service that aggregates data from other web services: OPeNDAP
grid servers (THREDDS), OPeNDAP sequence servers (Dapper), NOS SOAP service, SOS (IOOS,
OOStethys), microWFS, DiGIR (OBIS, BMDE). Regardless of the data source, ERDDAP makes all datasets
available to clients via standard (and enhanced) DAP requests and makes some datasets accessible via
WMS. A client's request also specifies the desired format for the results, e.g., .asc, .csv, .das, .dds, .dods,
htmlTable, XHTML, .mat, netCDF, .kml, .png, or .pdf (formats more directly useful to clients). ERDDAP
interprets a client request, requests the data from the data source (in the appropriate way), reformats the
data source's response, and sends the result to the client. Thus ERDDAP makes data from diverse sources
available to diverse clients via standardized interfaces. Clients don't have to install libraries to get data from
ERDDAP because ERDDAP is RESTful and resource-oriented: a URL completely defines a data request and
the URL can be used in any application that can send a URL and receive a file. This also makes it easy to
use ERDDAP in mashups with other web services. ERDDAP could be extended to support other protocols.
ERDDAP's hub and spoke architecture simplifies adding support for new types of data sources and new
types of clients. ERDDAP includes metadata management support, catalog services, and services to make
graphs and maps.
http://coastwatch.pfel.noaa.gov/erddap