IN31D-01 INVITED
Before you make the data interoperable you have to make the people interoperable
In February 2006 a deceptively simple concept was put forward. Could we use the International Year of Planet Earth 2008 as a stimulus to begin the creation of a digital geological map of the planet at a target scale of 1:1 million? Could we design and initiate a project that uniquely mobilises geological surveys around the world to act as the drivers and sustainable data providers of this global dataset? Further, could we synergistically use this geoscientist-friendly vehicle of creating a tangible geological map to accelerate progress of an emerging global geoscience data model and interchange standard? Finally, could we use the project to transfer know-how to developing countries and reduce the length and expense of their learning curve, while at the same time producing geoscience maps and data that could attract interest and investment? These aspirations, plus the chance to generate a global digital geological dataset to assist in the understanding of global environmental problems and the opportunity to raise the profile of geoscience as part of IYPE seemed more than enough reasons to take the proposal to the next stage. In March 2007, in Brighton, UK, 81 delegates from 43 countries gathered together to consider the creation of this global interoperable geological map dataset. The participants unanimously agreed the Brighton "Accord" and kicked off "OneGeology", an initiative that now has the support of more than 85 nations. Brighton was never designed to be a scientific or technical meeting: it was overtly about people and their interaction - would these delegates, with their diverse cultural and technical backgrounds, be prepared to work together to achieve something which, while technically challenging, was not complex in the context of leading edge geoscience informatics. Could we scale up what is a simple informatics model at national level, to deliver global coverage and access? The major challenges for OneGeology (and the deployment of interoperability) are rarely scientific or technical; they were and are the significantly more difficult logistical and "geopolitical - cultural" issues. OneGeology has grown and progressed rapidly to be an international project. It has not only achieved its first phase scientific and technical goals in launching its web map portal with map data from 30 nations at the International Geological Congress in August 2008, but has also attracted substantial scientific, public and media interest around the world. OneGeology is, in every sense, a child of its time - an agile Internet paradigm - a project whose informatics interoperability goals are in reality the total project ethos. The project has been allowed to grow and extend just as fast and as wide as its actors agree to take it, for the most part free from the territoriality and bureaucracy that all too often inhibit such initiatives. It is beyond doubt that a conventionally run (and thus constrained) OneGeology would not have achieved its goals. The OneGeology team has taken enormous strides in a very short space of time and the achievements are considerable. But some new challenges now arise. How will we sustain the project? Where do we take it next? Can OneGeology continue its "liberal" modus operandi? How should we fund and provide continuity for a growing and thus more demanding infrastructure and user base. Should we expand the portal to include map data from academia, commerce and the public (and how to maintain authentication if one does that?) How fast do we increase the sophistication of the informatics and the resolution and diversity of the data? The presentation will describe OneGeology, its current status and the technical and cultural issues involved in trying to move forward interoperability on a global scale.
IN31D-02
Geosciences Information Network (GIN): A Distributed, Interoperable Data Network for the Geosciences
A coalition of the state geological surveys (AASG), the U.S. Geological Survey (USGS), and other partners will receive NSF funding over the next 3 years under the INTEROP solicitation to start building a distributed, interoperable data network that will make thousands of data bases from the geological surveys and their partners available, searchable, and interoperable. This Geosciences Information Network (GIN) will focus on both spatial and analytical geologic data collected across the country for the past 150 years. Key components of the proposed network include: 1) catalog systems for data discovery; 2) service definitions that define interfaces for searching catalogs and accessing resources; 3) shared interchange formats to encode information for transmission; 4) data providers that publish information using standardized services defined by the network; and 5) client applications enabled to utilize information resources provided by the network. The GIN will integrate and utilize catalog resources that currently exist or are in development. We are working closely with the USGS National Geologic Map Database and its existing map catalog; with the USGS National Geological and Geophysical Data Preservation project, which is developing a metadata catalog for geoscience information resource discovery; and with the GEON catalog. Existing and emerging extensible mark-up languages such as GeoSciML, ChemML, and Open Geospatial Consortium sensor, observation and measurement MLs will provide the necessary interchange formats. Client application development will be fostered by collaboration with industry partners such as ESRI who's Geology Data Model for ArcGIS software is being designed to be compatible with GIN. The GIN project will focus on development of the remaining aspects of the system including: service definitions, technical assistance to data providers to implement the services and bring content online, and system integration. The Geosciences Information Network project will be managed by the Arizona Geological Survey on behalf of the Association of American State Geologists (AASG) in partnership with the USGS. Other collaborations include the OneGeology- Europe (www.onegeology.org) consortium of 27 nations that is building a similar network under the EU INSPIRE initiative, GEON (www.geongrid.org), and Earthchem (www.earthchem.org). OneGeology-Europe and GIN have agreed to integrate their networks and work towards the goal of developing global standards among geological surveys.
IN31D-03
GeosciNET: Building a Global Geoinformatics Partnership
GeosciNET is a collaboration of several existing geoinformatics efforts organized to provide a more effective data system for geoscience projects. Current members are: CoreWall (www.corewall.org), Geoinformatics for Geochemistry (GfG; www.geoinfogeochem.org), System for Earth Sample Registration (SESAR; www.geosamples.org ), GeoStrat SYS (www.geostratsys.org (formerly: PaleoStrat, www.paleostrat.org)), and the International Continental Drilling Program (ICDP; www.icdp-online.org). GeosciNET's basic goal is to advance coordination, complementarity, and interoperability, and minimize duplication of efforts among the involved partner systems in order to streamline the development and operation of geoinformatics efforts. We believe that by advancing the development and data holdings of its member groups, the overall value of each site will be significantly enhanced and better meet the needs of the users. With the existing membership, GeosciNET can offer a comprehensive, integrated system for data acquisition, dissemination, archiving, visualization, integration, and analysis. The system will enable a single researcher or a group of collaborators to keep track of, visualize, and digitally archive any type of sample- or stratigraphic-based data produced from drill holes, dredges, measured stratigraphic sections, the field, or the laboratory. The challenge is to build a linked system that provides users a library of research data as well as tools to input, discover, access, integrate, manipulate, analyze, and model interdisciplinary data - all without corrupting the original data and insuring that the data are attributed to the originator at all times. Science runs on data, but despite the importance of data (legacy or otherwise), there are currently few convenient mechanisms that enable users to easily input their data into databases. While some efforts such as GfG databases, PetDB and SedDB have worked hard to compile such data, only users' active participation can capture the major part of critical legacy data, and insure that new data enter the digital stream as they are generated. GeosciNET wants to lower the barriers so users can take advantage of geoinformatics resources and embrace its promise as the platform for doing the science of the future. Once these benefits are understood by the user community, the obstacles that currently exist in building a larger geoinformatics system will start to erode. User participation requires the proper tools such as translators that can recognize tags and parse the data accordingly, and incentives such as tools for visualization, synthesis and analysis, and digital collaboration space. A major focus for GeosciNET is to support individual researchers and projects that do not have their own dedicated data management and education and outreach programs. One of the greatest challenges for geoinformatics lies in being perceived as a friendly resource by its users where they can easily link their observations and analyses and integrate them with other data. GeosciNET will be experimenting with mechanisms to accomplish these goals.
IN31D-04
Full and Open Access to Data in the Global Earth Observing System of Systems (GEOSS): Implementing the GEOSS Data Sharing Principles
Full and open access to data from remote sensing platforms and other sources can facilitate not only
scientific research but also the more widespread and effective use of scientific data for the benefit of society.
The Global Earth Observing System of Systems (GEOSS) is a major international initiative of the Group on
Earth Observations (GEO) to develop "coordinated, comprehensive and sustained Earth observations and
information." In 2005, GEO adopted the GEOSS Data Sharing Principles, which call for the "full and open
exchange of data, metadata, and products shared within GEOSS, recognizing relevant international
instruments and national policies and legislation." These Principles also note that "All shared data, metadata,
and products will be made available with minimum time delay and at minimum cost" and that "All shared data,
metadata, and products being free of charge or no more than cost of reproduction will be encouraged for
research and education." GEOSS Task DA-06-01, aimed at developing a set of recommended
implementation guidelines for the Principles, was established in 2006 under the leadership of CODATA, the
Committee on Data for Science and Technology of the International Council for Science (ICSU). An
international team of authors has developed a draft White Paper on the GEOSS Data Sharing Principles and
a proposed set of implementation guidelines. These have been carefully reviewed by independent reviewers,
various GEO Committees, and GEO National Members and Participating Organizations. It is expected that
the proposed implementation guidelines will be discussed at the GEO-V Plenary in Budapest in November
2008.
The current version of the proposed implementation guidelines recognizes the importance of good faith,
voluntary adherence to the Principles by GEO National Members and Participating Organizations. It
underscores the value of reuse and re-dissemination of GEOSS data with minimum restrictions, not only
within GEOSS itself but on the part of GEOSS users. Consistency with relevant international instruments and
applicable policies and legislation is essential, and therefore clarification and coordination of applicable
policies and procedures are needed. Pricing of GEOSS data, metadata, and products should be based on
the premise that the data and information within GEOSS is a public good for public-interest use in the nine
societal benefit areas. Time delays for data access from both operational and research systems should be
kept to a minimum, reflecting the norms of the relevant scientific communities or data processing centers.
The proposed guidelines also emphasize the need to better define research and education uses and to
develop and collect usage metrics and indicators. The draft White Paper provides a more detailed review of
past and current data policies related to space-based and spatial data, assesses the implications of the Data
Sharing Principles for selected case studies, and discusses a number of other important implementation
issues. Successful implementation of the GEOSS Data Sharing Principles is likely to be a critical element in
the future effectiveness and value of GEOSS.
http://www.codata.org/GEOSS/index.html
IN31D-05
GeoSciML 2: Enabling Enhanced Geologic Information Interoperability
Interchange and mark-up languages such as the Geography Markup Language (GML) provide standard structures for transferring geospatial information within cyber-based infrastructures. In 2006 the CGI-IUGS Interoperability Working Group (IWG) released GeoSciML 1 as an application of GML for basic geologic information. Since then further testing and use case analysis has resulted in enhancements to the design of GeoSciML, which have increased both the depth and breadth of the representation of geologic units, earth materials, structures, and associated vocabularies. After careful testing of these enhancements in a recent implementation testbed, the IWG is officially releasing GeoSciML 2. The release includes a GeoSciML schema representation in UML and XML formats, text descriptions of schema components, and example data files. This paper will describe GeoSciML 2, focussing on significant changes and practical implementations, including its utilization in geospatial standard technologies such those being deployed by emerging geoscience information networks. As a result of these advancements GeoSciML 2 is poised to become a key vehicle for the delivery of basic geologic information within such networks.
IN31D-06
QuakeML: Recent Development and First Applications of the Community-Created Seismological Data Exchange Standard
QuakeML is an XML-based exchange format for seismological data which is being developed using a community-driven approach. It covers basic event description, including picks, arrivals, amplitudes, magnitudes, origins, focal mechanisms, and moment tensors. Contributions have been made from ETH, GFZ, USC, SCEC, USGS, IRIS DMC, EMSC, ORFEUS, GNS, ZAMG, BRGM, and ISTI. The current release (Version 1.1, Proposed Recommendation) reflects the results of a public Request for Comments process which has been documented online at http://quakeml.org/RFC_BED_1.0. QuakeML has recently been adopted as a distribution format for earthquake catalogs by GNS Science, New Zealand, and the European-Mediterranean Seismological Centre (EMSC). These institutions provide prototype QuakeML web services. Furthermore, integration of the QuakeML data model in the CSEP (Collaboratory for the Study of Earthquake Predictability, http://www.cseptesting.org) testing center software developed by SCEC is under way. QuakePy is a Python- based seismicity analysis toolkit which is based on the QuakeML data model. Recently, QuakePy has been used to implement the PMC method for calculating network recording completeness (Schorlemmer and Woessner 2008, in press). Completeness results for seismic networks in Southern California and Japan can be retrieved through the CompletenessWeb (http://completenessweb.org). Future QuakeML development will include an extension for macroseismic information. Furthermore, development on seismic inventory information, resource identifiers, and resource metadata is under way. Online resources: http://www.quakeml.org, http://www.quakepy.org
IN31D-07
Design Drivers of Water Data Services
The CUAHSI Hydrologic Information System (HIS) is being developed as a geographically distributed network
of hydrologic data sources and functions that are integrated using web services so that they function as a
connected whole. The core of the HIS service-oriented architecture is a collection of water web services,
which provide uniform access to multiple repositories of observation data. These services use SOAP
protocols communicating WaterML (Water Markup Language). When a client makes a data or metadata
request using a CUAHSI HIS web service, these requests are made in standard manner, following the CUAHSI
HIS web service signatures – regardless of how the underlying data source may be organized. Also,
regardless of the format in which the data are returned by the source, the web services respond to requests
by returning the data in a standard format of WaterML.
The goal of WaterML design has been to capture semantics of hydrologic observations discovery and
retrieval and express the point observations information model as an XML schema. To a large extent, it
follows the representation of the information model as adopted by the CUASHI Observations Data Model
(ODM) relational design. Another driver of WaterML design is specifications and metadata adopted by USGS
NWIS, EPA STORET, and other federal agencies, as it seeks to provide a common foundation for exchanging
both agency data and data collected in multiple academic projects. Another WaterML design principle was to
create, in version 1 of HIS in particular, a fairly rigid and simple XML schema which is easy to generate and
parse, thus creating the least barrier for adoption by hydrologists.
WaterML includes a series of elements that reflect common notions used in describing hydrologic
observations, such as site, variable, source, observation series, seriesCatalog, and data values. Each of the
three main request methods in the water web services - GetSiteInfo, GetVariableInfo, and GetValues – has a
corresponding response element in WaterML: SitesResponse, VariableResponse, and
TimeSeriesResponse.
The WaterML specification is being adopted by federal agencies. The experimental USGS NWIS Daily Values
web service returns WaterML-compliant TImeSeriesResponse. The National Climatic Data Center is also
prototyping WaterML for data delivery, and has developed a REST-based service that generates WaterML-
compliant output for the NCDC ASOS network. Such agency-supported web services coming online provide a
much more efficient way to deliver agency data compared to the web site scraper services that the CUAHSI
HIS project has developed initially.
The CUAHSI water data web services will continue to serve as the main communication mechanism within
CUAHSI HIS, connecting a variety of data sources with a growing set of web service clients being developed
in both academia and the commercial sector. The driving forces for the development of web services
continue to be:
- Application experience and needs of the growing number of CUAHSI HIS users, who experiment with
additional data types, analysis modes, data browsing and searching strategies, and provide feedback to
WaterML developers;
- Data description requirements posed by various federal and state agencies;
- Harmonization with standards being adopted or developed in neighboring communities, in particular the
relevant standards being explored within the Open Geospatial Consortium.
CUAHSI WaterML is a standard output schema for CUAHSI HIS water web services. Its formal specification is
available as OGC discussion paper at www.opengeospatial.org/standards/dp/
class="ab'>
IN31D-08
Surviving the Transition from FGDC to ISO Metadata Standards
The NOAA Metadata Manager and Repository (NMMR) has served a well established group of data managers at NOAA's National Data Centers for over a decade. It provides a web interface for managing FGDC compliant metadata and publishing that metadata to several large data discovery systems (GeoSpatial One-Stop, NASA's Global Change Master Directory, the Comprehensive Large-Array data Stewardship System, and FirstGov). The Data Center's are now faced with migration of these metadata to new International Metadata Standards (ISO 19115, 19115-2, …). We would like to accomplish this migration while minimizing disruption to the current users and supporting significant new capabilities of the ISO standards. Our current approach involves relational ISO views on top of the existing XML database to convert FGDC content into ISO without changing the data manager interface. These views are the foundation for ISO- compliant XML metadata access via REST-like web services. Additionally, new database tables provide information required by ISO that is not included in the FGDC standard. This approach allows us to support the new standard without disrupting the current system.
IN31D-09
NASA's Earth Science Data Systems Standards Process
NASA's Standards Process Group (SPG) facilitates the approval of proposed
standards that have proven implementation and operational benefit for use in NASA's
Earth science data systems. After some initial experience in approving proposed standards, the SPG has
tailored its Standards Process to remove redundant reviews to shorten the review process.
We have found that the candidate submissions that self defined communities are proposing for endorsement
to the SPG are one of 4 types: (1) A NASA community developed standard used within at least one self
defined community where the proposed standard has not been approved or adopted by an external
standards organization and where new implementations are expected to be developed from scratch, using
the proposed standard as the implementation specification; (2) A standard already approved by an external
standards organization but is being proposed for use for the NASA Earth science community; (3) A defacto
standard already widely used; or a (4) Technical Note
We will discuss real examples of the different types of candidate standards that have been proposed and
endorsed (i.e. OPeNDAP's Data Access Protocol, Open Geospatial
Consortium's Web Map Server, and the Hierarchical Data Format). We will discuss a
potential defacto standard (NASA's Global Change Master Directory (GCMD)
Directory Interchange Format (DIF)) that is currently being reviewed.
This past year, the SPG has modified its Standards Process to provide a comprehensive but not redundant
review of the submitted RFC. The end result of the process tailoring is that the reviews will be completed
faster. At each RFC submission, the SPG will decide which reviews will be performed. These reviews are
conducted simultaneously and can include these three types: (1) A Technical review to review the technical
specification and associated implementations; (2) An Operational Readiness review to evaluate whether the
proposed standard works in a NASA environment with NASA Earth science data with the volume of users; (3)
Usefulness review to determine whether the candidate standard is useful or helpful or fits the purpose for the
users. Some submissions, particularly the defacto standards or standards already approved by other
standards organizations, will not need all three types of reviews.
As an internal advisory group, the SPG has a NASA agency centered focus. At the same time, there is
growing awareness that interagency and international standards are extremely relevant to addressing the
regional and global science and decision support applications. The Global Earth Observing System of
Systems (GEOSS) Architecture and Data Management (AMD) Standards Interoperability Forum (SIF) is
designed to encourage the use of standards in contributed components. It is clear that some of the
standards endorsed by the NASA SPG could be important contributions to the GEOSS. The GEOSS
recognized standards can also be reviewed as 'defacto'
standards by the SPG. NASA stakeholders are often also NOAA stakeholders. Members of the NASA SPG
have been working with members of the NOAA standards endorsement process to provide mutual benefit.
We will also discuss the role of the NASA SPG participation with these and other cross-agency and
international standards initiatives.
http://www.esdswg.org/spg
IN31D-10 INVITED
Arctic Observing Network (AON): Enhancing Observing, Data Archiving and Data Discovery Capabilities as Arctic Environmental System Change Continues
The National Science Foundation (NSF) and the National Oceanic and Atmospheric Administration, under the auspices of the U.S. Inter-Agency Arctic Research Policy Committee, are leading the development of the Arctic Observing Network (AON) as part of the implementation of the Study of Environmental Arctic Change (SEARCH) and as a legacy of International Polar Year (IPY). As the Observing Change component of SEARCH, AON complements the Understanding Change and Responding to Change components. AON addresses the need to enhance observing capabilities in a data-sparse region where environmental system changes are among the most rapid on Earth. AON data will contribute to research into understanding the causes and consequences of Arctic environmental system change and its global connections, and to improving predictive skill. AON is also a contribution to the development of a multi-nation, pan-Arctic observing network that is being discussed at the IPY 'Sustaining Arctic Observing Networks' (SAON) workshops. Enhancing Arctic observing capabilities faces many challenges, including coordination and integration of disparate observing elements and data systems that operate according to diverse policies and practices. There is wide agreement that data systems that provide archiving and discovery services are essential and integral to AON. In recognition of this, NSF is supporting the development of CADIS (Cooperative Arctic Data and Information Service) as an AON portal for data discovery, a repository for data storage, and a platform for data analysis. NSF is also supporting ELOKA (Exchange for Local Observations and Knowledge in the Arctic), a pilot project for a data management and networking service for community- based observing that keeps control of data in the hands of data providers while still allowing for broad searches and sharing of information. CADIS and ELOKA represent the application of cyberinfrastructure to meet AON data system needs that might also contribute to the virtual, physical and social coordination and integration that will be required to create a functioning network that will realise Arctic and global value-added services and societal benefits. Enhanced Arctic observing and data systems also need a data policy. The AON data policy is clear and the same as the SEARCH data policy: data must be fully, freely and openly available as quickly as possible after collection and quality control. That is, with few exceptions, AON data are community data and not subject to embargo. Successful implementation of this policy will require something of a cultural shift among many scientists and northern residents. Sustained, inter-operable data systems that make it easy to deposit and discover data have a role to play a role in achieving that cultural shift.
IN31D-11
Semantics-Based Interoperability Framework for the Geosciences
Interoperability between heterogeneous data, tools and services is required to transform data to knowledge. To meet geoscience-oriented societal challenges such as forcing of climate change induced by volcanic eruptions, we suggest the need to develop semantic interoperability for data, services, and processes. Because such scientific endeavors require integration of multiple data bases associated with global enterprises, implicit semantic-based integration is impossible. Instead, explicit semantics are needed to facilitate interoperability and integration. Although different types of integration models are available (syntactic or semantic) we suggest that semantic interoperability is likely to be the most successful pathway. Clearly, the geoscience community would benefit from utilization of existing XML-based data models, such as GeoSciML, WaterML, etc to rapidly advance semantic interoperability and integration. We recognize that such integration will require a "meanings-based search, reasoning and information brokering", which will be facilitated through inter-ontology relationships (ontologies defined for each discipline). We suggest that Markup languages (MLs) and ontologies can be seen as "data integration facilitators", working at different abstraction levels. Therefore, we propose to use an ontology-based data registration and discovery approach to compliment mark-up languages through semantic data enrichment. Ontologies allow the use of formal and descriptive logic statements which permits expressive query capabilities for data integration through reasoning. We have developed domain ontologies (EPONT) to capture the concept behind data. EPONT ontologies are associated with existing ontologies such as SUMO, DOLCE and SWEET. Although significant efforts have gone into developing data (object) ontologies, we advance the idea of developing semantic frameworks for additional ontologies that deal with processes and services. This evolutionary step will facilitate the integrative capabilities of scientists as we examine the relationships between data and external factors such as processes that may influence our understanding of "why" certain events happen. We emphasize the need to go from analysis of data to concepts related to scientific principles of thermodynamics, kinetics, heat flow, mass transfer, etc. Towards meeting these objectives, we report on a pair of related service engines: DIA (Discovery, integration and analysis), and SEDRE (Semantically-Enabled Data Registration Engine) that utilize ontologies for semantic interoperability and integration.