IN22A-01 INVITED
Next Generation Virtual Observatories
Virtual Observatories (VO) are now being established in a variety of geoscience disciplines beyond their origins in Astronomy and Solar Physics. Implementations range from hydrology and environmental sciences to solid earth sciences. Among the goals of VOs are to provide search/ query, access and use of distributed, heterogeneous data resources. With many of these goals being met and usage increasing, new demands and requirements are arising. In particular there are two of immediate and pressing interest. The first is use of VOs by non-specialists, especially for information products that go beyond the usual data, or data products that are sought for scientific research. The second area is citation and attribution of artifacts that are being generated by VOs. In some sense VOs are re-publishing (re-packaging, or generating new synthetic) data and information products. At present only a few VOs address this need and it is clear that a comprehensive solution that includes publishers is required. Our work in VOs and related semantic data framework and integration areas has lead to a view of the next generation of virtual observatories which the two above-mentioned needs as well as others that are emerging. Both of the needs highlight a semantic gap, i.e. that the meaning and use for a user or users beyond the original design intention is very often difficult or impossible to bridge. For example, VOs created for experts with complex, arcane or jargon vocabularies are not accessible to the non-specialist and further, information products the non-specialist may use are not created or considered for creation. In the second case, use of a (possibly virtual) data or information product (e.g. an image or map) as an intellectual artifact that can be accessed as part of the scientific publication and review procedure also introduces terminology gaps, as well as services that VOs may need to provide. Our supposition is that formalized methods in semantics and semantic web technologies are ideal to meet and solve both of these semantic gaps. In this presentation we highlight both of the emerging needs, and current and emerging semantic web solutions that will enable the next generation of virtual observatories. Our work is funded under NSF/OCI and NASA/ACCESS/ESTO projects to the High Altitude Observatory at the National Center for Atmospheric Research (NCAR) and McGuinness Associates Consulting.
IN22A-02
Cyberinfrastructure for the NSF Ocean Observatories Initiative
The Ocean Observatories Initiative (OOI) is an environmental observatory covering a diversity of oceanic
environments, ranging from the coastal to the deep ocean. The physical infrastructure comprises a
combination of seafloor cables, buoys and autonomous vehicles. It is currently in the final design phase, with
construction planned to begin in mid-2010 and deployment phased over five years. The Consortium for
Ocean Leadership manages this Major Research Equipment and Facilities Construction program with
subcontracts to Scripps Institution of Oceanography, University of Washington and Woods Hole
Oceanographic Institution. High-level requirements for the CI include the delivery of near-real-time data with
minimal latencies, open data, data analysis and data assimilation into models, and subsequent interactive
modification of the network (including autonomous vehicles) by the cyberinfrastructure. Network connections
include a heterogeneous combination of fiber optics, acoustic modems, and Iridium satellite telemetry. The
cyberinfrastructure design loosely couples services that exist throughout the network and share common
software and middleware as necessary. In this sense, the system appears to be identical at all scales, so it is
self-similar or fractal by design. The system provides near-real-time access to data and developed
knowledge by the OOI's Education and Public Engagement program, to the physical infrastructure by the
marine operators and to the larger community including scientists, the public, schools and decision makers.
Social networking is employed to facilitate the virtual organization that builds, operates and maintains the OOI
as well as providing a variety of interfaces to the data and knowledge generated by the program. We are
working closely with NOAA to exchange near-real-time data through interfaces to their Data Interchange
Facility (DIF) program within the Integrated Ocean Observing System (IOOS). Efficiencies have been
emphasized through the use of university and commercial computing clouds.
http://ooici.ucsd.edu/spaces
IN22A-03
Data Relationships: Towards a Conceptual Model of Scientific Data Catalogs
As the amount of data, types of processing and storage formats increase, the total number of record permutations increase dramatically. The result is an overwhelming number of records that make identifying the best data object to answer a user's needs more difficult. The issue is further complicated as each archive's data catalog may be designed around different concepts - - anything from individual files to be served, series of similarly generated and processed data, or something entirely different. Catalogs may not only be flat tables, but may be structured as multiple tables with each table being a different data series, or a normalized structure of the individual data files. Merging federated search results from archives with different catalog designs can create situations where the data object of interest is difficult to find due to an overwhelming number of seemingly similar or entirely unwanted records. We present a reference model for discussing data catalogs and the complex relationships between similar data objects. We show how the model can be used to improve scientist's ability to quickly identify the best data object for their purposes and discuss technical issues required to use this model in a federated system.
IN22A-04
Report From the Cryospheric Cyberinfrastructure: Discovery, Access, and Delivery of Data for IPY (DADDI)
The Discovery, Access, and Delivery of Data for IPY (DADDI) project seeks to improve the availability of
Arctic coastal data, and has the long term goal of developing a system that can be extended to support
access to the spectrum of International Polar Year (IPY) data. Previously, we reported on the process of
defining user needs for DADDI, especially those requirements related to data discovery and access(1). Here
we discuss the implementation of the DADDI system and the components that provide the means to
contribute, preserve, discover and access data relevant to all disciplines within the cryospheric domain.
Our previously reported use case development for the DADDI project described a set of criteria which were
particularly salient for the users of systems supported by a geoscience cyberinfrastructure. These included
the ability to easily control the boundaries of scientific parameter dimensions when searching for,
manipulating and obtaining data; relevant, ranked, and filterable search and browse results; and access to
data quality indicators and references, including access to human experts in the use of the selected data.
Several of those user priorities have been successfully incorporated into the current DADDI environment,
and in particular into the Mercury search system used to provide DADDI's metadata harvesting, indexing,
query, and search results presentation functions. We will discuss and demonstrate the current system and
its capabilities, including a review of the metadata and related standards used to support the existing
features. We will also review the capabilities yet to be implemented, and the infrastructure changes or
additions that will be necessary for DADDI to more fully participate in the cryospheric cyberinfrastructure.
(1) The Virtual Observatory in Action: Recurring Themes in Polar Science Use Cases. 2007 Virtual
Observatories in Geosciences Conference (http://www.egy.org/VOiG/Home.html;
http://www.hao.ucar.edu/projects/vsto/voig/index.php/Session_II:Recurring_Themes_in_Polar_Sciences).
http://www.nsidc.org/daddi/
IN22A-05
Knowledge Provenance in Semantic Wikis
Collaborative online environments with a technical Wiki infrastructure are becoming more widespread. One of the strengths of a Wiki environment is that it is relatively easy for numerous users to contribute original content and modify existing content (potentially originally generated by others). As more users begin to depend on informational content that is evolving by Wiki communities, it becomes more important to track the provenance of the information. Semantic Wikis expand upon traditional Wiki environments by adding some computationally understandable encodings of some of the terms and relationships in Wikis. We have developed a semantic Wiki environment that expands a semantic Wiki with provenance markup. Provenance of original contributions as well as modifications is encoded using the provenance markup component of the Proof Markup Language. The Wiki environment provides the provenance markup automatically, thus users are not required to make specific encodings of author, contribution date, and modification trail. Further, our Wiki environment includes a search component that understands the provenance primitives and thus can be used to provide a provenance-aware search facility. We will describe the knowledge provenance infrastructure of our Semantic Wiki and show how it is being used as the foundation of our group web site as well as a number of project web sites.
IN22A-06
DataSpaces: Using Community Workspaces to Enable Rich Air Quality Metadata
Currently, metadata for air quality datasets is variable, distributed and normally created by the provider for the user. However, a single dataset can be used for many applications that the provider may or may not anticipate and the data may go through many value-adding processes before it reaches the "end user". Additional metadata can be created at any step along the usage chain and at this time there is no mechanism for collecting this metadata. Consequently, users don't know how a dataset has been used or what additional processing has occurred beyond the originator. One method to harvest and share metadata from all members of the usage chain is through community workspaces, DataSpaces. DataSpaces are virtual spaces for contributing and archiving metadata, discussing the dataset and harvesting distributed resources in order to capture the critical community knowledge about the dataset. A DataSpace for a given dataset has two parts, structured, semantically rich metadata and flexible community-contributed metadata. The structured dataset description includes standard dataset metadata, data lineage, and data quality information such as provider, parameters, platform and time period. The additional value of the DataSpaces comes from the context provided by the dataset community: users, mediators and providers. This may be through links to other mediator or user-provided metadata, publications that reference the dataset or web applications and tools using the dataset. DataSpaces also provides a place where a dataset community can connect through discussion and announcements about the dataset. As DataSpaces evolves and is used more by the community, additional functionality will emerge. Currently, there are still many issues with the implementation of DataSpaces including how to link the DataSpace to the dataset as it moves along the usage chain and how material in DataSpaces can be reused in other metadata.
IN22A-07
The Model Interoperability Experiment in the Gulf of Maine: A Success Story Made Possible by NetCDF, CF-1.0, NcML, NetCDF-Java, THREDDS, OPeNDAP and MATLAB
The Gulf of Maine Ocean Data Partnership Modeling Committee has been developing a Model
Interoperability Experiment in the Gulf of Maine built around the Climate and Forecast (CF-1.0) metadata
standard. The goal is to allow scientists to issue common Matlab commands to retrieve geospatially
referenced data, regardless of model type. Our starting point was output from six different models: the
ROMS, ECOM, POM and FVCOM ocean circulation models, the WRF meteorological model and the
WaveWatch III ocean wave model. Although the models all had different grid conventions and were served at
different institutions, each group produced NetCDF files, used Matlab for visualization and analysis, and had
a standard HTTP 1.1 web server. Only one group used CF-conventions, however, and as a result each
group had their own set of analysis and visualization routines to perform nearly identical tasks. The system
was designed to achieve interoperability with a minimum of effort on the part of the data providers and data
users. To supply data, participants need only place their existing NetCDF files on their own web sites. The
data is accessed using the "byte range request" feature of HTTP, utilized in NetCDF-Java. The CF
standardization is achieved using a layer of XML (NcML) which also provides virtual aggregation of data. The
THREDDS Data Server allows for central cataloging of the dataset, access via the OPeNDAP web service,
and for rectilinear grids, access via the OGC Web Coverage Service (WCS) and the NetCDF Subset
Services as well. The OPeNDAP + CF standard data can be accessed with our NetCDF-Java based "CF
Toolkit for MATLAB". This toolkit works on any MATLAB system without compiling, delivering geospatially
referenced model output from all six models using common functions. To further expand the capabilities of
CF clients such as the one we have developed, we need to further expand the CF conventions to specify
additional common features of model output, including staggered grids, masked regions, velocity component
relationships and unstructured grid connectivity information. We also need to develop CF toolkits for other
common languages such as Python and IDL.
http://www.gomodp.org/modeling-committee
IN22A-08
First Applications of DoD Iridium RUDICS in the NSF Polar Programs
We will present the first deployment and application of the new
Iridium RUDICS service to remote instrumentation projects within the
National Science Foundation's polar programs. The rise of automated
observing networks has increased the demand for real-time connectivity
to remote instruments, not only for immediate access to data, but to also
interrogate health and status. Communicating with field sites in the polar
regions is complicated by the remoteness from existing infrastructure,
low temperatures and limited connection options. Sites located above
78° latitude are not able to see geostationary satellites,
leaving the Iridium constellation as the only one that provide a direct
connection. Some others, such as Orbcomm, only provide a store-and-forward
service. Iridium is often used as a dial up modem to establish a PPP
connection to the Internet with data files transferred via FTP. On
low-bandwidth, high-latency networks like Iridium (2400bps with ping
times of seconds), this approach is time consuming and inefficient. The
dial up time alone takes upwards of a minute, and standard TCP/IP and FTP
protocols are hampered by the long latencies. Minimizing transmission
time is important for reducing battery usage and connection costs.
The new Iridium RUDICS service can be used for more efficient
transfers. RUDICS is an acronym for "Router-based Unstructured Digital
Inter-working Connectivity Solution" and provides a direct connection
between an instrument in the field and a server on the Internet. After
dialing into the Iridium gateway, a socket connection is opened to a
registered port on a user's server. Bytes sent to or from the modem
appear at the server's socket. The connection time is reduced to about
10 seconds because the modem training and PPP negotiation stages are
eliminated. The remote device does not need to have a full TCP/IP stack,
allowing smaller instruments such as data loggers to directly handle the
data transmission. Alternative protocols can be deployed that better
exploit the characteristics of the Iridium channel. In addition, the
setup naturally scales to handle hundreds of remote devices, an important
aspect for larger sensor networks.
As part of the NSF's Arctic Research Support and Logistics Services, we
have deployed RUDICS systems with three different research projects. These
are the first NSF RUDICS deployments for projects using the Department
of Defense Iridium gateway, which allows for unlimited connection time
at a flat monthly rate for US government users. The first project is
O-Buoy, an IPY-OASIS project for self-contained, autonomous observations
of atmospheric chemical species in the polar marine boundary layer. The
second project is collection of low-power instrument towers on Alaska's
North Slope at Imnavait Creek, part of the Arctic Observation Network
(AON). Lastly, the autonomous instrument platform at Ivotuk, Alaska,
uses RUDICS to provide telemetry about the renewable energy systems. A
set of real-time web displays allow researchers for each project to
monitor their remote sites and access real-time data.
http://transport.sri.com/rudics