IN51A-1137 INVITED
Cyberinfrastructure at IRIS: Challenges and Solutions Providing Integrated Data Access to EarthScope and Other Earth Science Data
While mature methods of accessing seismic data from the IRIS DMC have existed for decades, the demands for improved interdisciplinary data integration call for new approaches. Talented software teams at the IRIS DMC, UNAVCO and the ICDP in Germany, have been developing web services for all EarthScope data including data from USArray, PBO and SAFOD. These web services are based upon SOAP and WSDL. The EarthScope Data Portal was the first external system to access data holdings from the IRIS DMC using Web Services. EarthScope will also draw more heavily upon products to aid in cross-disciplinary data reuse. A Product Management System called SPADE allows archive of and access to heterogeneous data products, presented as XML documents, at the IRIS DMC. Searchable metadata are extracted from the XML and enable powerful searches for products from EarthScope and other data sources. IRIS is teaming with the External Research Group at Microsoft Research to leverage a powerful Scientific Workflow Engine (Trident) and interact with the web services developed at centers such as IRIS to enable access to data services as well as computational services. We believe that this approach will allow web- based control of workflows and the invocation of computational services that transform data. This capability will greatly improve access to data across scientific disciplines. This presentation will review some of the traditional access tools as well as many of the newer approaches that use web services, scientific workflow to improve interdisciplinary data access.
IN51A-1138
EarthScope Data Access Services at the IRIS Data Management Center
To meet the data management and access challenges of EarthScope, the IRIS Data Management Center is building a broad range of new and leveraged data discovery and access services. This collection of SOAP- based and REST-style web services support both the EarthScope Data Portal and IRIS DMC's operational needs. The services provide access to station metadata, waveform inventory and data, and data products from the SPADE product archive. The EarthScope Data Portal provides a single point of access to all data and products from three EarthScope component data centers: IRIS (USArray), UNAVCO (PBO), and ICDP (SAFOD). The Portal allows users to search for EarthScope stations and data matching specific search constraints. Selected data and data products can be added to a data cart for final packaging and download to the user's machine. Defining a single common service interface for all of the EarthScope components was one of the primary challenges of the Portal's development. This poster presents the design and implementation of the IRIS data access web services as it applies to the EarthScope Portal as well as a standalone service framework for the IRIS DMC.
IN51A-1139
Network of Research Infrastructures for European Seismology (NERIES)
NERIES (Network of Research Infrastructures for European Seismology) is an Integrated Infrastructure
Initiative (I3) project within the Sixth Framework Programme of the European Commission (EC). The project
consortium consists of 25 participants from 13 different European countries. It is currently the largest earth
science project ever funded by the EC.
The goal of NERIES is to integrate European seismological observatories and research institutes into one
integrated cyber-infrastructure for seismological data serving the research community, civil protection
authorities and the general public. The EC provides funds for the networking and research. The participants
provide the necessary hardware investments, mostly through national resources.
NERIES consists of 13 subprojects (networking and research activities) and 5 facilities providing access
through grants (Transnational Access). The project is coordinated by ORFEUS in close cooperation with the
EMSC.
The individual subprojects address different issues such as: extension of the Virtual European Broadband
Seismic Network (VEBSN) from 140 to about 500 stations, implementing the core European Integrated
Waveform Data Archive (EIDA) consisting of ODC-KNMI, GFZ, INGV and IPGP and a distributed archive of
historical Data. Providing access to data gathered by acceleration networks within Europe and its
surroundings and deploys Ocean Bottom Seismometers in coordination with relevant Ocean bottom projects
like ESONET. Tot facilitate access to this diverse and distributed data NERIES invests a significant portion of
its resources to implementing a portal for which a beta release is planned to be release in the autumn of
2008.
The research project main goal is to produce products and tools facilitating data interpretation and analysis.
These tools include a European reference (velocity) model, real-time hazard tools, shakemaps and
lossmaps, site response determination software and tools, and automatic tools to manage and exploit the
increasing quantity of data available in real-time.
NERIES also offers grants to individual researchers or groups to work at facilities such as the Swiss national
seismological network (SED/ETHZ, Switzerland), the CEA/DASE facilities in France, the data scanning
facilities at INGV (SISMOS), the array facilities of NORSAR (Norway) and the new Conrad Facility in Austria.
http://www.neries-eu.org
IN51A-1140
Supporting EarthScope Cyber-Infrastructure with a Modern GPS Science Data System
Building on NASA's investment in the measurement of crustal deformation from continuous GPS, we are
developing and implementing a Science Data System (SDS) that will provide mature, long-term Earth Science
Data Records (ESDR's). This effort supports NASA's Earth Surface and Interiors (ESI) focus area and
provide NASA's component to the EarthScope PBO. This multi-year development is sponsored by NASA's
Making Earth System data records for Use in Research Environments (MEaSUREs) program.
The SDS integrates the generation of ESDRs with data analysis and exploration, product generation, and
modeling tools based on daily GPS data that include GPS networks in western North America and a
component of NASA's Global GPS Network (GGN) for terrestrial reference frame definition. The system is
expandable to multiple regional and global networks. The SDS builds upon mature data production,
exploration, and analysis algorithms developed under NASA's REASoN, ACCESS, and SENH programs.
This SDS provides access to positions, time series, velocity fields, and strain measurements derived from
continuous GPS data obtained at tracking stations in both the Plate Boundary Observatory and other
regional Western North America GPS networks, dating back to 1995. The SDS leverages the IT and Web
Services developments carried out under the SCIGN/REASoN and ACCESS projects, which have streamlined
access to data products for researchers and modelers, and which have created a prototype an on-the-fly
interactive research environment through a modern data portal, GPS Explorer.
This IT system has been designed using modern IT tools and principles in order to be extensible to any
geographic location, scale, natural hazard, and combination of geophysical sensor and related data. We
have built upon open GIS standards, particularly those of the OGC, and have used the principles of Web
Service-based Service Oriented Architectures to provide scalability and extensibility to new services and
capabilities.
http://reason.scign.org
IN51A-1141
SCEC Earthquake System Science Using High Performance Computing
The SCEC Community Modeling Environment (SCEC/CME) collaboration performs basic scientific research
using high performance computing with the goal of developing a predictive understanding of earthquake
processes and seismic hazards in California. SCEC/CME research areas including dynamic rupture modeling,
wave propagation modeling, probabilistic seismic hazard analysis (PSHA), and full 3D tomography.
SCEC/CME computational capabilities are organized around the development and application of robust, re-
usable, well-validated simulation systems we call computational platforms. The SCEC earthquake system
science research program includes a wide range of numerical modeling efforts and we continue to extend our
numerical modeling codes to include more realistic physics and to run at higher and higher resolution.
During this year, the SCEC/USGS OpenSHA PSHA computational platform was used to calculate PSHA
hazard curves and hazard maps using the new UCERF2.0 ERF and new 2008 attenuation relationships.
Three SCEC/CME modeling groups ran 1Hz ShakeOut simulations using different codes and computer
systems and carefully compared the results. The DynaShake Platform was used to calculate several dynamic
rupture-based source descriptions equivalent in magnitude and final surface slip to the ShakeOut 1.2
kinematic source description. A SCEC/CME modeler produced 10Hz synthetic seismograms for the ShakeOut
1.2 scenario rupture by combining 1Hz deterministic simulation results with 10Hz stochastic seismograms.
SCEC/CME modelers ran an ensemble of seven ShakeOut-D simulations to investigate the variability of
ground motions produced by dynamic rupture-based source descriptions. The CyberShake Platform was
used to calculate more than 15 new probabilistic seismic hazard analysis (PSHA) hazard curves using full 3D
waveform modeling and the new UCERF2.0 ERF. The SCEC/CME group has also produced significant
computer science results this year. Large-scale SCEC/CME high performance codes were run on NSF
TeraGrid sites including simulations that use the full PSC Big Ben supercomputer (4096 cores) and
simulations that ran on more than 10K cores at TACC Ranger. The SCEC/CME group used scientific
workflow tools and grid-computing to run more than 1.5 million jobs at NCSA for the CyberShake project.
Visualizations produced by a SCEC/CME researcher of the 10Hz ShakeOut 1.2 scenario simulation data were
used by USGS in ShakeOut publications and public outreach efforts. OpenSHA was ported onto an NSF
supercomputer and was used to produce very high resolution hazard PSHA maps that contained more than
1.6 million hazard curves.
http://www.scec.org/petasha
IN51A-1142
HIS Central and the Hydrologic Metadata Catalog
The CUAHSI Hydrologic Information System project maintains a comprehensive workflow for publishing
hydrologic observations data and registering them to the common Hydrologic Metadata Catalog. Once the
data are loaded into a database instance conformant with the CUAHSI HIS Observations Data Model (ODM),
the user configures ODM web service template to point to the new database. After this, the hydrologic data
become available via the standard CUAHSI HIS web service interface, that includes both data discovery
(GetSites, GetVariables, GetSiteInfo, GetVariableInfo) and data retrieval (GetValues) methods. The
observations data then can be further exposed via the global semantics-based search engine called
Hydroseek. To register the published observations networks to the global search engine, users can now use
the HIS Central application (new in HIS 1.1). With this online application, the WaterML-compliant web services
can be submitted to the online catalog of data services, along with network metadata and a desired network
symbology. Registering services to the HIS Central application triggers a harvester which uses the services to
retrieve additional network metadata from the underlying ODM (information about stations, variables, and
periods of record). The next step in HIS Central application is mapping variable names from the newly
registered network, to the terms used in the global search ontology. Once these steps are completed, the
new observations network is added to the map and becomes available for searching and querying.
The number of observations network registered to the Hydrologic Metadata Catalog at SDSC is constantly
growing. At the time of submission, the catalog contains 51 registered networks, with estimated 1.7 million
stations.
http://hiscentral.cuahsi.org/
IN51A-1143
Model Fusion: A Fast, Practical Alternative Towards Joint Inversion of Multiple Datasets
There are many sources of data for Earth models: first-arrival passive seismic data (from the actual earthquakes), first-arrival active seismic data (from the seismic experiments), gravity data, surface waves, etc. At present, each of these datasets is processed separately, resulting in several different Earth models that have specific coverage areas, different resolutions and varying degrees of accuracy. These models often provide complimentary geophysical information on earth structure (P and S wave velocity structure); combining the information derived from each requires a joint inversion approach. Designing such joint inversion techniques presents a significant theoretical and practical challenge. While such joint inversion methods are being developed, as a first step, we propose a practical solution: to fuse the Earth models derived from different datasets. Since these Earth models have different areas of coverage, model fusion is especially important since some of the resulting models may provide better accuracy and/or spatial resolution in various spatial regions and/or depths. In our case, different measurements have not only different accuracy, but also different spatial resolution. To fuse these models, we must account for three different types of approximate equalities: (1) each high-resolution value is approximately equal to the actual value in the corresponding (smaller size) cell, with the accuracy corresponding to the accuracy of the higher- resolution Earth model; (2) each lower-resolution value is approximately equal to the average of values of all the smaller cells within the corresponding larger size cell, with the accuracy corresponding to the accuracy of the lower-resolution Earth model; and (3) each lower-resolution value is approximately equal to the value within each of the constituent smaller size cells, with the accuracy corresponding to the (empirical) standard deviation of the smaller-cell values within the larger cell. We can then use the least squares approach to combine these approximate equalities, and we can find the desired combined values by minimizing the resulting sum of weighted squared differences. Our preliminary proof-of-concept experiments with simplified datasets show that this method indeed leads to a fused model that effectively combines accuracy and resolution of different Earth models.
IN51A-1144
Llnking the EarthScope Data Virtual Catalog to the GEON Portal
The EarthScope Data Portal provides a unified, single-point of access to EarthScope data and products from
USArray, Plate Boundary Observatory (PBO), and San Andreas Fault Observatory at Depth (SAFOD)
experiments. The portal features basic search and data access capabilities to allow users to discover and
access EarthScope data using spatial, temporal, and other metadata-based (data type, station specific)
search conditions. The portal search module is the user interface implementation of the EarthScope Data
Search Web Service. This Web Service acts as a virtual catalog that in turn invokes Web services developed
by IRIS (Incorporated Research Institutions for Seismology), UNAVCO (University NAVSTAR Consortium),
and GFZ (German Research Center for Geosciences) to search for EarthScope data in the archives at each
of these locations. These Web Services provide information about all resources (data) that match the
specified search conditions.
In this presentation we will describe how the EarthScope Data Search Web service can be integrated into the
GEONsearch application in the GEON Portal (see http://portal.geongrid.org). Thus, a search request issued
at the GEON Portal will also search the EarthScope virtual catalog thereby providing users seamless access
to data in GEON as well as the Earthscope via a common user interface.
http://es-portal.geongrid.org
IN51A-1145
The GEON Integrated Data Viewer (IDV) and IRIS DMC Services Illustrate CyberInfrastructure Support for Seismic Data Visualization and Interpretation
UNAVCO and the IRIS DMC are data service partners for seismic visualization, particularly for hypocentral
data and tomography. UNAVCO provides the GEON Integrated Data Viewer (IDV), an extension of the
Unidata IDV, a free, interactive, research-level, software display and analysis tool for data in 3D (latitude,
longitude, depth) and 4D (with time), located on or inside the Earth.
The GEON IDV is designed to meet the challenge of investigating complex, multi-variate, time-varying, three-
dimensional geoscience data in the context of new remote and shared data sources. The GEON IDV
supports data access from data sources using HTTP and FTP servers, OPeNDAP servers, THREDDS
catalogs, RSS feeds, and WMS (web map) servers.
The IRIS DMC (Data Management System) has developed web services providing data for earthquake
hypocentral data and seismic tomography model grids. These services can be called by the GEON IDV to
access data at IRIS without copying files.
The IRIS Earthquake Browser (IEB) is a web-based query tool for hypocentral data. The IEB combines the
DMC's large database of more than 1,900,000 earthquakes with the Google Maps web interface. With the
IEB you can quickly find earthquakes in any region of the globe and then import this information into the
GEON Integrated Data Viewer where the hypocenters may be visualized. You can select earthquakes by
location region, time, depth, and magnitude. The IEB gives the IDV a URL to the selected data. The IDV then
shows the data as maps or 3D displays, with interactive control of vertical scale, area, map projection, with
symbol size and color control by magnitude or depth. The IDV can show progressive time animation of, for
example, aftershocks filling a source region. The IRIS Tomoserver converts seismic tomography model output grids to
NetCDF for use in the IDV. The Tomoserver accepts a tomographic model file as input from a user and
provides an equivalent NetCDF file as output. The service supports NA04, S3D, A1D and CUB input file
formats, contributed by their respective creators. The NetCDF file is saved to a location that can be
referenced with a URL on an IRIS server. The URL for the NetCDF file is provided to the user. The user can
download the data from IRIS, or copy the URL into IDV directly for interpretation, and the IDV will access the
data at IRIS. The Tomoserver conversion software was developed by Instrumental Software Technologies,
Inc.
Use cases with the GEON IDV and IRIS DMC data services will be shown.
http://geon.unavco.org/ http://www.iris.edu/dms/dmc/
IN51A-1146 INVITED
A Cyberinfrastructure Platform for Distribution of GeoEarthScope LiDAR Topography Data
The recently completed GeoEarthScope airborne LiDAR (Light Detection And Ranging) topography
acquisition will provide unprecedented data adjacent to active faults throughout the plate boundary region of
western North America. Totaling more than 5000 square kilometers, these community-oriented data offer an
high-resolution representation of fault zone topography and should be a revolutionary resource for
researchers studying earthquake hazards, active faulting, landscape processes, and ground deformation.
Since spring of 2007, the NSF-funded GeoEarthScope LiDAR project has acquired data for the San Andreas
fault system in northern California, faults in southern California, the Yakima Fold and Thrust Belt in
Washington, Yellowstone National Park, the Tetons, the Wasatch Front, and Alaska. These data will be made
available via the OpenTopography Portal (www.opentopography.org), a domain-specific component of the
GEON project, as they are processed and delivered by the National Center for Airborne Laser Mapping.
The OpenTopography Portal (OpenToPo) provides access to a variety of GeoEarthScope LiDAR data
products and uses several cyberinfrastructure components developed by the GEON project. These products
range from simple Google Earth visualizations of LiDAR hillshades to standard digital elevation model (DEM)
products as well as LiDAR point cloud data. The wide spectrum of LiDAR users have variable scientific
applications, computing resources and technical experience and thus require a data distribution system that
provides various levels of access to the data.
Standard DEM products in OpenToPo are accessed via a Google Map and/or Google Earth-based interface
that allow users to browse and download the data products. For users who wish to explore the full potential
of the LiDAR data, we provide access to the raw LiDAR point data and a suite of DEM generation tools to
enable users to create custom DEMs to best fit their science applications. Storage and management of these
multi-billion point LiDAR datasets is done via a partitioned spatial database that is deployed across a multi
node cluster. The innovative database architecture allows for high performance as well as high scalability.
Once a subset of data is defined in the Google Map interface, users are able to define their processing
parameters and submit jobs to run on OpenToPo computing resources.
By using cyberinfrastructure-based resources to provide access to the large volumes of GeoEarthScope
LiDAR topography, OpenToPo democratizes access to these exciting but often challenging datasets.
http://www.geongrid.org
IN51A-1147
Integrating Diverse Geophysical and Geological Data to Construct Multi-Dimensional Earth Models: The Open Earth Framework
Currently, many large geoscientific efforts (e.g., EarthScope, Continental Dynamics, and GeoSwath) have emphasized that a crucial need in advancing our understanding of the structure and evolution of the continents is high-resolution, 3-D models of lithospheric structure. In addition, the geoscience community recognizes that our ultimate goal is the addition of the dimension of time to make the problem 4-D. Adding the dimension of time is a complex problem that is strongly dependent on the integration of a variety of geological data into our analyses (e.g., geochronology, paleontology, stratigraphy, pressure-time histories, structural geology, paleogeography, etc.). The geoscience community also recognizes that solutions to the scientific and societal questions that they seek to answer require innovative integration of many types of data so that many physical properties (x, y, z, P-wave velocity, S-wave velocity, density, electrical conductivity, etc.) are measured and included in 3-D models. The problem is, therefore, truly multidimensional in nature. We are developing an Open Earth Framework (OEF) as an open data model for integration of such multidimensional Earth Sciences data. In our work and interactions with the community on building and visualizing complex earth models, several issues have emerged on which there is consensus. First of all, integration efforts should work from the surface down because we have the most data there (e.g., geologic maps, remote sensing data such as LIDAR and ASTER, digital elevation models, gravity and magnetic measurements, etc.) and because the complex conditions near surface always have a potential to mask deeper features. Secondly since we cannot expect uniform coverage of a variety of high-resolution data in anything but special circumstances, a data integration effort should first establish a regional context using lower resolution (and usually wide coverage) data and then proceed to modeling the data sets with the highest spatial resolution. Finally, formal quantitative integration would logically begin with employing accepted relationships between physical properties (e.g., there are widely used empirical relationships between Vp and density) and then proceed to producing integrated models that facilitate the search for anomalies. Our workshops and community interactions have shown that both raster (voxels) and vector (surfaces) 3D data structures would be involved if we are to produce integrated models that have all of the properties that the community desires. These interactions also quickly revealed a consensus that building such models can only be achieved through a highly integrated approach that takes advantage of all of the geological and geophysical constraints available. Conceptually, the modeling would begin with a voxel-based approach of building a highly-integrated 3-D model at Time=0 by deriving physical properties such as Vp, Vs, density, magnetic properties, electrical properties, anisotropy, attenuation (Q), temperature, etc. for volume elements that could take on several forms. Then, interfaces that represent features such as the Moho, major faults, crystalline basement surface beneath sedimentary basins, magmatic bodies, etc. would be inserted into the model in order to properly characterize the region geologically.
IN51A-1148 INVITED
EarthChem and SESAR: Data Resources and Interoperability for EarthScope Cyberinfrastructure
Data management within the EarthScope Cyberinfrastructure needs to pursue two goals in order to advance
and maximize the broad scientific application and impact of the large volumes of observational data acquired
by EarthScope facilities: (a) to provide access to all data acquired by EarthScope facilities, and to promote
their use by broad audiences, and (b) to facilitate discovery of, access to, and integration of multi-disciplinary
data sets that complement EarthScope data in support of EarthScope science. EarthChem and SESAR, the
System for Earth Sample Registration, are two projects within the Geoinformatics for Geochemistry program
that offer resources for EarthScope CI. EarthChem operates a data portal that currently provides access to
>13 million analytical values for >600,000 samples, more than half of which are from North America,
including data from the USGS and all data from the NAVDAT database, a web-accessible repository for age,
chemical and isotopic data from Mesozoic and younger igneous rocks in western North America. The new
EarthChem GEOCHRON database will house data collected in association with GeoEarthScope, storing and
serving geochronological data submitted by participating facilities. The EarthChem Deep Lithosphere Dataset
is a compilation of petrological data for mantle xenoliths, initiated in collaboration with GeoFrame to
complement geophysical endeavors within EarthScope science. The EarthChem Geochemical Resource
Library provides a home for geochemical and petrological data products and data sets. Parts of the digital
data in EarthScope CI refer to physical samples such as drill cores, igneous rocks, or water and gas samples,
collected, for example, by SAFOD or by EarthScope science projects and acquired through lab-based
analysis. Management of sample-based data requires the use of global unique identifiers for samples, so that
distributed data for individual samples generated in different labs and published in different papers can be
unambiguously linked and integrated. SESAR operates a registry for Earth samples that assigns and
administers the International GeoSample Numbers (IGSN) as a global unique identifier for samples.
Registration of EarthScope samples with SESAR and use of the IGSN will ensure their unique identification in
publications and data systems, thus facilitating interoperability among sample-based data relevant to
EarthScope CI and globally. It will also make these samples visible to global audiences via the SESAR Global
Sample Catalog.
http://www.geoinfogeochem.org