SF31B-01 INVITED 08:00h
An Electronic Library of Lithospheric Imagery from deeP Seismic Exploration (ELLIPSE)
The seismic reflection method, developed primarily by the oil exploration industry, is arguably geophysics' most versatile and powerful tool for mapping the earth's interior. It has been increasingly applied over the past 25 years to a wide variety of studies of the earth's crust and upper mantle by a diverse mixture of individual, institutional, national and international programs. The results from these efforts have revolutionized, and continue to advance, our understanding of the Earth's lithosphere. Yet until now there has been no central guide to the results of those efforts nor their continuing status. ELLIPSE is an effort to develop a comprehensive, global, dynamic catalog of deep seismic reflection surveys past and present. This catalog, in the form of a GIS database with user-friendly cartographic interface, serves as an internet portal to a metalibrary of deep reflection data that is derived from the physical libraries that are now globally dispersed and heterogeneously maintained. Core components of this metalibrary, and a proposed template for organizing comparable physical collections worldwide, are COCORP (US), INDEPTH (Tibet) and URSEIS (Russia) datasets currently archived at Cornell. These datasets are not only of considerable continuing scientific value in their own right, they typify the range of data characteristics of the larger global collection. ELLIPSE provides the final processed data, but access to the "raw" data and the metadata needed to reprocess, field photos and published interpretations. This multidimensional electronic publication will aid not only the serious researcher who may want to reprocess the original datasets, but the non-specialist who finds the final product useful in synthesis with other data, the classroom instructor who needs an easily comprehensive form of the results for the general student, or the earth science student who simply wants to browse the earth's interior for inspiration.
http://www.lithosurvey.info
SF31B-02 INVITED 08:15h
Cyberinfrastructure for the Unified Study of Earth Structure and Earthquake Sources in Complex Geologic Environments
The Southern California Earthquake Center (SCEC) is developing a Community Modeling Environment (CME) to facilitate the computational pathways of physics-based seismic hazard analysis (Maechling et al., this meeting). Major goals are to facilitate the forward modeling of seismic wavefields in complex geologic environments, including the strong ground motions that cause earthquake damage, and the inversion of observed waveform data for improved models of Earth structure and fault rupture. Here we report on a unified approach to these coupled inverse problems that is based on the ability to generate and manipulate wavefields in densely gridded 3D Earth models. A main element of this approach is a database of receiver Green tensors (RGT) for the seismic stations, which comprises all of the spatial-temporal displacement fields produced by the three orthogonal unit impulsive point forces acting at each of the station locations. Once the RGT database is established, synthetic seismograms for any earthquake can be simply calculated by extracting a small, source-centered volume of the RGT from the database and applying the reciprocity principle. The partial derivatives needed for point- and finite-source inversions can be generated in the same way. Moreover, the RGT database can be employed in full-wave tomographic inversions launched from a 3D starting model, because the sensitivity (Fr{\'{e}}chet) kernels for travel-time and amplitude anomalies observed at seismic stations in the database can be computed by convolving the earthquake-induced displacement field with the station RGTs. We illustrate all elements of this unified analysis with an RGT database for 33 stations of the California Integrated Seismic Network in and around the Los Angeles Basin, which we computed for the 3D SCEC Community Velocity Model (SCEC CVM3.0) using a fourth-order staggered-grid finite-difference code. For a spatial grid spacing of 200 m and a time resolution of 10 ms, the calculations took ~19,000 node-hours on the Linux cluster at USC's High-Performance Computing Center. The 33-station database with a volume of ~23.5 TB was archived in the SCEC digital library at the San Diego Supercomputer Center using the Storage Resource Broker (SRB). From a laptop, anyone with access to this SRB collection can compute synthetic seismograms for an arbitrary source in the CVM in a matter of minutes. Efficient approaches have been implemented to use this RGT database in the inversions of waveforms for centroid and finite moment tensors and tomographic inversions to improve the CVM. Our experience with these large problems suggests areas where the cyberinfrastructure currently available for geoscience computation needs to be improved.
SF31B-03 INVITED 08:30h
An XML-SEED Format for the Exchange of Synthetic Seismograms
We have demonstrated that we can calculate global theoretical seismograms for realistic 3D Earth models based upon the combination of a precise numerical technique (the spectral-element method) and a sufficiently fast supercomputer (the Earth Simulator) [Tsuboi et al, 2003]. It has now become possible to routinely calculate synthetic seismograms for earthquakes greater than a certain magnitude. We have started to create a synthetic seismogram database by using model S20RTS of the mantle (Ritsema et al., 1999), model CRUST2.0 of the crust (Basin et al., 2000), and topography and bathymetry model ETOPO5. The calculations are performed on 1944 processors, which require 243 out of 640 nodes of the Earth Simulator. Starting in 2003, we select earthquakes with magnitudes greater than 6.5 from the Harvard CMT catalog and calculate theoretical seismograms for the Stations in the Global Seismographic Network. To distribute this synthetic seismogram database to the seismological community, the data format of the seismograms becomes an issue because the SEED format, which is currently used for the exchange of broadband seismic data, does not have an entry for the metadata that characterize the theoretical seismograms. To overcome this problem, we extend the SEED format by using the eXtended Markup Language (XML) and propose this format as a possible standard for the exchange of theoretical seismograms. The representation of SEED volumes in XML has already been explored by the IFREE/JAMSTEC data center. We believe that the use of XML to extend the SEED format to accommodate the exchange of theoretical seismograms is promising because XML is easy to modify and is used as a basic protocol of network tools, such as Web services. We adopt this representation of SEED volumes in XML and extend it to include metadata entries for the theoretical seismogram database. The entries that we have added to this format uniquely characterize the simulation that was performed to create the synthetics. Currently, each data sample is represented by a character, but it should be possible to include binary formatted data, such as mini-SEED. We will soon distribute these theoretical seismograms through mirrored IFREE/JAMSTEC and Caltech web interfaces.
SF31B-04 INVITED 08:45h
Computational Infrastructure for Geodynamics (CIG)
Solid earth geophysicists have a long tradition of writing scientific software to address a wide range of problems. In particular, computer simulations came into wide use in geophysics during the decade after the plate tectonic revolution. Solution schemes and numerical algorithms that developed in other areas of science, most notably engineering, fluid mechanics, and physics, were adapted with considerable success to geophysics. This software has largely been the product of individual efforts and although this approach has proven successful, its strength for solving problems of interest is now starting to show its limitations as we try to share codes and algorithms or when we want to recombine codes in novel ways to produce new science. With funding from the NSF, the US community has embarked on a Computational Infrastructure for Geodynamics (CIG) that will develop, support, and disseminate community-accessible software for the greater geodynamics community from model developers to end-users. The software is being developed for problems involving mantle and core dynamics, crustal and earthquake dynamics, magma migration, seismology, and other related topics. With a high level of community participation, CIG is leveraging state-of-the-art scientific computing into a suite of open-source tools and codes. The infrastructure that we are now starting to develop will consist of: (a) a coordinated effort to develop reusable, well-documented and open-source geodynamics software; (b) the basic building blocks - an infrastructure layer - of software by which state-of-the-art modeling codes can be quickly assembled; (c) extension of existing software frameworks to interlink multiple codes and data through a superstructure layer; (d) strategic partnerships with the larger world of computational science and geoinformatics; and (e) specialized training and workshops for both the geodynamics and broader Earth science communities. The CIG initiative has already started to leverage and develop long-term strategic partnerships with open source development efforts within the larger thrusts of scientific computing and geoinformatics. These strategic partnerships are essential as the frontier has moved into multi-scale and multi-physics problems in which many investigators now want to use simulation software for data interpretation, data assimilation, and hypothesis testing.
http://geodynamics.org
SF31B-05 INVITED 09:00h
The SCEC TeraShake Earthquake Simulation
The southern portion of the San Andreas fault, between Cajon Creek and Bombay Beach has not seen a major event since 1690, and has therefore accumulated a slip deficit of 5-6 m. The potential for this portion of the fault to rupture in a single M7.7 event is a major component of seismic hazard in southern California and northern Mexico. TeraShake is a large-scale finite-difference (fourth-order) simulation of such an event based on Olsen's Anelastic Wave Propagation Model (AWM) code, and conducted in the context of the Southern California Earthquake Center Community Modeling Environment (CME). The fault geometry is taken from the 2002 USGS National Hazard Maps. The kinematic slip function is transported and scaled from published inversions for the 2002 Denali (M7.9) earthquake. The three-dimensional crustal structure is the SCEC Community Velocity model. The 600km x 300km x 80km simulation domain extends from the Ventura Basin and Tehachapi region to the north and to Mexicali and Tijuana to the south. It includes all major population centers in southern California, and is modeled at 200m resolution using a rectangular, 1.8 giganode, 3000 x 1500 x 400 mesh. The simulated duration is 200 seconds, with a temporal resolution of 0.01seconds, maximum frequency of 0.5Hz, for a total of 20,000 time steps. The simulation is planned to run at the San Diego Supercomputer Center (SDSC) on 240 processors of the IBM Power4, DataStar machine. Validation runs conducted at one sixteenth (4D) resolution have shown that this is the optimal configuration in the trade-off between computational and I/O demands. The full run will consume about 18,000 CPU.hours. Each time step produces a 21.6GByte mesh snapshot of the entire ground motion velocity vectors. A 4D wavefield containing 2,000 time steps, amounting to 43 Tbytes of data, will be stored at SDSC. Surface data will be archived for every time step for synthetic seismogram engineering analysis, totaling 1 Tbyte. The data will be registered with the SCEC Digital Library supported by the SDSC Storage Resource Broker (SRB). Data collections will be annotated with simulation metadata, which will allow data discovery operations on metadata-based queries. The binary output will be described using HDF5 headers. Each file will be fingerprinted with MD5 checksums to preserve and validate data integrity. Data access, management and data product derivation will be provided through a set of SRB APIs, including java, C, web service and data grid workflow interfaces. High resolution visualizations of the wave propagation phenomena will be produced under diverse camera views. The surface data will be analyzed online by remote web clients plotting synthetic seismograms. Data mining operations, spectral analysis and data subsetting are planned as future work. The TeraShake simulation project has provided some insights about the cyberinfrastructure needed to advance computational geoscience, which we will discuss.
http://epicenter.usc.edu/cmeportal/index.html
SF31B-06 09:15h
Integrating a Community Modeling Environment in the Geosciences Cyberinfrastructures
Last few decades have seen explosive growth of multidisciplinary data; continuation of such growth is ensured by the Earthscope initiative and other ongoing and future studies. Current efforts of cyberinfrastructure-building are focused on integrating the heterogeneous data bases. However, even with easy access to the vast data bases, it remains a major challenge for scientists to integrate multidisciplinary, multiscale data with proper computer models to facilitate scientific discovery. Comparing with the growth of observational data and the advance of computer hardware, which is fueled by the availability of cheep Linux PC clusters, softwares for geosciences have fallen behind and become a major bottleneck on the path of scientific exploration. We propose a community modeling environment that can be fully integrated with database-oriented cyberinfrastructures. This is achievable because many geological processes can be described by similar, even identical, mathematical equations for which various numerical solvers are available, thus the coding of the major part of many numerical models may be automatized. We will show a prototype of such a system in a study of crustal deformation in the western US. After the model geometry, rheolgoical structure, and boundary conditions are prescribed, the governing equations for the crustal deformation are solved in a 3D finite element model. The core computer codes are generated automatically for multiple-processor Linux clusters. The system can be integrated in distributed geoscience cyberinfrastructures such as the GEON grid (http://www.geongrid.org), and the models can interact with geophysical databases on the GEON grid for input parameters and model constraints. Wheras the prototype of the code-generating system is still preliminary, it has the potential of fundamentally changing how scientists develop and use computer models in the future.
SF31B-07 09:30h
The International Solid Earth Research Virtual Observatory
We describe the architecture and initial implementation of the International Solid Earth Research Virtual Observatory (iSERVO). This has been prototyped within the USA as SERVOGrid and expansion is planned to Australia, China, Japan and other countries. We base our design on a globally scalable distributed "cyber-infrastructure" or Grid built around a Web Services-based approach consistent with the extended Web Service Interoperability approach. The Solid Earth Science Working Group of NASA has identified several challenges for Earth Science research. In order to investigate these, we need to couple numerical simulation codes and data mining tools to observational data sets. This observational data are now available on-line in internet-accessible forms, and the quantity of this data is expected to grow explosively over the next decade. We architect iSERVO as a loosely federated Grid of Grids with each country involved supporting a national Solid Earth Research Grid. The national Grid Operations, possibly with dedicated control centers, are linked together to support iSERVO where an International Grid control center may eventually be necessary. We address the difficult multi-administrative domain security and ownership issues by exposing capabilities as services for which the risk of abuse is minimized. We support large scale simulations within a single domain using service-hosted tools (mesh generation, data repository and sensor access, GIS, visualization). Simulations typically involve sequential or parallel machines in a single domain supported by cross-continent services. We use Web Services implement Service Oriented Architecture (SOA) using WSDL for service description and SOAP for message formats. These are augmented by UDDI, WS-Security, WS-Notification/Eventing and WS-ReliableMessaging in the WS-I+ approach. Support for the latter two capabilities will be available over the next 6 months from the NaradaBrokering messaging system. We augment these specifications with the powerful portlet architecture using WSRP and JSR168 supported by such portal containers as uPortal, WebSphere, and Apache JetSpeed2. The latter portal aggregates component user interfaces for each iSERVO service allowing flexible customization of the user interface. We exploit the portlets produced by the NSF NMI (Middleware initiative) OGCE activity. iSERVO also uses specifications from the Open Geographical Information Systems (GIS) Consortium (OGC) that defines a number of standards for modeling earth surface feature data and services for interacting with this data. The data models are expressed in the XML-based Geography Markup Language (GML), and the OGC service framework are being adapted to use the Web Service model. The SERVO prototype includes a GIS Grid that currently includes the core WMS and WFS (Map and Feature) services. We will follow the best practice in the Grid and Web Service field and will adapt our technology as appropriate. For example, we expect to support services built on WS-RF when is finalized and to make use of the database interfaces OGSA-DAI and its WS-I+ versions. Finally, we review advances in Web Service scripting (such as HPSearch) and workflow systems (such as GCF) and their applications to iSERVO.
SF31B-08 09:45h
The Role of Grid Computing in the Geosciences: Developing a 3D Seismic Waveform Propagation Tool for Seismologists and EarthScope Research
Advances in the area of information technology (IT) have started to have a significant impact on how geoscientists conduct their daily research activities. Integrated and coordinated resource sharing in the areas of Grid computing, web/grid services, semantic data integration, information management and ontologies along with national computational grids such as TeraGrid now provide tremendous opportunities for geoscientists to conduct novel and efficient research in many areas of the geosciences. One of the national scale projects in this area is the GEON Cyberinfrastructure for the Geosciences Project funded by the NSF. As part of GEON's grid computing environment we have started developing a grid-enabled application (SYNSEIS - SYNthetic SEISmogram generation tool) to help seismologists as well as any other researchers calculate synthetic 3D regional seismic waveforms using a well-tested, finite difference code, E3D, developed by the Lawrence Livermore National Laboratory. SYNSEIS is built as a grid application and accesses distributed data centers and large computational clusters minimizing the requirements needed to conduct such advance calculations. With SYNSEIS users only need to have access to the Internet and a browser. The entire system is web-based and is accessible from the GEONgrid portal web page (www.geongrid.org). It is built using a service-based architecture and each sub-component in the system is also exposed as a web service, allowing multiple use scenarios for each component if other researchers choose to re-use some of the resources. It provides an interactive user interface with mapping tools and event/station/waveform extraction tools that allow users to seamlessly access IRIS Data Management Center's archives. Though the system currently accesses one 3D crustal model across the US, when more models become available they will be incorporated into the system. Users are able to interactively set their study region, retrieve seismic event and station locations, extract waveforms on the fly for any selected event-station pair, and compute a synthetic seismogram using built in tools. As high performence compute engines, SYNSEIS uses national-scale TeraGrid supercomputer centers, hiding all complexities and difficulties related to account management, cpu allocation, and software installation. The system is designed to be used in day-to-day activities of researchers, especially those of EarthScope scientists who will be accessing data from hundreds of stations everyday and need to process the data in a timely fashion.
http://www.geongrid.org