IN33B-1177
Storing, Browsing, Querying, and Sharing Data: the THREDDS Data Repository (TDR)
The Unidata Internet Data Distribution (IDD) network delivers gigabytes of data per day in near real time to sites across the U.S. and beyond. The THREDDS Data Server (TDS) supports public browsing of metadata and data access via OPeNDAP enabled URLs for datasets such as these. With such large quantities of data, sites generally employ a simple data management policy, keeping the data for a relatively short term on the order of hours to perhaps a week or two. In order to save interesting data in longer term storage and make it available for sharing, a user must move the data herself. In this case the user is responsible for determining where space is available, executing the data movement, generating any desired metadata, and setting access control to enable sharing. This task sequence is generally based on execution of a sequence of low level operating system specific commands with significant user involvement. The LEAD (Linked Environments for Atmospheric Discovery) project is building a cyberinfrastructure to support research and education in mesoscale meteorology. LEAD orchestrations require large, robust, and reliable storage with speedy access to stage data and store both intermediate and final results. These requirements suggest storage solutions that involve distributed storage, replication, and interfacing to archival storage systems such as mass storage systems and tape or removable disks. LEAD requirements also include metadata generation and access in order to support querying. In support of both THREDDS and LEAD requirements, Unidata is designing and prototyping the THREDDS Data Repository (TDR), a framework for a modular data repository to support distributed data storage and retrieval using a variety of back end storage media and interchangeable software components. The TDR interface will provide high level abstractions for long term storage, controlled, fast and reliable access, and data movement capabilities via a variety of technologies such as OPeNDAP and gridftp. The modular structure will allow substitution of software components so that both simple and complex storage media can be integrated into the repository. It will also allow integration of different varieties of supporting software. For example, if replication is desired, replica management could be handled via a simple hash table or a complex solution such as Replica Locater Service (RLS). In order to ensure that metadata is available for all the data in the repository, the TDR will also generate THREDDS metadata when necessary. Users will be able to establish levels of access control to their metadata and data. Coupled with a THREDDS Data Server, both browsing via THREDDS catalogs and querying capabilities will be supported. This presentation will describe the motivating factors, current status, and future plans of the TDR. References: IDD: http://www.unidata.ucar.edu/content/software/idd/index.html THREDDS: http://www.unidata.ucar.edu/content/projects/THREDDS/tech/server/ServerStatus.html LEAD: http://lead.ou.edu/ RLS: http://www.isi.edu/~annc/papers/chervenakRLSjournal05.pdf
IN33B-1178
The Gulf of Maine Ocean Data Partnership: Building a Region-wide Information System from the Bottom Up
The Gulf of Maine Ocean Data Partnership promotes and coordinates the sharing, linking, electronic dissemination, and use of data on the Gulf of Maine. The Partnership was formed through a Memorandum of Agreement in 2004 as a region-wide effort to assist individual institutions in publishing valuable data and information on the physical, biological, chemical and geologic conditions in the Gulf of Maine. The Partnership has grown to include twenty-one member organizations from government agencies, intergovernmental organizations, academic, research, and other nongovernmental and nonprofit entities. The Partnership provides guidance and technical assistance to members, a forum for exchanging ideas and experiences, and in the future, a web-based data portal to be hosted by the Gulf of Maine Ocean Observing System (GoMOOS). Four project areas have been designated as priorities: Data Assurance, Data Discovery, Data Accessibility, and Data Interoperability. Recommendations and protocols are in development for each project area. These are based on either accepted protocols, or best practices if no protocols exist. Accomplishments to date include a comprehensive inventory of the data management and technology capabilities of participants and a completion of a pilot project on metadata issues. A hands-on Metadata Training Workshop was held in the fall to familiarize participants with preparing compatible metadata for publishing their data collections on the web. The Partnership seeks to advance a truly integrated ocean observing system in the Gulf of Maine Designed and implemented as a grass-roots organization, it will be an important building block of the nascent Northeast Regional Association that will become a component of the US Coastal Integrated Ocean Observing System (IOOS) and a member of the National Federation of Regional Associations.
IN33B-1179
Use of an Enhanced NetCDF Data Model and Interface for Scientific Data Access and Sharing
NetCDF is already widely used for creating, accessing, distributing, archiving, and sharing data in the geosciences. Independently developed HDF software implements another popular data model, data access libraries, and format for scientific data. Recently, we have developed software that implements an enhanced data model for netCDF using the HDF5 storage layer. The new software provides compatibility with existing netCDF programs and data but also supports the use of more powerful data modeling abstractions that may be used to capture more of the meaning in data. We discuss how the new features are intended to be used, and make recommendations for both data providers and developers who may be considering the use of netCDF for future archives or applications.
IN33B-1180
The LASP Interactive Solar IRradiance Datacenter (LISIRD)
LASP has created an online resource for combined solar irradiance datasets from the SORCE, TIMED, UARS, and SME missions. The LASP Interactive Solar IRradiance Datacenter (LISIRD) not only provides unified access to the individual datasets, but also combines them for ease of use by scientists, educators, and the general public. In particular, LISIRD makes available composite spectra and time series. The TIMED SEE, SORCE SOLSTICE, and SORCE SIM instruments produce spectra that together cover the solar spectrum from 1 to 2700 nm. Through the LISIRD interface, the user can get data that bridges the various missions in both wavelength and time. LISIRD also hosts data products of interest to the space weather community. They have slightly different needs than the atmospheric modelers that are the typical users of irradiance data. For space weather applications, high time cadence and near real-time data delivery are key. For these users, we make our observations available shortly after spacecraft contact, and append the observations to a single data file which they can retrieve using anonymous ftp every few hours. The third component of LISIRD is the Solar Physical Radiation Model (SPRM) results of Fontenla et al. It provides a model of current solar activity, the synthetic spectral irradiance, and tools that permit one to model the solar activity source of the spectral irradiance variations.
IN33B-1181
The Dawn Science Database: A Data System for a Cost-Capped Mission
Dawn is a NASA Discovery mission that will orbit the asteroids Vesta and Ceres, following its launch in the summer of 2006. Despite the fact that the Dawn Science Team is distributed across the globe, it must live within the cost-cap constraints imposed on all Discovery missions. One of the many challenges of working within this environment is the development of effective data management, analysis, and archive systems. Even though all NASA missions face these same problems, NASA neither provides an off-the-shelf multi-mission solution nor does it provide a software toolkit to assist new missions with the development of mission specific solutions. The Dawn Science Center approach to this lack of institutional support has been to look to other NASA projects, primarily the Planetary Data System, for tools that can be easily adapted to meet the Dawn requirements. All NASA planetary missions must archive their data sets with the PDS. Building the Dawn internal data management system on the tools and standards of the PDS will facilitate the data archive process and reduce the overall cost to the mission. DITDOS 3 (Distributed Inventory Tracking and Data Ordering System), which is available from the Planetary Plasma Interactions (PPI) Node of the PDS, is a tool that provides a data management solution with web interface capabilities. DITDOS 3 allows developers to generate a searchable file system database containing metadata extracted from PDS labels. Users extract files (data) from the system by constructing queries against the database. Similarly, users upload data into the system by providing the necessary metadata to populate the database and construct PDS labels. Dawn is adapting the underlying DITDOS 3 database to support its mission specific needs. However, not all of the functions required by a science center can be supported with existing PDS technology. For example, while PDS labels are well suited to describing archival data, they have not been designed to handle many of the files missions need to manage (sequence products, timelines, schedules, etc.). Dawn is addressing these metadata deficiencies through the use of a local data dictionary. In this paper, we describe the Dawn Science Database (DSDb) architecture and functionality, the DITDOS 3 adaptations required to support the DSDb, and the Dawn specific enhancements to the PDS data dictionary.
IN33B-1182
Enabling Data Sharing with the Shore Side Data System (SSDS): Lessons Learned and Future Development
At the Monterey Bay Aquarium Research Institute (MBARI) we have experience in building data systems to meet scientific needs in oceanography. Many of these experiences involve system communication and data interoperability. Obviously there are many issues associated with 'systems of systems', and while this is a more tractable problem within the walls of an institute, when interoperating with external data systems, the difficulties grow rapidly. One example of a project within MBARI, the Shore-Side Data System (SSDS), has a service-oriented architecture which helped immensely in the connection of systems. The SSDS has served as a portion of the data management system for the Center for Integrated Marine Technologies (CIMT) Wind to Whales Program. Software connectivity was greatly enhanced by the service-oriented architecture of SSDS, but we still faced issues surrounding common vocabularies and ontologies that are necessary for true system interoperability. With the help of the Marine Metadata Initiative (MMI), some promising technologies are being developed to help SSDS bridge the gap between itself and other data systems. This will present the SSDS system and the portions of the architecture that have assisted in the connection of some production systems. It will also detail out where we are currently experiencing limitations and how projects like MMI are enabling interoperability with external data systems.
IN33B-1183
Facilitating Interdisciplinary Geosciences and Societal Impacts Research and Education via Dynamically Adaptive, Interoperable Data and Forecasting Systems
The problems monitoring, predicting, and responding to coastal inundation and inland flooding situations are inherently multidisciplinary. Predicting precipitation and streamflow require expertise in meteorology and hydrology. Oceanography also enters the picture in the cases where the severe storm occurs in a coastal area. Appropriate responses to such natural hazards requires integration of infrastructure and demographics data systems associated with the societal impacts community. Building and disseminating a system that will address this problem in a comprehensive and coherent manner can only be done by a team with the a broad range of technological and scientific expertise and community connections. Efforts are underway to develop interoperable data systems among the atmospheric science, hydrology, coastal oceans, and societal impacts communities, so they may conveniently and rapidly share data among their systems in cases where hazardous events threaten infrastructure and human health. The basic approach is to build on a dynamically adaptive data access and high resolution, local forecasting system being developed for the LEAD (Linked Environments for Atmospheric Discovery) project. At present, the LEAD technology is confined to local weather forecasts automatically steered by algorithms analyzing data from national forecasts. But efforts are underway to develop an expanded team that would include expertise in coupling atmospheric forecast models with hydrological and storm surge forecast models and, in turn, to coordinate those data systems with those of the GIS (Geographic Information System) community which contain most of the demographic and infrastructure information related to societal impacts. The paper will provide an update on the status of these efforts and a demonstration of how such a dynamically adaptive forecasting system focused high resolution local forecast model runs on Hurricane Katrina.
IN33B-1184
The Thredds Data Server and NetCDF Common Data Mdel
The THREDDS Data Server (TDS) is an open-source, pure Java web application that runs inside a Tomcat web server and provides metadata and data access to scientific datasets. The TDS integrates THREDDS Catalog services, an OpenDAP data server, an experimental OGC Web Coverage Server, and other web services, using the NetCDF Common Data Model to read and serve scientific data. The Common Data Model (CDM) is an abstract data model that the NetCDF (Unidata), HDF5 (NCSA) and OPeNDAP (University of Rhode Island) developers are working towards. The CDM also adds "Georeferencing Coordinate Systems" and specialized "Scientific Data Type" layers, which provide the semantics needed to convert datasets to other protocols and formats such as those required by GIS systems. This talk will overview the TDS and CDM functionality, and report on present status and future plans.
IN33B-1185
The USGODAE Monterey Data Server
The USGODAE Monterey Data Server (http://www.usgodae.org/) has been established at the Fleet Numerical Meteorology and Oceanography Center (FNMOC) as an explicit U.S. contribution to GODAE. The server is operated with oversight and funding from the Office of Naval Research (ONR). Support of the GODAE Monterey Data Server is accomplished by a cooperative effort between FNMOC and NOAA's Pacific Marine Environmental Laboratory (PMEL) in the on-going development of the GODAE server and the support of a collaborative network of GODAE assimilation groups. This server hosts near real-time in-situ oceanographic data available from the Global Telecommunications System (GTS) and other FTP sites, atmospheric forcing fields suitable for driving ocean models, and unique GODAE data sets, including demonstration ocean model products. It supports GODAE participants, as well as the broader oceanographic research community, and is becoming a significant node in the international GODAE program. GODAE is envisioned as a global system of observations, communications, modeling and assimilation, which will deliver regular, comprehensive information on the state of the oceans in a way that will promote and engender wide utility and availability of this resource for maximum benefit to society. It aims to make ocean monitoring and prediction a routine activity in a manner similar to weather forecasting. GODAE will contribute to an information system for the global ocean that will serve interests from climate and climate change to ship routing and fisheries. The USGODAE Server is developed and operated as a prototypical node for this global information system. Presenting data with a consistent interface and ensuring its availability in the maximum number of standard formats is one of the primary challenges in hosting the many diverse formats and broad range of data used by the GODAE community. To this end, all USGODAE data sets are available in their original format via HTTP and FTP. In addition, USGODAE data are served using Local Data Manager (LDM), THREDDS cataloging, OPeNDAP, and GODAE Live Access Server (LAS) from PMEL. Every effort is made to serve USGODAE data through the standards specified by the National Virtual Ocean Data System (NVODS) and the Integrated Ocean Observing System Data Management and Communications (IOOS/DMAC) specifications. USGODAE serves FNMOC GRIB files from the Navy Operational Global Atmospheric Prediction System (NOGAPS) and the Coupled Ocean/Atmosphere Mesoscale Prediction System (COAMPS) as OPeNDAP data sets using the GrADS Data Server (GDS). The server also provides several FNMOC custom IEEE binary format high resolution ocean analysis products and model outputs through GDS. These data sets are also made available through LAS. The Server functions as one of two Argo Global Data Assembly Centers (GDACs), hosting the complete collection of quality-controlled Argo temperature/salinity profiling float data. The Argo collection includes all available Delayed-Mode (scientific quality controlled and corrected) data. USGODAE Argo data are served through OPeNDAP and LAS, which provide complete integration of the Argo data set into NVODS and the IOOS/DMAC. By providing researchers flexible, easy access to data through standard Internet and oceanographic interfaces, the USGODAE Monterey Data Server has become an invaluable resource for oceanographic research. Also, by promoting the community data serving projects, USGODAE strengthens the community and helps to advance the data serving standards.
IN33B-1186
MODster: Namespaces and Redirection for Earth Science Data
MODster is a distributed, decentralized inventory server for Earth science data granules (standard units of data content and structure.) MODster connects data granule users (people who know which specific granule they want, but who don't know who has it or how to get it) with data granule providers (people or institutions that keep granules accessible online.) * If you're a provider, you can tell MODster which granules you have and where they live (i.e., their URLs.) * If you're a user, you can ask MODster for a granule, and it will transparently redirect your request to whomever has it. The key to making this work is a standard granule namespace. A granule namespace is a naming convention that associates particular names with particular granules, regardless of where those granules live. Different Earth science data products have their own granule namespaces. For example, in the MODIS granule namespace, the granule name "MOD43A2.A1998365.h5.v8.001.1999001090020.hdf" always refers to version 1 of the 5th horizontal and 8th vertical tile of the Level 3 16-day Bi-directional Reflectance Distribution Function product, acquired by the MODIS Terra sensor on 31 December 1998 and generated on 01 January 1999 at 9:00:20 AM. A MODster URL is simply a standard way of referring to a data product namespace and one of its granules. MODster URLs have the general form "http://server/namespace/granule" where "granule" is a granule name that conforms to a granule namespace, "namespace" is a MODster namespace, which is the name of a granule namespace whose conventions are known to MODster, and "server" is a MODster server, which is an HTTP server that can redirect namespace/granule requests to granule providers. A MODster URL with no granule component gets a description of the MODster namespace, its authority (the persons or institutions responsible for documenting and maintaining the naming convention), and also any services for that MODster namespace that the MODster server supports. Our current MODster implementation allows granule providers to explicitly register their granules, and can also crawl provider sites looking for granules whose names match specific rules or regular expressions.
IN33B-1187
Leveraging Industry Standards for GeoSpatial Portal Development
Rapid advances in mainstream IT data sharing techniques through the leveraging of mainstream IT standards such as the World Wide Web Consortium (W3C) extensible markup language (XML), simple object access protocol (SOAP) based web services and the Java Community Process (JCP) driven portlet technology (JSR-0168) in addition to the wide adoption of Open Geospatial Consortium (OGC) GIS web service specifications (WMS, WFS, WCS, WMC, CS-W etc.) are intersecting within commercial GIS technologies. For example, the next generation GIS Portal technology for the U.S. Government's Geospatial One-Stop has been developed to help establish an industrial strength geospatial portal that can be used as the primary U.S. Government coordinating portal for geospatial related activities. In addition to these technologies providing common highly interoperable portals, heavier desktop and server applications are further integrating technologies that will enable the scientific communities to link into these mainstream information portals. By example, we will discuss the incorporation of the Open Source scripting language known as Python into the commercial GIS platform both on the desktop and on the server. For example, users have already developed python code that can be deployed providing the GIS user access to large repositories of scientific multidimensional data via the OpeNDAP protocol that can be incorporated into the GIS analysis and workflow. Additional development in the support of NetCDF and in the future additional scientific data formats will expand the use of such formats within the GIS community. This presentation will provide an overview and demonstrations of these technologies and how they are relevant to the Earth and Space Science Informatics Community.
IN33B-1188
The Unidata Local Data Manager (LDM) at age 11
Since its initial release in 1994, the Unidata Local Data Manager (LDM) has been adopted by numerous entities and agencies to distribute and process data in near real-time via the Internet. User's of the LDM include universities engaged in geoscience research and education in the US, Canada, Costa Rica, Brazil, Argentina, and Italy. Non-educational entities include NOAA/NWS, NASA, USGS, the US Army Corps of Engineers, the US Navy, and governmental entities in Brazil, South Korea, Vietnam, and Taiwan. The LDM is the top user of Internet II in the "Advanced Applications" category. This paper describes the current LDM release. Its structure, behavior, and performance are presented as well as current usage, lessons learned, and future development plans.