Safeguarding Data

AGU seeks your input on a new draft of a position statement addressing safeguarding data to address global challenges for our future. Your feedback will be considered in revisions of this statement. Submit your comments by 30 April.

Safeguarding data to address global challenges for our future

Achieving accessibility, transparency and reproducibility in research

It is urgent for humanity to address serious and complex global challenges including sustainability, ecology, biodiversity, water and food security, and climate. Trustworthy and reproducible science requires that data - including their provenance - behind an assertion be accessible to evaluate and build upon. International collaboration and the open sharing of data are essential for addressing these challenges and promoting new scientific advances. While great strides have already been made regarding data sharing platforms and data and metadata standards, work remains to facilitate careful collection, use, stewardship and rewarding of data across the Earth and space sciences.

This statement takes the broadest interpretation of the term “data” according to the Beijing Declaration on Research Data: data can be collected, generated or compiled by humans or machines and include (but not limited to) metadata, samples, methods, software and algorithms. Infrastructure includes hardware, intangible assets such as software, and human capital.

Empowering scholarship

To achieve an ecosystem where data are consistently collected, well described, shared, preserved and reused there needs to be significant continuous evolution in (i) equitable access to data and infrastructure, (ii) a change in the research culture and the scientific rewarding system, and (iii) standards around describing and documenting data for multidisciplinary research.

Ensuring equitable access to trusted and reusable data

Researchers have a responsibility to collect, document and share data in an ethical manner, that is as open and transparent as possible. Persistently funded open research infrastructure must facilitate ease of deposit and long-term reuse, both for the management of data through the research process, but also to ensure the data are FAIR - Findable, Accessible, Interoperable, and Reusable -  for people and machines. Given the challenges associated with data related to national security, intellectual property concerns and cultural sensitivities this statement takes the position of ‘as open as possible, as closed as necessary’, in alignment with the UNESCO  Recommendation on Open Science. Future data ecosystems could implement federated infrastructure to ensure equitable access without compromising national security, privacy, and other concerns. Acknowledging the context and provenance of data is highly important and should be done with respect to nature and people (according to the CARE Principles for Indigenous Data Governance), and to build the foundation for trustworthy Artificial Intelligence (AI).

Changing research culture to prize data contributions

Researcher behaviour is strongly influenced by the cultures of institutions, publishers, and funders. Thus, it is critical that these entities recognize and reward contributions by people serving different roles across the life cycle of data. Accordingly, robust data stewardship requires cooperation between individual researchers, scientific facility teams, disciplinary communities, and repository personnel. In addition, researcher training should incorporate a curriculum on best practices for data stewardship and disciplinary communities. Domain repositories should prioritise and reward efforts to develop, share, and adopt these best practices. Dedicated data managers/curators are integral to building synergies among these many players and need to be appropriately funded and recognized. 

Implementing metadata standards across disciplines

In our interdisciplinary and diverse research community, data allows researchers from different domains to communicate concretely about specific measurements and analysis. However, which salient features of data are captured, and how they are described, often differs by domain, creating challenges to collaboration and data reuse. Additionally, describing and accounting for the specific contexts (e.g. terminologies, uncertainty, sampling biases) in data is especially important for responsible use of automated data-driven analysis and artificial intelligence.  It is thus paramount that our community continuously improve data documentation and adopt standards for transparent, machine-readable and understandable metadata, and data properties.

Taking collective responsibility

Robust stewardship, preservation and sharing of data requires individual actions by all players in the global research ecosystem: researchers, institutions, funders, publishers, and research infrastructure stewards. Infrastructure supporting the complete data life cycle should be globally distributed across public, private, commercial and not for profit entities. Limiting infrastructure within national boundaries restricts the development of science addressing global challenges. All entities forming this ecosystem have a responsibility to ensure their infrastructure not only facilitates equitable and sustained ease of use for both data depositors and users, but is also interoperable and focused on collective reuse of data.

In summary:

  • Researchers should engage with research infrastructure and repositories to ensure their data are FAIR and data are developed/used responsibly
  • Research infrastructure can be key partners for researchers in making their data FAIR and should ensure metadata capture is robust, standardized, and comprehensive (including widespread usage of persistent identifiers for researchers, samples, datasets, and software, for instance)
  • Publishers and Journals should require data (as defined broadly in the beginning of this statement) be archived in federated and recognized (domain) repositories and be available on the publication of the work while mitigating concerns of privacy, national security and data sovereignty
  • Institutions should recognise, reward and incentivize the work associated with responsible data sharing and stewardship
  • Funders should recognise data as a primary scholarly output and include the requirement for responsible and FAIR data sharing in all funding agreements, with consequences for non-compliance
Position Statement Guidelines
Before submitting feedback, please review our guidelines for writing comments on position statement drafts.

Join the Conversation

Please enter constructive feedback on this position statement. Your comments will be reviewed and added to the public comment section below.

NAME
EMAIL
AFFILIATION
ARE YOU AN AGU MEMBER?
I HAVE READ THE DRAFT STATEMENT ON SAFEGUARDING DATA AND...
Public Comments
26 April 2024
Valuing data contributions needs to extend to the generation of data as well. Often, there is little support for or recognition of the basic research that generates the data. Likewise, the responsibility needs to extend to the quality of data. Databases need to have a way to easily submit corrections and flag anomalies, and there needs to be adequate curation to deal with those problems.
2 April 2024
Thanks for work. For nearly 30 years, AGU has framed it’s leading data position statement as “Earth science data are a world heritage.” This concept was introduced in the very first data position statement in 1997 (which stated that, “Earth and space data are a national, and in many cases, an international resource” ) https://www.codata.info/data_access/policies.html#AGU) and the next position statement included the “World Heritage” framing. This has been an impactful statement and has been used (i can personally attest) in discussing and arguing for open science across federal agencies, National Academies, internationally and with other societies and publishers—AGU was providing a leading example of the need for open data, for science and the public (both). This framing provides that ALL data should be open and available and curated using leading practices. This draft, unfortunately imho, reframes the need for open Earth science data primarily in terms of grand challenges and trust. These are important but definitely narrow the larger justifications.

The need for high quality open Earth science data extends well beyond these issues. The Earth (and solar system, etc) are noisy and complex—understanding them and the various processes and history—requires data across space and time. Collectively, individual observations, data, and models integrated in aggregate have led to our understanding of evolution (the fossil and tectonic records), Earth’s magnetic field and history, the interior structure of the Earth, it’s geochemical history and on and on. Many of these data are not directly related to “Grand challenges and sustainability” as usually understood, in that the primary need for quality open data is to advance science and build this understanding (in addition to trust in the conclusions). Collectively many of these data and integrated knowledge have provided huge (HUGE) economic and other benefits, related to heath and medicine, weather prediction, energy, mineral resources, hazard mitigation, navigation (GPS), and more (see https://eos.org/editors-vox/earth-and-space-science-for-the-benefit-of-humanity for many examples). Going forward, diverse high-quality data of all sorts will be needed for growing AI/ML and other applications—some of which are connected to grand challenges directly, but not all (e.g., see https://doi.org/10.17226/26532. DOI: 10.22541/essoar.168132856.66485758/v1 doi: 10.1038/d41586-023-03316-8 for discussion).

Would thus suggest keeping the broader framing and also (rather than instead) emphasizing the importance of diverse ESS data for grand challenges, sustainability, and many other society needs (really already resulting in $trillion benefits). Indeed, this is an opportunity to be more explicit about the diverse benefits—understanding the Earth, integration with other data, advancing science, benefiting humanity, and trust), and also calling for addressing the largest risk—lack of support for quality data curation (infrastructure and culture) more specifically.

thanks for listening.