Satellite view of water and coastline

Data & Software for Authors

What is Needed?

AGU requires that the underlying data needed to understand, evaluate, and build upon the reported research be available at the time of peer review and publication. Additionally, authors should make available software that has a significant impact on the research. This entails:

  1. Depositing the data and software in a trusted repository, as appropriate, and preferably with a DOI
  2. Including an Availability Statement as a separate paragraph in the Open Research section explaining to the reader where and how to access the data and software
  3. And including citation(s) to the deposited data and software, in the Reference Section.

Click on the headings below for detailed information on:

Most of your questions regarding data and software should be answered by the resources below. Just in case, if you still have questions, you can contact [email protected].

What Data Needs to be Available?

Primary and processed data used for your research should be preserved and made available. Generally, the underlying data are considered to be the types of data usually preserved in domain repositories for each discipline.  These may include raw data, but are usually the processed or refined data that support and lead to the described results and allow other readers to assess your conclusions and build off your work.

In your paper, cite these data, as well as any data you used from other sources, and include information about access to the data in the availability statement.  For model or simulation data, follow journal specific guidance on prioritizing preserved output; in general, availability of software is most important.

Very large data (greater than 1 terabyte or TB) can be a challenge to preserve as there often fees and additional resources required. One option to consider, institutions often offer solutions for data preservation and compliance. Again, refer to the journal specific guidance for more information or email [email protected].

Repository Selection

The data that supports the research reported in your paper must be deposited in a trusted repository. When identifying the most appropriate repositories for your data, first, refer to the journal-specific data & software guidance below. We recommend a repository that specializes in the data for your scientific domain as this will maximize the probability that the deposited data will be findable, accessible, interoperable and reusable (FAIR). Otherwise, look to your institutional repository, your computing center, or a general repository Please note that the repository you select must offer a landing page in English in order for it to be accessible to the wider community and be in compliance. For your reference:

Note: Starting March 2021, AGU authors funded by the U.S. NSF will have their data publication fees waived when using the Dryad repository. Learn more about the AGU-Dryad partnership.

Availability Statement

An Availability Statement, located in the Open Research section of the paper, contains information about your data, software, and other research objects (e.g. notebook) and how readers can access these. The Statement should include:

  1. A brief description of the type(s) of data or software
  2. Repository Name(s) where they are deposited
  3. Version (of software)
  4. DOI, Persistent Identifier Link to Data or Software (and Identifier)
  5. Link to publicly accessible development platform (in the case of Software, e.g. GitHub)
  6. Access Conditions (e.g. if Registration is Required)
  7. Licensing/Permissions (e.g. Creative Commons Attribution)
  8. In-text citation in References (optional)

When developing the Availability Statement, consider how best to direct the reader/reviewer to your data (or software). For instance, do not simply provide a web link to the homepage of the repository. Directly link to the data (or software) or provide information/guidance necessary to get to the data (or software) efficiently. 

Check to see if the repository or data/software source has an “Acknowledgements” or “How to Cite” page to follow when putting together your Availability Statement and citation in the References section. 

Do not share data via an FTP location. It is not sufficient to include data in the supplementary information of your paper and to write that data will be available upon request. For data that is not initially available upon submission, authors should describe where the data will be shared in the Data Availability Statement and can share information within the supplementary information for peer review purposes only.

Availability Statement Templates:

  • The [type of data] data used for [brief context, description] in the study are available at [repository, source name] via [DOI, persistent identifier link] with  [license, access conditions] [optional in-text citation in References]
  • [Version number] of the [software name] used for [brief context, description of what the software was used for] is preserved at [DOI, persistent identifier link], available via [license type, access conditions] and developed openly at [software development platform link].* [optional in-text citation in References]

* For notebooks (e.g. Jupyter), also include a link to where the notebook can be run or executed via a zero-install environment in the cloud (e.g. Binder). See AGU’s Guidance for Authors - Jupyter Notebooks (or most recent version via GitHub) for more information.

The Methodology section of your paper should also describe how your data/software pertains to your research.

Data & Software Citation

Please cite in your References/Bibliography section a formal citation to the data/software described in the Availability Statement. Doing so will provide a citation credit for the data/software. Additionally, please cite data and software created by others used in your research, also to ensure proper credit for that work. If the data or software is described in a separate data or software paper, please include both that paper and the deposited data or software as separate citations. Citations should include:

  1. Author(s) or project name(s)
  2. Title / Software name
  3. Repository name / Publication venue
  4. Data release (version)
  5. Date / Software published
  6. DOI, persistent identifier, URL
  7. Major software release version (optional)
  8. Date when data was accessed, when using dynamic datasets (optional).

For more information on citations, reference the Journal Specific Guidance.

Data Citation Examples:

  • Fiechter, J., & Cheresh, J. (2020). Physical and biogeochemical drivers of alongshore pH and oxygen variability in the California Current System (Version 5). Dryad. https://doi.org/10.7291/D1D96Q
  • Edmunds, P. J., Didden, C., & Frank, K. (2021). Mean percentage cover of corals and Porites astreoides at each site by year at St. John, VI from 1992 to 2019 (Version 1). Biological and Chemical Oceanography Data Management Office (BCO-DMO). https://doi.org/10.26008/1912/BCO-DMO.843284.1
  • Alwarda, R., & Smith, I. (2021). Elevation data for Reflectors within the CO2 Deposit in Planum Australe, Mars. Zenodo. https://doi.org/10.5281/ZENODO.4639669
  • Gries, C., Downs, R. R., O’Brien, M., Parr, C., Duerr, R., Koskela, R., et al. (2019). Return on Investment Metrics for Data Repositories in Earth and Environmental Sciences [Data set]. Environmental Data Initiative. https://doi.org/10.6073/PASTA/D49BEC63F51603512EFA7E0FD2717203

Software Citation Examples:

For more information on citation examples, reference the Journal Specific Guidance. Note: See Enhanced Software Citation Support now available.

Citation Formatter

Need help with formatting your data and software citations? Try the DOI Citation Formatter and select the Formatting Style “american-geophysical-union” and “en-US” for Language and Country. Note: The Formatter resolves and negotiates DOIs from DataCite, Crossref, and mEDRA from the full list of DOI registration agencies.

Models & Simulations

For research involving models and simulations, refer to the community guidelines regarding what specifically must be made available and cited.  Otherwise, in all cases, the model and configuration information must be made available.

When the primary data for the research comes from models & simulations, follow these guidelines: 

  1. Citation of the model (most important). 

    1. BEST OPTION (model in repository): Cite the model using a repository that registers the version used for the paper with a persistent identifier (e.g., Digital Object Identifier) and metadata that describes the model using community standards. If a published paper has the complete description, please cite that also. Your citation should accurately capture the authors/creators of the model.

    2. GOOD OPTION (model described in paper): Cite the publication where the model is described with information about the version used for this paper.

  2. Description of the model.

    1. Include a description of the model in the text of the paper that is adequate to support reproducibility. If a publication describes the model thoroughly, cite that paper.

  3. Information about the configuration/parameters used to run the model.

    1. This information should be included in the paper text as well as providing any script/workflow used. The script/workflow should be preserved in a repository and cited. Any forcing datasets used should be described and cited.

  4. Data that Supports the Summary Results, Tables and Figures.

    1. BEST OPTION: Cite a package in an appropriate repository that includes scripts/workflows, provenance information, and summary files that support the research, figures and tables, consistent with archives maintained for transparency and traceability by assessments such as the IPCC.

    2. GOOD OPTION: Cite files (e.g., scripts, descriptive detail) in an appropriate repository that support evaluating the research and provide the details behind the tables and figures. 

    3. ACCEPTABLE OPTION: Provide the necessary information for transparency and traceability of the analysis using your community standards or guidance. 

  5. Model Output Data (optional)

    1. If certain model output data are instrumental to evaluating the research, then deposit these in a trusted repository. There are currently limited resources for preserving files of very large size. Selecting representative output from one or a few model runs as is recommended by a specific community may be necessary. 

 If the model or software  is not available because of the sensitivity of the research or proprietary concerns, then provide as much information as possible to support evaluation of the research and responsibility. Acceptance in such cases is at the discretion of the editors.  Papers where the primary results depend on proprietary scripts that are not available will usually not be allowed.

Data and Software Sharing Guidance for Authors Submitting to AGU journals

AGU editors, staff, and community members have developed Data and Software Guidance for Authors Submitting to AGU Journals. Sections in the guidance are available below for quick reference:


Fox, Peter, Erdmann, Chris, Stall, Shelley, Griffies, Stephen M., Beal, Lisa M., Pinardi, Nadia, Hanson, Brooks, Friedrichs, Marjorie A. M., Feakins, Sarah, Bracco, Annalisa, Pirenne, Benoî, & Legg, Sonya. (2021). Data and Software Sharing Guidance for Authors Submitting to AGU Journals. Zenodo. https://doi.org/10.5281/zenodo.5124741


AGU editors, staff, and community members have also provided a list of Domain-Displine Repositories Useful to AGU Journals. The following journals are in the process of moving to the list:  Solid Earth, AGU Advances, Earth's Future, Geophysical Research Letters (GRL), Global Biogeochemical Cycles, Paleoceanography and Paleoclimatology, Radio Science, Space Physics, Space Weather.

International Geo Sample Numbers

AGU recommends the use of IGSNs (International Geo Sample Numbers) for citing samples reported in research papers. The IGSN provides a unique identifier that allows samples to be linked across publications and searched through a central metadata repository. We strongly encourage authors to register samples with an IGSN Allocating Agent and obtain IGSNs and use them throughout their manuscript, tables, and archived data sets. We recognize IGSNs during our production process and will provide links in the manuscript and tables to the registered sample descriptions.  IGSNs can be reserved before field seasons, or assigned afterwards. For more information, see http://www.igsn.org.

Contact & Resources

If you have questions about how to comply with AGU data and software requirements for your manuscript, please contact us at [email protected].

For resources and further reading on the topics covered in this guidance, visit the Data Leadership page.