Improper data practices erode the quality of global ecological databases and impede the progress of ecological research

Glob Chang Biol. 2024 Jan;30(1):e17116. doi: 10.1111/gcb.17116.

Abstract

The scientific community has entered an era of big data. However, with big data comes big responsibilities, and best practices for how data are contributed to databases have not kept pace with the collection, aggregation, and analysis of big data. Here, we rigorously assess the quantity of data for specific leaf area (SLA) available within the largest and most frequently used global plant trait database, the TRY Plant Trait Database, exploring how much of the data were applicable (i.e., original, representative, logical, and comparable) and traceable (i.e., published, cited, and consistent). Over three-quarters of the SLA data in TRY either lacked applicability or traceability, leaving only 22.9% of the original data usable compared with the 64.9% typically deemed usable by standard data cleaning protocols. The remaining usable data differed markedly from the original for many species, which led to altered interpretation of ecological analyses. Though the data we consider here make up only 4.5% of SLA data within TRY, similar issues of applicability and traceability likely apply to SLA data for other species as well as other commonly measured, uploaded, and downloaded plant traits. We end with suggested steps forward for global ecological databases, including suggestions for both uploaders to and curators of databases with the hope that, through addressing the issues raised here, we can increase data quality and integrity within the ecological community.

Keywords: data equity; data management; ecological analysis; functional trait; global change.

Publication types

  • Review

MeSH terms

  • Big Data
  • Databases, Factual
  • Phenotype
  • Plant Leaves*
  • Plants*