Skip to content Skip to footer

Data life cycle: Preserving

What is data preservation?

Data preservation consists of a series of activities necessary to ensure safety, integrity, and accessibility of data for as long as necessary, even decades. Data preservation is indeed more than just data storage and backup, since data can be stored and backed up without being preserved. Data preservation prevents data from becoming unavailable and unusable over time through appropriate steps.

  • Ensure data safety and integrity.
  • Change the file format (format migration) and update software to make sure that they do not become outdated or obsolete.
  • Change hardware and other storage media (such as paper, magnetic tape, etc.) to avoid degradation.
  • Ensure that data is organised and described with appropriate metadata and documentation to be always understandable and reusable.

What options do I have for data preservation as an ERIM member?

At EUR we recommend researchers to use EUR Yoda Vault in order for reseachers to archive and preserve their data for verification purposes.

For any questions related EUR Yoda Vault or data preservation, please consult with the ERIM Research Data Steward via data@erim.eur.nl.

Why is data preservation important?

There are several important reasons to preserve research data.

  • Guarantee that your data can be verified and reproduced for several years after the end of the project.
  • Allow the reuse of the data in the future for different purposes, such as teaching or further research.
  • Funders, publishers, institutions and organisations could require a specific period for preservation of certain data for a specific purpose.
  • Preserve data that have significant value for an organisation, a Nation, the environment or for the entire society.

What should be considered for preserving data?

Not all data should be preserved. Preservation should be applied to an appropriate selection of data, since it takes relevant effort and costs. Common criteria to select the data to preserve for a certain amount of time are:

  • data requared to be preserved by funder, publisher, and institution policies (usually, data should be preserved for at least 5 or 10 years after the end of the project);
  • data preservation of which is needed by legal or ethical requirements (e.g. clinical trial data);
  • unique data or that cannot be easily re-generated (e.g. raw data, analysis workflow);
  • data that will probably being reused in the future;
  • data of great value for society (scientifically, historically, or culturally).

When preparing data for preservation several requirements need to be fulfilled.

  • Do not include data that are temporary or mutable.
  • Ensure well described and self-explanatory documentation.
  • Include information about provenance.
  • Include sufficient licensing information.
  • Ensure that data is well organised.
  • Ensure that a consistent naming scheme is used.
  • Use standard, open source, file formats instead of proprietary ones.
Contributors