top of page

Principles of Finding and Citing Data (#11)

This letter guides users through the essential principles of finding and citing data, offering practical strategies to locate datasets, considerations for evaluating them, and solutions to common challenges. It emphasizes the importance of leveraging existing data ethically and transparently to enhance research impact and reproducibility.

6342397.jpg

Introduction

 

​Finding and citing data are essential practices for fostering transparency, reproducibility, and ethical research. Whether reusing existing datasets or sharing newly generated ones, these principles help ensure that researchers build on solid foundations while respecting intellectual property and privacy constraints. This guide outlines why you should look for existing data, strategies for finding it, considerations to keep in mind, and challenges you may encounter in the process.

Why Look for Existing Data?

Using existing datasets can provide numerous benefits:

  • Save resources: Reuse data for new studies, reducing time, effort, and costs associated with new data collection.

  • Validate research: Replicate and verify the findings of prior studies.

  • Compare findings: Analyze and compare results across different studies.

  • Reinterpret data: Reassess or reinterpret earlier data using new methodologies or perspectives.

  • Expand studies: Extend previous research across time, geography, or populations by combining multiple datasets.

  • Enhance modeling: Test or develop computational models with robust datasets.

How to Find Existing Data

When searching for data, clearly define your needs and employ effective strategies:

  • Clarify your data needs:

    • What type of data answers your research question (e.g., statistical summaries, raw datasets)?

    • What is the required geographic scope (e.g., regional, national, or global)?

    • Do you need aggregated or individual-level data?

  • Identify sources:

    • Look for agencies or organizations likely to collect relevant data (e.g., government bodies, industry, research institutions).

    • Explore domain-specific or generalist repositories.

  • Check literature:

    • Review publications in your field to locate authors or studies sharing data.

    • Examine the methods, data, and references sections for potential leads.

Tools for Data Discovery

Several tools and platforms are available to help you find the datasets you need:

These tools make it easier to locate relevant data, saving you time and effort (Link: https://www.re3data.org/).

Considerations for Finding Data

When evaluating a dataset, consider the following:

  • Data format: Will you need to manipulate it, or is it provided as summaries?

  • Timeliness: How current does the data need to be for your research?

  • Provenance: Understand how the data was collected or generated.

  • Completeness: Are all variables, units, and guides to the data available?

Best Practices for Data Citation

Citing datasets properly is crucial for giving credit to data creators and enabling others to locate the data. Include these key elements in your citation:

  • Author(s) or organization: Who created or published the dataset.

  • Dataset title: The official name of the dataset.

  • Year of publication: When the dataset was published or updated.

  • Repository or publisher: Where the dataset is hosted.

  • Persistent Identifier (e.g., DOI): A unique, permanent identifier that links to the dataset.

Following these practices ensures transparency and contributes to the integrity of scholarly communication.

Challenges with Finding Data

Researchers often face challenges related to data access and usability, including:

  • Limited applicability: Some studies may not involve data collection.

  • Data sharing policies: Data may be:

    • Openly available in repositories or public sources.

    • Conditional on user agreements.

    • Restricted due to privacy or ethical considerations.

    • Shared as supplementary materials.

    • Available only through third-party vendors.

    • Unavailable due to high sensitivity or lack of participant consent.

  • Non-digital data: Some data may only be available in physical archives.

Conclusion

​Understanding the principles of finding and citing data empowers researchers to leverage existing resources effectively while adhering to ethical standards. By carefully considering your data needs, exploring appropriate sources, and acknowledging challenges, you can enhance the rigor and impact of your research. Always prioritize proper citation and transparency to contribute meaningfully to the broader research community.

CC
External Data
Own data

Disclaimer

I hope this was an interesting read. If you have comments, remarks, or suggestions about other RDM-related topics for the next newsletters, please let me know by sending me an email at dukkart@itc.rwth-aachen.de.

Image designed by freepik

bottom of page