Conditions of Use

All users of Add Health restricted-use data agree to the following conditions:

  • The data files will be used solely for statistical analyses.
  • No attempt will be made to identify specific individuals, families, households, schools, institutions, or geographic locations not provided by Add Health.
  • No list of sensitive data at the individual or family level will be published or otherwise distributed.
  • When presenting or publishing results
    • In no table should all cases in any row or column be found in a single cell.
    • In no case should the total for a row or column of a cross-tabulation be fewer than ten (10).
    • In no case should a cell frequency of a cross-tabulation be fewer than ten (10) cases.
    • In no case should a quantity figure be based on fewer than ten (10) cases.
    • Data Files released should never permit disclosure when used in combination with other known data.
  • Each written report or other publication based on analysis of Add Health restricted-use data will include the acknowledgements of funding that can be found on the Add Health website at https://addhealth.cpc.unc.edu/about/#acknowledgement.
  • All journal articles based on analysis of Add Health restricted-use data will receive a PubMed Central reference number (PMCID).

In addition, all users of Add Health data, both public-use and restricted-use, agree to abide by the following LLM and AI Use Policy.

AI and LLM Use Policy

Large language models (LLMs) and other AI tools (e.g., ChatGPT, Claude AI, Microsoft Copilot, Google Gemini) may not be used to manage, process, or analyze data distributed by Add Health. This policy applies to both public-use and restricted-use data.

Under Add Health Data Use Agreements, researchers are forbidden to distribute data or other materials we supply (apart from codebooks and metadata, described below) to other members, organizations, or individuals. This means that use of LLMs or other AI is a violation of all existing data use agreements.

For purposes of this policy, LLMs are classified into three categories:

  • Type 1: LLMs that retain user-provided data for any purpose, including training the LLM (e.g., GPT, Llama)
  • Type 2: LLMs that are licensed by an institution and have conditions of use that do not permit the retention of user-provided data (e.g., Microsoft 365 Copilot with Enterprise data protection for the University of North Carolina at Chapel Hill, University of Michigan’s Maizey)
  • Type 3: Type 2 LLMs that are isolated within a secure network with no access to the Internet

Data Use by LLM Type

LLM TypeData UseReason
Type 1NoneType 1 LLMs ingest and make use of the data. This counts as redistributing the data to the company operating the LLM so it is not permitted.
Type 2NoneType 2 LLMs do not retain or make use of the data, so this does not count as redistribution. However, they are not isolated from broader networks or the Internet and heighten external data merge risks, so they would not comply with data security plans for Restricted-Use data and are not permitted.
Type 3NoneType 3 LLMs do not retain or make use of the data, and they are also isolated on individual machines or within secure networks; however, LLMs are not available within the UNC SRW, so this is not an option for Restricted-Use data. At present, we are also not accepting requests for this kind of data use for Home Institution Hosting Agreements.

Study-Level Metadata and Documentation

It is permissible to use LLMs and AI tools with our public-facing documentation, codebooks, and study-level metadata, including group or population estimates. However, use of individual-level data is not permissible.

Acknowledgement

Thanks to the University of Michigan’s Health and Retirement Study and ICPSR as well as Sebastian Karcher at Syracuse University for originating this taxonomy of LLMs and/or policy.

Add Health