DOJ Regulations Regarding Bulk U.S. Sensitive Data

On April 8, 2025, the Data Security Program (DSP) issued by the U.S. Department of Justice (DOJ) took effect, establishing national security safeguards around the handling of certain categories of sensitive U.S. personal data. These regulations, developed under Executive Order 14117, place specific restrictions on how large datasets containing sensitive information about U.S. persons may be accessed, stored, or transferred when entities or individuals from or associated with foreign countries of concern are involved.

The purpose of the DSP is to reduce the risk that large-scale datasets could be exploited for surveillance, targeting, or other activities that could compromise individual privacy or national security. As a result, universities and research institutions must carefully assess when research datasets fall within the scope of these regulations and ensure they are managed in compliance with federal requirements. 

The tabs below provide guidance to help faculty and administrators understand the regulations, identify when they may apply, and determine the appropriate next steps.

What is considered “Bulk Sensitive Personal Data” under the DSP?

“Bulk Sensitive Personal Data” (Bulk SPD) refers to large datasets that contain detailed or sensitive information about U.S. persons. These datasets may present national security risks if accessed, stored, processed, or transferred in ways that allow foreign governments of concern, or entities they control, to obtain or analyze them.

Bulk SPD includes several specific categories of sensitive personal information. A dataset is considered bulk when it meets or exceeds the numerical thresholds defined in the DSP.

Which countries are considered “foreign countries of concern?”

The DSP restricts certain activities involving China (including Hong Kong and Macau), Russia, Iran, North Korea, Cuba, and Venezuela. These restrictions apply to entities owned or controlled by these governments, including universities, research institutes, and private companies.

What categories of Sensitive Personal Data may become Bulk Sensitive Personal Data?

Bulk thresholds are counted across a 12-month period, meaning that multiple smaller datasets may together constitute a bulk dataset. Any dataset that can reasonably be linked back to U.S. persons should be reviewed if it approaches the bulk thresholds.

Categories include:

  • Personal Identifiers: Includes data that can identify an individual directly or indirectly. Examples include names linked to Social Security numbers, driver’s license numbers, passport numbers, or combinations of identifiers that could reasonably be used to identify a person when aggregated.
  • Health and Medical Information: Includes electronic health records, medical histories, diagnostic codes, treatment data, lab results, and other individually associated health details.
  • Human Genomic Data: Encompasses whole genome sequences, exome data, and other genetic information tied to human subjects. Genomic data has lower bulk thresholds than other categories due to its highly sensitive and uniquely identifying nature.
  • Other Human “Omics” Data: Includes proteomic, metabolomic, transcriptomic, epigenomic, or similar biological datasets that provide detailed information about biological traits or processes.
  • Biometric Identifiers: Includes fingerprints, facial recognition data, iris scans, voiceprints, gait patterns, or other physiological traits used for identification.
  • Precise Geolocation Data: Refers to location information tied to specific devices or individuals, typically at GPS accuracy. This includes location history from mobile devices, vehicle telematics, or wearable technology.
  • Financial Information: Covers bank account details, credit or debit card numbers, transaction patterns, income information, or other personal financial data.
Are de-identified data sets excluded from the definition of Bulk Sensitive Personal Data?

No. Even if a dataset is “de-identified,” it may still meet the regulatory definition of Bulk SPD if they exceed bulk thresholds.

Are there specific thresholds for when sensitive personal data becomes “bulk?”

Yes. The DOJ defines numerical thresholds for different categories of personal data. When these thresholds are exceeded, the dataset is treated as “bulk,” even if coded or de-identified. Because thresholds vary and may change, investigators should consult the Office of Research Protections if they believe their work may involve large datasets of sensitive personal or genomic data.

What are the Bulk SPD Thresholds?

Bulk Sensitive Personal Data Parameters (DOJ Thresholds)

Datasets meeting or exceeding these thresholds within a 12-month period may be regulated as “Bulk Sensitive Personal Data.”

Category of Data Definition/Description Bulk Threshold (U.S. Persons or Devices)
Human Genomic Data Whole genome sequences, exome data, or other human genetic information of U.S. persons. Highly sensitive and uniquely identifying. 100 individuals
Other Human “Omics” Data Proteomic, metabolomic, epigenomic, transcriptomic, or similar biological datasets that provide detailed molecular-level information. 1,000 individuals
Biometric Identifiers Fingerprints, facial recognition data, voiceprints, DNA markers, iris scans, gait patterns, or other physiological traits used for identification. 1,000 individuals
Precise Geolocation Data GPS-level or otherwise granular geolocation information tied to individuals or devices, including mobile devices, vehicles, or wearable technology. 1,000 devices or individuals
Personal Health Data Medical information, electronic health records, diagnostic or treatment data, or other protected health information. 10,000 individuals
Personal Financial Data Bank account details, credit or debit card numbers, income data, financial transaction information, and other identifiable financial data. 10,000 individuals
Covered Personal Identifiers (CPIs) Direct identifiers (e.g., name with SSN, passport number, driver’s license number) or combinations of identifiers that reasonably identify a person. 100,000 individuals

 

Why does dataset size matter?

A dataset becomes “bulk” not only because of the type of information it contains, but because the volume of records magnifies the potential harm. 

Larger datasets:

  • Increase the likelihood of profiling or predictive modeling of U.S. persons.
  • Enable pattern analysis that reveals sensitive or protected information even when identifiers are removed.
  • Facilitate re-identification, because data points may be cross matched with other publicly available or illicitly obtained datasets.
  • Elevate national security risk, particularly when aggregated health or genomic datasets could be misused for military, intelligence, or discriminatory purposes by foreign governments.
How might the DSP affect research activities at Pitt?

The DSP may affect any project that collects, analyzes, stores, or shares Bulk SPD or genomic data. Activities that involve foreign collaborators, external cloud services, data-processing tools, or service providers linked to a country of concern may require additional review. 

Do these rules apply to all foreign collaborations?

No. The DSP applies specifically to transfers of Bulk SPDs to foreign countries of concern and entities they control. Collaborations with researchers or institutions in other countries are not restricted under this rule unless data storage, processing, or access involves companies or infrastructure tied to a foreign country of concern. 

What research activities may be prohibited or require authorization?

The DSP prohibits the transfer of Bulk SPD or genomic data to covered entities affiliated with countries of concern; contracting with cloud or data-processing services under their control; or transferring large U.S. datasets to these countries. Some activities may be permissible with appropriate safeguards or approvals, but prior institutional review is essential.

How does this relate to HIPAA, FERPA, or NIH data policies?

The DSP augments existing federal requirements rather than replacing them. Research involving human subjects, medical information, student records, or genomic data must still comply with HIPAA, FERPA, NIH policies, and Pitt’s internal standards. The DOJ rule adds further scrutiny related specifically to foreign access and national security risks.

What should faculty do if their research uses or may use Bulk Sensitive Personal Data?

Faculty should contact the Office of Research Protections early in the planning stage of any project that may involve the transfer of Bulk SPD or genomic data to a foreign country of concern or covered entity affiliated with a foreign country of concern. ORP can help determine whether the project falls under the DSP and identify any required controls, approvals, or documentation. Researchers should also seek guidance when engaging foreign collaborators or using external platforms or vendors.

Who should I contact if I have questions?

Questions regarding how the DSP applies to research at Pitt should be directed to the Office of Research Protections (ORP). ORP will coordinate with other relevant university offices to ensure appropriate compliance.

Additional Information 

This guidance provides an overview of the DSP. More information can also be found on the following websites: