What are the two methods of HIPAA de-identification?

HIPAA provides two de-identification methods under 45 CFR 164.514. Safe Harbor requires removing 18 specific identifiers and having no actual knowledge the remaining data could identify someone. Expert Determination requires a qualified statistical expert to certify the risk of re-identification is very small.

What are the 18 HIPAA Safe Harbor identifiers?

The 18 identifiers include names, geographic data smaller than a state, dates (except year) related to an individual, phone numbers, fax numbers, email addresses, Social Security numbers, medical record numbers, health plan beneficiary numbers, account numbers, certificate/license numbers, vehicle identifiers, device identifiers, web URLs, IP addresses, biometric identifiers, full-face photos, and any other unique identifying number.

Is de-identified data still protected by HIPAA?

No. Properly de-identified data under either the Safe Harbor or Expert Determination method is no longer considered protected health information (PHI) and is not subject to HIPAA Privacy Rule restrictions. However, if de-identified data is re-identified, it becomes PHI again and all HIPAA protections apply.

HIPAA De-Identification: Safe Harbor Method & Requirements

Q: What is the difference between Safe Harbor and Expert Determination under HIPAA?

\nSafe Harbor means removing all 18 listed identifiers — a checklist any compliance team can follow. Expert Determination lets you keep more detailed data such as monthly dates or sub-state geography, but a qualified statistician or privacy expert must formally certify that re-identification risk is “very small.” Expert Determination is more flexible but costs more and needs stronger records.\n\n

Q: Can ZIP codes be included in de-identified data?

\nIt depends on population size. Under Safe Harbor, you may retain the first three digits of a ZIP code only if the geographic area formed by all ZIP codes sharing those three digits contains more than 20,000 people. If that area covers 20,000 or fewer people, the ZIP must be recoded as 000. Rural ZIP prefixes often fail this threshold.\n\n

Q: Does de-identified data still fall under HIPAA?

\nNo. Once data is properly de-identified using either Safe Harbor or Expert Determination, it is no longer PHI and is not subject to HIPAA’s Privacy Rule. This is the core value of de-identification — it allows data to be used for research, analytics, or marketing without HIPAA restrictions. A limited data set, by contrast, remains PHI and still requires a Data Use Agreement.\n\n

Q: What is a limited data set under HIPAA?

\nA limited data set is a middle-ground option — it strips direct identifiers like name, address, and SSN, but may keep geographic data at the city and ZIP level plus full dates. It can be shared for research, public health, or healthcare operations under a Data Use Agreement, without full de-identification. It is still PHI; HIPAA still applies.\n\n

Q: What are the two HIPAA de-identification methods?

\nThey are Safe Harbor and Expert Determination. Safe Harbor removes specific identifiers from a defined list of 18 categories. Expert Determination relies on a qualified expert applying statistical principles to certify that re-identification risk is very small.\n\n

Q: Is deleting the patient's name enough to de-identify data?

\nNo. Removing a name alone does not de-identify someone under HIPAA. If the remaining data — such as dates, ZIP codes, age, or rare diagnoses — can still identify a person through combination or context, the data is not de-identified.\n\n

Q: What is the difference between de-identified data and a limited data set?

\nA limited data set is not de-identified. It can still contain certain dates and limited geographic details such as city and ZIP code. It also requires a Data Use Agreement and remains subject to HIPAA. Fully de-identified data carries none of those restrictions.\n\n

\n\n

Practical guidance for healthcare teams and business associates

\n\n

HIPAA De-Identification Requirements

\n\n

Many groups think data is de-identified after they remove the obvious items. But this does not meet HIPAA standards. Deleting the patient's name alone is not enough. Removing just the medical record number is also not enough.

\n\n

Even removing a whole list of obvious items may still fall short. That is, if the data can tie back to the person. HIPAA de-identification has two known paths. Both need care. More than most teams expect.

\n\n

This is one of the easiest areas to get wrong while still believing you are being careful.

\n\n

If your team uses data for:

Vendor troubleshooting
Analytics
Marketing
Model training

\n\n

Weak de-identification steps create a false comfort. They give the feel of "anonymous" data that is not truly anonymous.

\n\n

Why De-Identification Matters

\n\n

Data that is truly de-identified is no longer PHI under HIPAA. This matters a lot. It changes how you can use the data and lowers privacy risks. But the standard only applies if the data is truly de-identified. Groups run into trouble for one of two reasons:

\n\n

They remove too little
They overestimate how anonymous the data is
They use de-identified data practices inconsistently across teams and vendors

\n\n

The result is risk that could have been avoided. This is true when data leaves its first setting. In practice, weak de-identification often goes hand in hand with weak minimum necessary controls and broad vendor access.

\n\n

The Two HIPAA Methods

\n\n

HIPAA allows two ways to de-identify data:

Safe Harbor
Expert Determination

\n\n

They are different methods. Teams should stop treating them as the same thing.

\n\n

Method 1: Safe Harbor

\n\n

Safe Harbor is the more checklist-based path. Under this method, the group removes certain types of data. These include items that point to the person, their family, their employer, or household members. The group must also not know that the leftover data could ID the person. Most people have heard of this method. It is easier to explain. But it is also the method most often made too simple.

\n\n

What Safe Harbor Requires

\n\n

Safe Harbor calls for the removal of 18 types of data, such as:

names
geographic areas smaller than a state, with limited ZIP code exceptions
all date details tied to the person (except year in most cases)
phone numbers
email addresses
Social Security numbers
medical record numbers
account numbers
certificate and license numbers
vehicle IDs
device IDs and serial numbers
URLs
IP addresses
biometric IDs
full-face images
any other unique number, trait, or code that could ID a person

\n\n

That last item matters. Teams often learn the list but miss the bigger point. Safe Harbor is not about checking off fields. It is about the data that remains. Can it still point back to a person in the real world?

\n\n

The 18 Safe Harbor Identifiers: Full Reference Table

\n\n

OCR’s guidance lays out all 18 types in detail. Each one has edge cases that trip teams up in practice.

\n\n

\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n

#	Identifier Category	What It Includes — Common Edge Cases
1	Names	First, last, middle, maiden, initials. Nicknames count if they can be linked.
2	Geographic data smaller than a state	Street address, city, county, ZIP code, geocodes. Three-digit ZIP codes may be kept in some cases — see the ZIP rule below.
3	Dates (except year) related to the individual	Admission date, discharge date, birth date, death date, surgery date. Ages over 89 must be grouped into a “90 or older” bucket.
4	Phone numbers	All phone numbers: cell, fax, and work lines.
5	Fax numbers	Listed apart from phone numbers in the rule.
6	Email addresses	Personal and work email. A generic office email is not a patient ID item; a patient-linked one is.
7	Social Security numbers	Full or partial SSN. Even the last four digits must be removed.
8	Medical record numbers	EHR-assigned IDs, chart numbers, sign-up numbers.
9	Health plan beneficiary numbers	Insurance member IDs, Medicare/Medicaid member IDs.
10	Account numbers	Patient billing account numbers, money-related account IDs.
11	Certificate and license numbers	Driver’s license, work license, DEA number when tied to a patient.
12	Vehicle identifiers and serial numbers	VIN, license plate numbers, vehicle sign-up data.
13	Device identifiers and serial numbers	Medical device serial numbers, implant IDs, wearable device IDs.
14	Web URLs	Patient-linked web addresses, portal login URLs.
15	IP addresses	Patient device IP addresses found in logs or portal access records.
16	Biometric identifiers	Fingerprints, voiceprints, retinal scans, facial shape data.
17	Full-face photographs and comparable images	Clinical photos, ID photos, images where the face is visible and can be linked to a person.
18	Any other unique identifying number, characteristic, or code	The catch-all. If a piece of data can single out one person in a dataset, it fits here even if not named above.

\n\n

Three-Digit ZIP Code Rule: When Geography Can Stay

\n\n

ZIP codes are not always off-limits under Safe Harbor. The rule is based on how many people live in the area:

\n\n

If all ZIP codes sharing the same first three digits cover a geographic area containing more than 20,000 people, the three-digit prefix may remain in the dataset.
If the three-digit ZIP area covers 20,000 or fewer people, all digits must be recoded to 000.

\n\n

This matters most in rural areas. A three-digit ZIP prefix that covers a small rural county may need to be zeroed out. Urban ZIP prefixes that cover dense metro areas often pass Safe Harbor. Teams working with any map-level data below the state level should run the count check before calling the data clean.

\n\n

Date Handling: Ages, Years, and the 90-Plus Rule

\n\n

Dates are the most often misread Safe Harbor item. The rule allows keeping year alone — not month, not day. All other date details tied to the person must go.

\n\n

The second trap is age. Safe Harbor allows age as a value, with one hard cutoff: anyone aged 90 or older must have their age listed as “90 or older.” The reason is simple. Very old patients in small groups are easy to re-identify from age alone. A 97-year-old patient in a rural area who had a rare treatment can be picked out with no other data point.

\n\n

In practice, this means any dataset that keeps exact ages above 89 does not pass Safe Harbor. It does not matter what other steps were taken.

\n\n

Dates and Geography Cause More Problems Than People Expect

\n\n

Staff will think data is de-identified once names are gone. The problem is that the dataset still holds:

exact admission dates
discharge dates
surgery dates
city-level location data
mixes of age, rare diagnosis, and event timing

\n\n

Those details can point to a person fast. This is true in small towns or for unusual clinical events. A dataset that shows a:

92-year-old patient
In one small town
Who had a rare event
On a specific date

\n\n

Is easy to re-identify even without a name attached.

\n\n

Method 2: Expert Determination

\n\n

Expert Determination is more flexible, but also harder. Under this method, a qualified person uses stats to show that the risk of re-identification is very small. This path works when a team needs to keep more data than Safe Harbor allows. For example:

research-support datasets
daily analytics
product or model development
quality projects that need more time-based or location detail

\n\n

But Expert Determination is not “our data scientist looked at it” or “IT said it seems fine.” It needs a strong expert process and clear records.

\n\n

Who Qualifies as an Expert?

\n\n

OCR does not certify experts for this method. The rule calls for a person with the right knowledge and skill in accepted stats and science methods for making data not linkable to a person. In practice this means:

\n\n

A stats expert with privacy or health data skills
A bio-stats expert who knows re-identification research
A data scientist with published work in health data masking
An academic or consultant who can write up their method and stand behind it in an audit

\n\n

An internal IT analyst or compliance officer does not qualify unless they can show that background. The expert label must be solid — meaning if OCR asked, you could produce credentials, methods, and a signed report.

\n\n

What the Expert’s Report Must Contain

\n\n

There is no required template, but a solid Expert Determination report will cover:

\n\n

The expert’s skills and relevant background
What data was reviewed and how it will be used
The stats methods used to check re-identification risk
The risk threshold found (OCR guidance points to a “very small” standard, which courts and scholars read as below 0.09 — roughly a 1-in-11 chance of re-identification)
Any leftover risks found and the reason for accepting them
A signed statement that re-identification risk is very small

\n\n

Without a written report, Expert Determination is just a loose opinion. OCR expects records that can hold up to review after the fact.

\n\n

Statistical Methods Used in Expert Determination

\n\n

Common approaches include:

\n\n

K-anonymity: Each record looks the same as at least k-1 other records on a set of quasi-identifiers. A dataset with k=5 means any one person shares all key traits with at least four others.
L-diversity and T-closeness: These build on k-anonymity. They manage how sensitive traits spread within groups. This cuts guessing attacks even when a person’s identity is hidden.
Differential privacy: Adds tuned noise to query results so that no single record’s presence can be found from the output. Major health analytics tools use this method. OCR has begun to cite it in informal guidance.
Risk-based modeling: Rates the real chance of re-identification using group-level data, how unique records are, and known outside data sources (e.g., voter rolls, social media).

\n\n

The choice of method depends on the data type, its planned use, and the risk down the line. A report that explains why a method was chosen and how it was used is far stronger than one that just states a result.

\n\n

When Expert Determination Is Better Than Safe Harbor

\n\n

Expert Determination is the right choice when the data loses key research or day-to-day value under Safe Harbor’s strict removal rules. Common cases:

\n\n

Research datasets that need monthly admission trends, not just year-only dates
Health analytics that need sub-state location detail below the Safe Harbor ZIP threshold
AI and machine learning training sets where time or location detail makes the model more accurate
Quality projects that need links across data points that Safe Harbor would strip

\n\n

The tradeoff is cost and time. Expert Determination means hiring a qualified pro, paying for their review, and keeping their report as a compliance record. For datasets that pass Safe Harbor cleanly, that extra work is not worth it.

\n\n

Safe Harbor vs. Expert Determination

\n\n

The practical tradeoff is simple:

\n\n

Safe Harbor is more rigid but easier to explain

\n\n

Expert Determination is more flexible but requires stronger expertise and records

\n\n

Groups often choose Safe Harbor when they want a more standard compliance path. They choose Expert Determination when data loses too much value under Safe Harbor. The mistake is doing Expert Determination with only a rough internal review.

\n\n

The Biggest De-Identification Mistakes

\n\n

These are the failures that show up often:

\n\n

1. Removing Names and Stopping There

\n\n

This is the classic error. Teams strip the obvious direct IDs and think the dataset is now anonymous. It often is not.

\n\n

2. Ignoring Combinations of Data Points

\n\n

Single fields may seem harmless, but mixes create risk. Examples:

rare condition plus exact date
age over 89 plus location
very specific service line plus event timing
small staff or patient group plus internal day-to-day data

\n\n

3. Reusing Internal Data for New Purposes

\n\n

Data that was fine inside one workflow may not be de-identified for other uses like:

External sharing
Vendor use
Marketing analysis
Testing

\n\n

Context changes the risk.

\n\n

4. Sending Live PHI to Vendors for Troubleshooting

\n\n

This happens all the time. Teams send screenshots. They create exports, or logs. They use sample data with vendors to skip the hassle of using real info. That is not a de-identification plan. It is shortcut culture.

\n\n

5. No Documentation of the Method Used

\n\n

If someone asks how the data was de-identified, there should be a clear reply:

which method used
what identifiers removed
who approved the process
what leftover risk was weighed

\n\n

If nobody can answer those questions, the process is weak.

\n\n

"Actual Knowledge" Still Matters

\n\n

Safe Harbor is not just about deleting items. It needs the group to not have knowledge that leftover info will ID a person. You cannot strip listed data and then ignore clues your team already knows. If everyone on the team can tell who the patient is from the remaining dataset, the data is not de-identified.

\n\n

De-Identification vs. Limited Data Sets

\n\n

This is another common confusion point.

\n\n

A limited data set is not the same as de-identified data. A limited data set may still hold some dates and limited location details. It needs a data use agreement. Teams sometimes apply limited-data-set logic while calling the result de-identified. That is the wrong label, and it matters. If the data is only partly stripped and still depends on use limits, you may not be in de-identified space at all.

\n\n

What a Limited Data Set May Contain

\n\n

A limited data set strips direct IDs but keeps data that Safe Harbor would remove on purpose. It may retain:

\n\n

Town, city, state, and five-digit ZIP codes (items Safe Harbor removes)
Date details — such as admission, discharge, and service dates — that Safe Harbor limits to year only
Ages, including ages over 89 that Safe Harbor requires grouping

\n\n

What it must remove: names, postal address (street), phone numbers, fax numbers, email addresses, Social Security numbers, medical record numbers, health plan member numbers, account numbers, certificate and license numbers, vehicle IDs, device IDs, web URLs, IP addresses, and full-face photos.

\n\n

Data Use Agreement Requirements

\n\n

A limited data set can only be shared with someone who has signed a Data Use Agreement (DUA). The DUA must state that the person or group will:

\n\n

Use or share the limited data set only for the purposes named in the agreement
Not try to re-identify or contact the people in the data
Use proper safeguards to stop uses or sharing not allowed by the agreement
Report any use or sharing that breaks the agreement
Make sure any agents or subs handling the data agree to the same rules

\n\n

A DUA is not the same as a Business Associate Agreement, though some groups combine them. The DUA is specific to limited data sets. It governs how the receiving party handles data that is not fully de-identified.

\n\n

\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n

Feature	De-Identified Data (Safe Harbor)	Limited Data Set
PHI status under HIPAA	Not PHI — HIPAA rules do not apply	Still PHI — HIPAA rules apply
Agreement required	None	Data Use Agreement (DUA)
Dates retained	Year only	Full dates permissible
Geographic detail	State-level or limited 3-digit ZIP	City, ZIP, county permissible
Ages over 89	Must be grouped as “90 or older”	Exact age permissible
Permitted uses	No limits — no longer PHI	Research, public health, healthcare operations only
Re-identification risk	Must be very small per OCR rules	Handled by DUA limits, not removed

\n\n

The bottom line: if the data must keep dates and city-level details for research or analytics, a limited data set with a proper DUA is often the right path. If the planned use has no HIPAA limits (such as business analytics or public release), full de-identification is needed.

\n\n

Common De-Identification Use Cases

\n\n

Knowing where de-identification is used in the real world helps teams set up the right steps before data leaves a controlled setting.

\n\n

Clinical Research and IRB Studies

\n\n

Academic medical centers and health systems often de-identify patient records for research studies. IRBs often accept Safe Harbor as enough to waive consent. But they need proof of the method used. Expert Determination is used when the study needs date or location detail that Safe Harbor strips.

\n\n

Population Health Analytics

\n\n

Health systems and payers use de-identified claims and clinical data to track disease rates, find care gaps, and model risk at the local level. Safe Harbor often works here. The exception is when the review covers small areas or rare conditions. Those cases may need Expert Determination to avoid re-identification through a mix of small groups and kept data points.

\n\n

AI and Machine Learning Training Data

\n\n

This is where teams most often miss the re-identification risk. Training a clinical model on de-identified patient data sounds simple. But large language models and neural nets can learn patterns that re-identify people from outputs. This risk goes beyond whether the input data was cleaned. If you use patient data for AI work, treat model outputs as a possible re-identification path, not just the training set.

\n\n

Healthcare Marketing and Analytics

\n\n

Using de-identified patient data for marketing analytics is allowed under HIPAA. But only if the data is truly de-identified — not just stripped of names. Ad platforms that get patient data for audience modeling are a common source of risk. This is true when teams confuse limited data sets with de-identified data. The minimum necessary rule applies to what data reaches marketing workflows in the first place.

\n\n

Re-Identification Risk: What “Very Small” Actually Means

\n\n

HIPAA’s Expert Determination standard says re-identification risk must be “very small.” The rule does not set a number. OCR’s guidance and the research behind it point to a benchmark: re-identification odds below roughly 0.09. That means fewer than 1 in 11 people in the dataset can be correctly re-identified using real-world methods.

\n\n

That threshold matters because re-identification attacks have become much easier. Research has shown that:

\n\n

87% of Americans can be singled out using only ZIP code, birth date, and sex — three fields many teams see as harmless
Mixing a rare diagnosis with age, location, and rough event date can single out a patient in a small group with no other data
Public data sources — voter rolls, social media, property records — can be joined to “anonymous” datasets to re-identify people at scale

\n\n

Under Safe Harbor, you manage re-identification risk by removing all 18 types and confirming no one knows that the leftover data points to a person. Under Expert Determination, it takes formal testing and records. Either way, the “very small” standard is not a goal. It is a rule that can be tested and pushed in enforcement.

\n\n

Operational Uses That Deserve Review

\n\n

You should review de-identification steps if your group uses data for:

vendor troubleshooting
software testing
marketing analytics
AI or model training
internal dashboards
quality reporting
case studies
public presentations

\n\n

These are the settings where teams often move fast. They assume the data is harmless because it has no patient names. That belief is what creates later exposure with vendors, contractors, and internal teams.

\n\n

A Practical De-Identification Review Checklist

\n\n

Which method are we using: Safe Harbor or Expert Determination?
Have all required data types been removed if we claim Safe Harbor?
Does the leftover dataset still create clear re-identification risk in context?
Are dates, location, age, and rarity creating a combined exposure?
Are vendors receiving data that should be masked or de-identified first?
Is the method written up enough to explain later?

\n\n

If those questions produce fuzzy answers, the process needs work.

\n\n

Final Takeaway

\n\n

HIPAA de-identification rules are not met by surface-level edits to a dataset. The real question is not whether the most obvious IDs are gone. The question is: Does the leftover info connect back to a person with fair effort?

\n\n

That is why disciplined de-identification matters:

choose the right method
document the process
think about combinations, not just fields
do not use live PHI when masked or fake data would work

\n\n

Groups that get this right reduce risk. They do so without pretending data is safer than it is.

\n\n

If your current process depends on:

Ad hoc redaction
Screenshots
Loose judgment calls

\n\n

Review it before it turns into a privacy problem.

\n\n

Learn about HIPAA consulting support

\n\n

Documentation Requirements for Both Methods

\n\n

HIPAA does not have a single template for de-identification records. But both methods need files that hold up in an OCR audit or legal challenge. At minimum, a group should be able to produce:

\n\n

Method chosen: Safe Harbor or Expert Determination, and the reason for the choice
Dataset description: What data was processed, from what source, and for what intended use
Items removed (Safe Harbor): A record showing all 18 types were reviewed and handled, including how edge cases (ZIP codes, ages over 89, dates) were treated
Expert’s report (Expert Determination): A signed report from a qualified expert stating that re-identification risk is very small, with the method described
Approval chain: Who approved the de-identification process and who reviewed the output before it was shared
Ongoing review: Whether the de-identification process gets reviewed again as the data setting or use case changes

\n\n

Teams that treat de-identification as a one-time tech step rather than a recorded process create a gap that shows up under audit. Good records also protect you if a later party misuses data. Proof that the method was applied right shifts blame where it belongs.

\n\n

If your group needs help building a strong de-identification framework, HIPAA consulting support can speed up the process without starting from scratch.

\n\n

Frequently Asked Questions

\n\n\n\n

What are the 18 identifiers that must be removed for HIPAA Safe Harbor de-identification?

The 18 identifiers are: (1) names, (2) geographic data smaller than state level, (3) dates except year related to the individual, (4) phone numbers, (5) fax numbers, (6) email addresses, (7) Social Security numbers, (8) medical record numbers, (9) health plan beneficiary numbers, (10) account numbers, (11) certificate and license numbers, (12) vehicle identifiers, (13) device identifiers, (14) web URLs, (15) IP addresses, (16) biometric identifiers, (17) full-face photographs, and (18) any other unique identifying number or code. Ages over 89 must also be aggregated into a “90 or older” category.

\n\n

What is the difference between Safe Harbor and Expert Determination under HIPAA?

Safe Harbor means removing all 18 listed identifiers — a checklist any compliance team can follow. Expert Determination lets you keep more detailed data such as monthly dates or sub-state geography, but a qualified statistician or privacy expert must formally certify that re-identification risk is “very small.” Expert Determination is more flexible but costs more and needs stronger records.

\n\n

Can ZIP codes be included in de-identified data?

It depends on population size. Under Safe Harbor, you may retain the first three digits of a ZIP code only if the geographic area formed by all ZIP codes sharing those three digits contains more than 20,000 people. If that area covers 20,000 or fewer people, the ZIP must be recoded as 000. Rural ZIP prefixes often fail this threshold.

\n\n

Does de-identified data still fall under HIPAA?

No. Once data is properly de-identified using either Safe Harbor or Expert Determination, it is no longer PHI and is not subject to HIPAA’s Privacy Rule. This is the core value of de-identification — it allows data to be used for research, analytics, or marketing without HIPAA restrictions. A limited data set, by contrast, remains PHI and still requires a Data Use Agreement.

\n\n

What is a limited data set under HIPAA?

A limited data set is a middle-ground option — it strips direct identifiers like name, address, and SSN, but may keep geographic data at the city and ZIP level plus full dates. It can be shared for research, public health, or healthcare operations under a Data Use Agreement, without full de-identification. It is still PHI; HIPAA still applies.

\n\n

What are the two HIPAA de-identification methods?

They are Safe Harbor and Expert Determination. Safe Harbor removes specific identifiers from a defined list of 18 categories. Expert Determination relies on a qualified expert applying statistical principles to certify that re-identification risk is very small.

\n\n

Is deleting the patient's name enough to de-identify data?

No. Removing a name alone does not de-identify someone under HIPAA. If the remaining data — such as dates, ZIP codes, age, or rare diagnoses — can still identify a person through combination or context, the data is not de-identified.

\n\n

What is the difference between de-identified data and a limited data set?

A limited data set is not de-identified. It can still contain certain dates and limited geographic details such as city and ZIP code. It also requires a Data Use Agreement and remains subject to HIPAA. Fully de-identified data carries none of those restrictions.

\n\n

Can I send partially redacted patient data to a vendor for troubleshooting?

Not safely by default. If the data is still identifiable, you may still be sending PHI, which requires a Business Associate Agreement and appropriate safeguards. The better approach is masking, de-identifying, or using a small synthetic dataset that removes all real patient data.

\n\n

Key stat: The Safe Harbor method under 164.514(b) requires removal of all 18 HIPAA identifiers plus a determination that the remaining information cannot be used alone or in combination to identify an individual. The Expert Determination method under 164.514(b)(1) requires a qualified statistical or scientific expert to certify that re-identification risk is very small. Most healthcare organizations use Safe Harbor because Expert Determination is expensive and requires specialized expertise.

Sources

\n\n

HIPAA De-Identification: Requirements, Methods & Common Mistakes

HIPAA De-Identification Requirements

Why De-Identification Matters

The Two HIPAA Methods

Method 1: Safe Harbor

What Safe Harbor Requires

The 18 Safe Harbor Identifiers: Full Reference Table

Three-Digit ZIP Code Rule: When Geography Can Stay

Date Handling: Ages, Years, and the 90-Plus Rule

Dates and Geography Cause More Problems Than People Expect

Method 2: Expert Determination

Who Qualifies as an Expert?

What the Expert’s Report Must Contain

Statistical Methods Used in Expert Determination

When Expert Determination Is Better Than Safe Harbor

Safe Harbor vs. Expert Determination

The Biggest De-Identification Mistakes

1. Removing Names and Stopping There

2. Ignoring Combinations of Data Points

3. Reusing Internal Data for New Purposes

4. Sending Live PHI to Vendors for Troubleshooting

5. No Documentation of the Method Used

"Actual Knowledge" Still Matters

De-Identification vs. Limited Data Sets

What a Limited Data Set May Contain

Data Use Agreement Requirements

Common De-Identification Use Cases

Clinical Research and IRB Studies

Population Health Analytics

AI and Machine Learning Training Data

Healthcare Marketing and Analytics

Re-Identification Risk: What “Very Small” Actually Means

Operational Uses That Deserve Review

A Practical De-Identification Review Checklist

Final Takeaway

Documentation Requirements for Both Methods

Frequently Asked Questions

What are the 18 identifiers that must be removed for HIPAA Safe Harbor de-identification?

What is the difference between Safe Harbor and Expert Determination under HIPAA?

Can ZIP codes be included in de-identified data?

Does de-identified data still fall under HIPAA?

What is a limited data set under HIPAA?

What are the two HIPAA de-identification methods?

Is deleting the patient's name enough to de-identify data?

What is the difference between de-identified data and a limited data set?

Can I send partially redacted patient data to a vendor for troubleshooting?

Sources

Related Reading

Ready to Get Compliant?

BAA vs. Confidentiality Agreement: Which One Gets Signed?

HIPAA Violations: What Triggers an Investigation

What Is ePHI? Electronic Protected Health Information Explained