|
Perhaps the most central and unique feature of the PEP system is that different users receive different row identifiers to refer to the same row. This prevents downloaders from blending their respective data into a single, larger data set. Thus, with its built-in pseudonymization mechanism, PEP provides some basic privacy safeguards when disseminating sensitive data such as medical or financial information.
|
|
Perhaps the most central and unique feature of the PEP system is that different users receive different row identifiers to refer to the same row. This prevents downloaders from blending their respective data into a single, larger data set. Thus, with its built-in pseudonymization mechanism, PEP provides some basic privacy safeguards when disseminating sensitive data such as medical or financial information.
|
|
|
|
|
|
# Traditional data IDs
|
|
# Traditional fixed identifiers
|
|
|
|
|
|
Data storage systems usually assign unique identifiers to the entries they store. For example, (relational) database tables typically include an `Id` column:
|
|
Data storage systems usually assign unique identifiers to the entries they store. For example, (relational) database tables typically include an `Id` column:
|
|
|
|
|
... | @@ -16,4 +16,10 @@ Identifiers are commonly generated when the entry is first created. Once availab |
... | @@ -16,4 +16,10 @@ Identifiers are commonly generated when the entry is first created. Once availab |
|
|
|
|
|
While a traditional `Id` column thus achieves some form of pseudonymization, it is a privacy hazard when access to other data is restricted. For example, financial service professionals may be allowed to read the table's `BankAccountNr`, while medical personnel may be granted access to their `LastDoctorVisit`. Since both parties will also have access to the `Id` column, if an accountant and a doctor compare notes, they can build a combined data set on the basis of their common `Id` values. This will provide them with *a combination of* financial and medical information that no one has been granted access to.
|
|
While a traditional `Id` column thus achieves some form of pseudonymization, it is a privacy hazard when access to other data is restricted. For example, financial service professionals may be allowed to read the table's `BankAccountNr`, while medical personnel may be granted access to their `LastDoctorVisit`. Since both parties will also have access to the `Id` column, if an accountant and a doctor compare notes, they can build a combined data set on the basis of their common `Id` values. This will provide them with *a combination of* financial and medical information that no one has been granted access to.
|
|
|
|
|
|
Such "data blending" has been the subject of much debate, a.o. in the context of user profiling on the Internet. PEP addresses this issue by using a different type of identifier. |
|
Such "data blending" has been the subject of much debate, a.o. in the context of user profiling on the Internet.
|
|
\ No newline at end of file |
|
|
|
|
|
# Identifiers in PEP
|
|
|
|
|
|
|
|
Instead of assigning fixed identifiers to rows, PEP uses identifiers called "polymorphic pseudonyms" (PPs) that are partially randomized. A new PP value is generated whenever a data entry is accessed, causing different parties to receive different PPs for the same row. Since parties cannot match PPs between their respective data sets, this eliminates a major underpinning of the data blending.
|
|
|
|
|
|
|
|
A downside of the use of PPs is that a single party would also not be able to associate data that they retrieve at different times. But since the party could create a complete data set by downloading data in one fell swoop (instead of in batches), PP volatility provides no security in this case. PEP therefore also has the ability to calculate "user pseudonyms" (UPs) for its rows. For any given party, the same row will be assigned the same UP value at all times, allowing data from multiple downloads to be joined. But different parties will receive different UP values to refer to the same row, thus preventing data blending outside PEP. |
|
|
|
\ No newline at end of file |