Skip to content

GitLab

  • Menu
Projects Groups Snippets
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • U User docs
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 0
    • Issues 0
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Packages & Registries
    • Packages & Registries
    • Package Registry
    • Container Registry
    • Infrastructure Registry
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • PEP Public
  • User docs
  • Wiki
  • Home

Home · Changes

Page history
Update home authored Jan 25, 2021 by Kai van Lopik's avatar Kai van Lopik
Hide whitespace changes
Inline Side-by-side
home.md
View page @ 429509ab
PEP is an acronym for "Polymorphic Encryption and Pseudonymization", which hints at its [features](#Features), and at the technology on which it is based. Functionally, PEP is software for the storage and retrieval of tabular data. PEP's storage consists of a single table. This rather limited data structure is offset by some features that run-of-the-mill database systems do not normally provide:
- PEP encrypts data both [at rest](https://en.wikipedia.org/wiki/Data_at_rest#Encryption) and in transit, effectively providing [end-to-end encryption](https://en.wikipedia.org/wiki/End-to-end_encryption) between the data's uploader(s) and downloader(s).
- PEP ensures that no single server or administrator or hosting party can access the data (or provide access to it) by themselves.
- PEP contravenes data blending, preventing data from multiple downloaders from being combined into a larger data set.
- PEP keeps previous data versions available after a cell's contents are overwritten.
More information on these four key features can be found below. Because of these features, PEP is usable for the storage of any (sensitive and/or confidential) information that must be made available in a pseudonymized form. Its current applications include the storage and dissemination of medical data for multiple academic research projects.
# Features
## Encryption
PEP applies strong cryptography to all data stored in the system. Cryptographic keys are only made available to authorized uploaders and downloaders, who (respectively) encrypt and decrypt the data on their local machines. Thus, data cannot be accessed by the PEP system itself, by its hosting parties, by its administrators, or by anyone else that may gain access to PEP's innards.
## Trust reduction
PEP server components complement, check, and audit each others' actions. This ensures that data confidentiality cannot be compromised by breaching a single server. A similar "four eyes" principle applies to PEP's authorization system. Multiple administrators must cooperate to grant access, preventing any single administrator from being able to expose confidential data.
## Pseudonymization
Perhaps the most central and unique feature of the system, different PEP users receive different row identifiers to refer to the same row. This prevents downloaders from blending their respective data into a single, larger data set. Thus, with its built-in pseudonymization mechanism, PEP provides some basic privacy safeguards when disseminating sensitive data such as medical or financial information.
## Retention
Data stored in PEP are never overwritten. When users store data, the system also retains any data that were previously stored in the same table cell. PEP can thus reconstruct its state as it was at any point in the past, allowing the exact same data set to be retrieved multiple times. This makes PEP eminently usable for the (storage and) retrieval of data for academic replication studies.
# Detailed documentation
The information in the below pages is to be merged into a more interlinked wiki structure:
- [General pepcli usage and examples](pepcli-usage)
- [Pseudonymized upload of MRI data](pseudonymized-upload)
\ No newline at end of file
Clone repository
  • Access control
  • April 2022 migration
  • Castor integration
  • Data structure
  • Glossary
  • Pseudonymization
  • Uploading and downloading data
  • Using pepcli
  • Home