... | ... | @@ -84,43 +84,16 @@ Use the `--force` switch to have the command (discard/overwrite local data and) |
|
|
|
|
|
## Manually `list`ing and `get`ting data
|
|
|
|
|
|
As [described](#downloading-data), the `pepcli pull` command downloads data into a predefined local directory structure. Users that need more fine-grained control over the download process can use the `pepcli list` and `pepcli get` commands instead.
|
|
|
Like its `pull` command, the `pepcli` utility's `list` and `get` commands allow data to be downloaded from PEP. But although they provide more fine-grained control over the download process, they do not support automatic [data pseudonymization](data-pseudonymization), making them unusable for certain types of data. **Use of these commands is therefore strongly discouraged.** They are retained only for backward compatibility purposes, and may be removed from future versions of PEP.
|
|
|
|
|
|
### `List`ing data
|
|
|
# Data pseudonymization
|
|
|
|
|
|
The [`pepcli list` command](Using-pepcli#list) accepts the same switches as the `pepcli pull` command to specify the participants, participant groups, columns, and column groups to process. E.g.:
|
|
|
While PEP generates pseudonymous subject identifiers, it does not offer automatic anonymization of the *data* stored into the system. (Generally speaking) downloaders will receive the exact data that uploaders have stored into the system. Uploaders should therefore ensure that data are stripped of any personally identifying information before they are stored.
|
|
|
|
|
|
```
|
|
|
/app/pepcli list -P all-pit -P all-denovo -C DeNovoWatchData -C Castor -c IsTestParticipant
|
|
|
```
|
|
|
|
|
|
By default the `pepcli list` command will
|
|
|
|
|
|
- immediately retrieve data that is smaller than 1000 bytes and include it in the output. This so-called *inlining* behavior can be overridden by means of the `--inline-data-size-limit` and `--no-inline-data` switches. The command's output will include identifiers for any data that is not inlined.
|
|
|
- omit local pseudonyms from the output. The inclusion of local pseudonyms can be requested by means of the `--local-pseudonyms` switch.
|
|
|
- output data in the order in which it is received from the PEP servers. Use the `--group-output` switch to have entries for the same data subject grouped into a single node.
|
|
|
- request a ticket and use it exclusively for its own data retrieval. This behavior can be overridden using the `--ticket` and `--ticket-out` switches.
|
|
|
|
|
|
The `pepcli list` command produces output in a [JSON](https://www.json.org/json-en.html) structure representing the data available in PEP:
|
|
|
|
|
|
- A top level JSON array, containing
|
|
|
- (Unnamed) objects representing (sets of) data points. Each object contains
|
|
|
- A node named `pp` with a string value containing the subject's [polymorphic pseudonym](Pseudonymization#identifiers-in-pep).
|
|
|
- If requested, a node named `lp` with a string value containing the subject's [local pseudonym](Pseudonymization#identifiers-in-pep).
|
|
|
- A node named `data` with an object value representing data that was inlined. Within the object:
|
|
|
- Node names correspond with column names.
|
|
|
- Node values contain the (raw) inlined data retrieved from PEP.
|
|
|
- A node named `ids` with an object value representing data that was not inlined. Within the object:
|
|
|
- Node names correspond with column names.
|
|
|
- Node values contain the data's identifier, which can be [passed to the `pepcli get` command](#getting-data) to retrieve the data itself.
|
|
|
Uploaders should also ensure that their data do not contain any [fixed identifiers](Pseudonymization#traditional-fixed-identifiers) associated with the data subjects. Since all downloaders will receive the same data (including any identifiers included in that data), this would allow for the data blending that PEP is intended to contravene. Just like with data *anonym*ization, the responsibility for data *pseudonym*ization lies with the uploader.
|
|
|
|
|
|
@@@ more here @@@
|
|
|
|
|
|
### `Get`ting data
|
|
|
|
|
|
When the `pepcli list` command produces identifiers for data that was not inlined, the `pepcli get` command may subsequently be used to download the data associated with the identifier. The command requires a `--ticket` switch specifying (the path to) a previously saved ticket file. Such a file can (exclusively) be produced by invoking the `pepcli list` command with the `--ticket-out` switch.
|
|
|
|
|
|
@@@ more here @@@
|
|
|
|
|
|
## 2. List data from PEP e.g. to retrieve a short pseudonym
|
|
|
|
... | ... | |