The pepcli
application is the primary command line interface (CLI) application to interact with the PEP system. It is available for multiple platforms, and is included in PEP's Docker images @@@ which one(s)? @@@ and in the Windows client software installer. Among pepcli
's functionalities are the ability to upload and download data, and to administer the PEP system.
The use of command line utilities such as pepcli
is subject to details of the platform on which it is used. For example, a literal *
(asterisk) parameter value must be escaped to \*
on Linux to prevent shell expansion. Such details are not (extensively) covered in this documentation. Users are expected to be knowledgeable enough about their platforms to perform basic tasks and avoid common pitfalls.
General usage
The pepcli
utility must be invoked from a command line, with parameters telling it what to do. The general form of invocation is
pepcli [general flags] <COMMAND> [command flags] [parameters...]
The various commands are documented in some detail on this page. The general flags are documented separately. Some commands have subcommands:
pepcli [general flags] <COMMAND> <SUBCOMMAND> [subcommand flags] [parameters...]
Command line help
The pepcli
application provides command line help if it is invoked without parameters, or with the --help
switch:
pepcli # Produces command line help
pepcli --help # Produces command line help
The --help
switch is also supported by most (or all?) of pepcli
's commands and subcommands. This can be used to "drill down" through command line help to construct an appropriate command line, e.g. by sequentially invoking:
pepcli --help # Output includes the ama command, so we then issue: pepcli ama --help # Output includes the query subcommand, so we then issue: pepcli ama query --help # Output mentions the --column-group switch, so we then issue: pepcli ama query --column-group ShortPseudonyms # The completed command line
Enrollment
Most of pepcli
's commands will connect to one or more of the PEP servers, and most server requests will require the user to be enrolled. There are two primary methods get enrolled for pepcli
usage:
- Have an OAuth token issued and present it to
pepcli
's--oauth-token
switch. Contact the PEP support team for more information on obtaining such a prefab OAuth token. - From the directory where you intend to use
pepcli
, run thepepLogon
utility and log on interactively. Your enrollment data will be stored to a file and will remain valid for a period of 12 hours. During this period, usepepcli
without its--oauth-token
switch to perform commands in the role for which you logged on.
Note that prefab OAuth tokens are usually issued for longer validity periods (think months rather than hours), e.g. making them usable to execute pepcli
from automatic server processes.
General flags
The pepcli
utility's general flags can be used to indicate how to connect to and enroll with the PEP system:
-
--client-working-directory
specifies a directory containing configuration files specifying how to connect to PEP's servers. If not specified, these configuration files are assumed to be located in the directory containing thepepcli
executable file. -
--client-config-name
specifies the name of the main (client) configuration file. If not specified, the file is assumed to be namedClientConfig.json
. -
--oauth-token
specifies an OAuth token to be used for enrollment, or the path to a file containing such an OAuth token. If not specified, the user is assumed to have been enrolled prior topepcli
's invocation, e.g. by means of thepepLogon
utility.
For brevity, these general flags will not be mentioned in the documentation of individual commands, or in the examples below. But they may be included with any command issued to pepcli
, e.g.:
pepcli --oauth-token /PATH/TO/OAuthToken.json --client-working-directory /PATH/TO/config-directory list -C \* -P \*
Other general flags exist, but are intended for use by developers of the PEP system. While mentioned in pepcli
's command line help, they are not further documented here.
Commands
The pepcli
utility supports commands for various tasks, aimed at different types of users:
General purpose:
-
query
provides information on the PEP environment.
Data storage and retrieval:
-
get
retrieves data from a specific cell. -
list
lists data available in PEP. -
pull
stores a data set in files on your local machine. -
store
stores data in a specific cell.
Administrative tasks:
-
ama
provides subcommands to perform administrative tasks related to the Access Manager service.-
ama query
summarizes the current data structure and access rules. - For users enrolled as a
Data Administrator
:-
ama column
can be used to create and remove columns, and to group and un-group them. -
ama columnGroup
can be used to create and remove columns, and to group and un-group them.
-
- For users enrolled as an
Access Administrator
:-
ama cgar
can be used to manage the type(s) of access that access groups have to column groups.
-
-
ama
The ama
command's various sub-commands can be used to perform administrative tasks. While ama
is short for "Access Manager Administration", it should be noted that ama
provides subcommands for both the Access Administrator
and Data Administrator
roles. Users must be enrolled for the role appropriate for the subcommand they're invoking.
ama cgar
The cgar
subcommand is short for "column group access rule". It allows Access Administrator
to determine the types of access that access groups have to column groups:
pepcli ama cgar create <column group name> <access group name> <mode>
pepcli ama cgar remove <column group name> <access group name> <mode>
The column group must have been previously created by a Data Administrator
using the pepcli ama columnGroup
subcommand. The mode
parameter must be either read
or write
, indicating the type of access to grant or revoke.
After using the cgar
subcommand, the rule immediately takes effect. Users enrolled for the specified access group will immediately be granted (or denied) access to the specified column group.
ama column
The column
subcommand allows Data Administrator
to create and remove columns:
pepcli ama column create <column name>
pepcli ama column remove <column name>
Because of technical limitations, PEP column names may contain only printable ASCII characters. Additional restrictions apply to the names of columns into which Castor data are imported @@@ link and/or describe @@@.
Note that column removal will not discard data present in those columns; it will merely make the column's contents inaccessible. Therefore:
- when users retrieve data from an earlier moment in time, those data may include columns that have since been removed.
- when a column is removed and later re-added, the newly added column will contain any data that had previously been stored in the same column name.
The column
subcommand can also be used to group and un-group columns into column groups:
pepcli ama column addTo <column name> <column group name>
pepcli ama column removeFrom <column name> <column group name>
When columns are added to a column group, those columns immediately become available to users who can access the column group.
Column groups can be created and removed using the pepcli ama columnGroup
subcommand. Access Administrator
can grant access to column groups using the pepcli ama cgar
subcommand.
ama columnGroup
The columnGroup
subcommand allows Data Administrator
to create and remove column groups:
pepcli ama columnGroup create <column group name>
pepcli ama columnGroup remove <column group name>
Because of technical limitations, column group names may contain only printable ASCII characters. Note that some column groups are predefined and/or automatically managed by PEP software.
Once a column group has been created, use the pepcli ama column
subcommand to determine which columns are included in the group. Access Administrator
can grant access to column groups using the pepcli ama cgar
subcommand.
ama query
The query
subcommand summarizes the current state of PEP's data structure and access rules. Both the Access Administrator
and Data Administrator
roles can invoke:
pepcli ama query
The output lists
- The
Columns
that have been defined by data administrator. - The
ColumnGroups
that have been defined by data administrator, and the columns included in each column group. - The
ColumnGroupAccessRules
that have been defined by access administrator, i.e. which access groups have what type(s) of access to which column groups. - The (participant)
Groups
that have been defined by data administrator, and the participants included in each group. - The (participant)
GroupAccessRules
that have been defined by access administrator, i.e. which access groups have what type(s) of access to which participant groups.
get
After you have list
ed data, depending on whether data was inlined or not, you have IDs for data entries. You can retrieve the data as follows:
pepcli get -t <ticket file> -i <identifier> -o <output file>
The flags are:
-
-t
The ticket you stored with the-T
flag of thelist
command -
-i
The identifier you got frompepcli list
-
-o
The file to write the output to.-
indicates stdout. This is the default. -
-m
Also retrieve the metadata and write it to the given file.-
indicates stdout. Default is to not retrieve metadata. If you do not know what you would need metadata for, then you probably don't need it.
list
You can list data e.g. as follows:
pepcli list -C <column group> -P <participant group> -T <ticket out file>
This will list the data that is in PEP, in json format. If a data entry is short enough, it will be displayed directly in the output. For larger entries it will display an id, which can be used with the get
command. There are flags to change this behaviour.
Important flags are:
-
-C
Column group to list data for. Can be repeated if you want data for more than one column group. There is a special column group*
that contains all columns. -
-c
Specific column to list data for. Can be repeated, and combined with-C
if you want multiple columns and column groups -
-P
Participant group to list data for. Can be repeated if you want data for more than one participant group. There is a special participant group*
that contains all participants. -
-p
Specific participant to list data for. Can be repeated, and combined with-P
if you want mulitple participants and participant groups -
-l
Include the local pseudonyms in the output. By default pepcli will only show polymorphic pseudonyms (PP). These are not constant, and cannot be used to see whether data belongs to the same participant. You need the local pseudonyms (LP) for that. -
-T
The first thing PEP does when you interact with it, is checking whether you have access to the partipant(group)s and column(group)s you request. If you do have access, it will hand out a ticket. You can store this ticket with the-T
flag, to use it for later actions. -
-t
You can pass a ticket from an earlier request with the-t
flag. The column(group)s and participant(group)s of this request must be a subset of the earlier request. -
-s
The size limit (in bytes) for data that should be inlined. Currently defaults to 1000. Setting this to 0 means that data wil ALWAYS be inlined -
--no-inline-data
Never inline data -
-g
Data MAY show up grouped, when it belongs to the same participant. By default this depends on the order in which data comes in, so this grouping is not guaranteed. Use-g
to force grouping of data. This may impact performance.
Note: Shells use *
for globbing. We do not want this behaviour, so make sure you escape it with a backslash or double quotes when invoking pepcli from e.g. bash.
pull
The pull
command downloads a data set from PEP and stores the data in files. If you need more fine-grained control, use the list
and get
commands instead.
pepcli pull -C <column group> -P <participant group>
This will by default store the data to the directory pulled-data
.
Important flags are:
-
-C
Column group to list data for. Can be repeated if you want data for more than one column group. There is a special column group*
that contains all columns. -
-c
Specific column to list data for. Can be repeated, and combined with-C
if you want multiple columns and column groups -
-P
Participant group to list data for. Can be repeated if you want data for more than one participant group. There is a special participant group*
that contains all partipants. -
-o
Directory to write files to. Default ispulled-data
. -
-f
Overwrite or remove existing data in output directory -
-r
Resume an interupted download -
-u
Updates an existing output directory, e.g. when new data is available. This will use the same participant(group)s and column(group)s as the original download, so-c
,-C
,-p
and-P
are not allowed with this flag
query
Use the pepcli query
command to query the system. It currently only supports a single sub-command, which lists the columns accessible to you:
pepcli query column-access
The output will include all columns and column groups accessible to you, as well as whether you have read and/or write access. Note that such access depends on the access group for which you are enrolled. Enroll for a specific access group:
- either by means of a prior call to
pepLogon
, - or by passing an appropriate OAuth token to
pepcli
.
store
You can store data with this command:
pepcli store -c <column name> -p <participant> -i /PATH/TO/DATA/FILE
This will output the identifier of the stored entry.
The flags are:
-
-c
The column to store the data in -
-p
This is either the participant identifer, or the polymorphic pseudonym to store data for. PPs can be obtained withpepcli list
. -
-i
Path to the file to store.-
means stdin, and is the default. -
-d
Data to store. Use either this or-i
-
-T
By default, pepcli will request a write-only ticket. You can use-T
and give a path to store the ticket in. If you use this flag, pepcli will also request read access to the entry that is stored. You can then use the ID in the output, together with this ticket forpepcli get
. This way you can check whether the data was stored correctly. Note that pepcli also performs its own checks to see whether the data was stored correctly.
Specific examples
Let's say we have the following column groups, with the listed columns:
- Example
- ExampleColumn1
- ExampleColumn2
- ExampleColumn3
- ExampleColumn4
- SomeData
- SomeData.ColumnA
- SomeData.ColumnB
- SomeData.ColumnC
Each participant must have a participant identifier, e.g. CP1234567890123
, provided by you. This will be the seed of the pseudonymization performed by PEP. These identifiers are very sensitive, and may NOT be sent to others. Instead, they must use the pseudonyms created by PEP.
To store data from the file data.txt
, for participant identified by CP1234567890123
in column ExampleColumn1
:
pepcli store -c ExampleColumn1 -p CP1234567890123 -i data.txt
To store the string Hello world
in column SomeData.ColumnA
:
pepcli store -c SomeData.ColumnA -p CP1234567890123 -d "Hello World"
To list the data for column groups Example
and SomeData
, and store the ticket in the file ticket.out
:
pepcli list -C Example -C SomeData -P \* -T ticket.out
If you executed the previous examples, and data.txt
was sufficiently large (>1000 bytes), this will display an ID for the column Example1
. You can get the data (printed to stdout) with:
pepcli get -t ticket.out -i <IDENTIFIER FROM LIST>