Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
U
User docs
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 0
    • Issues 0
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge Requests 0
    • Merge Requests 0
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Package Registry
    • Container Registry
  • Analytics
    • Analytics
    • CI / CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • PEP Public
  • User docs
  • Wiki
  • Using pepcli

Last edited by Kai van Lopik Feb 25, 2021
Page history

Using pepcli

The pepcli application is the primary command line interface (CLI) application to interact with the PEP system. It is available for multiple platforms, and is included in PEP's Docker images @@@ which one(s)? @@@ and in the Windows client software installer. Among pepcli's functionalities are the ability to upload and download data, and to administer the PEP system.

The use of command line utilities such as pepcli is subject to details of the platform on which it is used. For example, a literal * (asterisk) parameter value must be escaped to \* on Linux to prevent shell expansion. Such details are not (extensively) covered in this documentation. Users are expected to be knowledgeable enough about their platforms to perform basic tasks and avoid common pitfalls.

General usage

The pepcli utility must be invoked from a command line, with parameters telling it what to do. The general form of invocation is

pepcli [general flags] <COMMAND> [command flags] [parameters...]

The various commands are documented in some detail on this page. The general flags are documented separately. Some commands have subcommands:

pepcli [general flags] <COMMAND> <SUBCOMMAND> [subcommand flags] [parameters...]

Command line help

The pepcli application provides command line help if it is invoked without parameters, or with the --help switch:

pepcli         # Produces command line help
pepcli --help  # Produces command line help

The --help switch is also supported by most (or all?) of pepcli's commands and subcommands. This can be used to "drill down" through command line help to construct an appropriate command line, e.g. by sequentially invoking:

pepcli --help                                    # Output includes the ama command, so we then issue:
pepcli ama --help                                # Output includes the query subcommand, so we then issue:
pepcli ama query --help                          # Output mentions the --column-group switch, so we then issue:
pepcli ama query --column-group ShortPseudonyms  # The completed command line

Enrollment

Most of pepcli's commands will connect to one or more of the PEP servers, and most server requests will require the user to be enrolled. There are two primary methods get enrolled for pepcli usage:

  • Have an OAuth token issued and present it to pepcli's --oauth-token switch. Contact the PEP support team for more information on obtaining such a prefab OAuth token.
  • From the directory where you intend to use pepcli, run the pepLogon utility and log on interactively. Your enrollment data will be stored to a file and will remain valid for a period of 12 hours. During this period, use pepcli without its --oauth-token switch to perform commands in the role for which you logged on.

Note that prefab OAuth tokens are usually issued for longer validity periods (think months rather than hours), e.g. making them usable to execute pepcli from automatic server processes.

General flags

The pepcli utility's general flags can be used to indicate how to connect to and enroll with the PEP system:

  • --client-working-directory specifies a directory containing configuration files specifying how to connect to PEP's servers. If not specified, these configuration files are assumed to be located in the directory containing the pepcli executable file.
  • --client-config-name specifies the name of the main (client) configuration file. If not specified, the file is assumed to be named ClientConfig.json.
  • --oauth-token specifies an OAuth token to be used for enrollment, or the path to a file containing such an OAuth token. If not specified, the user is assumed to have been enrolled prior to pepcli's invocation, e.g. by means of the pepLogon utility.

For brevity, these general flags will not be mentioned in the documentation of individual commands, or in the examples below. But they may be included with any command issued to pepcli, e.g.:

pepcli --oauth-token /PATH/TO/OAuthToken.json --client-working-directory /PATH/TO/config-directory list -C \* -P \*

Other general flags exist, but are intended for use by developers of the PEP system. While mentioned in pepcli's command line help, they are not further documented here.

Commands

The pepcli utility supports commands for various tasks, aimed at different types of users:

General purpose:

  • query provides information on the PEP environment.

Data storage and retrieval:

  • get retrieves data from a specific cell.
  • list lists data available in PEP.
  • pull stores a data set in files on your local machine.
  • store stores data in a specific cell.

Administrative tasks:

  • ama provides subcommands to perform administrative tasks related to the Access Manager service.
    • ama query summarizes the current data structure and access rules.
    • For users enrolled as a Data Administrator:
      • ama column can be used to create and remove columns, and to group and un-group them.
      • ama columnGroup can be used to create and remove columns, and to group and un-group them.
    • For users enrolled as an Access Administrator:
      • ama cgar can be used to manage the type(s) of access that access groups have to column groups.

ama

The ama command's various sub-commands can be used to perform administrative tasks. While ama is short for "Access Manager Administration", it should be noted that ama provides subcommands for both the Access Administrator and Data Administrator roles. Users must be enrolled for the role appropriate for the subcommand they're invoking.

ama cgar

The cgar subcommand is short for "column group access rule". It allows Access Administrator to determine the types of access that access groups have to column groups:

pepcli ama cgar create <column group name> <access group name> <mode>
pepcli ama cgar remove <column group name> <access group name> <mode>

The column group must have been previously created by a Data Administrator using the pepcli ama columnGroup subcommand. The mode parameter must be either read or write, indicating the type of access to grant or revoke.

After using the cgar subcommand, the rule immediately takes effect. Users enrolled for the specified access group will immediately be granted (or denied) access to the specified column group.

ama column

The column subcommand allows Data Administrator to create and remove columns:

pepcli ama column create <column name>
pepcli ama column remove <column name>

Because of technical limitations, PEP column names may contain only printable ASCII characters. Additional restrictions apply to the names of columns into which Castor data are imported @@@ link and/or describe @@@.

Note that column removal will not discard data present in those columns; it will merely make the column's contents inaccessible. Therefore:

  • when users retrieve data from an earlier moment in time, those data may include columns that have since been removed.
  • when a column is removed and later re-added, the newly added column will contain any data that had previously been stored in the same column name.

The column subcommand can also be used to group and un-group columns into column groups:

pepcli ama column addTo      <column name> <column group name>
pepcli ama column removeFrom <column name> <column group name>

When columns are added to a column group, those columns immediately become available to users who can access the column group.

Column groups can be created and removed using the pepcli ama columnGroup subcommand. Access Administrator can grant access to column groups using the pepcli ama cgar subcommand.

ama columnGroup

The columnGroup subcommand allows Data Administrator to create and remove column groups:

pepcli ama columnGroup create <column group name>
pepcli ama columnGroup remove <column group name>

Because of technical limitations, column group names may contain only printable ASCII characters. Note that some column groups are predefined and/or automatically managed by PEP software.

Once a column group has been created, use the pepcli ama column subcommand to determine which columns are included in the group. Access Administrator can grant access to column groups using the pepcli ama cgar subcommand.

ama query

The query subcommand summarizes the current state of PEP's data structure and access rules. Both the Access Administrator and Data Administrator roles can invoke:

pepcli ama query

The output lists

  • The Columns that have been defined by data administrator.
  • The ColumnGroups that have been defined by data administrator, and the columns included in each column group.
  • The ColumnGroupAccessRules that have been defined by access administrator, i.e. which access groups have what type(s) of access to which column groups.
  • The (participant) Groups that have been defined by data administrator, and the participants included in each group.
  • The (participant) GroupAccessRules that have been defined by access administrator, i.e. which access groups have what type(s) of access to which participant groups.

get

After you have listed data, depending on whether data was inlined or not, you have IDs for data entries. You can retrieve the data as follows:

pepcli get -t <ticket file> -i <identifier> -o <output file>

The flags are:

  • -t The ticket you stored with the -T flag of the list command
  • -i The identifier you got from pepcli list
  • -o The file to write the output to. - indicates stdout. This is the default.
  • -m Also retrieve the metadata and write it to the given file. - indicates stdout. Default is to not retrieve metadata. If you do not know what you would need metadata for, then you probably don't need it.

list

You can list data e.g. as follows:

pepcli list -C <column group> -P <participant group> -T <ticket out file>

This will list the data that is in PEP, in json format. If a data entry is short enough, it will be displayed directly in the output. For larger entries it will display an id, which can be used with the get command. There are flags to change this behaviour.

Important flags are:

  • -C Column group to list data for. Can be repeated if you want data for more than one column group. There is a special column group * that contains all columns.
  • -c Specific column to list data for. Can be repeated, and combined with -C if you want multiple columns and column groups
  • -P Participant group to list data for. Can be repeated if you want data for more than one participant group. There is a special participant group * that contains all participants.
  • -p Specific participant to list data for. Can be repeated, and combined with -P if you want mulitple participants and participant groups
  • -l Include the local pseudonyms in the output. By default pepcli will only show polymorphic pseudonyms (PP). These are not constant, and cannot be used to see whether data belongs to the same participant. You need the local pseudonyms (LP) for that.
  • -T The first thing PEP does when you interact with it, is checking whether you have access to the partipant(group)s and column(group)s you request. If you do have access, it will hand out a ticket. You can store this ticket with the -T flag, to use it for later actions.
  • -t You can pass a ticket from an earlier request with the -t flag. The column(group)s and participant(group)s of this request must be a subset of the earlier request.
  • -s The size limit (in bytes) for data that should be inlined. Currently defaults to 1000. Setting this to 0 means that data wil ALWAYS be inlined
  • --no-inline-data Never inline data
  • -g Data MAY show up grouped, when it belongs to the same participant. By default this depends on the order in which data comes in, so this grouping is not guaranteed. Use -g to force grouping of data. This may impact performance.

Note: Shells use * for globbing. We do not want this behaviour, so make sure you escape it with a backslash or double quotes when invoking pepcli from e.g. bash.

pull

The pull command downloads a data set from PEP and stores the data in files. If you need more fine-grained control, use the list and get commands instead.

pepcli pull -C <column group> -P <participant group>

This will by default store the data to the directory pulled-data.

Important flags are:

  • -C Column group to list data for. Can be repeated if you want data for more than one column group. There is a special column group * that contains all columns.
  • -c Specific column to list data for. Can be repeated, and combined with -C if you want multiple columns and column groups
  • -P Participant group to list data for. Can be repeated if you want data for more than one participant group. There is a special participant group * that contains all partipants.
  • -o Directory to write files to. Default is pulled-data.
  • -f Overwrite or remove existing data in output directory
  • -r Resume an interupted download
  • -u Updates an existing output directory, e.g. when new data is available. This will use the same participant(group)s and column(group)s as the original download, so -c, -C, -p and -P are not allowed with this flag

query

Use the pepcli query command to query the system. It currently only supports a single sub-command, which lists the columns accessible to you:

pepcli query column-access

The output will include all columns and column groups accessible to you, as well as whether you have read and/or write access. Note that such access depends on the access group for which you are enrolled. Enroll for a specific access group:

  • either by means of a prior call to pepLogon,
  • or by passing an appropriate OAuth token to pepcli.

store

You can store data with this command:

pepcli store -c <column name> -p <participant> -i /PATH/TO/DATA/FILE

This will output the identifier of the stored entry.

The flags are:

  • -c The column to store the data in
  • -p This is either the participant identifer, or the polymorphic pseudonym to store data for. PPs can be obtained with pepcli list.
  • -i Path to the file to store. - means stdin, and is the default.
  • -d Data to store. Use either this or -i
  • -T By default, pepcli will request a write-only ticket. You can use -T and give a path to store the ticket in. If you use this flag, pepcli will also request read access to the entry that is stored. You can then use the ID in the output, together with this ticket for pepcli get. This way you can check whether the data was stored correctly. Note that pepcli also performs its own checks to see whether the data was stored correctly.

Specific examples

Let's say we have the following column groups, with the listed columns:

  • Example
    • ExampleColumn1
    • ExampleColumn2
    • ExampleColumn3
    • ExampleColumn4
  • SomeData
    • SomeData.ColumnA
    • SomeData.ColumnB
    • SomeData.ColumnC

Each participant must have a participant identifier, e.g. CP1234567890123, provided by you. This will be the seed of the pseudonymization performed by PEP. These identifiers are very sensitive, and may NOT be sent to others. Instead, they must use the pseudonyms created by PEP.

To store data from the file data.txt, for participant identified by CP1234567890123 in column ExampleColumn1:

pepcli store -c ExampleColumn1 -p CP1234567890123 -i data.txt

To store the string Hello world in column SomeData.ColumnA:

pepcli store -c SomeData.ColumnA -p CP1234567890123 -d "Hello World"

To list the data for column groups Example and SomeData, and store the ticket in the file ticket.out:

pepcli list -C Example -C SomeData -P \* -T ticket.out

If you executed the previous examples, and data.txt was sufficiently large (>1000 bytes), this will display an ID for the column Example1. You can get the data (printed to stdout) with:

pepcli get -t ticket.out -i <IDENTIFIER FROM LIST>
Clone repository
  • Access control
  • Data structure
  • Glossary
  • Pseudonymization
  • Using pepcli
  • Home
  • pseudonymized upload