molgenis
10.0
10.0
  • Introduction
    • What is MOLGENIS
    • Try out MOLGENIS
    • Quick start (docker)
  • Find, view, query
    • Using the navigator
    • Using the search-all
    • Using the dataexplorer
    • Setup authentication
  • Data management
    • EMX format
    • Using expressions in EMX
    • Quickly import data
    • Advanced data import
    • Modify metadata
    • Questionnaires
    • Downloading data
    • MagmaScript expressions (mapping service)
    • Pseudonymisation
  • Access control
    • Users
    • Groups and roles
    • Finegrained permissions
    • Set permissions on row level (RLS)
  • Data processing
    • Scripts
    • R
    • Schedule jobs
  • Configuration
    • Settings
    • Customize MOLGENIS
    • Localization
    • Apps in MOLGENIS
    • Creating themes
    • Migration
    • Auditing
  • Interoperability
    • Swagger specification
    • Data API
    • Metadata API
    • REST api v1
    • REST api v2
    • Files api
    • Import api
    • Permission api
    • Python-api client
    • R-api client
    • Beacon api
    • FAIR api
    • RSQL operators
  • For developers
    • Developing MOLGENIS
    • Developing frontend in MOLGENIS
    • Developing Apps in MOLGENIS
    • Using an IDE (Intellij)
    • Technologies
    • Dynamic decorators
    • Running the integration tests
    • Jobs
    • Security
  • Deploy MOLGENIS
    • Using RPM
    • Technical Migration
Powered by GitBook
On this page
  • How to configure
  • Endpoints
  • Migration
  1. Data management

Pseudonymisation

PreviousMagmaScript expressions (mapping service)NextAccess control

Last updated 3 years ago

It's possible to define your own format for automatically generated identifiers that increment sequentially. These can be used as pseudonyms for your rows. On this page we will use the following sequence, but you can configure the prefix and number of digits to your liking.

GEN-0000001
GEN-0000002
GEN-0000003
etc.

Besides incrementing sequentially, the digit part of th ids can also be scrambled:

GEN-5720385
GEN-1398822
GEN-9401776
etc.

These are not random: the generated ids are guaranteed to be unique and are based on the incrementing sequence.

Note: Be aware that the scrambled identifiers will repeat when all possibilities are exhausted. So make sure that the length of the digit-part is sufficient for your use case!

How to configure

To configure an attribute as an incrementing identifier, first make sure the attribute has the following properties:

idAttribute: AUTO
dataType: string

To begin, you should create two tags. You can add them to the tags sheet of your EMX file or just create them in the Data Explorer. In a tag with relation IRI you specify the number of digits. In the other tag with relation IRI you specify the ID prefix. The actual values you want to use should go in the values attribute. Here are the tags for the sequence we're looking at:

identifier
relationLabel
relationIRI
value

hasIDDigitCount7

hasIDDigitCount

http://purl.obolibrary.org/obo/IAO_0000596

7

hasIDPrefixGen

hasIDPrefix

http://purl.obolibrary.org/obo/IAO_0000599

GEN-

Now it's time to add the tags to your id-attribute. You can do this in the Data Explorer (by opening the Attribute table and editing your attribute), or you can make it part of the model in your EMX file:

name
dataType
nillable
idAttribute
tags

pseudonym

string

false

AUTO

hasIDDigitCount7,hasIDPrefixGen

If you want the identifiers to be scrambled, you should also add the scrambled tag to the attribute:

name
dataType
nillable
idAttribute
tags

pseudonym

string

false

AUTO

hasIDDigitCount7,hasIDPrefixGen,scrambled

Endpoints

To keep track of a sequence's current value, it is stored in the database. To interact with a sequence there are two endpoints available:

GET <host>/plugin/metadata-manager/sequences

Returns the sequences present in the database.

Note: Keep in mind that the sequence identifiers contain number signs (#). These need to be URL escaped with %23 in the upcoming endpoints.

GET <host>/plugin/metadata-manager/sequences/<sequence_id>

Returns the current value of the sequence.

DElETE <host>/plugin/metadata-manager/sequences/<sequence_id>

Deletes a sequence in the database. Since sequences are not automatically deleted when an entity type is deleted, you can use this endpoint to clean up. This endpoint can also be used to "reset" a sequence: a new sequence starting at 1 will be created as soon as a new row is added.

POST <host>/plugin/metadata-manager/sequences/<sequence_id>?value=123

Sets a sequence to a specific value.

Migration

A sequence's current value is not carried over when copying, exporting, migrating or re-importing a dataset. When you import/migrate a dataset that had sequences in it, do the following:

  1. Use the GET /sequences endpoint to retrieve the sequences in the database and find the one you need

  2. Look at your dataset's last rows and figure out what the value for the sequence should be

  3. Use the POST /sequences endpoint to set the sequence's value

Make sure everything is configured correctly (see )

obo:IAO_0000596
obo:IAO_0000599
How to configure