EMX format
Last updated
Last updated
The default import format for MOLGENIS is 'EMX'. This is a flexible spreadsheet format (Excel, CSV) that allows you to annotate your data with a data model. This works because you can tell MOLGENIS the 'model' of your data via a special sheet named 'attributes'. Optionally, you can also add metadata on entities (i.e., classes, tables), and packages (i.e, models and submodels). It is also possible to provide packages in an emx file via the 'packages' sheet without providing the attributes sheet.
(download)
Note: In order to upload data, at least one group must be created. View the section on groups and roles, for information on creating a group.
For example, if you want to upload an Excel with sheet 'patients':
Then you must provide a model of your 'patients' via Excel with sheet named 'attributes':
'entity' should show the name of your data sheet. Each attribute the column headers in your data. Default dataType is 'string' so you only need to provide non-string values (int, date, decimal, etc). And you must always provide one idAttribute that has 'nillable' = 'FALSE'.
You can first upload the 'model' and then the 'data'. Or you can put the both into one file and upload in one go. What you prefer :-) [todo: provide example files for download]
(download)
Lets assume we want to upload multiple data sheets, with relations between them:
Cities:
Patients:
Notes: birthplace refers to elements in the cityName values in the cities table. children contains comma separated values referring to another patient via displayName (trailing spaces will be removed). Warning: when using excel, be sure your decimal separator is a ".", instead of ",", otherwise mrefs might be seen as decimals when their id is a number, this causes molgenis to see a dot between your references and the importer to fail when uploading.
Users:
Note: users looks similar patients, i.e. they are also persons having 'displayName', 'firstName', and 'lastName'. We will use this in the model below.
To model the data advanced data example, again you need to provide the 'attributes' (i.e., columns, properties). Optionally, you can also describe entities (i.e., classes, tables), and packages (i.e, models and submodels) which gives you some advanced options.
Attributes:
The example below defines the model for entities 'city', 'patient' and 'user'. Note that 'users' had some attributes shared with 'patients' so we will use 'object orientation' to say that both 'user' and 'patient' are both a special kind of 'persons'. This will be defined using the 'extends' relation defined in the 'entities' sheet below.
Entities:
In most cases the 'attributes' sheet is all you need. However, in some cases you may want to add more details on the 'entity'. Here we wanted to show use of 'abstract' (i.e., interfaces) to create model class 'persons' and 'extends' (i.e., subclass, inheritance) to define that 'user' and 'patient' have the same attributes as 'persons'. When data model become larger, or when many data sheets are loaded then the 'package' construct enables you to group your (meta)data.
Packages:
For names in the EMX format, the following rules apply:
Name cannot be empty.
Only letters (a-z
, A-Z
), digits (0-9
), underscores (_
) and dashes (-
) are allowed.
The keywords: login
, logout
, csv
, base
, exist
, meta
and _idValue
are not allowed as
entity and attribute names.
attribute names
attribute names also allow the hash character (#
), e.g. #CHROM
is a valid attribute name.
In attribute names, the dash (-
) is reserved for localization, e.g. description-nl
contains the
Dutch translation of the description
attribute.
labels
These restrictions only apply to the technical names, labels are not limited by these rules.
Creating a package without a parent package (also known as a root package) automatically results in the creation of a group. Initially the group name is set to the package identifier, unless the package identifier is not a valid group name. In this case a unique group name is generated. The initial group label is set to the package label. Otherwise the group creation is the same as when created using the security manager plugin. Both name and label can be modified afterwards.
View the section on groups and roles, for information on creating a group.
Name of the entity this attribute is part of
Name of attribute, unique per entity.
Defines the data type (default: string)
Used in combination with xref, mref, categorical, categorical_mref or one_to_many. Should refer to an entity.
Whether the column may be left empty. Default: true
Whether this field is the unique key for the entity. Default: false. Use 'AUTO' for auto generated (string) identifiers.
Whether the value for this field is automatically generated. Default: false. Can be set to true when idAttribute is true or data type is one of [string, data, datetime].
Free text documentation describing the attribute
Description for specified language (can be multiple languages, example: description-nl)
Used to set range in case of int or long attributes
Used to set range in case of int or long attributes
true/false, default false
Indicates if this attribute should appear in the xref/mref search dropdown in the dataexplorer. A lookupAttribute must be visible. An entity inherits the lookupAttributes from the entity it extends.
If an entity has no lookupAttributes, the labelAttribute is used in the dropdown.
optional human readable name of the attribute
label for specified language (can be multiple languages, example: label-nl)
true/false to indicate if the user can use this atrribute in an aggregate query
true/false to indicate that the value of this attribute should be used as label for the entity (in the dataexplorer when used in xref/mref). Default: false. A labelAttribute must be visible. If an entity's idAttribute is not visible, it should have a labelAttribute.
true/false to indicate a readOnly attribute
ability to tag the data referring to the tags sections, described below
Magma JavaScript validation expression that must return a bool. Must return true if valid and false if invalid. See the Expressions section for a syntax description.
true/false to indicate whether the attribute can be seen by users. Can also contain a Magma JavaScript expression to dynamically decide if the attribute should be shown or not. See the Expressions section for a syntax description.
value that will be filled in in the forms when a new entity instance is created. For mref and categorical_mref, this should be a comma separated list of ids. For categorical and xref this should be the id of the refEntity. For bool should be true or false. For datetime should be a string in the format YYYY-MM-DDTHH:mm:ssZZ. For date should be a string in the format YYYY-MM-DD.
is used to group attributes into a compound attribute. Put here the name of the compound attribute.
is used to create computed attributes.
Computed object example: "computed myXref" (config attributes table)
Create a two new target attributes (attr1, attr2) in a new entity (newEntity).
Create a xref attribute (myXref) to contain the computed entity.
Add in the expression column of new xref attribute (myXref) the next script: "{attr1: myAttr1, attr2: myAttr2}"
The name of the attributes to convert from should be in the same entity as the new xref attribute (myEntity).
Template
In addition to basic 'computed strings' and 'computed' objects a template can be used as expression. The template expression format is: {"template":"..."} with the value a Mustache template. Tags must refer to attribute identifiers (e.g.
). For attributes referencing another entity type the attribute in the referencing entity type needs to be specified as well (e.g.
).
Example:
unique name of the entity. If packages are provided, name must be unique within a package.
reference to another entity that is extended
name of the group this entity is part of
indicate if data can be provided for this entity (abstract entities are only used for data modeling purposes but cannot accept data)
free text description of the entity
description for specified language (can be multiple languages, example: description-nl)
the backend (database) to store the entities in (currently only PostgreSQL)
ability to tag the data referring to the tags sections, described below
unique name of the package. If parent package is provided the name is unique within the parent.
free text description of the package
use when packages is a sub-package of another package
mechanism to add flexible meta data such as ontology references, hyperlinks
Optionally, additional information can be provided beyond the standard meta data described above. Therefore all meta-data elements can be tagged in simple or advanced ways (equivalent to using RDF triples). For example, above in the packages example there is a 'homepage' tag provided. For example:
unique name of this tag, such that it can be referenced
the human readable label of the tag (e.g. the 'like' tag as shown above).
url to the value object (will become an hyperlink in the user interface)
human readible label of the relation, e.g. 'Documentation and Help'
url to the relation definition, e.g. http://edamontology.org/topic_3061
name of the code system used, e.g. EDAM
You can internationalize attribute labels and descriptions, entity labels and descriptions and you can define internationalized versions of entity attributes.
description-{languageCode} : description for specified language (can be multiple languages) label-{languageCode} : label for specified language (can be multiple languages)
Example:
description-{languageCode} : description for specified language (can be multiple languages) label-{languageCode} : label for specified language (can be multiple languages)
Example:
You can internationalize attributes by postfixing the name with -{countryCode}.
If this is the label attribute, you must set all city-xx labelAttribute values to 'TRUE' on the 'entities' tab.
Example:
entities:
gender: