GDPR: Anonymisation and pseudonymisation

European Citizens have a fundamental right to privacy, it is important for organisations which process personal data to be cognisant of this right. When carried out effectively, anonymisation and pseudonymisation can be used to protect the privacy rights of individual data subjects and allow organisations to balance this right to privacy against their legitimate goals.

Read this guide to find out about using these techniques.

Key points

Irreversibly and effectively anonymised data is not “personal data” and the data protection principles do not have to be complied with in respect of such data. Pseudonymised data remains personal data.

If the source data is not deleted at the same time that the ‘anonymised’ data is prepared, where the source data could be used to identify an individual from the ‘anonymised’ data, the data may be considered only ‘pseudonymised’ and thus still ‘personal data’, subject to the relevant Data Protection legislation.

Data can be considered “anonymised” from a data protection perspective when data subjects are not identified or identifiable, having regard to all methods reasonably likely to be used by the data controller or any other person to identify the data subject, directly or indirectly.

What is personal data?

Personal data means any information relating to an identified or identifiable individual. This individual is also known as a ‘data subject’.

An identifiable individual is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that individual.

The definition above reflects the wording of both the General Data Protection Regulation (GDPR) and the Irish Data Protection Act 2018. Accordingly, data about living individuals which has been anonymised such that it is not possible to identify the data subject from the data or from the data together with certain other information, is not governed by the GDPR or the Data Protection Act 2018, and is not subject to the same restrictions on processing as personal data.

What is anonymisation?

“Anonymisation” of data means processing it with the aim of irreversibly preventing the identification of the individual to whom it relates. Data can be considered effectively and sufficiently anonymised if it does not relate to an identified or identifiable natural person or where it has been rendered anonymous in such a manner that the data subject is not or no longer identifiable.

There is a lot of research currently underway in the area of anonymisation, and knowledge about the effectiveness of various anonymisation techniques is constantly changing. It is therefore impossible to say that a particular technique will be 100% effective in protecting the identity of data subjects, but this guidance is intended to assist with identifying and minimising the risks to data subjects when anonymising data. In the case of anonymisation, by ‘identification’ we mean the possibility of retrieving a person’s name and/or address, but also the potential identifiability by singling out, linkability and inference.

What is pseudonymisation?

“Pseudonymisation” of data means replacing any identifying characteristics of data with a pseudonym, or, in other words, a value which does not allow the data subject to be directly identified.

The GDPR and the Data Protection Act 2018 define pseudonymisation as the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that (a) such additional information is kept separately, and (b) it is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable individual.

Although pseudonymisation has many uses, it should be distinguished from anonymisation, as it only provides a limited protection for the identity of data subjects in many cases as it still allows identification using indirect means. Where a pseudonym is used, it is often possible to identify the data subject by analysing the underlying or related data.

Uses of anonymisation and pseudonymisation

Data which has been irreversibly anonymised ceases to be “personal data”, and processing of such data does not require compliance with the Data Protection law. In principle, this means that organisations could use it for purposes beyond those for which it was originally obtained, and that it could be kept indefinitely.

In some cases, it is not possible to effectively anonymise data, either because of the nature or context of the data, or because of the use for which the data is collected and retained. Even in these circumstances, organisations might want to use anonymisation or pseudononymisation techniques:-

As part of a “privacy by design” strategy to provide improved protection for data subjects.

As part of a risk minimisation strategy when sharing data with data processers or other data controllers.

To avoid inadvertent data breaches occurring when your staff is accessing personal data.

As part of a “data minimisation” strategy aimed at minimising the risks of a data breach for data subjects.

Even where anonymisation is undertaken, it does retain some inherent risk. As mentioned, pseudonymisation is not the same as anonymisation and should not be equated as such – the information remains personal data. Even where effective anonymisation takes place, other regulations may apply – for instance the ePrivacy directive applies in many regards to information rather than personal data. And finally, even where effective anonymisation can be carried out, any release of a dataset may have residual privacy implications, and the expectations of the concerned individuals should be accounted for.

Identification – the test under the Data Protection Acts

In order to determine whether data has been sufficiently anonymised to bring it outside the scope of Data Protection law, it is necessary to consider the second element of the definition, relating to the identification of the data subject, in greater detail.

The Article 29 Working Party on Data Protection (now replaced by the European Data Protection board, or ’EDPB’) has previously suggested the following test for when an individual is identified or identifiable:

“In general terms, a natural person can be considered as “identified” when, within a group of persons, he or she is “distinguished” from all other members of the group. Accordingly, the natural person is “identifiable” when, although the person has not been identified yet, it is possible to do it…”

Thus, a person does not have to be named in order to be identified. If there is other information enabling an individual to be connected to data about them, which could not be about someone else in the group, they may still “be identified”.

“Identifiers are pieces of information which are closely connected with a particular individual, which could be used to single them out.”

In determining whether a person can be distinguished from others in a group, it is important to consider what “identifiers” are contained in the information held. Identifiers are pieces of information which are closely connected with a particular individual, which could be used to single them out. Such identifiers can be “direct”, like the data subject’s name or image, or “indirect”, like their phone number, email address or a unique identifier assigned to the data subject by the data controller. As a result, removing direct identifiers does not render data sets anonymous. Data which are not identifiers may also be used to provide context which may lead to identification or distinction between users – e.g. a series of data about their location, or perhaps their shopping or internet search history. Indeed, these kinds of data series on their own may be sufficient to distinguish and identify an individual.

However, just because data about individuals contains identifiers does not mean that the data subjects will be identified or identifiable. This will depend on contextual factors. Information about a child’s year of birth might allow them to be singled out in their family, but would probably not allow them to be distinguished from the rest of their school class, if there are a large number of other children with the same year of birth. Similarly, data about the family name of an individual may distinguish them from others in their workplace, but might not allow them to be identified in the general population if the family name is common.

On the other hand, data which appears to be stripped of any personal identifiers can sometimes be linked to an individual when combined with other information, which is available publicly or to a particular individual or organisation. This occurs particularly in cases where there are unique combinations of connected data. In the above case for instance, if there was one child with a particular birthday in the class then having that information alone allows identification.

Identifiability and anonymisation

The concept of “identifiability” is closely linked with the process of anonymisation. Even if all of the direct identifiers are stripped out of a data set, meaning that individuals are not “identified” in the data, the data will still be personal data if it is possible to link any data subjects to information in the data set relating to them.

Recital 26 of the GDPR provides that when determining whether an individual is identifiable or not “[…] account should be taken of all the means reasonably likely to be used, such as singling out, either by the controller or by another person to identify the natural person directly or indirectly” and that when determining whether means are ‘reasonably likely to be used’ to identify the individual “[,,,] account should be taken of all objective factors, such as the costs of and the amount of time required for identification, taking into consideration the available technology at the time of the processing and technological developments.” Recital 26 also clarifies that the principles of data protection do not apply to anonymous information. …”

Therefore, to determine when data is rendered anonymous for data protection purposes, you have to examine what means and available datasets might be used to re-identify a data subject. Organisations don’t have to be able to prove that it is impossible for any data subject to be identified in order for an anonymisation technique to be considered successful. Rather, if it can be shown that it is unlikely that a data subject will be identified given the circumstances of the individual case and the state of technology, the data can be considered anonymous.

Some different ways that re-identification can take place are discussed below.

If the source data is not deleted at the time of the anonymisation, the data controller who retains both the source data and the anonymised data will normally be in a position to identify individuals from the anonymised data. In such cases, the anonymised data must still be considered to be personal data while in the hands of the data controller, unless the anonymisation process would prevent the singling out of an individual data subject, even to someone in possession of the source data.

Identification risks…

Read The Full Article from the Irish DPA