How to ensure anonymity of AI systems?

When training artificial intelligence systems, developers need to use privacy-enhancing technologies to ensure that the subjects of the training data are not exposed, new study suggests.
anonymity of AI
(Image: Yutong Liu)

Artificial Intelligence (AI) systems trained using machine learning retain an imprint of their training data that allows identifying data that was used to train them.

Systems trained using larger datasets are less vulnerable to identification of the training data, but effectively eliminating the vulnerability would require impractically large datasets.

The finding from an article by researchers from the DataLit project at the Finnish Center for Artificial Intelligence (FCAI) at the University of Helsinki and from Kyoto University, published at the Conference on Neural Information Processing Systems (NeurIPS), has important implications to developers training AI systems using sensitive or personal data, such as health data. 

"Developers need to use privacy-enhancing technologies such as differential privacy to ensure that the subjects of the training data are not exposed. Differential privacy allows mathematically proving that the trained model can never reveal too much information about any individual in the training dataset," says professor Antti Honkela.

Risks in training AI with personal data

The European General Data Protection Regulation (GDPR) defines strict rules for processing of personal data. According to a recent opinion by the European Data Protection Board, an AI system would be considered personal data if training data subjects can be identified from it.

The new result highlights such risk for AI systems trained using personal data.

"An important application for the result is in AI systems for health. Finnish law on secondary use of health data and new European Health Data Space Act require that AI systems developed using health data must be anonymous. In other words, it must not be possible to identify training data subjects."

Previous work from the same group of researchers shows that it is possible to train provably anonymous AI systems using so-called differential privacy during training.

The reported result was obtained by studying the vulnerability when a large image classification model pretrained on a large dataset is fine-tuned using a smaller sensitive dataset. According to the results, the fine-tuned model is less vulnerable than a model trained from scratch using only sensitive data.

Article information

Marlon Tobaben, Hibiki Ito, Joonas Jälkö, Yuan He and Antti Honkela. Impact of Dataset Properties on Membership Inference Vulnerability of Deep Transfer Learning.Opens in a new tab In Advances in Neural Information Processing 39 (NeurIPS 2025).

This article was originally published on the University of Helsinki website on 27.11.2025.

  • Updated:
  • Published:
Share
URL copied!

Read more news

Director at OKKA Foundation, Tuulikki Similä, Arto Hellas, and chairwoman of the board of directors of Nokia, Sari Baldauf.
Aalto University, Awards, Computer Science Department, Highlight Published:

Arto Hellas receives the Nokia Foundation teaching recognition award

Arto Hellas was awarded the inaugural Nokia-OKKA Educational Recognition Award for his long-term efforts in advancing ICT education.
Katsiaryna and Arash at ECAI 2025
AI, Computer Science Department, News from HIIT, Research, University of Helsinki Published:

GRADSTOP: Early Stopping of Gradient Descent via Posterior Sampling presented at ECAI 2025

HIIT Postdoc Katsiaryna Haitsiukewich presented a full paper at ECAI-2025
Mikko Kivelä and Ali Salloum
Aalto University, Computer Science Department, Highlight, Research Published:

Elites wield huge influence over deepening polarisation –– now we can tell exactly how much

Just a handful of influential voices may be enough to drive dramatic societal rifts, according to new research from Aalto University. The study gives unprecedented insight into the social media mechanics of the partisan divide.
wind turbine
Aalto University, Department of Information and Communications Engineering, Highlight, Research Published:

New research: Reliable electricity can no longer be taken for granted – the green transition may require fossil fuel as backup

Although Finland's electricity system has been exceptionally reliable, this may not necessarily be the case in the future. A recent study by Aalto University warns that without further investment in flexible production and demand management, the security of the electricity supply could deteriorate significantly as early as the 2030s.