Go to main content Go to search
Page updated 29.12.2023

PHASE IV AI – Privacy compliant health data as a service for AI development

Data driven tools (esp. AI) usually need a large amount of data for achieving relevance and accuracy. Therefore, development of such tools for healthcare solutions requires broad applicability of validated data sets of appropriate size depending on large populations. However, the application of data for healthcare today is limited due to the sensitive nature of health data and potential high privacy impact. 
Data-driven approaches may not work for diseases with a small local population. In this case, de-identification often is not strong enough or today’s de-identifcation strategies do not lead to the required accuracy in detection of prediction.  Thus , such kind of diseases cannot be tackled by AI based technology due to limitations of existing anonymisation or synthetization techniques. Synthetic data and federated learning assist in this because they facilitate data sharing without compromising data security.
The usefulness of synthetic data is not obvious. Usefulness usually is unclear before tackling the real problem and validating the synthetic data in comparison with underlying real world data in specific cases. The criteria, how synthetic data shall be generated in a generally useful way currently are subject to research. Type of data, data generation mechanism need to be analysed for a more generalized approach. Moreover, quality evaluation and validation tools are lacking as well a quality metrics measuring utility & privacy criteria. 


  • Improved Technologies for (federated) anonymisation, synthetization of health data with strong de-identification properties
  • Enable AI developers access to larger pools of data for federated learning by easy to use and configurable data services
  • Establish a Data Market – facilitating data sharing and monetization incl. incentives based providing data to the services
  • Integrate the data market and the data service ecosystem as a X-European health data hub in the European Health Data Space.

Turku UAS will investigate local and global weight aggregation models tailored with differential privacy techniques to develop a scalable and multi-layer secure Federated Learning systems for privacy preservation in health data setting. We will also investigate methods to assess generated synthetic image data from clinical and privacy points of view. We will consider statistical and mathematical techniques to evaluate synthetic data generated by AI models privacy guarantees for the FL and DP models.

Phase ja Horizon_logot.png