publications
2024
- SIGKDDAnalyzing and explaining privacy risks on time series data: ongoing work and challengesJul 2024
Currently, privacy risks assessment is mainly performed as audits conducted by data privacy analysts. In the TAILOR project, we promote a more systematic and automatic approach based on interpretable metrics and formal methods to evaluate privacy risks and to control the tension between data privacy and utility. In this paper, we focus on privacy risks raised by publishing time series datasets, and we survey the methods developed in TAILOR to analyze and quantify privacy risks depending on different publisher and attacker models.
2023
- Univ. RennesPrivacy Risk Analysis of Large-scale Temporal Data : Application to Electricity Consumption DataVoyez, AntoninJul 2023
The leading French electricity distribution manager, Enedis, legally must collect and publish electricity consumption time series. Series from households and companies are highly privacy sensitive. Therefore, the publication is anonymized using threshold aggregates. This work studies the vulnerability of open-sourcing electricity consumption time series. Our first contribution performs a large-scale statistical study of French electricity measurements. In particular, we perform a unique study showing un-anonymized series’ high vulnerability against identification attacks. Our second contribution is a membership inference attack that finds every series forming an aggregate. This attack is based on a variant of the subset-sum problem. Our third contribution is a membership inference attack modelized as a time series classification problem. This attack requires little prior knowledge and can find a specific target in an aggregate. We perform in-depth experiments on the attacks. The results offer insight into the choice of relevant threshold. Finally, we propose a metric estimate the potential vulnerability of individual series.
2022
- arxivUnique in the Smart Grid -The Privacy Cost of Fine-Grained Electrical Consumption DataNov 2022
The collection of electrical consumption time series through smart meters grows with ambitious nationwide smart grid programs. This data is both highly sensitive and highly valuable: strong laws about personal data protect it while laws about open data aim at making it public after a privacy-preserving data publishing process. In this work, we study the uniqueness of large scale real-life fine-grained electrical consumption time-series and show its link to privacy threats. Our results show a worryingly high uniqueness rate in such datasets. In particular, we show that knowing 5 consecutive electric measures allows to re-identify on average more than 90% of households in our 2.5M half-hourly electric time series dataset. Moreover, uniqueness remains high even when data is severely degraded. For example, when data is rounded to the nearest 100 watts, knowing 7 consecutive electric measures allows to re-identify on average more than 40% of the households (same dataset). We also study the relationship between uniqueness and entropy, uniqueness and electric consumption, and electric consumption and temperatures, showing their strong correlation.
- SECRYPTMembership Inference Attacks on Aggregated Time Series with Linear ProgrammingIn 19th International Conference on Security and Cryptography Jul 2022
Aggregating data is a widely used technique to protect privacy. Membership inference attacks on aggregated data aim to infer whether a specific target belongs to a given aggregate. We propose to study how aggregated time series data can be susceptible to simple membership inference privacy attacks in the presence of adversarial background knowledge. We design a linear programming attack that strongly benefits from the number of data points published in the series and show on multiple public datasets how vulnerable the published data can be if the size of the aggregated data is not carefully balanced with the published time series length. We perform an extensive experimental evaluation of the attack on multiple publicly available datasets. We show the vulnerability of aggregates made of thousands of time series when the aggregate length is not carefully balanced with the published length of the time series.
2021
- BDAAttaque par inférence d’appartenance sur des séries temporelles agrégées en utilisant la programmation par contraintesIn Oct 2021
L’agrégation est largement utilisée comme méthode de protection de la vie privée. Les attaques par inférence ,d’appartenance sur agrégat ont pour but de déterminer si une cible donnée a participé ou non au calcul de l’agrégat attaqué. Dans cet article, nous étudions la vulnérabilé de séries temporelles agrégées - où chaque point est un agrégat horodaté - face à des attaques par inférence d’appartenance. L’attaquant que nous considérons dispose de connaissances auxiliaires sur un sur-ensemble des données agrégées (e.g., issu d’une fuite de données). Nous proposons une nouvelle attaque tirant parti de ce type de connaissances auxiliaires et des multiples points formant la série temporelle agrégat. Notre attaque est modélisée comme un problème d’optimisation linéraire en nombres entiers, permettant à l’attaquant de bénéficier de la puissance des solveurs dédiés (e.g., Gurobi). Cette attaque, testée sur des jeux de données publics, montre la vulnérabilité d’une publication de série temporelle agrégat si le nombre de séries agrégées est trop faible face au nombre de points constituant la série.
2020
- EDBTTask-Tuning in Privacy-Preserving Crowdsourcing PlatformsDuguépéroux, Joris, Voyez, Antonin, and Allard, TristanIn Mar 2020
Specialized worker profiles of crowdsourcing platforms may contain a large amount of identifying and possibly sensitive personal information (e.g., personal preferences, skills, available slots, available devices) raising strong privacy concerns. This led to the design of privacy-preserving crowdsourcing platforms, that aim at enabling efficient crowdsourcing processes while providing strong privacy guarantees even when the platform is not fully trusted. We propose a demonstration of the PKD algorithm, a privacy-preserving space partitioning algorithm dedicated to enabling secondary usages of worker profiles within privacy-preserving crowdsourcing platforms by combining differentially private perturbation with additively-homomorphic encryption. The demonstration scenario showcases the PKD algorithm by illustrating its use for enabling requesters tune their tasks according to the actual distribution of worker profiles while providing sound privacy guarantees.