Posts for: #cybersec

Becoming one in a million by giving up your data

Becoming one in a million by giving up your data
Oh sweet, another lecture on why you should be careful with your personal data. And while you’d be right in saying that, the field of record linkage provides unique tools to grasp just how little you need to disclose about yourself to be at risk of being uniquely identifiable. I will prove this in the final instalment of my series of posts on my master thesis.

You show me your bits, I show you who you are

You show me your bits, I show you who you are
Bloom filters irreversibly obscure the data that is inserted into them … right? Yes! But also no. In the field of record linkage, Bloom filters have the unfortunate downside of leaking information to an attacker who aims to reconstruct personal data from bit patterns. This post describes some attacks on Bloom filters in record linkage, as well as possible mitigation strategies.

Find duplicates in your datasets with this one weird data structure

Find duplicates in your datasets with this one weird data structure
Identifying similar data records sounds like something that should be doable with a single database query. As with many things in life, it’s not as easy as it seems. This is the start of a series covering technical details about my master thesis topic that didn’t make the cut. In this post, I’ll lay down some of the fundamentals to understand what’s coming up.