Posts for: #pprl

Becoming one in a million by giving up your data

Becoming one in a million by giving up your data
Oh sweet, another lecture on why you should be careful with your personal data. And while you’d be right in saying that, the field of record linkage provides unique tools to grasp just how little you need to disclose about yourself to be at risk of being uniquely identifiable. I will prove this in the final instalment of my series of posts on my master thesis.

How to count bits at the speed of light

How to count bits at the speed of light
How can one count the amount of set bits in a stream of bits? And more importantly, how can one do it efficiently? This is one of the many rabbit holes I found myself in while working on my master thesis, although it hardly overlaps with my actual thesis topic. But that’s the point of this series! So allow me to indulge in one of the cooler software engineering topics that I came across.

You show me your bits, I show you who you are

You show me your bits, I show you who you are
Bloom filters irreversibly obscure the data that is inserted into them … right? Yes! But also no. In the field of record linkage, Bloom filters have the unfortunate downside of leaking information to an attacker who aims to reconstruct personal data from bit patterns. This post describes some attacks on Bloom filters in record linkage, as well as possible mitigation strategies.

Find duplicates in your datasets with this one weird data structure

Find duplicates in your datasets with this one weird data structure
Identifying similar data records sounds like something that should be doable with a single database query. As with many things in life, it’s not as easy as it seems. This is the start of a series covering technical details about my master thesis topic that didn’t make the cut. In this post, I’ll lay down some of the fundamentals to understand what’s coming up.