Oh sweet, another lecture on why you should be careful with your personal data. And while you’d be right in saying that, the field of record linkage provides unique tools to grasp just how little you need to disclose about yourself to be at risk of being uniquely identifiable. I will prove this in the final instalment of my series of posts on my master thesis.
On paper, I’m a research assistant. But in reality, I’m more of a level 2 tech support and system administrator. Between reading and writing research papers, work never gets dull with new issues popping up left and right. But recently, I had to deal with a problem that far exceeded all the weird edge cases I had encountered thus far.
How can one count the amount of set bits in a stream of bits? And more importantly, how can one do it efficiently? This is one of the many rabbit holes I found myself in while working on my master thesis, although it hardly overlaps with my actual thesis topic. But that’s the point of this series! So allow me to indulge in one of the cooler software engineering topics that I came across.
Bloom filters irreversibly obscure the data that is inserted into them … right? Yes! But also no. In the field of record linkage, Bloom filters have the unfortunate downside of leaking information to an attacker who aims to reconstruct personal data from bit patterns. This post describes some attacks on Bloom filters in record linkage, as well as possible mitigation strategies.
Identifying similar data records sounds like something that should be doable with a single database query. As with many things in life, it’s not as easy as it seems. This is the start of a series covering technical details about my master thesis topic that didn’t make the cut. In this post, I’ll lay down some of the fundamentals to understand what’s coming up.