Personal Blog

View the blog index.

RSS feed Click the icon for the blog RSS feed.


5 most recent items

12 Sep 2021 : Changing password approaches is really hard #
Last month I spent a couple of weeks with Joanna in a cottage next to a lake in the Finnish countryside. It was a time for reflection and an opportunity to re-evaluate my life choices. A chance happening raised the topics of passwords and phishing. It's a subject I used to be well-acquainted with, but which my work has drifted away from more recently.

Everyone needs a way to manage their passwords and ideally everyone should have a good way to manage their passwords. About a decade ago I started using PwdHash as my method. It has several advantages that are similar to those offered by the more-familiar database-backed password managers. For example it ensures you use a different password for every website; it guards against phishing attacks; it avoids the need to remember anything other than a master password; and it works across all devices (desktop, phone, browser, etc.). Because it generates passwords on-the-fly, it also has the benefit of not needing to store a database of passwords, neither in the cloud nor locally. Pretty neat. This last point makes it particularly attractive to me, since I'm generally uncomfortable relying on cloud services I don't host myself.

It has downsides too though. The main practical downside is that — because they're deterministically generated — the passwords can't be amended. This causes issues for systems that require regular password changes (thankfully less of an issue now than it was five years ago), or if you have to change your password on a specific site for some other reason (e.g. the site's passwords are compromised by an attacker). There are ways to work around this, but they're pretty awkward and user-unfriendly.

A further downside is that the password generation can be reversed, meaning that compromising a password for a single account could lead to compromise of the master password used to generate all of the individual site passwords, and hence to all of your accounts. This was such a major concern that my Cambridge colleague Graham Rymer and I investigated it back in 2016 and showed it to be a very real threat. We even managed to extract 79 PwdHash master passwords from three well-known compromised and publicly-leaked password databases (Stratfor, Rootkit and LinkedIn). In the paper we published on it we discussed ways to mitigate this threat, presenting our own improved alternative scheme. Other than technical improvements, the most crucial countermeasure we suggested was to use a really strong master password. This may seem obvious, but apparently it wasn't for many users of PwdHash up to that point.

Another significant problem with PwdHash is that changing to a different scheme at a later date is a tremendously painful experience. And that's exactly what I'm experiencing now.

The work Graham and I did convinced me I needed to change my approach. And what better approach than the one we recommended?

Yet it's been five years and I'm only just now starting to make the switch, which maybe tells you something about the effort involved. In theory I should have been able to switch over gradually, one site at a time, just updating my passwords on my next visit to each site. In practice the infrastructure wasn't there to allow me to do this easily enough. What I needed was a website and an app that would allow me to generate my old password, then seamlessly generate the new password without having to type in a new URL, open a new app or whatever. Basically, for something I'm going to have to perform hundreds of times, the process has to be as effortless as possible.

So this is what I've spent the last few weeks arranging. First off was the website. There are already sites for the original Stanford PwdHash and for our updated variant. But the thought of having to switch from one to the other across hundreds of sites was just too much. So I combined them into a single site that allows switching between the two with a single click. It's a really small, simple thing. but it's enough to make the difference between inertia holding me back and momentum pushing me forwards. I also made it easier to get the generated password onto the clipboard, allowing my web workflow to be fully optimised.

That deals with the web but what about apps on my phone? I've been using a PwdHash app on my phone for many years, written by Robert Gerlach. It's a really simple app, but all the more effective for it. The app only supported the original Stanford variant, so I've just spent my weekend updating it to support the new algorithm as well. Not only have I added support for the new algorithm, but following the advice from our paper of five years ago, I also added a password strength meter using the zxcvbn algorithm. Plus I also made a few other improvements to better suit my usual workflow.

So now the fun of changing all my passwords can finally begin. So far it's been more preparation than progress, but I have managed to convert the passwords for three sites, so there's no going back now. The whole experience has reaffirmed my empathy for everyone struggling with password management. There are plenty of good solutions out there for managing passwords of course, but frankly the fact there are so many options just adds to the complexity of making a good choice. Especially when it's the sort of decision you really want to get right first time. Or in my case, hopefully, second time.
29 Aug 2021 : New and improved waste data graphs #
I've just hit two full years of waste output data, which has given me a nice idea about how much waste I generate on a daily basis. Since I started back in August 2019 I've been updating a graph showing the results on my waste page. It's provided quite a fascinating picture. Not only has my waste output gone down over time, but it's also become more consistent.

I attribute this improvement squarely to the act of measuring my data each fortnight. The process has made me far more aware, not just about how much waste I produce, but also the sorts of products that generate more or less waste.

For example, glass is really heavy and it became clear quite early on that it was contributing significantly to the weight of waste I was producing. This motivated me to look into it more deeply, which ultimately resulted in me almost completely eradicating glass from my daily usage.

As a result of this and other changes, my daily usage has gone down from 322.80 g/day in 2019 to 154.98 g/day in 2020, and now in 2021 I'm currently averaging 123.34 g/day. Admittedly my average this year is likely to increase during the winter (and Christmas especially) but my aim is to keep it at least as low as my 2020 average.

One of the downsides to accumulating all this data is that the graphs I've been posting here have become increasingly hard to read. Placing all of the data onto a single graph has become unsustainable, so over the last week I've been updating my graph-generating scripts to make them more flexible. As a result, I'm now going to only show data for the current year on the main waste page. The data for previous years can still be viewed on the pages for 2019 and 2020, and I'll add new pages as the years tick forwards.

I've also created a new page showing the complete data set. These "all-data" graphs are plotted wider now, and while this makes it easier to read the individual entries, it also makes them impractically long and thin. The "fixed in time" preview below already gives an idea of the problem, but the graphs will only get wider, and the issue more accuate, over time. So they're really only going to be of interest for the masochistic.
Daily waste data histocurve snapshot 29/08/2021

While the full-data graph is interesting by virtue of its absurdity, splitting the graph up into annual chunks turns out to be the more interesting case. In particular, because I take readings when I take out the rubbish, these rarely actually fall on the first or last day of the year. So, how to split the readings across the year boundaries?

The solution I've came up with is to scale the readings at each end of the year in proportion to how much of the period falls into the year in question. For example, here are the actual readings I took over the 2020-2021 year boundary.

Date Paper Card Glass Metal Returnables Composts Plastic General
12/12/2020 57 515 0 0 0 449 107 322
14/3/2021 641 225 0 0 93 443 88 473

This covers an unusually long period of time because I was stuck in the UK for January, February and most of March due to Covid travel restrictions. But this is also convenient for making a more exaggerated example. So the period between 12th December and 14th March contains a total of 92 days. That splits into the two periods "12th December - 31st December" and "1st January - 14th March", which contain 20 and 72 days respectively. The proportion of time for each of these periods is therefore 20 / 92 = 21.74% that falls into 2020 and 72 / 92 = 78.26% that falls into 2021.

To manage the data split across the year, we therefore have to scale it appropriately. Each entry represents the end of a period, so the 12th December data falls entirely within 2020. The 14th March data represents the period that's split across both years. We can therefore scale this entry and turn it into two separate entries like this, scaling each of the data points based on the proportions calculated above.
Date Paper Card Glass Metal Returnables Composts Plastic General
31/12/2020 139.35 48.91 0 0 20.22 96.30 19.13 102.83
14/3/2021 501.65 176.09 0 0 72.78 346.70 68.87 370.17

To get the correct picture this has to be done at both ends of the year being plotted.

Managing the data this way makes some obvious assumptions which may not necessarily be true (it assumes I generate waste uniformly across the time period, which is obviously not the case). However it has several nice properties. The annual histograms get drawn in a way that broadly speaking matches up across the year boundary; and the annual averages also match up correctly. At least, it seems to me to be the most honest way to tackle the issue when apportioning the data across year boundaries.

Check back to my waste page over time to see how I'm getting on with keping my waste output down (or not), and whether I'm able to hit my 2021 target.
12 Jun 2021 : A BIOS reset caused my hard drive to disappear #
First of all a little context. My laptop is a Dell XPS 13 9300 (2020 edition), running Ubuntu 20.04 LTS and BIOS version 1.5.0 and with a 1Tb SSD hard drive. I use an encrypted LVM root partition with no Windows partition. I mention all of this in case someone else finds themselves in the same situation as me; maybe they'll find my story helpful.

Just before briefly heading outside yesterday evening (something I've not been doing very often recently) I suspended my laptop. It's something I've been doing every day for the last year without a hitch. When I got back later in the evening I found my laptop couldn't be awoken. Usually just hitting a random key on the keyboard is enough to pull it from its slumbers, but I ended up having to do a 15-second long-press on the power button.

What followed has been a very traumatic 16-hours-worth of frustration and worry. Rather than rebooting normally the laptop immediately dropped into the BIOS "SupportAssist Pre-Boot System Performance Check". It's never done this before so it felt like a bad sign. I left it to run through its checks, taking about 30 minutes to complete. All tests passed. "Phew", I thought.

I rebooted, and it dropped into GRUB. It usually skips GRUB so this was also a concerning sign. Eventually it halted on the Ubuntu pre-boot logo with the error "cryptsetup: Waiting for encrypted source device UUID=574e19ff-e89e-112b-ba3e-76e1-929f4473d12c...". I love the fact my drive is encrypted for security, I hate the worry that comes with knowing that corrupting certain parts can leave all my data unrecoverable.

Hitting a key dropped me to a busybox initram prompt with the following error:
Gave up waiting for root device. Common problems:
  - Boot args (cat /proc/cmdline)
    - Check rootdelay= (did the system wait long enough?)
    - Check root= (did the system wait for the right device?)
  - Missing modules (cat /proc/modules; ls /dev)
ALERT! /dev/mapper/ubuntu--vg-root does not exist. Dropping to a shell! 

I scoured the Web last night and into the early hours looking for a solution, and then again for several hours this morning. There's plenty of good advice out there about how to recover from failures involving a nonexistent /dev/mapper/ubuntu--vgroot. In fact I used to get this periodically on another laptop I use, fixed by running fsck from the inetramfs terminal. But this was different. According to the output from lsblk it had literally no hard drive present.

Checking the BIOS, things looked okay, the hard drive was still present and there were no devastating failures shown in the logs. The last firmware update was a couple of months ago. The only odd thing was the following line:
02/03/2021 00:00:12 Time-of-day not set - please run SETUP program.
BIOS System Logs

Well, the current date according to the BIOS was a somewhat inexplicable 02/03/2021 (US-style date, mm/dd/yyyy), even though the date today is 12/6/2021 (European-style dd/mm/yyyy). So this log entry seemed surely to be related to the strange drop into the system check last night. For some reason, something had reset my BIOS settings. Nevertheless I couldn't see anything particularly out of order in the BIOS settings apart from the odd date and time. Neither the BIOS nor the boot shell were offering a way to access the drive any better, so I decided to try gparted from an Ubuntu Live CD.

Luckily I have a second laptop so was able to burn an Ubuntu ISO to USB drive and boot up into a live Ubuntu session. Firing up gparted also showed a complete lack of hard drives present. This didn't make me feel any better. The system check had tested the drive, and it showed up in the BIOS, so I knew it must still be there somewhere, with or without my data.

I tried contacting Dell support. Even though the legal warranty on the device is for two years, my one year support warranty ran out just 36 days ago, so my options were to pay another £109, or wait until Monday when the non-paid support lines opened. I was fairly certain that on Monday they'd tell my I had to stump up for a support contract in order to get help. And that they'd then tell me I had to send back my laptop for investigation and repairs. I really didn't want to lose my laptop for three months on a roundtrip from Finland to the UK and back.

So I checked my server, noted that my last backup was only the day before, and resigned myself to reinstalling everything. Even this felt like a lost cause, given I had no drive to install to. But in a triumph of hope over experience I decided to try it anyway. Inside the live Ubuntu session I clicked on the button to run the install, only after a few steps to be presented with this:
Turn off RST

That looks bad, but it was actually the hint I needed. I dropped back to the BIOS, switched from Intel Rapid Storage Technology (RST) to Advanced Host Controller Interface (AHCI) and rebooted, expecting everything to get wiped in the process.

But lo, it booted fine, found the encrypted volume and requested my passphrase. After offering it up the boot continued and I'm now back in my laptop, no need to re-install everything, and more relieved than I've felt in a long time.

I've grown to rely very heavily on my laptop over the last year due to the lockdown preventing me from travelling. The prospect of losing access to it for potentially months was pretty soul crushing. So I'm hugely relieved.

Trying to make sense of what happened, it seems the BIOS just decided to reset some of its settings for no apparent reason. So as a note to future me, or for anyone else who may find themselves in a similar situation: it can happen, and the solution for me was to disable RST, and re-enable AHCI in the BIOS. When making the switch the BIOS presents an ominous warning that making this switch could prevent the device from booting or even result in wiping the contents. Thankfully that didn't happen to me, but be aware that if you try the same, it may happen to you.
18 Apr 2021 : This site won't be contributing to Google FLoC's profiling #

What the FLoC?

There's been a lot of noise in the technology press recently about Google's FloC proposal. On one side we heard from Google that they would be turning off third party cookie support in Chrome. This would improve privacy by preventing advertisers from tracking individual users as they traverse different sites across the Web. Google, which gains the majority of its income from advertising (81% of revenue in 2020), were introducing what they refer to as the Privacy Sandbox to Google Chrome to counteract the impact of no longer being able to track users, and allow behavioural advertising to continue without infringing their users' right to privacy. The Privacy Sandbox is more of an initiative than a technology. The technology that will replace tracking using third-party cookies and user fingerprinting has yet to be fully decided, but one of the options with the most momentum, given that it's been developed by Google itself, is called FLoC (Federated Learning of Cohorts). This paper released by Google provides a good summary of the way it's supposed to work.
The user found themselves trapped in a swirling mass of ominous-looking icons
In short, FLoC shifts the process of assigning behavioural labels from the advertising companies' servers to the client browser. This requires some clever algorithms, because an individual browser doesn't have access to data from other users. It essentially reverses the process: rather than collecting data about all users into one place (the advertisers' servers) so that it can be categorised into groups that are labelled based on behaviour, it instead sends the labels to the browser, where the browser then does the work of determining which labels apply based on the sites the user visits, allowing the browser to pick the most appropriate group. The user's group is then sent to the advertisers so that they can target their ads more effectively, while the user's complete viewing history never has to leave their browser. One notable facet of this approach is that FLoC is able to derive its labelling from all the sites a user visits, not just those serving third-party cookies as is the case now.

You can see how this can be presented as a privacy-win for users, if their browsing history no longer needs to be collected by random companies.

On the other side of the argument is the EFF, claiming that FLoC is a terrible idea. They cite a number of reasons for this. First, they claim that passing the group identifier of a user to a third party is actually just offering a new way for those third parties to fingerprint users, especially if the group doesn't contain many users. Second, they say that even though FLoC may prevent individual tracking, it nevertheless exposes information about a user to third parties. This isn't a big reveal, given that this is the stated purpose of FLoC, but it does bring into question the real privacy benefits that the approach is supposed to bring. Third, they highlight the fact that FLoC fails to mitigate the more tangible negative implications of targetted advertising and profiling in general, such as bias that leads to inequality and persecution. Since these are consequences of behavioural profiling, rather than the means of achieving it, solutions such as FLoC will always find it hard — if not impossible — to avoid them.

I'm a privacy-rights fundamentalist, but I also believe that in many cases privacy violations are caused by overreach rather than any fundamental need. In fact, in the majority of cases where privacy violations occur, my biggest frustration is that privacy-invasive techniques are chosen in favour of privacy-preserving ones, even though the stated aims are achieved by both. This is the clearest indicator of bad-intent that I know of, and is also a trap many organisations fall into.

A good recent example of where technology was used to achieve an important end without overly impacting user privacy was with the GAEN Contact Tracing protocol, which I wrote in favour of at the time. A good recent example of where a privacy-invasive technology was chosen unnecessarily, which I took to be an indication of bad-faith, was with the UK's Contact Tracing protocol, which failed to make use of the privacy-preserving protocols that were readily available.

So, to be clear, I'd be in favour of FLoC if it was a neat technological solution that addressed privacy-concerns while still allowing targetted advertising to work.

Sadly, this isn't one of those cases, and my view falls squarely in line with the EFF's. Allowing third parties access to behavioural labels based on the sites a user has visited is a horrible intrusion of privacy, even if it's not sending the individual URLs that have been visited. I do not want data about my behaviour collected or shared with anyone; at least not without my explicit consent.

Google is trying to walk a fine line here, following the trend set by other browsers in blocking third-party tracking cookies, while not wanting to dent its advertising revenue or that of its partners. But to me the privacy-preserving rhetoric looks more like a gimmick than a reality: an attempt to placate those asking for better privacy, while hiding the real consequences of FLoC in amongst the technical detail.

The practicalities

If you're a Google Chrome or Chromium user, then you need to consider the implications of FLoC. If you're using some other browser (e.g. Firefox, Brave, Safari, etc.) you're almost certainly clear of it for the time being. The main non-Chrome browsers have been blocking third-party tracking cookies by default for some time already. The chance of them introducing FLoC in the near future seems small, and in fact right now there's very little incentive for them to do so. Unlike Google, they aren't reliant to the same extent on advertising for their revenue streams.

However, this might change. I was trying to think of scenarios in which other browser manufacturers were somehow incentivised to introduce the Privacy Sandbox. The obvious one is that websites start to demand it in an attempt to push up their own revenue from adverts. If users of Firefox start to see a lot of "Chrome-only" websites blocking any user that doesn't have an active Privacy-Sandbox, then users will also start to demand it as a feature, at which point we could see other browser developers introducing it.

So, right now, as long as you're not really tied in to Chrome and Android, you still have some choice over this as a user. But if you were paying close attention during the preceding paragraphs, you'll have noticed that it's not just users who have to be concerned about this. Webmasters also have a role here, because unlike tracking cookies which are only served by sites which invite them in (e.g. by including advertising or Google Analytics on the page), FLoC will track all of the webpages a user visits, independent of whether the website requests it or not.

In practice I see so many sites using Google Analytics, even sites that are championing online privacy, that it's reasonable to assume Google is already able to track you with this level of granularity already, and that most webmasters are okay with this.

So, it's hard to frame this as being bad for website owners. After all, if a user wants to have the sites they visit tracked, then that's not really a decision for the site or not. Nevertheless it introduces a new dynamic that webmasters should be aware of, especially for sites that contain sensitive material (i.e. material that they think their users won't want tracked) and where their users may not be aware of the privacy implications of browsing the site with the Privacy Sandbox enabled.

My homepage certainly doesn't fall into that category of sites, but I've nevertheless worked hard to make the site respect my visitors' privacy. As someone who considers privacy to be a human right, respecting the privacy of my visitors is a minimal bar that I think every site should aspire to reach.

As a consequence I'm also requesting that FLoC not include visits to this site as an input to its profiling algorithm. Any webmaster can do this by adding the following line to their site's headers.

Permissions-Policy: interest-cohort=()

The fact that this is opt-out for sites, rather than opt-in, is frustrating, but also not at all surprising. FLoC is just one of the many proposals for how to use Google's Privacy Sandbox for tracking users. This header is FLoC-specific, and so I certainly hope I don't have to introduce new headers for every random technology that every random advertising company decices to try to deploy. But if that's the direction things go, then I will.
Permissions Policy header sent by the site
13 Mar 2021 : Male Violence against Women #
Today I watched this video of Chris Hemmings. It's inspiring and I can't find a word in it to disagree with.

After watching it I was really surprised to see the reaction of many men disagreeing, stating that it wasn't their fault (they're not predators) and that the focus should be put on “bad individuals” rather than “men as a group”.
Chris Hemmings on BBC News

The fact is, as a man I've benefited from living in a patriarchal society. Men have (predominantly, if not exclusively) made the laws, shaped the culture, chosen the path that society has taken. Individual men have reaped the greatest of the rewards from this by become rich and powerful. Society — the society I live in — has been like this for centuries if not longer.

The idea that I, as a man, haven't benefited from sexism, that I've not benefited from an entrenched prejudice against women and towards men, is frankly absurd. The fact is that I benefit from it everyday, by being paid a bit more, from being taken more seriously, by being listened to more intently, from not having to do things I would otherwise have to do. By having more freedom to make my own choices. I couldn't pick out a specific instance where I've benefited from this system. I don't know of any time I got a job instead of a more qualified female candidate. I never fought for a pay rise that would otherwise have been given to a woman. I never knowingly took the men in my computer classes more seriously than the women. In fact, I have always gone out of my way to support women in technology (I hope in a practical, rather than patronising way).

So I couldn't give you a specific instance of where I've benefited from society being patriarchal. But that doesn't mean it isn't the case. It just seems so obvious to me that I benefit in unquantifiable ways on a daily basis. How could it not be so? How could it be that male preference is woven so tightly into the history of society, but that I haven't personally benefited as a man? Of course I have.

It's true that it may not feel like this for many men. Many men feel they are treated unfairly by the world, that they've earned what they have and deserve more than they're given. The fact is, I've never met anyone who didn't believe this. Everyone feels they're working hard for little reward. If you're a man, I hate to break it to you, but in the majority of cases this just isn't true: if it wasn't for living in a male-dominated society you would simply be getting less than you do now. It's true, of course, that there are other prejudices and that many people suffer from them. And you don't have to be a minority to be the victim of prejudice. But if you're a victim of some other form of bigotry, that doesn't mean you don't also benefit from being male. The fact is that cumulatively, sexism and prejudice against women is probably the most significant injustice that exists in society. As a man, it's almost impossible to have avoided benefiting from sexism.

So now, when women are asking to be treated with respect, and where the overwhelming cases are of men treating women badly. now is the time when some men want to disown the patriarchy and deny responsibility for the society that we helped create, and which we have benefited from.

Personally I don't buy it. I think we, as men, are very much to blame here. Not necessarily to blame for specific instances of violence or bigotry perpetrated against women, but certainly to blame for a society and culture that allows it to happen. As men, we've all taken the benefits, so now would be a good time to accept the responsibility.

I want to make clear that I accept my share of the responsibility. I'm not exactly sure what accepting it entails, but probably it means in the future having to accept more compromises in terms of my personal income, personal comfort, and how much my voice is heard, in order for women to gain a rightfully more dominant position in society.

The fact is I've met a lot of very competent women and the world would not be worse off if their voices had the same weight as their male counterparts. The men fighting against this will, I hope, be on the wrong side of history.