Gecko-dev Diary
Between August 2023 and September 2024 I upgrading the Sailfish OS browser from Gecko version ESR 78 to ESR 91, writing a daily blog as I went along. This page catalogues my progress, alongside the other browser-related topics I've looked in to since.
Latest code changes are in the gecko-dev sailfishos-esr91 branch.
There is an index of all posts in case you want to jump to a particular day.
Gecko
5 most recent items
26 Nov 2024 : A Brief Embedded Browser Expo #
The question of whether Gecko is the most appropriate browser for use on Sailfish OS is a perennial one. Back in the days of Maemo, with a user interface built using Gtk, Gecko may have seemed like a natural choice. But with the shift to Qt on the N9 and with Sailfish OS sticking to Qt ever since, it's natural to ask whether something like WebKit might not be a better fit.
Indeed, for many years WebKit was also an integral part of Sailfish OS, providing the embeddable QtWebKit widget that many other apps used as a way of rendering Web content. It wasn't until Sailfish OS 4.2.0 that this was officially replaced by the Gecko-based WebView API.
The coexistence of multiple engines within the operating system isn't the only reason many people felt WebKit would make a better alternative to Gecko. Another is the fact that WebKit, and subsequently Blink, has become the defacto standard for embedded applications. In contrast, although Mozilla were pushing embedded support back when Maemo was being developed, it's since dropped official embedded support entirely.
So in this post I'm going to take a look at embedded browsers. What does it mean for a browser to be embedded, what APIs are supported by the most widely used embedded toolkits, and might it be true that Sailfish OS would be better off using Blink? In fact, I'll be leaving this last question for a future post, but my hope is that the discussion in this post will serve as useful groundwork.
Let's start by figuring out what an embedded browser actually is. In my mind there are two main definitions, each embracing a slightly different set of characteristics.
If you frame it right though, these two definitions can feel similar. Here's how the Web Platform for Embedded team describe it:
So minimal, encapsulated, targeted. Maybe something you don't even realise is a browser.
And what does this mean in practice? That might be a little hard to pin down, but for me it's all about the API. What API does the browser expose for use by other applications and systems? If it provides bare-bones render output, but with enough hooks to build a complete browser on top of (at a minimum) then you've got yourself an effective embedded browser.
In the past Gecko provided exactly this in the form of the EmbedLIte API and the XULRunner Development Kit. The former provides a set of APIs that allow Gecko to be embedded in other applications. The latter allows the Gecko build process to be harnessed in order to output all of the libraries and artefacts needed (such as the libxul.so library and the omni.ja resource archive) to integrate Gecko into another application.
Sadly Mozilla dropped support for both of these back in 2016, when it was decided the core Firefox browser needed to be prioritised over an embedding offering. Mozilla has made plenty of questionable decisions over the years and given the rise in use of WebKit and Chrome as embedded browsers, you might think this was one of them. But despite the lack of investment in the API, it's not been removed entirely, to Mozilla's credit. It is, in fact, still possible to access the EmbedLite APIs and to generate the XULRunner artefacts and get a very effective embedded browser.
We'll come back to the EmbedLite approach to embedding later. But in order to understand it better, I believe it's also helpful to understand the context. I therefore plan to look at three different embedded browser frameworks. These are CEF (the Chromium Embedded Framework), Qt WebEngine and then finally we'll return to Gecko by considering the Gecko WebView.
Looking through the documentation I was surprised at how similar these three frameworks appear to be. But trying them out I was quickly divested of this misapprehension. They do offer similar functionality, but turn out to be quite different to use in practice.
Before we get in to the API details, let's first consider what a minimal embedded browser external interface might look like.
Let's now turn to the three individual embedding frameworks to see how they approach all this.
And that's exactly how CEF brings value. It takes the Chromium internals (Blink, V8, etc.) and wraps them with the basic windowing scaffolding needed to get a browser working across multiple platforms (Linux, Windows, macOS). It then adds in a streamlined interface for controlling the most important features needed for embedding (settings, browser controls, JavaScript injection, message passing).
Having worked through the documentation and tutorials, the CEF project assumes a slightly different workflow from what I'd typically expect. Qt WebEngine and Gecko WebView are both provided as widgets that integrate with the graphical user interface (GUI) toolkit (which in these cases is Qt). On the other hand, CEF is intended for use with multiple different widget toolkits (Gtk, Qt, etc.). As a developer you're supposed to clone the cef-project repository with the CEF example code — which is complex and extensive — and build your application on top of that. As is explained in the documentation, the first thing a developer needs to do is to:
The documentation assumes you're starting from scratch; it's not clear to me how you're supposed to proceed if you want to retrofit CEF into an existing application. It looks like it may not be straightforward.
Nevertheless, assuming you're starting from scratch, CEF provides a solid base to build on, since you'll start with an application that already builds, runs and displays Web content. You can then immediately see how the main classes needed for controlling the browser are used.
There are many such classes, but I've picked out three that I think are especially important for understanding what's going on. As soon as you look into one of these classes you'll find references to other classes. You may need to look into these too if you want to fully understand what's going on; with each such step I found myself getting pulled a little further into the rabbit hole.
First the CefBrowserHost class. This class is pretty key as it handles the lifespan of the browser; it's described in the source as being "used to represent the browser process aspects of a browser". Here's a flavour of what the class looks like. I've cut a lot out for the sake of brevity, but you can check out the class header if you want to see everything.
Here's the interface for the CefBrowser object the lifecycle of which is being managed (you'll find the full class header in the same file):
Once we have this root CefFrame object we can then ask for a particular page to be loaded into the frame using the LoadURL() method:
So now we've seen enough of the API to initialise things and to then load up a particular URL into a particular view. But CefFrame offers us a lot more than just that. In particular it offers up two other pieces of functionality critical for embedded browser use: it allows us to execute code within the frame and it allows us to send messages between the application and the frame.
Why are these two things so critical? In order for the content shown by the browser to feel fully integrated into the application, the application must have a means to interact with it. These two capabilities are precisely what we need to do this.
Understanding CEF requires a lot more than these three classes, but this is a supposed to be a survey, not a tutorial. Still, it would be nice to know what it's like to use these classes in practice. To that end, I've put together a simple example CEF application that makes use of some of this functionality.
The application itself is simple and useless, but designed to capture functionality that might be repurposed for better use in other situations. The app displays a single window containing various widgets built using the native toolkit (in the case of our CEF example, these are Gtk widgets). The browser view is embedded between these widgets to demonstrate that it could potentially be embedded anywhere on the page.
The native widgets allow some limited control over the browser content: a URL bar, forwards and backwards. There's also an "execute JavaScript" button. This is the more interesting functionality. When the user presses this a small piece of JavaScript will be executed in the DOM context of the page being rendered.
Here's the JavaScript to be executed:
There is one peculiar aspect to this code though: having completed the walk and returned from collect_node_stats() the code then converts the results into a JSON string and passes the result into a function called dom_walk(). But this function doesn't exist. Huh?!
We'll come back to this.
The values that are calculated aren't really important, what is important is that we can return these values at the end and display them in the native user interface code. This highlights not only how an application can have its own code executed in the browser context, but also how the browser can communicate back information to the application. With these, we can make our browser and application feel seamlessly integrated, rather than appear as two different apps that happen to be sharing some screen real-estate.
Let's now delve in to some code and consider how our three classes are being used. We'll then move on to how the communication between app and browser is achieved.
To get the CEF example working I followed the advice in the documentation and made a fork of the cef-project repository. I then downloaded the binary install of the cef project, inside which is an example application called cefclient. I made my own copy of this inside my cef-project fork, hooked it into the CMake build files and started making changes to it.
There's a lot of code there which may look a bit overwhelming but bear in mind that the vast majority of this code is boilerplate taken directly from the example. Writing this all from scratch would have been... time consuming.
Most of the changes I did make were to the file. This handles the browser lifecycle as described above using an instance of the CefBrowserHost class.
We can see this in the RootWindowGtk::CreateRootWindow() method which is responsible for setting up the contents of the main application window. In there you'll see lots of calls for creating and arranging Gtk widgets (I love Gtk, but admittedly it can be a tad verbose). Further down in this same method we see the call to CefBrowserHost::CreateBrowser() that brings the browser component to life.
In the case of CEF the browser isn't actually a component. We tell the browser where in our window to render and it goes ahead and renders, so we actually create an empty widget and then update the browser content bounds every time the size or position of this widget changes.
This contrasts with the Qt WebEngine and Gecko WebView approach, where the embedded browser is provided as an actual widget and, consequently, the bounds are updated automatically as the widget updates. Here with CEF we have to do all this ourselves.
It's not hard to do, and it brings extra control for greater flexibility, but it also hints at why so much boilerplate code is needed.
The browser lives on until the app calls CefBrowserHost::CloseBrowser() in the event that the actual Gtk window containing it is deleted.
We already talked about the native controls in the window and the fact that we can enter a URL, as well as being able to navigate forwards and backwards through the browser history. For this functionality we use the CefBrowser object.
We can see this at work in the same file. Did I mention that this file is where most of the action happens? That's because this is the file that handles the Gtk window and all of the interactions with it.
When creating the window we set up a generic RootWindowGtk::NotifyButtonClicked() callback to handle interactions with the native Gtk widgets. Inside this we find some code to get our CefBrowser instance and call one of the navigation functions on it. The choice of which to call depends on the button that was pressed by the user:
The format is similar, but when clicked it extracts the main frame from the browser in the form of a CefFrame instance and calls the CefFrame::ExecuteJavaScript() method on this instead. Like this:
There are similar methods available for the Qt WebEngine and Gecko WebView as well. As we'll see, what makes the CEF version different is that it doesn't block the user interface thread during execution and doesn't return a value. But as we discussed above, we want to return a value, because otherwise how are we going to display the number of nodes, tree height and tree breadth in the user interface?
This is where that mysterious dom_walk() method that I mentioned earlier comes in. We're going to create this method on the C++ side so that when the JavaScript code calls it, it'll execute some C++ code rather than some JavaScript code.
We do this by extending the CefV8Handler class and overriding its CefV8Handler::Execute() method with the following code:
In JavaScript functions are just like any other value, so to get our new function into the DOM context all we need to do is create a CefV8Value object, which is the C++ name for a JavaScript value, and pass it in to the global context for the browser. We do this when the JavaScript context is created like so:
So now we've gone full circle: the user interface thread executes some JavaScript code on the render thread in the view's DOM context. This then calls a C++ method also on the render thread, which sends a message to the user interface thread, which updates the widgets to show the result.
All of the individual steps make sense in their own way, but it is, if I'm honest, a bit convoluted. I can fully understand that message passing is needed between the different threads, but it would have been nice to be able to send the message directly from the JavaScript. Although there are constraints that apply here for security reasons, the Qt WebEngine and Gecko WebView equivalents both abstract these steps away from the developer, which makes life a lot easier.
With all of this hooked up, pressing the execute JavaScript button now has the desired effect.
The CEF project works hard to make Blink accessible as an embedded browser, but there's still plenty of complexity to contend with. Given just the few pieces we've covered here — lifecycle, navigation, JavaScript execution and message passing — you'll likely be able to do the majority of things you might want with an embedded browser. Crucially, you can integrate the browser component seamlessly with the rest of your application.
It's powerful stuff, but it's also true to say that the other approaches I tried out managed to hide this complexity a little better. The main reason for this would seem to be because CEF doesn't target any particular widget toolkit. It can, in theory, be integrated with any toolkit, whether it be on Linux, Windows or macOS.
While that flexibility comes at a cost in terms of complexity, that hasn't stopped CEF becoming popular. It's widely used by both open source and commercial software, including the Steam client and Spotify desktop app.
In the next section we'll look at the Qt WebEngine, which provides an alternative way to embed the Blink rendering engine into your application.
Although both uses Blink, there are other important differences between the two. First, Qt WebEngine is tied to Qt. That means that all of the classes we'll look at bar one will inherit from QObject and the main user interface class will inherit from QWidget (which itself is a descendant of QObject).
While Qt is largely written in C++ and targets C++ applications, we'll also make use of QML for our example code. This will make the presentation easier, but in practice we could achieve exactly the same results using pure C++. We'd just end up with a bit more code.
So, with all that in mind, let's get to it.
The fact that Qt WebEngine exclusively targets Qt applications does make things a little simpler, both for the Qt WebEngine implementation and for our use of it. Consequently we can focus on just two classes. In practice there are many more classes that make up the API, but many of these have quite specific uses (such as interacting with items in the navigation history, or handling HTTPS certificates). All useful stuff for sure, but our aim here is just to give a flavour.
The two classes we're going to look at are QWebEnginePage and QWebEngineView. Here's an abridged version of the former:
According to the Qt documentation, the QWebEnginePage class...
That's reflected in the methods and member variables I've pulled out here. The title, url and load status of the page are all exposed by this class and it also allows us to search the page. The reference to actions in the documentation relates to the triggerAction() method. There are numerous types of WebAction that can be passed in to this. Things like Forward, Back, Reload, Copy, SavePage and so on.
You'll also notice there's a runJavaScript() method. If you've already read through the section on CEF you should have a pretty good idea about how we're planning to make use of this, but we'll talk in more detail about that later.
The other key class is QWebEngineView. This inherits from QWidget, which means we can actually embed this object in our window. It's the class that actually gets added to the user interface. It's therefore also the route through which we can interact with the QWebEnginePage page object that it holds.
There are also convenience slots for navigation (a slot is just a method that can be either called directly, or connected up to one of the signals I mentioned earlier).
And with these few classes we have what we need to create ourselves an example application. I've created something equivalent to our CEF example application described in the previous section, called WebEngineTest; all of the code for it is available on GitHub, but I'm also going to walk us through the most important parts here.
If you looked at the sprawling CEF code, you may be surprised to see how simple the Qt WebEngine equivalent is. The majority of what we need is encapsulated in this short snipped of QML code copied from the Main.qml file.
We'll take a look at the NavBar and InfoBar shortly, but let's first concentrate on the WebEngineView. It has a width set to match the width of the page and a height set to match the page height minus the size of the other widgets. We set the initial page to load and set JavaScript to be enabled, like so:
Then there's the getInfo() method that executes the following:
The method takes the JavaScript script to execute — as a string — for its first parameter and a callback that's called on completion of execution for the second parameter. Internally this is actually doing something very similar to the CEF code we saw above: it passes the code to the V8 JavaScript engine to execute inside the DOM, then waits on a message to return with the results of the call.
In our callback we simply copy the returned data into the infobar.dominfo variable, which is used to populate the widgets along the bottom of the screen.
It all looks very clean and simple. But there is some machinery needed in the background to make it all hang together. First, you may have noticed that for our script we simply pass in a domwalk variable. We set this up in the main.cpp file (which is the entrypoint of our application). There you'll see some code that looks like this:
Next up, let's take a look at the NavBar.qml file. QML automatically creates a widget named NavBar based on the name of the file, which we saw in use above as part of the main page.
Finally for the toolbar, the third button simply calls the getInfo() method that we created as part of the definition of our WebEngineView widget above. We already know what this does, but just to recap, this will execute the domwalk JavaScript inside the DOM context and store the result in the infobar.dominfo variable.
The NavButton component type used here is just a simple wrapper around the QML Button QML.
Now let's look at the code in the InfoBar.qml file:
And it works too. Clicking the execute JavaScript button will paint the elements of the page with a red border and display the stats in the infobar, just as happened with our CEF example:
When I first started using QML this automatic updating of the fields felt counter-intuitive. In most programming languages if an expression includes a variable, the value at the point of assignment is used and the expression isn't reevaluated if the variable changes value. In QML if a variable is defined using a colon : (as opposed to an equals =) symbol, it will be bound to the variables in the expression and updated if they change. This is what's happening here: when the dominfo variable is updated, all of its dependent bound variables will be updated too. All made possible using the magical signals from earlier.
Other user interface frameworks (Svelte springs to mind) have this feature as well; when used effectively it can make for super-simple and clean code.
There's just one last piece of the puzzle, which is the domwalk code itself. I'm not going to list it here, because it's practically identical to the code we used for the CEF example, which is listed above. The only difference is the way we return the result back at the end. You can check out the DomWalk.js source file if you'd like to compare.
And that's it. This is far simpler than the code needed for CEF, although admittedly the CEF code all made perfect sense. Unlike CEF, Qt WebEngine is only intended for use with Qt. This fact, combined with the somewhat less verbose syntax of QML compared to C++, is what makes the Qt version so much more concise.
In both cases the underlying Web rendering and JavaScript execution engines are the same: Blink and V8 respectively. It's only the way the Chromium API is exposed that differs.
Let's now move on to the Sailfish WebView, which has a similar interface to Qt WebEngine but uses a different engine in the background.
Although the Sailfish WebView may therefore not be so useful outside of Sailfish OS, it's Sailfish OS that drives my interest in mobile browsers. So from my point of view it's very natural for me to include it here.
Since it's built using Qt and is exposed as a QML widget, the Sailfish WebView has many similarities with Qt WebEngine. In fact the Qt WebEngine API is itself a successor of the Qt WebView API, which was previously available on Sailfish OS and which the Sailfish WebView was developed as a replacement for.
So expect similarities. However there's also one crucial difference between the two: whereas the Qt WebEngine is built using Blink and V8, the Sailfish WebView is built using Gecko and SpiderMonkey. So in the background they're making use of completely different Web rendering engines.
Like the other two examples, all of the code is available on GitHub. The repository structure is a little different, mostly because Sailfish OS has its own build engine that's designed for generating RPM packages. Although the directory structured differs, for the parts that interest us you'll find all of the same source files across both the WebEngineTest and harbour-webviewtest.
Looking first at the Main.qml file there are only a few differences between this and the equivalent file in the WebEngineTest repository.
The DomWalk.js code is also practically identical. One difference is that we don't use structuredClone() because the engine version is slightly older. We use a trick of converting to JSON and back instead to achieve the same result. This JavaScript is loaded in the main.cpp file in the same way as for WebEngineTest.
The NavBar.qml and InfoBar.qml files are to all intents and purposes identical, so I won't copy the code out here.
And that's it. Once again, it's a pretty clean and simple implementation. It demonstrates execution of JavaScript within the DOM that's able to manipulate elements and read data from them. It also shows data being passed back from Gecko to the native app that wraps it.
Although the Sailfish WebView is uses Gecko, from the point of view of the developer and the end user there's no real difference between the API offered by Qt WebEngine and that offered by the Sailfish WebView.
For Sailfish OS users it's natural to ask whether it makes sense to continue using Gecko, rather than switching to Blink. I'm hoping this investigation will help provide an answer, but right now I just want to reflect on the fact there's very little difference from the perspective of the API consumer.
While CEF and Qt WebEngine both share the same rendering backend, their APIs are quite different, at least when it comes to the specifics. But in fact, the functionalities exposed by both are similar.
The Sailfish WebView on the other hand uses completely different engines — Gecko and SpiderMonkey — and yet, in spite of this the WebView API is really very similar to the Qt WebEngine API.
So as a developer, why choose one of them over the other? When it comes to the WebEngine and the WebView the answer is happily straightforward: since they support different, non-overlapping platforms, if you're using Sailfish OS consider using the WebView; if you're not it's the WebEngine you should look at.
To wrap things up, lets consider the core functionalities the APIs provide. Although each has its own quirks, fundamentally they offer something similar:
At the start I said I'd consider whether Gecko is still appropriate for use by Sailfish OS for its embedded browser. This is an important question that we're now closer to having a clearer answer to. I'll take a look at this in more detail in a future post.
Comment
Indeed, for many years WebKit was also an integral part of Sailfish OS, providing the embeddable QtWebKit widget that many other apps used as a way of rendering Web content. It wasn't until Sailfish OS 4.2.0 that this was officially replaced by the Gecko-based WebView API.
The coexistence of multiple engines within the operating system isn't the only reason many people felt WebKit would make a better alternative to Gecko. Another is the fact that WebKit, and subsequently Blink, has become the defacto standard for embedded applications. In contrast, although Mozilla were pushing embedded support back when Maemo was being developed, it's since dropped official embedded support entirely.
So in this post I'm going to take a look at embedded browsers. What does it mean for a browser to be embedded, what APIs are supported by the most widely used embedded toolkits, and might it be true that Sailfish OS would be better off using Blink? In fact, I'll be leaving this last question for a future post, but my hope is that the discussion in this post will serve as useful groundwork.
Let's start by figuring out what an embedded browser actually is. In my mind there are two main definitions, each embracing a slightly different set of characteristics.
- A browser that runs on an embedded device.
- A browser that can be embedded in another application.
If you frame it right though, these two definitions can feel similar. Here's how the Web Platform for Embedded team describe it:
For many of us, a browser is an application like the one you’re probably using now. You click an icon on your graphical operating system (OS), navigate somewhere with a URL bar, search, and so on. You have bookmarks and tabs that you can drag around, and lots of other features.
In contrast, an embedded browser is contained within another application or is built for a specific purpose and runs in an embedded system, and the application controlling the embedded browser does not provide all the typical features of browsers running in desktops.
In contrast, an embedded browser is contained within another application or is built for a specific purpose and runs in an embedded system, and the application controlling the embedded browser does not provide all the typical features of browsers running in desktops.
So minimal, encapsulated, targeted. Maybe something you don't even realise is a browser.
And what does this mean in practice? That might be a little hard to pin down, but for me it's all about the API. What API does the browser expose for use by other applications and systems? If it provides bare-bones render output, but with enough hooks to build a complete browser on top of (at a minimum) then you've got yourself an effective embedded browser.
In the past Gecko provided exactly this in the form of the EmbedLIte API and the XULRunner Development Kit. The former provides a set of APIs that allow Gecko to be embedded in other applications. The latter allows the Gecko build process to be harnessed in order to output all of the libraries and artefacts needed (such as the libxul.so library and the omni.ja resource archive) to integrate Gecko into another application.
Sadly Mozilla dropped support for both of these back in 2016, when it was decided the core Firefox browser needed to be prioritised over an embedding offering. Mozilla has made plenty of questionable decisions over the years and given the rise in use of WebKit and Chrome as embedded browsers, you might think this was one of them. But despite the lack of investment in the API, it's not been removed entirely, to Mozilla's credit. It is, in fact, still possible to access the EmbedLite APIs and to generate the XULRunner artefacts and get a very effective embedded browser.
We'll come back to the EmbedLite approach to embedding later. But in order to understand it better, I believe it's also helpful to understand the context. I therefore plan to look at three different embedded browser frameworks. These are CEF (the Chromium Embedded Framework), Qt WebEngine and then finally we'll return to Gecko by considering the Gecko WebView.
Looking through the documentation I was surprised at how similar these three frameworks appear to be. But trying them out I was quickly divested of this misapprehension. They do offer similar functionality, but turn out to be quite different to use in practice.
Before we get in to the API details, let's first consider what a minimal embedded browser external interface might look like.
- Settings controller. An API, likely exposed as a class, to control browser settings such as cache and profile location, scaling, user agent string, privacy settings and so on. Browsers typically offer numerous configuration options and some of these such as profile location are especially important for the embedded case.
- JavaScript execution. Apps that embed a browser often have particular use-cases in mind. The ability to execute JavaScript is important for allowing interaction between the rendered Web content and the rest of the application (see also message passing interface).
- Web controls. There are a bunch of controls that are needed as a bare minimum for controlling browser content. Load URL; navigate forwards; navigate backwards; that kind of thing. An app that embeds a browser may choose to handle these controls itself, potentially hiding them from the user entirely, but at the very least the app has to be able to access these controls programmatically from its own code.
- Separate view widgets. The browser is an engine and often an app will want multiple views all of which make use of it, each rendering different content. An embedding framework should allow an app to embed multiple views, each making use of the same engine underneath.
- Message passing interface. The app and the browser need a way to communicate with one another. Browsers already work by broadcasting messages between different components, so there should be a way for the embedder to send and receive these messages as well. A common use case will involve the embedder injecting some JavaScript, with communication handled by message passing between the app and the JavaScript. The app can then act on the messages sent from inside the browser engine by the JavaScript.
- Populate a settings structure for the browser to capture the settings in an object.
- Instantiate a bunch of singleton browser classes. These will be for central management of the browser components.
- Pass in the settings object to these central browser components.
- Embed one or more browser widget into the user interface to create browser views.
- Inject some JavaScript into the browser views. This JavaScript listens for messages from the app, interacts with the browser content and sends messages back.
- Open the window containing the browser widget for the user.
- Interact with the JavaScript and browser engine by passing messages via the message passing interface.
- When the user closes the window, shut down the views.
Let's now turn to the three individual embedding frameworks to see how they approach all this.
CEF
Let's start by considering the CEF API as documented. I actually began by looking at the Blink source code, but it turns out this isn't set up well for easy integration into other projects. I should caveat this: the underlying structure may be carefully arranged to support it, but the Blink project itself doesn't seem to prioritise streamlining the process of embedding. For example it exposes the entire internal API with no simplified embedding wrapper and I didn't find good official documentation on the topic.And that's exactly how CEF brings value. It takes the Chromium internals (Blink, V8, etc.) and wraps them with the basic windowing scaffolding needed to get a browser working across multiple platforms (Linux, Windows, macOS). It then adds in a streamlined interface for controlling the most important features needed for embedding (settings, browser controls, JavaScript injection, message passing).
Having worked through the documentation and tutorials, the CEF project assumes a slightly different workflow from what I'd typically expect. Qt WebEngine and Gecko WebView are both provided as widgets that integrate with the graphical user interface (GUI) toolkit (which in these cases is Qt). On the other hand, CEF is intended for use with multiple different widget toolkits (Gtk, Qt, etc.). As a developer you're supposed to clone the cef-project repository with the CEF example code — which is complex and extensive — and build your application on top of that. As is explained in the documentation, the first thing a developer needs to do is to:
Fork the cef-project repository using Bitbucket and Git to store the source code for your own CEF-based project.
The documentation assumes you're starting from scratch; it's not clear to me how you're supposed to proceed if you want to retrofit CEF into an existing application. It looks like it may not be straightforward.
Nevertheless, assuming you're starting from scratch, CEF provides a solid base to build on, since you'll start with an application that already builds, runs and displays Web content. You can then immediately see how the main classes needed for controlling the browser are used.
There are many such classes, but I've picked out three that I think are especially important for understanding what's going on. As soon as you look into one of these classes you'll find references to other classes. You may need to look into these too if you want to fully understand what's going on; with each such step I found myself getting pulled a little further into the rabbit hole.
First the CefBrowserHost class. This class is pretty key as it handles the lifespan of the browser; it's described in the source as being "used to represent the browser process aspects of a browser". Here's a flavour of what the class looks like. I've cut a lot out for the sake of brevity, but you can check out the class header if you want to see everything.
class CefBrowserHost : public CefBaseRefCounted { public: static bool CreateBrowser(const CefWindowInfo& windowInfo, CefRefPtr<CefClient> client, const CefString& url, const CefBrowserSettings& settings, CefRefPtr<CefDictionaryValue> extra_info, CefRefPtr<CefRequestContext> request_context); CefRefPtr<CefBrowser> GetBrowser(); void CloseBrowser(bool force_close); [...] void StartDownload(const CefString& url); void PrintToPDF(const CefString& path, const CefPdfPrintSettings& settings, CefRefPtr<CefPdfPrintCallback> callback); void Find(const CefString& searchText, bool forward, bool matchCase, bool findNext); void StopFinding(bool clearSelection); bool IsFullscreen(); void ExitFullscreen(bool will_cause_resize); [...] };As a developer you call CreateBrowser() to start up your browser, which you can then access using GetBrowser(). Once you're done you can destroy it using CloseBrowser(). All of these are accessed via this CefBrowserHost interface. As you can see, there are also a bunch of browser-wide functionalities (search, fullscreen mode, printing, etc.) that are also managed through CefBrowserHost.
Here's the interface for the CefBrowser object the lifecycle of which is being managed (you'll find the full class header in the same file):
class CefBrowser : public CefBaseRefCounted { public: bool CanGoBack(); void GoBack(); bool CanGoForward(); void GoForward(); bool IsLoading(); void Reload(); void StopLoad(); bool HasDocument(); CefRefPtr<CefFrame> GetMainFrame(); [...] };Things are starting to look a lot more familiar now, with methods to perform Web navigation and the like. Notice however that we still haven't reached the interface for loading a specific URL yet. For that we need a CefFrame which the source code describes as being "used to represent a frame in the browser window.". A page can be made up of multiple such frames, but there's always a root frame which we can extract from the browser using the GetMainFrame() method you see above.
Once we have this root CefFrame object we can then ask for a particular page to be loaded into the frame using the LoadURL() method:
class CefFrame : public CefBaseRefCounted { public: CefRefPtr<CefBrowser> GetBrowser(); void LoadURL(const CefString& url); CefString GetURL(); CefString GetName(); void Cut(); void Copy(); void Paste(); void ExecuteJavaScript(const CefString& code, const CefString& script_url, int start_line); void SendProcessMessage(CefProcessId target_process, CefRefPtr<CefProcessMessage> message); [...] };The full definition of CefFrame can be seen in the cef_frame.h header file.
So now we've seen enough of the API to initialise things and to then load up a particular URL into a particular view. But CefFrame offers us a lot more than just that. In particular it offers up two other pieces of functionality critical for embedded browser use: it allows us to execute code within the frame and it allows us to send messages between the application and the frame.
Why are these two things so critical? In order for the content shown by the browser to feel fully integrated into the application, the application must have a means to interact with it. These two capabilities are precisely what we need to do this.
Understanding CEF requires a lot more than these three classes, but this is a supposed to be a survey, not a tutorial. Still, it would be nice to know what it's like to use these classes in practice. To that end, I've put together a simple example CEF application that makes use of some of this functionality.
The application itself is simple and useless, but designed to capture functionality that might be repurposed for better use in other situations. The app displays a single window containing various widgets built using the native toolkit (in the case of our CEF example, these are Gtk widgets). The browser view is embedded between these widgets to demonstrate that it could potentially be embedded anywhere on the page.
The native widgets allow some limited control over the browser content: a URL bar, forwards and backwards. There's also an "execute JavaScript" button. This is the more interesting functionality. When the user presses this a small piece of JavaScript will be executed in the DOM context of the page being rendered.
Here's the JavaScript to be executed:
function collect_node_stats(global_context, local_context, node) { // Update the context local_context.depth += 1; local_context.breadth.push(node.childNodes.length); global_context.nodes += 1; global_context.maxdepth = Math.max(local_context.depth, global_context.maxdepth); // Recurse into child nodes for (child of node.childNodes) { child_context = structuredClone(local_context); child_context.breadth = local_context.breadth.slice(0, local_context.depth + 1); child_context = collect_node_stats(global_context, child_context, child); // Recalculate the child breadths for (let i = local_context.depth + 1; i < child_context.breadth.length; ++i) { local_context.breadth[i] = (local_context.breadth[i]||0) + child_context.breadth[i]; } } // Paint the DOM red if (node.style) { node.style.boxShadow = "inset 0px 0px 1px 0.5px red"; } // Move back up the tree local_context.depth -= 1; return local_context; } function node_stats() { // Data available to all nodes let global_context = { "nodes": 0, "maxdepth": 0, "maxbreadth": 0 } // Data that's local to the node and shared with the parent let local_context = { "depth": 0, "breadth": [1] } // Off we go local_context = collect_node_stats(global_context, local_context, document); global_context.maxbreadth = Math.max.apply(null, local_context.breadth); return global_context; } // Return the results (only strings allowed) // See: DomWalkHandler::Execute() defined in renderer/client_renderer.cc dom_walk(JSON.stringify(node_stats()))I'm including all of it here because it's not too long, but there's no need to go through this line-by-line. All it does is walk the page DOM tree, giving each item in the DOM a red border and collecting some statistics as it goes. As it goes along it collects information that allows us to calculate the number of nodes, the maximum height of the DOM tree and the maximum breadth of the DOM tree.
There is one peculiar aspect to this code though: having completed the walk and returned from collect_node_stats() the code then converts the results into a JSON string and passes the result into a function called dom_walk(). But this function doesn't exist. Huh?!
We'll come back to this.
The values that are calculated aren't really important, what is important is that we can return these values at the end and display them in the native user interface code. This highlights not only how an application can have its own code executed in the browser context, but also how the browser can communicate back information to the application. With these, we can make our browser and application feel seamlessly integrated, rather than appear as two different apps that happen to be sharing some screen real-estate.
Let's now delve in to some code and consider how our three classes are being used. We'll then move on to how the communication between app and browser is achieved.
To get the CEF example working I followed the advice in the documentation and made a fork of the cef-project repository. I then downloaded the binary install of the cef project, inside which is an example application called cefclient. I made my own copy of this inside my cef-project fork, hooked it into the CMake build files and started making changes to it.
There's a lot of code there which may look a bit overwhelming but bear in mind that the vast majority of this code is boilerplate taken directly from the example. Writing this all from scratch would have been... time consuming.
Most of the changes I did make were to the file. This handles the browser lifecycle as described above using an instance of the CefBrowserHost class.
We can see this in the RootWindowGtk::CreateRootWindow() method which is responsible for setting up the contents of the main application window. In there you'll see lots of calls for creating and arranging Gtk widgets (I love Gtk, but admittedly it can be a tad verbose). Further down in this same method we see the call to CefBrowserHost::CreateBrowser() that brings the browser component to life.
In the case of CEF the browser isn't actually a component. We tell the browser where in our window to render and it goes ahead and renders, so we actually create an empty widget and then update the browser content bounds every time the size or position of this widget changes.
This contrasts with the Qt WebEngine and Gecko WebView approach, where the embedded browser is provided as an actual widget and, consequently, the bounds are updated automatically as the widget updates. Here with CEF we have to do all this ourselves.
It's not hard to do, and it brings extra control for greater flexibility, but it also hints at why so much boilerplate code is needed.
The browser lives on until the app calls CefBrowserHost::CloseBrowser() in the event that the actual Gtk window containing it is deleted.
We already talked about the native controls in the window and the fact that we can enter a URL, as well as being able to navigate forwards and backwards through the browser history. For this functionality we use the CefBrowser object.
We can see this at work in the same file. Did I mention that this file is where most of the action happens? That's because this is the file that handles the Gtk window and all of the interactions with it.
When creating the window we set up a generic RootWindowGtk::NotifyButtonClicked() callback to handle interactions with the native Gtk widgets. Inside this we find some code to get our CefBrowser instance and call one of the navigation functions on it. The choice of which to call depends on the button that was pressed by the user:
CefRefPtr<CefBrowser> browser = GetBrowser(); if (!browser.get()) { return; } switch (id) { case IDC_NAV_BACK: browser->GoBack(); break; case IDC_NAV_FORWARD: browser->GoForward(); [...]Earlier we mentioned that we also have this special execute JavaScript button. I've hooked this up slightly differently, so that it has its own callback for when clicked.
The format is similar, but when clicked it extracts the main frame from the browser in the form of a CefFrame instance and calls the CefFrame::ExecuteJavaScript() method on this instead. Like this:
void RootWindowGtk::DomWalkButtonClicked(GtkButton* button, RootWindowGtk* self) { CefRefPtr<CefBrowser> browser = self->GetBrowser(); if (browser.get()) { CefRefPtr<CefFrame> frame = browser->GetMainFrame(); frame->ExecuteJavaScript(self->dom_walk_js_, "", 0); } }The dom_walk_js_ member is just a string buffer containing the contents of our JavaScript file (which I load at app start up). As the method name implies, calling ExecuteJavaScript() will immediately execute the provided JavaScript code in the view's DOM context, starting execution from the line provided.
There are similar methods available for the Qt WebEngine and Gecko WebView as well. As we'll see, what makes the CEF version different is that it doesn't block the user interface thread during execution and doesn't return a value. But as we discussed above, we want to return a value, because otherwise how are we going to display the number of nodes, tree height and tree breadth in the user interface?
This is where that mysterious dom_walk() method that I mentioned earlier comes in. We're going to create this method on the C++ side so that when the JavaScript code calls it, it'll execute some C++ code rather than some JavaScript code.
We do this by extending the CefV8Handler class and overriding its CefV8Handler::Execute() method with the following code:
bool DomWalkHandler::Execute(const CefString& name, CefRefPtr<CefV8Value> object, const CefV8ValueList& arguments, CefRefPtr<CefV8Value>& retval, CefString& exception) { if (!arguments.empty()) { // Create the message object. CefRefPtr<CefProcessMessage> msg = CefProcessMessage::Create( "dom_walk"); // Retrieve the argument list object. CefRefPtr<CefListValue> args = msg->GetArgumentList(); // Populate the argument values. args->SetString(0, arguments[0]->GetStringValue()); // Send the process message to the main frame in the render process. // Use PID_BROWSER instead when sending a message to the browser process. browser->GetMainFrame()->SendProcessMessage(PID_BROWSER, msg); } return true; }This code is going to execute on the render thread, so we still need to get our result to the user interface thread. I say "thread", but it could even be a different process. So this is where the SendProcessMessage() call at the end of this code snippet comes in. The purpose of this is to create a message with a payload made up of the arguments passed in to the dom_walk() method (which, if you'll recall, is a stringified JSON structure). We then send this as a message to the browser process.
In JavaScript functions are just like any other value, so to get our new function into the DOM context all we need to do is create a CefV8Value object, which is the C++ name for a JavaScript value, and pass it in to the global context for the browser. We do this when the JavaScript context is created like so:
void OnContextCreated(CefRefPtr<ClientAppRenderer> app, CefRefPtr<CefBrowser> browser, CefRefPtr<CefFrame> frame, CefRefPtr<CefV8Context> context) override { message_router_->OnContextCreated(browser, frame, context); if (!dom_walk_handler) { dom_walk_handler = new DomWalkHandler(browser); } CefRefPtr<CefV8Context> v8_context = frame->GetV8Context(); if (v8_context.get() && v8_context->Enter()) { CefRefPtr<CefV8Value> global = v8_context->GetGlobal(); CefRefPtr<CefV8Value> dom_walk = CefV8Value::CreateFunction( "dom_walk", dom_walk_handler); global->SetValue("dom_walk", dom_walk, V8_PROPERTY_ATTRIBUTE_READONLY); CefV8ValueList args; dom_walk->ExecuteFunction(global, args); v8_context->Exit(); } }Finally in our browser thread we set up a message handler to listen for when the dom_walk message is received from the render thread.
if (message_name == "dom_walk") { if (delegate_) { delegate_->OnSetDomWalkResult(message->GetArgumentList()->GetString(0)); } }Back in our root_window_gtk.cc file is the implementation of OnSetDomWalkResult() which takes the string passed to it, parses it and displays the content in our info bar at the bottom of the window:
void RootWindowGtk::OnSetDomWalkResult(const std::string& result) { CefRefPtr<CefValue> parsed = CefParseJSON(result, JSON_PARSER_ALLOW_TRAILING_COMMAS); int nodes = parsed->GetDictionary()->GetInt("nodes"); int maxdepth = parsed->GetDictionary()->GetInt("maxdepth"); int maxbreadth = parsed->GetDictionary()->GetInt("maxbreadth"); gchar* nodes_str = g_strdup_printf("Node count: %d", nodes); gtk_label_set_text(GTK_LABEL(count_label_), nodes_str); g_free(nodes_str); gchar* maxdepth_str = g_strdup_printf("DOM height: %d", maxdepth); gtk_label_set_text(GTK_LABEL(height_label_), maxdepth_str); g_free(maxdepth_str); gchar* maxbreadth_str = g_strdup_printf("DOM width: %d", maxbreadth); gtk_label_set_text(GTK_LABEL(width_label_), maxbreadth_str); g_free(maxbreadth_str); }As you can see, most of this final piece of the puzzle is just calling the Gtk code needed to update the user interface.
So now we've gone full circle: the user interface thread executes some JavaScript code on the render thread in the view's DOM context. This then calls a C++ method also on the render thread, which sends a message to the user interface thread, which updates the widgets to show the result.
All of the individual steps make sense in their own way, but it is, if I'm honest, a bit convoluted. I can fully understand that message passing is needed between the different threads, but it would have been nice to be able to send the message directly from the JavaScript. Although there are constraints that apply here for security reasons, the Qt WebEngine and Gecko WebView equivalents both abstract these steps away from the developer, which makes life a lot easier.
With all of this hooked up, pressing the execute JavaScript button now has the desired effect.
The CEF project works hard to make Blink accessible as an embedded browser, but there's still plenty of complexity to contend with. Given just the few pieces we've covered here — lifecycle, navigation, JavaScript execution and message passing — you'll likely be able to do the majority of things you might want with an embedded browser. Crucially, you can integrate the browser component seamlessly with the rest of your application.
It's powerful stuff, but it's also true to say that the other approaches I tried out managed to hide this complexity a little better. The main reason for this would seem to be because CEF doesn't target any particular widget toolkit. It can, in theory, be integrated with any toolkit, whether it be on Linux, Windows or macOS.
While that flexibility comes at a cost in terms of complexity, that hasn't stopped CEF becoming popular. It's widely used by both open source and commercial software, including the Steam client and Spotify desktop app.
In the next section we'll look at the Qt WebEngine, which provides an alternative way to embed the Blink rendering engine into your application.
Qt WebEngine
In the last section we looked at CEF for embedding Blink into an application with minimal restrictions on the choice of GUI framework. We'll follow a similar approach as we investigate Qt WebEngine: first looking at the API, then seeing how we can apply it in practice.Although both uses Blink, there are other important differences between the two. First, Qt WebEngine is tied to Qt. That means that all of the classes we'll look at bar one will inherit from QObject and the main user interface class will inherit from QWidget (which itself is a descendant of QObject).
While Qt is largely written in C++ and targets C++ applications, we'll also make use of QML for our example code. This will make the presentation easier, but in practice we could achieve exactly the same results using pure C++. We'd just end up with a bit more code.
So, with all that in mind, let's get to it.
The fact that Qt WebEngine exclusively targets Qt applications does make things a little simpler, both for the Qt WebEngine implementation and for our use of it. Consequently we can focus on just two classes. In practice there are many more classes that make up the API, but many of these have quite specific uses (such as interacting with items in the navigation history, or handling HTTPS certificates). All useful stuff for sure, but our aim here is just to give a flavour.
The two classes we're going to look at are QWebEnginePage and QWebEngineView. Here's an abridged version of the former:
class QWebEnginePage : public QObject { Q_PROPERTY(QUrl requestedUrl...) Q_PROPERTY(qreal zoomFactor...) Q_PROPERTY(QString title..) Q_PROPERTY(QUrl url READ...) Q_PROPERTY(bool loading...) [...] public: explicit QWebEnginePage(QObject *parent); virtual void triggerAction(WebAction action, bool checked); void findText(const QString &subString, FindFlags options, const std::function<void(const QWebEngineFindTextResult &)> &resultCallback)); void load(const QUrl &url); void download(const QUrl &url, const QString &filename); void runJavaScript(const QString &scriptSource, const std::function<void(const QVariant &)> &resultCallback); void fullScreenRequested(QWebEngineFullScreenRequest fullScreenRequest); [...] };If you're not familiar with Qt those Q_PROPERTY macros at the top of the class may be a bit confusing. These introduce scaffolding for setters and getters of a named class variable. The developer still has to define and implement the setter and getter methods in the class. However properties come with a signal method which other methods can connect to. When the value of a property changes, the connected method is called, allowing for immediate reactions to be coded in whenever the property updates.
According to the Qt documentation, the QWebEnginePage class...
holds the contents of an HTML document, the history of navigated links, and actions.
That's reflected in the methods and member variables I've pulled out here. The title, url and load status of the page are all exposed by this class and it also allows us to search the page. The reference to actions in the documentation relates to the triggerAction() method. There are numerous types of WebAction that can be passed in to this. Things like Forward, Back, Reload, Copy, SavePage and so on.
You'll also notice there's a runJavaScript() method. If you've already read through the section on CEF you should have a pretty good idea about how we're planning to make use of this, but we'll talk in more detail about that later.
The other key class is QWebEngineView. This inherits from QWidget, which means we can actually embed this object in our window. It's the class that actually gets added to the user interface. It's therefore also the route through which we can interact with the QWebEnginePage page object that it holds.
class QWebEngineView : public QWidget { Q_PROPERTY(QString title...) Q_PROPERTY(QUrl url...) Q_PROPERTY(QString selectedText...) Q_PROPERTY(bool hasSelection...) Q_PROPERTY(qreal zoomFactor...) [...] public: explicit QWebEngineView(QWidget *parent); QWebEnginePage *page() const; void setPage(QWebEnginePage *page); void load(const QUrl &url); void findText(const QString &subString, FindFlags options, const std::function<void(const QWebEngineFindTextResult &)> &resultCallback); QWebEngineSettings *settings() const; void printToPdf(const QString &filePath, const QPageLayout &layout, const QPageRanges &ranges); [...] public slots: void stop(); void back(); void forward(); void reload(); [...] };Notice the interface includes a setter and getter for the QWebEnginePage object. Some of the page functionality is duplicated (for convenience as far as I can tell). We can also get access to the QWebEngineSettings object through this view, which allows us to configure similar browser settings to those we might find on the settings page of an actual browser.
There are also convenience slots for navigation (a slot is just a method that can be either called directly, or connected up to one of the signals I mentioned earlier).
And with these few classes we have what we need to create ourselves an example application. I've created something equivalent to our CEF example application described in the previous section, called WebEngineTest; all of the code for it is available on GitHub, but I'm also going to walk us through the most important parts here.
If you looked at the sprawling CEF code, you may be surprised to see how simple the Qt WebEngine equivalent is. The majority of what we need is encapsulated in this short snipped of QML code copied from the Main.qml file.
Column { anchors.fill: parent NavBar { id: toolbar webview: webview width: parent.width } WebEngineView { id: webview width: parent.width height: parent.height - toolbar.height - infobar.height url: "https://www.whatsmybrowser.org" onUrlChanged: toolbar.urltext.text = url settings.javascriptEnabled: true function getInfo() { runJavaScript(domwalk, function(result) { infobar.dominfo = JSON.parse(result); }); } } InfoBar { id: infobar } }This column fills the entire application window and essentially makes up the complete user interface for our application. The column contains three rows. At the top and bottom are a NavBar widget and an InfoBar widget respectively. Nestled between the two is a WebEngineView component, which is an instance of the class with the same name that we described above.
We'll take a look at the NavBar and InfoBar shortly, but let's first concentrate on the WebEngineView. It has a width set to match the width of the page and a height set to match the page height minus the size of the other widgets. We set the initial page to load and set JavaScript to be enabled, like so:
settings.javascriptEnabled: trueAs it happens JavaScript is enabled by default, so this line is redundant. It's there as a demonstration of how we can interact with the elements inside the QWebEngineSettings component we saw above.
Then there's the getInfo() method that executes the following:
runJavaScript(domwalk, function(result) { infobar.dominfo = JSON.parse(result); });These three lines of code are performing all of the complex message passing steps that we described at length for the CEF example. We call runJavaScript() which is provided by the QML interface as a shortcut to the method from QWebEnginePage.
The method takes the JavaScript script to execute — as a string — for its first parameter and a callback that's called on completion of execution for the second parameter. Internally this is actually doing something very similar to the CEF code we saw above: it passes the code to the V8 JavaScript engine to execute inside the DOM, then waits on a message to return with the results of the call.
In our callback we simply copy the returned data into the infobar.dominfo variable, which is used to populate the widgets along the bottom of the screen.
It all looks very clean and simple. But there is some machinery needed in the background to make it all hang together. First, you may have noticed that for our script we simply pass in a domwalk variable. We set this up in the main.cpp file (which is the entrypoint of our application). There you'll see some code that looks like this:
QString domwalk; QFile file(":/js/DomWalk.js"); if (file.open(QIODevice::ReadOnly)) { domwalk = file.readAll(); file.close(); } engine.rootContext()->setContextProperty("domwalk", domwalk);This is C++ code that simply loads the file from disk, stores it in the domwalk string and then adds the domwalk variable to the QML context. Doing this essentially makes domwalk globally accessible in all the QML code. If we were writing a larger more complex application we might approach this differently, but it's fine here for the purposes of demonstration.
Next up, let's take a look at the NavBar.qml file. QML automatically creates a widget named NavBar based on the name of the file, which we saw in use above as part of the main page.
Row { height: 48 property WebEngineView webview property alias urltext: urltext NavButton { icon.source: "../icons/back.png" onClicked: webview.goBack() enabled: webview.canGoBack } NavButton { icon.source: "../icons/forward.png" onClicked: webview.goForward() enabled: webview.canGoForward } NavButton { icon.source: "../icons/execute.png" onClicked: webview.getInfo() } Item { width: 8 height: parent.height } TextField { id: urltext y: (parent.height - height) / 2 text: webview.url width: parent.width - (parent.height * 3.6) - 16 color: palette.windowText onAccepted: webview.url = text } }As we can see, the toolbar is a row of five widgets. Three buttons, a spacer and the URL text field. The first two buttons simply call the goBack() and goForward() methods on our QWebEngineView class. The only other thing of note is that we also enable or disable the buttons based on the status of the canGoBack and canGoForward properties. This is where the signals we discussed earlier come in: when these variables change, they will output signals which are bound to these properties so that the change is propagated throughout the user interface. That's a Qt thing and it works really nicely for user interface development.
Finally for the toolbar, the third button simply calls the getInfo() method that we created as part of the definition of our WebEngineView widget above. We already know what this does, but just to recap, this will execute the domwalk JavaScript inside the DOM context and store the result in the infobar.dominfo variable.
The NavButton component type used here is just a simple wrapper around the QML Button QML.
Now let's look at the code in the InfoBar.qml file:
Row { height: 32 anchors.horizontalCenter: parent.horizontalCenter property var dominfo: { "nodes": 0, "maxdepth": 0, "maxbreadth": 0 } InfoText { text: qsTr("Node count: %1").arg(dominfo.nodes) } InfoText { text: qsTr("DOM height: %1").arg(dominfo.maxdepth) } InfoText { text: qsTr("DOM width: %1").arg(dominfo.maxbreadth) } }The overall structure here is similar: it's a row of widgets, in this case three of them, each a text label. The InfoText component is another simple wrapper, this time around a QML Text widget. As you can see the details shown in each text label are pulled in from the dominfo variable. Recall that when the domwalk JavaScript code completes execution, the callback will store the resulting structure into the dominfo variable we see here. This will cause the nodes, maxdepth and maxbreadth fields to be updated, which will in turn cause the labels to be updated as well.
And it works too. Clicking the execute JavaScript button will paint the elements of the page with a red border and display the stats in the infobar, just as happened with our CEF example:
When I first started using QML this automatic updating of the fields felt counter-intuitive. In most programming languages if an expression includes a variable, the value at the point of assignment is used and the expression isn't reevaluated if the variable changes value. In QML if a variable is defined using a colon : (as opposed to an equals =) symbol, it will be bound to the variables in the expression and updated if they change. This is what's happening here: when the dominfo variable is updated, all of its dependent bound variables will be updated too. All made possible using the magical signals from earlier.
Other user interface frameworks (Svelte springs to mind) have this feature as well; when used effectively it can make for super-simple and clean code.
There's just one last piece of the puzzle, which is the domwalk code itself. I'm not going to list it here, because it's practically identical to the code we used for the CEF example, which is listed above. The only difference is the way we return the result back at the end. You can check out the DomWalk.js source file if you'd like to compare.
And that's it. This is far simpler than the code needed for CEF, although admittedly the CEF code all made perfect sense. Unlike CEF, Qt WebEngine is only intended for use with Qt. This fact, combined with the somewhat less verbose syntax of QML compared to C++, is what makes the Qt version so much more concise.
In both cases the underlying Web rendering and JavaScript execution engines are the same: Blink and V8 respectively. It's only the way the Chromium API is exposed that differs.
Let's now move on to the Sailfish WebView, which has a similar interface to Qt WebEngine but uses a different engine in the background.
Sailfish WebView
The Sailfish WebView differs from our previous two examples in some important respects. First and foremost it's designed for use on Sailfish OS, a mobile Linux-based operating system. The Sailfish WebView won't run on other Linux distributions, mainly because Sailfish OS uses a bespoke user interface toolkit called Silica, which is built on top of Qt, but which is specifically targeted at mobile use.Although the Sailfish WebView may therefore not be so useful outside of Sailfish OS, it's Sailfish OS that drives my interest in mobile browsers. So from my point of view it's very natural for me to include it here.
Since it's built using Qt and is exposed as a QML widget, the Sailfish WebView has many similarities with Qt WebEngine. In fact the Qt WebEngine API is itself a successor of the Qt WebView API, which was previously available on Sailfish OS and which the Sailfish WebView was developed as a replacement for.
So expect similarities. However there's also one crucial difference between the two: whereas the Qt WebEngine is built using Blink and V8, the Sailfish WebView is built using Gecko and SpiderMonkey. So in the background they're making use of completely different Web rendering engines.
Like the other two examples, all of the code is available on GitHub. The repository structure is a little different, mostly because Sailfish OS has its own build engine that's designed for generating RPM packages. Although the directory structured differs, for the parts that interest us you'll find all of the same source files across both the WebEngineTest and harbour-webviewtest.
Looking first at the Main.qml file there are only a few differences between this and the equivalent file in the WebEngineTest repository.
Column { anchors.fill: parent NavBar { id: toolbar webview: webview width: parent.width } WebView { id: webview width: parent.width height: parent.height - (2 * Theme.iconSizeLarge) url: "https://www.whatsmybrowser.org/" onUrlChanged: toolbar.urltext.text = url Component.onCompleted: { WebEngineSettings.javascriptEnabled = true } function getInfo() { runJavaScript(domwalk, function(result) { infobar.dominfo = JSON.parse(result); }); } } InfoBar { id: infobar width: parent.width } }The main difference is the way the settings are accessed and set. Whereas the settings property could be accessed directly as a WebEngineView property, here we have to access the settings through the WebEngineSettings singleton object. Otherwise this initial page is the same and the approach to calling JavaScript in the DOM is also identical.
The DomWalk.js code is also practically identical. One difference is that we don't use structuredClone() because the engine version is slightly older. We use a trick of converting to JSON and back instead to achieve the same result. This JavaScript is loaded in the main.cpp file in the same way as for WebEngineTest.
The NavBar.qml and InfoBar.qml files are to all intents and purposes identical, so I won't copy the code out here.
And that's it. Once again, it's a pretty clean and simple implementation. It demonstrates execution of JavaScript within the DOM that's able to manipulate elements and read data from them. It also shows data being passed back from Gecko to the native app that wraps it.
Although the Sailfish WebView is uses Gecko, from the point of view of the developer and the end user there's no real difference between the API offered by Qt WebEngine and that offered by the Sailfish WebView.
For Sailfish OS users it's natural to ask whether it makes sense to continue using Gecko, rather than switching to Blink. I'm hoping this investigation will help provide an answer, but right now I just want to reflect on the fact there's very little difference from the perspective of the API consumer.
Wrap-up
We've seen three different embedding APIs. CEF uses Chromium and in particular Blink and V8 for its rendering and JavaScript engines respectively. CEF isn't aimed at any particular platform or user interface toolkit and consequently writing an application that uses it requires considerably more boilerplate than the Qt WebEngine or Sailfish OS WebView approaches. The latter two both build on the Qt toolkit and it wouldn't make sense to use them with something like Gtk.While CEF and Qt WebEngine both share the same rendering backend, their APIs are quite different, at least when it comes to the specifics. But in fact, the functionalities exposed by both are similar.
The Sailfish WebView on the other hand uses completely different engines — Gecko and SpiderMonkey — and yet, in spite of this the WebView API is really very similar to the Qt WebEngine API.
So as a developer, why choose one of them over the other? When it comes to the WebEngine and the WebView the answer is happily straightforward: since they support different, non-overlapping platforms, if you're using Sailfish OS consider using the WebView; if you're not it's the WebEngine you should look at.
To wrap things up, lets consider the core functionalities the APIs provide. Although each has its own quirks, fundamentally they offer something similar:
- Rendering of Web content. As an embedded browser, the content can be rendered in a window in amongst the other widgets of your application.
- Navigation. The usual Web navigation functionalities can be controlled programmatically. This includes setting the URL, moving through the history and so on
- Settings. The browser settings can be set and controlled programmatically.
- Access to other browser features. This includes profiles, password management, pop-up controls, downloading content and so on. All of these are also accessible via the API.
- JavaScript execution. JavaScript can be executed in the DOM and in different contexts, including with privileged access to all of the engine's functionalities.
- Message passing. Messages can be sent from the managing application to the renderer and in the other direction, allowing fine-grained integration of the two
At the start I said I'd consider whether Gecko is still appropriate for use by Sailfish OS for its embedded browser. This is an important question that we're now closer to having a clearer answer to. I'll take a look at this in more detail in a future post.
6 Oct 2024 : My Browser History #
Working on a presentation for work has given me an opportunity to read up about the history of the Web Browser. Turns out to be not such a simple thing to unravel, but I've given it a go.
6 Oct 2024 : Reviewing My Browser History #
For many years I thought it would be a mistake to mix my hobbies with my professional life. Blurring the boundary would prevent me defining a clear boundary between my work time and my relaxation time. I thought it could also lead to things I enjoy becoming contaminated, irreversibly harming the joy I get from them. It's not that I didn't want to enjoy my work: quite the opposite in fact. I felt in order to enjoy both I needed to maintain a separation.
As my life has progressed I've changed my opinion on this. It's great to separate work and play, but there's also immense joy to be had from doing something you love as a professional endeavour. Mixing the two together has the potential to amplify the joy from both.
Working for Jolla was what really brought this home to me. Smartphone development, user privacy and control, and Sailfish OS in particular were always part of the life I separated from my career. When I started working at Jolla I thought I was taking a risk. Would I lose my passion? Would I regret knowing what goes on inside the "sausage factory"?
My concerns were unfounded and, with this experience in hand, I now make it my aim to bring my personal passions into my professional life as well.
Now I'm at the Turing I'm no longer developing for Sailfish OS during work hours. As readers of my gecko dev diaries will know, upgrading the Sailfish browser has been one of my main activities outside of work. Finding opportunities to bring this Sailfish development into my professional world has been one of my objectives and recently just such an opportunity arose.
It's only a small overlap: in November I'll be giving a presentation about browsers at the Turing. The title of the talk will be "The Anatomy of a Browser: Embedded Mobile Lizards". Lizards being a reference to Gecko.
To help with this I've been digging a little into the history of browsers. They have a rich, fascinating and often fractious history that I find fascinating and one I want to talk a little more about today.
But to understand the history, we first need to understand a little about the internals of a browser.
The protocol client handles network interactions. It opens a network connection, sends a message to the server, then waits for and interprets the response. If the protocol uses a secure transport layer it handles certificate validation, checking certificate revocation, data encryption and integrity. The latest releases of Firefox and Chrome support HTTP, HTTPS, WebSockets, Secure WebSockets, Secure Real-time Transport Protocol, file access and probably others I'm not aware of. Firefox and Chrome used to support FTP but have since dropped it. Firefox dropped support in version 90 (July 2021) while Chrome dropped it in version 95 (September 2021). Unlike the rendering and JavaScript engines, protocol clients tend not to be given their own bespoke names separate from the browser. Maybe this is because they're often built from other libraries offering support for specific protocols. Nevertheless the protocol client is both a crucial and complex part of the browser.
I've listed the DOM as a separate piece of the browser, but it's usually tightly coupled with the layout and rendering engines. The DOM defines the internal data structures used to represent the page being rendered. For HTML, XML or SVG documents these are hierarchically built from nodes that have a parent and multiple (possibly zero) children. Typically the document structure will map naturally onto the DOM, with XML elements and attributes mapping onto nodes. Child nodes in the document will map to child nodes in the DOM. In practice nodes are likely to be represented as class objects in the code containing references to child nodes. The DOM is usually part of the rendering engine separate from the JavaScript engine, but if it weren't for JavaScript the DOM might be considered as just an implementation details. The existence of JavaScript elevates the DOM to something Web developers have to have a good understanding of, as we'll see.
The JavaScript engine allows execution of JavaScript code. JavaScript has an odd history. Originally invented at Netscape by Brendan Eich, you might think that the JavaScript language has something to do with Java. In fact they're very different. Java is a strictly-typed object-oriented garbage-collected language that compiles down to a bytecode representation that can be executed by a Java Virtual Machine. Although at one point Java "applets" that ran in the browser were a thing, you rarely see these nowadays (they're not supported without installing a plugin). Java is still used in server applications and to be honest, given Sun's expertise and revenue rested primarily with servers I always found it rather surprising that it was ever anything other than server-focused. JavaScript on the other hand is very much a client-side, dynamically-typed event-based scripting language with prototype-based object orientation. In recent years it's also become popular as a server-side language for reasons that I won't go in to here. Both Java and JavaScript are "curly-brace" languages with similarities to C++; and while I realise I've managed to make them sound quite similar, they're actually totally different. The only reason they share a name is that in a bit to ride the wave of Java's popularity, Netscape signed a licensing agreement with Sun to use the name. Marketing genius or ontological vandalism? You decide.
Another key difference between JavaScript and Java is that the DOM is a first-class entity in JavaScript. Although they live in different parts of the browser, the development of the DOM is tightly intertwined with that of the language. When first released by Netscape JavaScript could interact with only certain elements of the page, most notably form elements. The name now given to the set of elements exposed at that time is DOM Level 0. Access to the full document didn't come until DOM Level 1. While JavaScript is a perfectly good language even without the DOM, in a browser context the two are tightly coupled.
Although JavaScript refers to and can modify the DOM, the DOM implementation is part of the layout and rendering engine. When we refer to browser engines (WebKit, Gecko, Blink,...) we're usually referring to this layout/render engine portion of the browser. The layout engine takes the document, structured using the DOM, and lays it out as elements on the page in the way they'll be viewed by the user. This allows the browser to build up the equivalent of a scene graph which is then rendered by the rendering engine to some sort of canvas (the screen or an offscreen buffer). This rendering usually uses an appropriate render backend, for example on the Sailfish Browser it calls a serious of GLES commands. The layout engine follows a strict set of rules for positioning elements on the page. The HTML/CSS box model is used for rendering most items, but there are exceptions. For example SVG has its own rendering model which Gecko also supports as part of the same DOM hierarchy.
HTML and SVG documents embed or reference large numbers of other file types, which the browser has to support as well. These multimedia files include images, audio, animations and video. Historically browser support for different multimedia elements has been a mess, often delegated to some other operating system component (e.g. Windows Media Player, ffmpeg, gstreamer). Each file type will have its own decoder and there may be Digital Rights Management involved as well (e.g. Widevine). In practice browsers tend to separate raster and vector images from video and audio. The former have been tightly integrated into HTML for decades whereas the latter two only became standardised in HTML 5 with the introduction of the audio and video tags. These allow audio and video to be embedded with customisable controls.
Finally we have the user interface, which is the bit that we most associate with the browser. This is a little ironic given I'd argue the depth, complexity and maintenance burden is weighted towards the other layers. But most people aren't really concerned with the rendering or JavaScript engine, they care about whether a particular user interface feature is supported or not.
And to be fair, the user interface doesn't just display an address bar. It also has to provide tabs, JavaScript pop-ups, permissions dialogues, Settings controls, password management functionality, bookmarks, history management and a whole lot more.
In the embedded browser space the user interface is intentionally minimal. The idea is that the browser gets embedded into some other application which provides the user interface elements needed over and above those provided by the rendered Web page itself. On Sailfish OS this minimal interface is provided by the WebView. The additional capabilities are managed through the WebView's Application Programming Interface. On Sailfish OS there's also a Qt-based user interface to the browser, which brings its own complexity. For simplicity I've grouped together the user interface of the browser and the application programming interface of the embeddable WebView in the "Interface" section in the diagram.
During my time upgrading the Sailfish Browser from ESR 78 to ESR 91 I routinely referred to it as a Gecko upgrade. The name Gecko covers the DOM, layout engine and rendering engine but typically doesn't include the JavaScript engine or user interface. The user interface is typically referred to by the name of the browser itself. For example Firefox uses the Gecko rendering engine, the SpiderMonkey JavaScript engine and the Firefox user interface. For Safari it's WebKit, Nitro and Safari. For Chrome it's Blink, V8 and Chrome. And so on.
Now that we've broken down the different parts of the browser we're equipped to delve into the history of Web browsers in more detail.
Right at the start the history is a bit messy. The WorldWideWeb browser was the graphical HTML browser (and editor) written by Tim-Berners Lee in Objective-C to run on NeXTSTEP. The first version was completed at the end of 1990 with the browser being later renamed to Nexus. The code was re-written in C by Tim and Jean-François and turned into the Libwww library to become the very first browser engine. This was then used by Nicola Pellow at CERN to write the Line Mode Browser which was text-based, usable over telnet and released in 1991.
This was not just the birth of the Web, but also the genesis of structures that now define what it means to be a Web browser. These same structures can be seen in how browsers are built today.
Libwww and the Line Mode Browser that were created from it continued to be developed right up until 2017. Although the library is written in C it applies an object-oriented approach. Structures have constructors and destructors with the me context variable often used in places where you might find this or self in an object-oriented language. Reading this code in the early noughties had a profound influence on me, shaping my own style of C coding to this day.
More notably it was also used by the Amaya lightweight Web editor developed at INRIA and the Mosaic Browser developed at the NCSA. The Mosaic browser was popular in its day and the NCSA spun out a commercial entity in the form of Spyglass Mosaic which built on the NCSA Mosaic code. The company was set up to licence the browser to other companies.
The licensing agreement struck with Spyglass required Microsoft to pay a small monthly fee with additionally a portion of all non-Windoww revenue from the browser going to Spyglass.
As anyone who experienced the browser wars at that time will know, Microsoft proceeded to give Internet Explorer away for free with Windows. This ultimately earned them a lawsuit from Spyglass (settled out of court for $8 million) and an antitrust lawsuit from the US Government (eventually resulting in Microsoft having to change its approach to interoperability).
Internet Explorer remained as a core component of Windows until Windows 10, after which the company finally switched to offering Edge as the default browser. While Edge is built on Google's Blink engine, even that wasn't enough to dislodge Trident entirely. It remains to this day as the rendering engine powering Edge's compatibility mode. While it's not clear whether any of the original code can still be found in Edge (seems unlikely), a thirty-five year legacy is pretty good going.
Running up to 2000 Netscape completely re-wrote their browser engine. The result was what we now know as Gecko and which powers both Firefox and the Sailfish Browser. The purpose of the re-write was ostensibly to improve standards compliance and maintainability. But the highly abstracted code — arguably what has allowed the renderer to remain relevant to this day — resulted in poor performance. Netscape Navigator was a large programme, incorporating not just a browser but also a full email client and Website editor. In an attempt to improve performance in 2002 the components were split up to form Firefox as a stand-alone Web browser and Thunderbird as a stand-alone email client. My recollection is that this was controversial at the time and didn't improve performance a great deal. But the separation stuck. Splitting email from Web and dropping editing entirely seems to have resonated with users.
Gecko, in the form of Firefox, has experienced ups and downs. Browser statistics are notoriously subjective, but Statscounter registers Firefox market share as having dropped to just over 3% as of January 2024, having peaked in January 2010 at just over 30%.
There's plenty more to say about Gecko's history, not least in relation to its use as an embeddable component, but let's put that aside for today and I'll return to it in a future post.
Gecko remains relevant today as the most popular alternative to the WebKit/Blink family of browsers. While technically open source, both WebKit and Blink are directed by large corporations with few concessions to open source development methodologies. Mozilla on the other hand is a not-for-profit foundation that embraces the spirit of open source as well as the letter. For many, Gecko is an important bulwark against a corporate-controlled browser monoculture.
An interesting twist in Gecko's development comes from its adoption of the Rust language. Developed by Mozilla employee Graydon Hoare and officially adopted by Mozilla in 2009, Mozilla has been gradually moving Gecko's internal components from C++ to Rust.
This led to the development of the Servo engine, written wholly in Rust as a Mozilla research project. While never intended to replace Gecko, elements of the Servo engine were integrated back into the Gecko's WebRender rendering engine.
Servo is currently available as an engine with an intentionally bare-bones user interface. Mozilla divested itself of Servo in 2020, but development continues with the aim of specifying a WebView API during 2024 for use as an embeddable engine.
Through much of its life Opera forged its own path. It was the first mainstream browser to introduce tabs. It integrated a (very good) email client long after Mozilla had disentangled Firefox and Thunderbird from Netscape Navigator. It even integrated its own Web server at one point. Opera also made a point of sticking to W3C standards while other browsers were still trying to lock users in to a proprietary Web.
Perhaps it's for this reason that there was much disappointment when Opera switched to using Blink and V8 in 2013, soon after Google and announced it would fork WebKit. To find out how it got to this point we'll need to go back a bit again and look at the evolution of WebKit.
Initially part of the KDE project, WebKit provided the engine for Konquerer, the default browser for the KDE desktop environment. At that point the engine was referred to by the name KHTML, alongside the KJS JavaScript engine. It was picked up by Apple in 2001, apparently because of its small code footprint. Apple renamed KHTML and KJS to WebCore and JavaScriptCore respectively with the WebKit project encompassing both.
Contributions to WebKit came from both Apple and the KDE project, as well as from the Qt Project which offered the QtWebKit embeddable widget. Sailfish OS supported use of QtWebKit up until its deprecation in Sailfish OS 4.4 and removal in 4.5, the functionality being replaced by the Gecko WebView API.
In 2008 Google introduced its own Chrome browser also built on WebKit but using the new and Google-developed V8 JavaScript engine. Google's advertising for it emphasise speed (start times and JavaScript execution in particular). Chrome also had an — at the time — unusual sandboxing model with each tab executed as a separate process. This meant that crashes triggered by WebKit or V8 would only bring down a single tab, leaving other tabs and the browser intact.
Although built using many open source components, Chrome itself is made available under a proprietary licence. The Chromium project, also developed by Google, is a fully open source implementation of Chrome, but with the proprietary components removed.
From the outset Google had to make changes to WebKit to support its use in Chrome. Still it took another five years before Google officially forked WebKit in 2013, creating the Blink browser engine. Consequently Chrome now uses both its own renderer and JavaScript combination: Blink and V8.
One of the attractive features of the Blink engine, also particularly relevant to Sailfish OS, is its embedding API which allows it to be used separately from Chrome (or Chromium) and embedded in independent applications. A common example of this usage can be found in the Electron framework, which uses Blink for rendering.
This embeddable design, which neatly separates the chrome from the engine, also makes Blink attractive for use by other browser developers. As noted earlier, Opera switched from Presto and Caraken to Blink and V8 for rendering and JavaScript respectively. Microsoft similarly chose Blink and V8 as the basis for its Edge browser in 2019.
Qt introduced the Qt WebEngine component, wrapping Blink and V8 to offer an embeddable browser, around the release of Qt 5.2 in 2013. This was intended to replace QtWebKit, which was ultimately removed in Qt 5.6. The closest KDE has to a default browser is Falkon, which uses the Qt WebEngine. This therefore completed a strange cycle, with KHTML having been started as part of KDE, forked by Apple, forked again by Google and then integrated back in to KDE via Qt.
Recently the main Serenity OS developer, Andreas King, refocused his attention from the operating system to the Ladybird browser. Ladybird is built using the LibWeb and LibJS browser components of Serenity OS, but which he now develops independently. This arguably represents the first new engine to be introduced with the aim of being a fully-fledged browser for over twenty years, making for a particularly interesting development.
The first version of NetSurf was released in 2002. At that time it was developed exclusively for use on RISC OS, the operating system that powered the Acorn Archimedes (the first publicly available computer to use an ARM processor).
RISC OS is very different from most other operating systems available today. It makes no attempt to be Unix-like and has its own distinctive and cooperatively multitasking desktop environment. This heritage means that the browser is incredibly lightweight, with good CSS support but without viable JavaScript.
During the early days of development JavaScript support was considered out-of-scope for the browser. The reason for this is interesting: it wasn't for lack of a usable JavaScript interpreter, but because the browser lacked a standards-compliant DOM. It turns out JavaScript isn't especially useful without a standards-compliant way to access the elements of a Web page.
Despite the lack of JavaScript support NetSurf still managed to find a niche as a fast and lightweight browser, growing beyond RISC OS. As of today there are downloadable packages available for RISC OS, GTK (Linux), Haiku, AmigoOS, Atari and experimentally for Windows.
Even the question of what a browser is, presented in anything other than the most abstract terms, is likely to suffer exceptions.
What is clear is that browsers have become deeply integrated into our lives. Whether using a computer or smartphone, access to a browser has become a necessity. Over time they've continued to become more capable and more technically complex. Combined with their convoluted history, that makes them fascinating objects of study.
Comment
As my life has progressed I've changed my opinion on this. It's great to separate work and play, but there's also immense joy to be had from doing something you love as a professional endeavour. Mixing the two together has the potential to amplify the joy from both.
Working for Jolla was what really brought this home to me. Smartphone development, user privacy and control, and Sailfish OS in particular were always part of the life I separated from my career. When I started working at Jolla I thought I was taking a risk. Would I lose my passion? Would I regret knowing what goes on inside the "sausage factory"?
My concerns were unfounded and, with this experience in hand, I now make it my aim to bring my personal passions into my professional life as well.
Now I'm at the Turing I'm no longer developing for Sailfish OS during work hours. As readers of my gecko dev diaries will know, upgrading the Sailfish browser has been one of my main activities outside of work. Finding opportunities to bring this Sailfish development into my professional world has been one of my objectives and recently just such an opportunity arose.
It's only a small overlap: in November I'll be giving a presentation about browsers at the Turing. The title of the talk will be "The Anatomy of a Browser: Embedded Mobile Lizards". Lizards being a reference to Gecko.
To help with this I've been digging a little into the history of browsers. They have a rich, fascinating and often fractious history that I find fascinating and one I want to talk a little more about today.
But to understand the history, we first need to understand a little about the internals of a browser.
Browser Internals
What are the various pieces that make up a browser? Broadly speaking we can see it as being made up of five parts:- Protocol client (HTTP/S, WS/S, file, FTP,...).
- JavaScript engine.
- DOM - Document Object Model.
- Layout/rendering engine (HTML, CSS, SVG).
- Media encoder/decoder (JPEG, PNG, audio, video,...).
- User interface.
The protocol client handles network interactions. It opens a network connection, sends a message to the server, then waits for and interprets the response. If the protocol uses a secure transport layer it handles certificate validation, checking certificate revocation, data encryption and integrity. The latest releases of Firefox and Chrome support HTTP, HTTPS, WebSockets, Secure WebSockets, Secure Real-time Transport Protocol, file access and probably others I'm not aware of. Firefox and Chrome used to support FTP but have since dropped it. Firefox dropped support in version 90 (July 2021) while Chrome dropped it in version 95 (September 2021). Unlike the rendering and JavaScript engines, protocol clients tend not to be given their own bespoke names separate from the browser. Maybe this is because they're often built from other libraries offering support for specific protocols. Nevertheless the protocol client is both a crucial and complex part of the browser.
I've listed the DOM as a separate piece of the browser, but it's usually tightly coupled with the layout and rendering engines. The DOM defines the internal data structures used to represent the page being rendered. For HTML, XML or SVG documents these are hierarchically built from nodes that have a parent and multiple (possibly zero) children. Typically the document structure will map naturally onto the DOM, with XML elements and attributes mapping onto nodes. Child nodes in the document will map to child nodes in the DOM. In practice nodes are likely to be represented as class objects in the code containing references to child nodes. The DOM is usually part of the rendering engine separate from the JavaScript engine, but if it weren't for JavaScript the DOM might be considered as just an implementation details. The existence of JavaScript elevates the DOM to something Web developers have to have a good understanding of, as we'll see.
The JavaScript engine allows execution of JavaScript code. JavaScript has an odd history. Originally invented at Netscape by Brendan Eich, you might think that the JavaScript language has something to do with Java. In fact they're very different. Java is a strictly-typed object-oriented garbage-collected language that compiles down to a bytecode representation that can be executed by a Java Virtual Machine. Although at one point Java "applets" that ran in the browser were a thing, you rarely see these nowadays (they're not supported without installing a plugin). Java is still used in server applications and to be honest, given Sun's expertise and revenue rested primarily with servers I always found it rather surprising that it was ever anything other than server-focused. JavaScript on the other hand is very much a client-side, dynamically-typed event-based scripting language with prototype-based object orientation. In recent years it's also become popular as a server-side language for reasons that I won't go in to here. Both Java and JavaScript are "curly-brace" languages with similarities to C++; and while I realise I've managed to make them sound quite similar, they're actually totally different. The only reason they share a name is that in a bit to ride the wave of Java's popularity, Netscape signed a licensing agreement with Sun to use the name. Marketing genius or ontological vandalism? You decide.
Another key difference between JavaScript and Java is that the DOM is a first-class entity in JavaScript. Although they live in different parts of the browser, the development of the DOM is tightly intertwined with that of the language. When first released by Netscape JavaScript could interact with only certain elements of the page, most notably form elements. The name now given to the set of elements exposed at that time is DOM Level 0. Access to the full document didn't come until DOM Level 1. While JavaScript is a perfectly good language even without the DOM, in a browser context the two are tightly coupled.
Although JavaScript refers to and can modify the DOM, the DOM implementation is part of the layout and rendering engine. When we refer to browser engines (WebKit, Gecko, Blink,...) we're usually referring to this layout/render engine portion of the browser. The layout engine takes the document, structured using the DOM, and lays it out as elements on the page in the way they'll be viewed by the user. This allows the browser to build up the equivalent of a scene graph which is then rendered by the rendering engine to some sort of canvas (the screen or an offscreen buffer). This rendering usually uses an appropriate render backend, for example on the Sailfish Browser it calls a serious of GLES commands. The layout engine follows a strict set of rules for positioning elements on the page. The HTML/CSS box model is used for rendering most items, but there are exceptions. For example SVG has its own rendering model which Gecko also supports as part of the same DOM hierarchy.
HTML and SVG documents embed or reference large numbers of other file types, which the browser has to support as well. These multimedia files include images, audio, animations and video. Historically browser support for different multimedia elements has been a mess, often delegated to some other operating system component (e.g. Windows Media Player, ffmpeg, gstreamer). Each file type will have its own decoder and there may be Digital Rights Management involved as well (e.g. Widevine). In practice browsers tend to separate raster and vector images from video and audio. The former have been tightly integrated into HTML for decades whereas the latter two only became standardised in HTML 5 with the introduction of the audio and video tags. These allow audio and video to be embedded with customisable controls.
Finally we have the user interface, which is the bit that we most associate with the browser. This is a little ironic given I'd argue the depth, complexity and maintenance burden is weighted towards the other layers. But most people aren't really concerned with the rendering or JavaScript engine, they care about whether a particular user interface feature is supported or not.
And to be fair, the user interface doesn't just display an address bar. It also has to provide tabs, JavaScript pop-ups, permissions dialogues, Settings controls, password management functionality, bookmarks, history management and a whole lot more.
In the embedded browser space the user interface is intentionally minimal. The idea is that the browser gets embedded into some other application which provides the user interface elements needed over and above those provided by the rendered Web page itself. On Sailfish OS this minimal interface is provided by the WebView. The additional capabilities are managed through the WebView's Application Programming Interface. On Sailfish OS there's also a Qt-based user interface to the browser, which brings its own complexity. For simplicity I've grouped together the user interface of the browser and the application programming interface of the embeddable WebView in the "Interface" section in the diagram.
During my time upgrading the Sailfish Browser from ESR 78 to ESR 91 I routinely referred to it as a Gecko upgrade. The name Gecko covers the DOM, layout engine and rendering engine but typically doesn't include the JavaScript engine or user interface. The user interface is typically referred to by the name of the browser itself. For example Firefox uses the Gecko rendering engine, the SpiderMonkey JavaScript engine and the Firefox user interface. For Safari it's WebKit, Nitro and Safari. For Chrome it's Blink, V8 and Chrome. And so on.
Now that we've broken down the different parts of the browser we're equipped to delve into the history of Web browsers in more detail.
Libwww
We're going to start our history in 1990 when Tim-Berners Lee and Jean-François Groff, both working at CERN, created the HTTP protocol and HTML language that still define the Web today. It fascinates me that Tim-Berners Lee is so well-known as the inventor of the Web, but pioneers like Jean-François Groff and Nicola Pellow, who were there at the beginning, are scarcely recorded. But the Computer History Museum has documented a fascinating interview with Jean-François in which he gives am explanation of the very first Web engine.my main task during my days at CERN... was porting all the software libraries, I mean the software components that were on the NeXT system into a universal code library that was written in C, it's the 'libwww'" It didn't even have a name at the beginning, which is why in some history books, you see, 'Oh, libwww was released in November of 92.' No, it wasn't, you know? It was running since February '91, it just didn't have that name... We had the page rendering system, the parsing of HTML, and also all the URL mechanisms, history list, all that was abstracted into one software library as a package, as a toolset basically. And then in August 91, I think when we announced the World Wide Web, we also said 'You can use that toolset and build whatever you want with it'".
Right at the start the history is a bit messy. The WorldWideWeb browser was the graphical HTML browser (and editor) written by Tim-Berners Lee in Objective-C to run on NeXTSTEP. The first version was completed at the end of 1990 with the browser being later renamed to Nexus. The code was re-written in C by Tim and Jean-François and turned into the Libwww library to become the very first browser engine. This was then used by Nicola Pellow at CERN to write the Line Mode Browser which was text-based, usable over telnet and released in 1991.
This was not just the birth of the Web, but also the genesis of structures that now define what it means to be a Web browser. These same structures can be seen in how browsers are built today.
Libwww and the Line Mode Browser that were created from it continued to be developed right up until 2017. Although the library is written in C it applies an object-oriented approach. Structures have constructors and destructors with the me context variable often used in places where you might find this or self in an object-oriented language. Reading this code in the early noughties had a profound influence on me, shaping my own style of C coding to this day.
/* Create a Context Object ** ----------------------- */ PRIVATE Context * Context_new (LineMode *lm, HTRequest *request, LMState state) { Context * me; if ((me = (Context *) HT_CALLOC(1, sizeof (Context))) == NULL) HT_OUTOFMEM("Context_new"); me->state = state; me->request = request; me->lm = lm; HTRequest_setContext(request, (void *) me); HTList_addObject(lm->active, (void *) me); return me; }Besides the Line Mode Browser, Libwww was used in countless other projects. My own port to RISC OS from 2004 is still available. I used it to extend a forensic analysis tool for use on the Web (that the University I worked for later patented).
More notably it was also used by the Amaya lightweight Web editor developed at INRIA and the Mosaic Browser developed at the NCSA. The Mosaic browser was popular in its day and the NCSA spun out a commercial entity in the form of Spyglass Mosaic which built on the NCSA Mosaic code. The company was set up to licence the browser to other companies.
Trident
This Microsoft duly did. The browser engine of Internet Explorer — called Trident — was built on the Mosaic technology. The first version of Internet Explorer shipped without JavaScript support (the language hadn't been invented yet), but when it arrived in IE 3 in 1996 it was powered by Microsoft's Chakra JavaScript (nee JScript) engine.The licensing agreement struck with Spyglass required Microsoft to pay a small monthly fee with additionally a portion of all non-Windoww revenue from the browser going to Spyglass.
As anyone who experienced the browser wars at that time will know, Microsoft proceeded to give Internet Explorer away for free with Windows. This ultimately earned them a lawsuit from Spyglass (settled out of court for $8 million) and an antitrust lawsuit from the US Government (eventually resulting in Microsoft having to change its approach to interoperability).
Internet Explorer remained as a core component of Windows until Windows 10, after which the company finally switched to offering Edge as the default browser. While Edge is built on Google's Blink engine, even that wasn't enough to dislodge Trident entirely. It remains to this day as the rendering engine powering Edge's compatibility mode. While it's not clear whether any of the original code can still be found in Edge (seems unlikely), a thirty-five year legacy is pretty good going.
Gecko
Internet Explorer's arch rival during the browser wars was Netscape Navigator, offered to consumers by countless dial-up Internet providers bundling it on free CDs alongside their own dial-up software and configurations. Netscape was the first browser to incorporate JavaScript support, which it did using the SpiderMonkey JavaScript interpreter in 1995.Running up to 2000 Netscape completely re-wrote their browser engine. The result was what we now know as Gecko and which powers both Firefox and the Sailfish Browser. The purpose of the re-write was ostensibly to improve standards compliance and maintainability. But the highly abstracted code — arguably what has allowed the renderer to remain relevant to this day — resulted in poor performance. Netscape Navigator was a large programme, incorporating not just a browser but also a full email client and Website editor. In an attempt to improve performance in 2002 the components were split up to form Firefox as a stand-alone Web browser and Thunderbird as a stand-alone email client. My recollection is that this was controversial at the time and didn't improve performance a great deal. But the separation stuck. Splitting email from Web and dropping editing entirely seems to have resonated with users.
Gecko, in the form of Firefox, has experienced ups and downs. Browser statistics are notoriously subjective, but Statscounter registers Firefox market share as having dropped to just over 3% as of January 2024, having peaked in January 2010 at just over 30%.
There's plenty more to say about Gecko's history, not least in relation to its use as an embeddable component, but let's put that aside for today and I'll return to it in a future post.
Gecko remains relevant today as the most popular alternative to the WebKit/Blink family of browsers. While technically open source, both WebKit and Blink are directed by large corporations with few concessions to open source development methodologies. Mozilla on the other hand is a not-for-profit foundation that embraces the spirit of open source as well as the letter. For many, Gecko is an important bulwark against a corporate-controlled browser monoculture.
An interesting twist in Gecko's development comes from its adoption of the Rust language. Developed by Mozilla employee Graydon Hoare and officially adopted by Mozilla in 2009, Mozilla has been gradually moving Gecko's internal components from C++ to Rust.
This led to the development of the Servo engine, written wholly in Rust as a Mozilla research project. While never intended to replace Gecko, elements of the Servo engine were integrated back into the Gecko's WebRender rendering engine.
Servo is currently available as an engine with an intentionally bare-bones user interface. Mozilla divested itself of Servo in 2020, but development continues with the aim of specifying a WebView API during 2024 for use as an embeddable engine.
Presto
We're going to jump ahead a little in the diagram and turn our attention to the Opera browser. There are many unique and fascinating facets to Opera that it won't be possible to explore fully here, but it's still worth skimming the surface. Opera is unusual in that it was, for a long time, one of the few independent commercial browsers. When first released in 1995 it was shareware (requiring payment after a trial period). There was no JavaScript support (the language hadn't been invented yet) and at the outset the rendering engine wasn't named separately to the browser. This changed in 2000 with the introduction of the Elektra rendering engine and the Linear A JavaScript engine. In 2003 Opera switched to using what they claimed to be a new rendering engine, the internally developed Presto, alongside a new Linear B JavaScript engine. While the Presto name stuck, Opera's JavaScript engines have enjoyed periodic renaming: the Futhark JavaScript engine in 2008, followed by the Carakan JavaScript engine in 2010. Since the browser and all of these engines are closed source it's impossible to know to what extent they were really new technology as compared to an evolution of existing code.Through much of its life Opera forged its own path. It was the first mainstream browser to introduce tabs. It integrated a (very good) email client long after Mozilla had disentangled Firefox and Thunderbird from Netscape Navigator. It even integrated its own Web server at one point. Opera also made a point of sticking to W3C standards while other browsers were still trying to lock users in to a proprietary Web.
Perhaps it's for this reason that there was much disappointment when Opera switched to using Blink and V8 in 2013, soon after Google and announced it would fork WebKit. To find out how it got to this point we'll need to go back a bit again and look at the evolution of WebKit.
WebKit and Blink
At this point in time WebKit is the most popular engine for accessing the Web (in either its WebKit or Blink variants). Moreover it's also the go-to browser engine for use in embedded scenarios as we'll see shortly.Initially part of the KDE project, WebKit provided the engine for Konquerer, the default browser for the KDE desktop environment. At that point the engine was referred to by the name KHTML, alongside the KJS JavaScript engine. It was picked up by Apple in 2001, apparently because of its small code footprint. Apple renamed KHTML and KJS to WebCore and JavaScriptCore respectively with the WebKit project encompassing both.
Contributions to WebKit came from both Apple and the KDE project, as well as from the Qt Project which offered the QtWebKit embeddable widget. Sailfish OS supported use of QtWebKit up until its deprecation in Sailfish OS 4.4 and removal in 4.5, the functionality being replaced by the Gecko WebView API.
In 2008 Google introduced its own Chrome browser also built on WebKit but using the new and Google-developed V8 JavaScript engine. Google's advertising for it emphasise speed (start times and JavaScript execution in particular). Chrome also had an — at the time — unusual sandboxing model with each tab executed as a separate process. This meant that crashes triggered by WebKit or V8 would only bring down a single tab, leaving other tabs and the browser intact.
Although built using many open source components, Chrome itself is made available under a proprietary licence. The Chromium project, also developed by Google, is a fully open source implementation of Chrome, but with the proprietary components removed.
From the outset Google had to make changes to WebKit to support its use in Chrome. Still it took another five years before Google officially forked WebKit in 2013, creating the Blink browser engine. Consequently Chrome now uses both its own renderer and JavaScript combination: Blink and V8.
One of the attractive features of the Blink engine, also particularly relevant to Sailfish OS, is its embedding API which allows it to be used separately from Chrome (or Chromium) and embedded in independent applications. A common example of this usage can be found in the Electron framework, which uses Blink for rendering.
This embeddable design, which neatly separates the chrome from the engine, also makes Blink attractive for use by other browser developers. As noted earlier, Opera switched from Presto and Caraken to Blink and V8 for rendering and JavaScript respectively. Microsoft similarly chose Blink and V8 as the basis for its Edge browser in 2019.
Qt introduced the Qt WebEngine component, wrapping Blink and V8 to offer an embeddable browser, around the release of Qt 5.2 in 2013. This was intended to replace QtWebKit, which was ultimately removed in Qt 5.6. The closest KDE has to a default browser is Falkon, which uses the Qt WebEngine. This therefore completed a strange cycle, with KHTML having been started as part of KDE, forked by Apple, forked again by Google and then integrated back in to KDE via Qt.
LibWeb
An unexpected entrant into the browser space was recently announced in the form of Ladybird. To understand why Ladybird exists, it helps to understand a little about Serenity OS, the operating system project it grew out of and which it has now eclipsed. According to the FAQ of Serenity OS the developers try to "maximize hackability, accountability, and fun(!) by implementing everything ourselves.". And that includes the Web browser: the project developed its own renderer and JavaScript engine in the form of the imaginatively-named LibWWW and LibJS.Recently the main Serenity OS developer, Andreas King, refocused his attention from the operating system to the Ladybird browser. Ladybird is built using the LibWeb and LibJS browser components of Serenity OS, but which he now develops independently. This arguably represents the first new engine to be introduced with the aim of being a fully-fledged browser for over twenty years, making for a particularly interesting development.
NetSurf
Last but not least we have NetSurf, which like Ladybird, is a bit of an outlier. Like Ladybird it was originally developed for exclusive use on a non-mainstream operating system.The first version of NetSurf was released in 2002. At that time it was developed exclusively for use on RISC OS, the operating system that powered the Acorn Archimedes (the first publicly available computer to use an ARM processor).
RISC OS is very different from most other operating systems available today. It makes no attempt to be Unix-like and has its own distinctive and cooperatively multitasking desktop environment. This heritage means that the browser is incredibly lightweight, with good CSS support but without viable JavaScript.
During the early days of development JavaScript support was considered out-of-scope for the browser. The reason for this is interesting: it wasn't for lack of a usable JavaScript interpreter, but because the browser lacked a standards-compliant DOM. It turns out JavaScript isn't especially useful without a standards-compliant way to access the elements of a Web page.
Despite the lack of JavaScript support NetSurf still managed to find a niche as a fast and lightweight browser, growing beyond RISC OS. As of today there are downloadable packages available for RISC OS, GTK (Linux), Haiku, AmigoOS, Atari and experimentally for Windows.
The Truth About Browsers
Browser history is a tangled Web. While writing this it quickly became clear that, when it comes to browsers, any generalised claim is likely to turn out false. The date a browser came into existence? Do you mean the date the project was first thought of? The first commit? The first release? An alpha release? A beta release? Release 1.0? Is a particular engine entirely new, the redevelopment of an old engine, or just a rename? To what extent does the code of one engine flow into another when they both share libraries? Every browser out there is like the Ship of Theseus at this point. When we talk about a browser engine are we talking about the renderer, the layout engine, the JavaScript engine, the chrome? Sometimes these things can be separated, other times they're intrinsically tied together. Is it good to have a single reference engine that all browsers use for a consistent experience across the Web, or should we be championing diversity as way to prevent any single entity taking control? Do we even know how to calculate browser market share?Even the question of what a browser is, presented in anything other than the most abstract terms, is likely to suffer exceptions.
What is clear is that browsers have become deeply integrated into our lives. Whether using a computer or smartphone, access to a browser has become a necessity. Over time they've continued to become more capable and more technically complex. Combined with their convoluted history, that makes them fascinating objects of study.
23 Sep 2024 : Gecko retrospective #
Now that I'm no longer writing a daily Gecko Dev Diary I've had the time to reflect on my experiences upgrading the Sailfish browser to Gecko ESR 91.9,1 over the last year. Check out my Gecko Retrospective to read about what I think went well, what went badly and where things might go in the future.
23 Sep 2024 : Retrospective #
Way back last year in August 2023, before actually starting the process of upgrading the Gecko engine in Sailfish OS from ESR 78 to ESR 91, I wrote a preamble in which I set out my objectives and sketched a brief plan for how to achieve them. Although the work isn't entirely complete, after 339 days I consider the main bulk of my work on the project to be complete. We're now in the mopping up stage. That means it's a good time to look back at the process, find out what went well, what went badly, what I've learned from the experience and how I feel about things. If the preamble was the opening bracket, this retrospective can be considered its closing partner. Together they're the bookends encapsulating all the diary entries in between.
On day 149 I gave a presentation of this work at FOSDEM'24, including an earlier version of this diagram. I thought I was about half way through the work at that stage, but this turned out to be an underestimate, as is clear from this diagram. I was in fact only 11/25 of the way through.
The longest task of 87 days involved getting the WebView render pipeline working. In comparison getting the first successful build to complete took only 45 days. Both of these were quite dark and gloomy times. Without a working build it's impossible to debug or test the code, whilst without a working renderer nothing else can be effectively tested. Both periods felt like dark tunnels that took an age to emerge from.
Following these in terms of length of task were PDF Printing at 28 days, the WebGL renderer at 25 days and the Sec-Fetch-* headers at 15 days. These were the only tasks that took more than 14 days, which is a bit of a cut-off point for me. After two weeks of writing daily about the tasks it becomes really hard to write more without sounding (and seeming) a bit lost and exhausted.
It was particularly frustrating how long it took to get the WebView rendering working, given the browser already worked nicely. It could have been worse though: I had a clear plan which involved gradually stripping out and adjusting pieces of code to align with the code in the ESR 78 implementation. This gradual convergence towards ESR 78 meant I knew the task was inevitably going to be time-limited, also allowing me to identify progress on a daily basis. As it turned out I had to do it twice: first removing code, then adding code back in again. But it did eventually get there.
Realising on Day 254, after all this, that I'd broken the WebGL rendering process was also a bit of a low point. By then I really wanted to move on from the render pipeline.
But eventually I did emerge from all of these tunnels and the joy of getting something to work is a crucial counterbalance to the frustration when it isn't. In retrospect the low points were all worth it for the sake of the enjoyment I also got out of it.
Apart from how things turned out in practice it's also interesting to compare how closely it matched my initial expectations. Returning to the preamble once again, it's clear I was expecting a long haul, but I also had experience to draw from my previous involvement in browser upgrades at Jolla:
I think it's fair to say that I did follow this approach, starting with getting things to compile and then focusing on the details afterwards. This led on to the following decision about the structure of the work:
In hindsight I think that was the right thing to do. But it also felt like a natural consequence of the situation I found myself in. Given the upstream code changes the patches I did apply needed quite a lot of work to get them to stick. That gave me the impression that many of the existing patches might turn out to be redundant, superseded by changes in the upstream code.
By applying only the patches that were necessary it give me the opportunity to potentially avoid patches which were no longer relevant in a more intentional way. Hopefully the patches I've ended up with are closer to the minimum required and have a slightly cleaner structure than would otherwise have been the case.
But practically speaking I think my original plan was a good one and, in retrospect, I followed it pretty closely.
Abstractly speaking, one of the most compelling reasons to want to upgrade is because websites routinely attempt to fingerprint browsers and serve different content depending on the result. This practice is as old as the hills, yet remains as common today as it is problematic. I understand that different browsers have different capabilities and that website creators will be blamed (unfairly) if a page renders poorly as a result of a user failing to keep their browser up-to-date. But you'd have thought at the very least browsers could test for features rather than versions.
When browsing using ESR 78 it's not uncommon for a site to chastise its own customers. Updating the engine on Sailfish OS is one way to reduce the chance of seeing these invectives, even if just changing the user agent string is often just as effective as a browser upgrade without any of the effort.
One of the worst offenders is Cloudflare, which routinely blocks the Sailfish browser from accessing sites on its content delivery network. Upgrading to ESR 91 seems to circumvent this in at least some cases.
But browser upgrades also bring genuine improvements as well. New features, improved stability, increased security and bug fixes. There have been a total of 45 point releases between the previous Sailfish OS engine of 78.15.0 and the upgraded version at 91.9.1. Each of these point releases has brought improvements, although not all will be relevant to Sailfish OS. Major releases (e.g. from 78 to 79) will typically include new features, stability improvements and security fixes, whereas point releases (e.g. 91.1.0 to 91.2.0) will often only include security and regression fixes.
Working through the Firefox changelogs the following are some of the obvious improvements that have a direct impact on the Sailfish browser:
In addition to the above changes, there were 15 critical, 115 high severity, 68 medium severity and 30 low severity security fixes combined into these updates. The importance of these can be best understand with reference to Mozilla's security classification:
Whether this is actually the case is hard to say. My tests using various performance measurement tools don't suggest significant performance improvements. But I must admit to having the same feeling of improved responsiveness. I suspect that may be due to the upstream changes in version 91.0 that claim to have improved responsiveness for user-interactions by 10-20%. That would make a noticeable improvement for users in a way that may not show up in benchmarks. It's my suspicion that the page loading feedback that's used to drive the progress bar on Sailfish OS has also been improved, although I've not found any explicit changes that would do this.
What do all of these changes mean for the state of the code? The upgrade from ESR 78 to ESR 91 also, surprisingly for me, brought with it a larger codebase. Mozilla has been intentionally transitioning code from C++ to Rust, with the number of lines of Rust code increasing by 14%. But the number of lines of C++ code also increased by 3% and for the total combined C++, JavaScript and Rust code this increased by 7%. Plotting the lines of code categorised by language, these increases are clearly visible.
Although proportionally there's been a bigger increase in Rust code than C++, in absolute terms the increase in both is almost identical (380607 lines of Rust code added compared to 384080 lines of C++ code).
In the above diagram Docs refers to content that relates to documentation. Build refers to scripts used to manage the build pipeline. IDL refers to interface definition files.
It's worth pausing to consider the code needed to build the Gecko engine. Gecko has experienced several changes through its life accumulating a mixture technologies as it goes. As a result the build system is a strange combination of Build (the mozilla build system), Python, Make, ninja, GN and Cargo. At certain points the build system compiles Rust into native binaries that then become part of the build pipeline itself. This causes havoc for the scratchbox2 cross-platform build engine Sailfish OS uses. No small part of the work in getting gecko working for Sailfish OS involves taming these build systems.
Although the numbers for IDL shown in the graph are low compared to the other languages, I nevertheless wanted to include it because it's such a critical part of the way Gecko works. The combination of C++/Rust and JavaScript means that there needs to be a really solid way to expose native methods to JavaScript and JavaScript methods to native code. The type systems aren't equivalent and so this requires a careful arrangement. Gecko supports this using its Interface Definition Language. IDL files read a bit like C++ header files but are more generic. Any interface defined using IDL can be exposed both natively and to the JavaScript layer. It's critical glue that holds everything together.
The numbers shown in the graph are measured in millions of lines of code. They're big numbers, but it's worth bearing in mind that Gecko is a relative minnow when it comes to code size in the world of browsers. For comparison I ran the same code analysis on the Chromium source. I was pretty surprised by how large Chromium is compared to Gecko.
Chromium contains over four times the code: 154 981 674 lines of code for Chromium compared to a paltry 37 361 820 lines of code for Gecko ESR 91. It's also interesting to compare the range of technologies involved in the two projects. Chromium introduces TypeScript, Go, Java, Objective-C, Lua, AppleScript, TCL and WASM, although some of these will be target-specific.
As any Sailfish OS developer will be aware, Sailfish OS uses RPMs for packaging software, a technology that originated on Red Hat Linux as the Red Hat Package Management system. Work started on RPM in 1995, a good ten years before the initial release of git and two years before Netscape started work on Gecko. Back then it was commonplace for software to be provided in the form of a tarball and in some ways the RPM build process reflects this. Distribution-specific changes are provided as patches applied directly to the upstream source. These patches are all listed in the spec file which is passed to the rpm tool to perform the build. On Sailfish OS this is all hidden behind sfdk which is itself a wrapper for the scratchbox2 sb2 tool. It's a complex layered system with multiple abstractions.
The point is that even now on Sailfish OS packages that use upstream code can pull directly from the upstream repositories, rather than having to use Sailfish-specific implementations. Any Sailfish OS specific changes can then be applied onto this code in the form of patches. It's not a process I enjoy working with because patching is a lot messier and less flexible than working with commits in a repository. Even though it's possible to convert a patch list into a series of commits and back again this adds an extra step and contrains what actions can be performed at different times.
The benefit is that we always retain a very clean and clear distinction between the upstream code and the Sailfish OS specific changes, with the latter being encapsulated in the patches to be applied. We can use this separation to discover how the changes needed to get Gecko ESR 78 to work with Sailfish OS differ compared to those needed for ESR 91.
This figure shows only a very high-level view, but nevertheless tells a story. Note that unlike the previous figures the y-axis of this chart uses a logarithmic scale to account for the big differences in scale between different languages. This can make the values harder to read so, for clarity, here they are in tabular form.
These numbers represent the actual code I've been working on for the last year. In general the number of lines added or removed has reduced as we've moved from ESR 78 to ESR 91. This is a good thing. The fewer changes made to the upstream code the better. In general the difference isn't huge, but it does exist. The total number of patches reduced from 98 to 84. The number of lines added to ESR 91 was 82% of the number of lines added to ESR 78. The number of lines removed from ESR 91 was only 57% of the number removed from ESR 91.
Interestingly, while there were fewer change made, the differences practically balance themselves out. Overall the patches to ESR 78 increased the code size by 31 383 lines compared to 31 049 lines for ESR 91. That's astonishingly similar.
These numbers don't quite capture all of the changes because they relate only to the gecko code. There were also changes needed in the other four components that make up the Sailfish browser stack, as well as to the EmbedLIte code (which is handled separately from gecko but ends up in the same xulrunner package). Let's briefly take a look at these other components.
The gecko renderer is by far the largest of the components. The qtmozembed component provides a QT wrapper around the renderer. The embedlite-components package adds the privileged JavaScript shims needed for Sailfish OS, largely replacing equivalent privileged JavaScript that would typically run in Firefox. The sailfish-components-webview component provides Qt components needed in order to support both the browser and WebView (for example the pop-up dialogues), but also provides the code needed to offer the rendering engine as a WebView component to other Qt apps. Finally the sailfish-browser component is the actual browser app you run when you open the browser on your phone.
Apart from the gecko renderer all of these are Sailfish-specific packages, so they don't have any "upstream" code. The Jolla repositories are the upstream repositories for these. Consequently there's no need to apply patches and we can work on the code directly. That means that when analysing changes for these we're just using the commits that take the code from ESR 78 versions to ESR 91 versions. Between them they accumulated 169 commits with the following additions and removals (these numbers also including the changes to the gecko source):
This table essentially captures the sum total of the changes needed to move from one version to the next. As you can see, the majority of the additions have been to C++ code. The build scripts saw rather a lot of churn. I'm very surprised to see more Rust additions than JavaScript additions. The QML code changed very little, which is perhaps to be expected given the external appearance, renderer aside, is almost identical. That was intentional: there's always scope to improve the Sailfish browser user interface, but my objective with this work was to get the renderer upgraded as quickly as possible. Changing the interface would have been a diversion.
Nevertheless, this was a big deal for me. I'm not a natural blogger so the prospect of writing about my coding on a daily basis was daunting at the outset. But it turned out to be surprisingly easy. Writing about specific tasks is very different from having to come up with inventive and interesting topics to write about on a daily basis.
Having to write daily diary entries undoubtedly helped keep me on track and working on the project every day. The need to have at least a few paragraphs to write about drove me to do the coding work.
There were a few occasions when I struggled with this. Typically on a Friday night after having spent two and a half hours on public transport returning from work. Having to then write up a diary entry in a tired state of semi-consciousness was not always ideal. But these cases were relatively rare.
There were also occasions — mostly in the middle of the work to get the various rendering pipelines working — when the work really got me down. Writing the diary entries made me very conscious of the progress I was — or in many cases wasn't — making. In the middle of the trough when it's really not clear whether it will be possible to come up with a solution, some of those occasions felt quite dark. If I hadn't been writing the diary I can imagine myself choosing to take a break and then having that break go on for several days.
But, and this is a big but, I was supported the entire way through by the amazing Sailfish community who responded to my posts on Mastodon and the forum, always encouraging and supportive. I'm not a social person and this was a bit of a shock for me. People out there in the Sailfish community and beyond really are the most encouraging and thoughtful people you could hope to interact with.
The amazing images and poetry from the likes of Thigg (thigg) and Leif-Jöran Olsson (ljo) are beautiful cases in point.
But there are so many people who helped and contributed in so many ways, I couldn't possibly mention everyone here. I apologise for not mentioning you all individually, but I'm really grateful.
Besides the community I also have to mention Joanna, my wife, who's sacrificed more than anyone else for the sake of me spending three hours each day and most of my weekends on gecko development. She carried me through this.
With all of this support, I found the experience surprisingly effortless. Perhaps the biggest challenge, as it turns out, was being able to find a suitable point to wind things down. Dropping off from posting diary entries every day and having a very clear purpose for my free time has been hard to manage in a measured way. It was too much of a cliff edge and, if I do this again, I think I'd want to look into ways to mitigate this. But I don't yet have a good solution: writing these diary entries doesn't lend itself to a tapered reduction of work.
As I write this the current situation is that three out of five pull requests have been merged into Jolla's repositories. The remaining three have been through a couple of review rounds already. So the immediate task is to get them through review and merged in. This alone won't result in their release as part of Sailfish OS as they're currently being merged into bespoke ESR 91 branches. Jolla will need to merge these into the main branch before they can become part of any official Sailfish OS release.
It's nevertheless exciting to see that as part of the recent upgrade from Sailfish OS 4.6.0.13 to 4.6.0.15, several changes to libhybris were included that will support the move to ESR 91. As readers of my diary entries will know, there were several issues that caused the browser to crash or hang which were ultimately traced back to libhybris and which, looking at the changelog, will now be fixed. If ESR 91 does go out in some future Sailfish OS release, this will make the transition much smoother.
At present I've been building exclusively for aarch64. The build will need to be tested and potentially amended for armv7hl and i486 targets. On top of this, it appears that getting the browser to work on native platforms such as the emulator and the PinePhone, where there is no libhybris layer, will also require some additional work.
In the longer term, there are two, maybe three, objectives. The obvious next step after the release of ESR 91 would be to move to the next ESR release, which is 102.15.1. Checking the various release notes we can see that ESR 91.9.1 was released on 20 May 2022, whereas ESR 102.15.1 was released on 12 September 2023. That's a gap of around 16 months. So far the upgrade from ESR 78 has taken 13 months, so it looks like we may have an opportunity to catch up with Firefox ESR latest. In practice though it's usually around 12 months between ESR releases so some acceleration will be needed if we're to properly keep up. It's worth noting that the extended service releases have a much longer support cycle than other releases, which can lead to some overlap. For example both ESR 115.15.0 and ESR 128.2.0 were released on 3 September 2024.
Besides the obvious upgrade to the renderer engine it would also be great to add features to the browser. On the Sailfish OS Forum Niels (fingus) suggested supporting MPRIS for the video and audio controls of the browser. That's the sort of thing I'd love to add, but which would require some research and effort to investigate and implement. I'd also love to introduce support for reader mode, scrollbars and maybe even extensions. There's no shortage of interesting ideas for things to work on.
The third objective would be to properly support the WebRender compositor on Sailfish OS. It's not clear how much work this would involve, but it's potentially substantial. Integrating this with the Sailfish OS render pipeline could be quite a challenge.
Finally there's plenty of scope to make important improvements to the browser build process. Updating Rust, fixing the multi-process hang — which remains a significant barrier to reducing build times — and introducing a build cache would all help to make development easier.
But I've learnt a whole lot more than this and not just from the process of development, but also from the experience of writing a daily diary about it. I'd like to think that the work has helped demonstrate the importance and benefit of open source, for users of course, but also for Jolla. Jolla invested heavily in ensuring the browser is open source. Not just in giving the code the right licence and making the source available, but also in documenting it, following an open development model and supporting the community in making it accessible. In no way was this a "free" browser upgrade for Jolla, but I hope it goes some small way to justifying this open source strategy. I'd also like to think the diary entries have demonstrated some of the benefits of being open about progress as well.
I've also learnt more than I'd like to admit about Brownian debugging. This is the process of performing a random walk, changing bits of the code en route, until it works. It may not be the most efficient debugging approach and it may be that an element of strategic direction improves matters, but as long as the problem space can be constrained I've found Brownian debugging can be unexpectedly effective. Given enough time and patience.
There's a follow-up to this, which is that it also demonstrates how much can be achieved without the benefit of understanding or insight, but relying on perseverance alone. I'm definitely more familiar with the gecko code than when I started, but the gaps in my knowledge remain prodigious. Armed only with my abilities in Brownian debugging and enough time to deploy them, I managed to make some progress.
I admit this wasn't my first involvement in upgrading the browser. While working at Jolla I contributed to the upgrade from ESR 60 to ESR 68, and then again from ESR 68 to ESR 78. But that was as part of a team with an incredible depth of knowledge of the browser and impressive software development skills. When I started this process I wasn't at all certain whether I'd be able to make any meaningful contribution to the next upgrade. I'm now much more confident that not only has this been possible, but that I'd be able to do it again.
It's been great to feel some purpose within the Sailfish OS community. I really enjoyed working for Jolla, not least because it felt worthwhile contributing to an operating system I love using, but also contributing to the community I felt a part of. Doing this work has served as a great way to continue feeling like I have something to contribute.
Writing the development diaries was, I hope, helpful in demonstrating that work was continuing on the browser: it hadn't been forgotten or left to decay. It gave me a lot more visibility than I would have got otherwise. Crucially though it made me realise that there are many, many, Sailfish OS developers putting in similar or greater levels of commitment, for ports and apps and bug checking, who may not have the same visibility because they're not writing a diary, but who nevertheless put in more work and deserve the same appreciation that I've felt privileged to have received from the community.
Comment
The Journey
Back when I started I hadn't quite appreciated how long this whole process was going to take. Although somewhere between half a year and a year seemed reasonable, the final 339 day tally is a little closer to the latter than I'd hoped. Moreover a year in theory feels much shorter than a year in practice. Adjusting for the fact I'm employed full-time to not do Gecko work, in practice I must have worked only around three hours a day during the week and twelve hours at the weekend. Two thirds of that time was spent coding and the other third writing up the diary entries. Given that the 339 days was made up of 244 weekdays and 48 weekends, I can be a bit more precise about how much time I actually spent on it.$ time gecko-dev real 48w 3d 0h 0m 0.000s code 5w 1d 12h 0m 0.000s diary 2w 4d 4h 0m 0.000sLet's convert that into work time. This is interesting because practically speaking this is the "Full Time Equivalent" (FTE) or the amount of person-hours needed to complete the project from a commercial perspective. Typically the work would of course be distributed between multiple people to speed up project implementation, so the real time would be shorter.
$ time --work gecko-dev real 67w 4d 0h 0m 0.000s code 23w 1d 2h 0m 0.000s diary 11w 3d 1h 0m 0.000sLet's consider now how those days were partitioned into tasks. The following diagram shows the linear sequence of how I spent each day of work. This oversimplifies things a little given I didn't always complete tasks sequentially, but is pretty close to reality.
On day 149 I gave a presentation of this work at FOSDEM'24, including an earlier version of this diagram. I thought I was about half way through the work at that stage, but this turned out to be an underestimate, as is clear from this diagram. I was in fact only 11/25 of the way through.
The longest task of 87 days involved getting the WebView render pipeline working. In comparison getting the first successful build to complete took only 45 days. Both of these were quite dark and gloomy times. Without a working build it's impossible to debug or test the code, whilst without a working renderer nothing else can be effectively tested. Both periods felt like dark tunnels that took an age to emerge from.
Following these in terms of length of task were PDF Printing at 28 days, the WebGL renderer at 25 days and the Sec-Fetch-* headers at 15 days. These were the only tasks that took more than 14 days, which is a bit of a cut-off point for me. After two weeks of writing daily about the tasks it becomes really hard to write more without sounding (and seeming) a bit lost and exhausted.
It was particularly frustrating how long it took to get the WebView rendering working, given the browser already worked nicely. It could have been worse though: I had a clear plan which involved gradually stripping out and adjusting pieces of code to align with the code in the ESR 78 implementation. This gradual convergence towards ESR 78 meant I knew the task was inevitably going to be time-limited, also allowing me to identify progress on a daily basis. As it turned out I had to do it twice: first removing code, then adding code back in again. But it did eventually get there.
Realising on Day 254, after all this, that I'd broken the WebGL rendering process was also a bit of a low point. By then I really wanted to move on from the render pipeline.
But eventually I did emerge from all of these tunnels and the joy of getting something to work is a crucial counterbalance to the frustration when it isn't. In retrospect the low points were all worth it for the sake of the enjoyment I also got out of it.
Apart from how things turned out in practice it's also interesting to compare how closely it matched my initial expectations. Returning to the preamble once again, it's clear I was expecting a long haul, but I also had experience to draw from my previous involvement in browser upgrades at Jolla:
Another piece of wisdom that Raine taught me is that the first task of upgrading the engine should always be to get it to compile. Once it's compiling, getting it to actually run on a phone, patching all of the regressions and fixing up all the integrations can follow. But without a compiling build there's no point in spending time on these other parts.
I think it's fair to say that I did follow this approach, starting with getting things to compile and then focusing on the details afterwards. This led on to the following decision about the structure of the work:
I'm therefore going for a three-stage process with the upgrade:
Looking back I did broadly follow this structure. I got the build to complete, then I applied the patches needed to get rendering working and only after that did I apply the other patches. I did diverge from this advice in one important respect. Rather than applying all of the remaining patches I actually only applied a minimal set required to get the render working.- Apply a minimal set of changes and patches to get ESR 91 to build.
- Apply any remaining patches where possible and other changes to get it to run and render.
- Handle the Sailfish OS specific integrations.
In hindsight I think that was the right thing to do. But it also felt like a natural consequence of the situation I found myself in. Given the upstream code changes the patches I did apply needed quite a lot of work to get them to stick. That gave me the impression that many of the existing patches might turn out to be redundant, superseded by changes in the upstream code.
By applying only the patches that were necessary it give me the opportunity to potentially avoid patches which were no longer relevant in a more intentional way. Hopefully the patches I've ended up with are closer to the minimum required and have a slightly cleaner structure than would otherwise have been the case.
But practically speaking I think my original plan was a good one and, in retrospect, I followed it pretty closely.
Destination Gecko
Let's now consider where the journey took us. The point of all this work was to take the browser engine from ESR 78 to ESR 91. What does this give us?Abstractly speaking, one of the most compelling reasons to want to upgrade is because websites routinely attempt to fingerprint browsers and serve different content depending on the result. This practice is as old as the hills, yet remains as common today as it is problematic. I understand that different browsers have different capabilities and that website creators will be blamed (unfairly) if a page renders poorly as a result of a user failing to keep their browser up-to-date. But you'd have thought at the very least browsers could test for features rather than versions.
When browsing using ESR 78 it's not uncommon for a site to chastise its own customers. Updating the engine on Sailfish OS is one way to reduce the chance of seeing these invectives, even if just changing the user agent string is often just as effective as a browser upgrade without any of the effort.
One of the worst offenders is Cloudflare, which routinely blocks the Sailfish browser from accessing sites on its content delivery network. Upgrading to ESR 91 seems to circumvent this in at least some cases.
But browser upgrades also bring genuine improvements as well. New features, improved stability, increased security and bug fixes. There have been a total of 45 point releases between the previous Sailfish OS engine of 78.15.0 and the upgraded version at 91.9.1. Each of these point releases has brought improvements, although not all will be relevant to Sailfish OS. Major releases (e.g. from 78 to 79) will typically include new features, stability improvements and security fixes, whereas point releases (e.g. 91.1.0 to 91.2.0) will often only include security and regression fixes.
Working through the Firefox changelogs the following are some of the obvious improvements that have a direct impact on the Sailfish browser:
- Certificate performance improvements (80.0.1).
- WebGL rendering improvements (80.0.1).
- Support for viewing more filetypes (81.0).
- Improved element rendering (81.0.1, 86.0.1).
- Improved PDF export (81.0.1, 85.0.1, 90.0.2).
- Increased startup and rendering speeds (82.0).
- Fixes for WebSocket message duplication (82.0.2).
- SpiderMonkey JavaScript performance improvements (83.0).
- An HTTPS-Only mode option (83.0).
- Improved shared memory performance (84.0).
- Increased cookie and supercookie isolation (85.0, 86.0, 89.0, 90.0, 91.0).
- Deprecation of WebRTC DTLS 1.0 (86.0).
- Private browsing compatibility improvements (87.0).
- Increased referrer privacy (87.0, 88.0).
- Working hyperlinks in PDF export (90.0).
- Removal of FTP support (90.0).
- Improved user-action response times (91.0).
- Fixes for microsoft.com certificate errors (91.4.1).
- Many crash bug fixes (81.0.1, 82.0.1, 85.0.1).
In addition to the above changes, there were 15 critical, 115 high severity, 68 medium severity and 30 low severity security fixes combined into these updates. The importance of these can be best understand with reference to Mozilla's security classification:
- Critical: Vulnerability can be used to run attacker code and install software, requiring no user interaction beyond normal browsing.
- High: Vulnerability can be used to gather sensitive data from sites in other windows or inject data or code into those sites, requiring no more than normal browsing actions.
- Moderate: Vulnerabilities that would otherwise be High or Critical except they only work in uncommon non-default configurations or require the user to perform complicated and/or unlikely steps.
- Low: Minor security vulnerabilities such as Denial of Service attacks, minor data leaks, or spoofs. (Undetectable spoofs of SSL indicia would have "High" impact because those are generally used to steal sensitive data intended for other sites.)
Whether this is actually the case is hard to say. My tests using various performance measurement tools don't suggest significant performance improvements. But I must admit to having the same feeling of improved responsiveness. I suspect that may be due to the upstream changes in version 91.0 that claim to have improved responsiveness for user-interactions by 10-20%. That would make a noticeable improvement for users in a way that may not show up in benchmarks. It's my suspicion that the page loading feedback that's used to drive the progress bar on Sailfish OS has also been improved, although I've not found any explicit changes that would do this.
What do all of these changes mean for the state of the code? The upgrade from ESR 78 to ESR 91 also, surprisingly for me, brought with it a larger codebase. Mozilla has been intentionally transitioning code from C++ to Rust, with the number of lines of Rust code increasing by 14%. But the number of lines of C++ code also increased by 3% and for the total combined C++, JavaScript and Rust code this increased by 7%. Plotting the lines of code categorised by language, these increases are clearly visible.
Although proportionally there's been a bigger increase in Rust code than C++, in absolute terms the increase in both is almost identical (380607 lines of Rust code added compared to 384080 lines of C++ code).
In the above diagram Docs refers to content that relates to documentation. Build refers to scripts used to manage the build pipeline. IDL refers to interface definition files.
It's worth pausing to consider the code needed to build the Gecko engine. Gecko has experienced several changes through its life accumulating a mixture technologies as it goes. As a result the build system is a strange combination of Build (the mozilla build system), Python, Make, ninja, GN and Cargo. At certain points the build system compiles Rust into native binaries that then become part of the build pipeline itself. This causes havoc for the scratchbox2 cross-platform build engine Sailfish OS uses. No small part of the work in getting gecko working for Sailfish OS involves taming these build systems.
Although the numbers for IDL shown in the graph are low compared to the other languages, I nevertheless wanted to include it because it's such a critical part of the way Gecko works. The combination of C++/Rust and JavaScript means that there needs to be a really solid way to expose native methods to JavaScript and JavaScript methods to native code. The type systems aren't equivalent and so this requires a careful arrangement. Gecko supports this using its Interface Definition Language. IDL files read a bit like C++ header files but are more generic. Any interface defined using IDL can be exposed both natively and to the JavaScript layer. It's critical glue that holds everything together.
The numbers shown in the graph are measured in millions of lines of code. They're big numbers, but it's worth bearing in mind that Gecko is a relative minnow when it comes to code size in the world of browsers. For comparison I ran the same code analysis on the Chromium source. I was pretty surprised by how large Chromium is compared to Gecko.
Chromium contains over four times the code: 154 981 674 lines of code for Chromium compared to a paltry 37 361 820 lines of code for Gecko ESR 91. It's also interesting to compare the range of technologies involved in the two projects. Chromium introduces TypeScript, Go, Java, Objective-C, Lua, AppleScript, TCL and WASM, although some of these will be target-specific.
Destination Sailfish
So far we've considered the differences between ESR 78 and ESR 91 in some detail, but none of this has touched on the actual changes needed to get the code to run on Sailfish OS.As any Sailfish OS developer will be aware, Sailfish OS uses RPMs for packaging software, a technology that originated on Red Hat Linux as the Red Hat Package Management system. Work started on RPM in 1995, a good ten years before the initial release of git and two years before Netscape started work on Gecko. Back then it was commonplace for software to be provided in the form of a tarball and in some ways the RPM build process reflects this. Distribution-specific changes are provided as patches applied directly to the upstream source. These patches are all listed in the spec file which is passed to the rpm tool to perform the build. On Sailfish OS this is all hidden behind sfdk which is itself a wrapper for the scratchbox2 sb2 tool. It's a complex layered system with multiple abstractions.
The point is that even now on Sailfish OS packages that use upstream code can pull directly from the upstream repositories, rather than having to use Sailfish-specific implementations. Any Sailfish OS specific changes can then be applied onto this code in the form of patches. It's not a process I enjoy working with because patching is a lot messier and less flexible than working with commits in a repository. Even though it's possible to convert a patch list into a series of commits and back again this adds an extra step and contrains what actions can be performed at different times.
The benefit is that we always retain a very clean and clear distinction between the upstream code and the Sailfish OS specific changes, with the latter being encapsulated in the patches to be applied. We can use this separation to discover how the changes needed to get Gecko ESR 78 to work with Sailfish OS differ compared to those needed for ESR 91.
This figure shows only a very high-level view, but nevertheless tells a story. Note that unlike the previous figures the y-axis of this chart uses a logarithmic scale to account for the big differences in scale between different languages. This can make the values harder to read so, for clarity, here they are in tabular form.
Language | ESR 78 added | ESR 78 removed | ESR 91 added | ESR 91 removed |
---|---|---|---|---|
C++ | 22 726 | 606 | 22 476 | 631 |
Docs | 510 | 2 | 508 | 12 |
Build | 28 558 | 20 350 | 19 320 | 11 090 |
JavaScript | 158 | 6 | 170 | 43 |
Rust | 544 | 175 | 498 | 180 |
IDL | 29 | 3 | 39 | 6 |
These numbers represent the actual code I've been working on for the last year. In general the number of lines added or removed has reduced as we've moved from ESR 78 to ESR 91. This is a good thing. The fewer changes made to the upstream code the better. In general the difference isn't huge, but it does exist. The total number of patches reduced from 98 to 84. The number of lines added to ESR 91 was 82% of the number of lines added to ESR 78. The number of lines removed from ESR 91 was only 57% of the number removed from ESR 91.
Interestingly, while there were fewer change made, the differences practically balance themselves out. Overall the patches to ESR 78 increased the code size by 31 383 lines compared to 31 049 lines for ESR 91. That's astonishingly similar.
These numbers don't quite capture all of the changes because they relate only to the gecko code. There were also changes needed in the other four components that make up the Sailfish browser stack, as well as to the EmbedLIte code (which is handled separately from gecko but ends up in the same xulrunner package). Let's briefly take a look at these other components.
The gecko renderer is by far the largest of the components. The qtmozembed component provides a QT wrapper around the renderer. The embedlite-components package adds the privileged JavaScript shims needed for Sailfish OS, largely replacing equivalent privileged JavaScript that would typically run in Firefox. The sailfish-components-webview component provides Qt components needed in order to support both the browser and WebView (for example the pop-up dialogues), but also provides the code needed to offer the rendering engine as a WebView component to other Qt apps. Finally the sailfish-browser component is the actual browser app you run when you open the browser on your phone.
Apart from the gecko renderer all of these are Sailfish-specific packages, so they don't have any "upstream" code. The Jolla repositories are the upstream repositories for these. Consequently there's no need to apply patches and we can work on the code directly. That means that when analysing changes for these we're just using the commits that take the code from ESR 78 versions to ESR 91 versions. Between them they accumulated 169 commits with the following additions and removals (these numbers also including the changes to the gecko source):
Language | Lines added | Lines removed |
---|---|---|
C++ | 23 456 | 1 281 |
Docs | 508 | 12 |
Build | 19 724 | 11 381 |
JavaScript | 452 | 114 |
Rust | 498 | 180 |
IDL | 52 | 17 |
QML | 14 | 14 |
Total | 44 704 | 12 999 |
This table essentially captures the sum total of the changes needed to move from one version to the next. As you can see, the majority of the additions have been to C++ code. The build scripts saw rather a lot of churn. I'm very surprised to see more Rust additions than JavaScript additions. The QML code changed very little, which is perhaps to be expected given the external appearance, renderer aside, is almost identical. That was intentional: there's always scope to improve the Sailfish browser user interface, but my objective with this work was to get the renderer upgraded as quickly as possible. Changing the interface would have been a diversion.
Mental Health
I put a lot of myself into the Gecko upgrade. Working on it practically every day for a year, even if not full-time, required a level of commitment that I wouldn't typically give outside of my work hours. This is a personal perspective: the world is blessed with many people who commit far more for far less reward and who don't then feel the need to tell the world about it in a blog post.Nevertheless, this was a big deal for me. I'm not a natural blogger so the prospect of writing about my coding on a daily basis was daunting at the outset. But it turned out to be surprisingly easy. Writing about specific tasks is very different from having to come up with inventive and interesting topics to write about on a daily basis.
Having to write daily diary entries undoubtedly helped keep me on track and working on the project every day. The need to have at least a few paragraphs to write about drove me to do the coding work.
There were a few occasions when I struggled with this. Typically on a Friday night after having spent two and a half hours on public transport returning from work. Having to then write up a diary entry in a tired state of semi-consciousness was not always ideal. But these cases were relatively rare.
There were also occasions — mostly in the middle of the work to get the various rendering pipelines working — when the work really got me down. Writing the diary entries made me very conscious of the progress I was — or in many cases wasn't — making. In the middle of the trough when it's really not clear whether it will be possible to come up with a solution, some of those occasions felt quite dark. If I hadn't been writing the diary I can imagine myself choosing to take a break and then having that break go on for several days.
But, and this is a big but, I was supported the entire way through by the amazing Sailfish community who responded to my posts on Mastodon and the forum, always encouraging and supportive. I'm not a social person and this was a bit of a shock for me. People out there in the Sailfish community and beyond really are the most encouraging and thoughtful people you could hope to interact with.
The amazing images and poetry from the likes of Thigg (thigg) and Leif-Jöran Olsson (ljo) are beautiful cases in point.
But there are so many people who helped and contributed in so many ways, I couldn't possibly mention everyone here. I apologise for not mentioning you all individually, but I'm really grateful.
Besides the community I also have to mention Joanna, my wife, who's sacrificed more than anyone else for the sake of me spending three hours each day and most of my weekends on gecko development. She carried me through this.
With all of this support, I found the experience surprisingly effortless. Perhaps the biggest challenge, as it turns out, was being able to find a suitable point to wind things down. Dropping off from posting diary entries every day and having a very clear purpose for my free time has been hard to manage in a measured way. It was too much of a cliff edge and, if I do this again, I think I'd want to look into ways to mitigate this. But I don't yet have a good solution: writing these diary entries doesn't lend itself to a tapered reduction of work.
Future Work
Future work for this project comes in two forms. There's the future work needed to achieve the (hopefully) near-term goal of getting the browser released to users as part of Sailfish OS. Then there's the longer term goal of what to do beyond that.As I write this the current situation is that three out of five pull requests have been merged into Jolla's repositories. The remaining three have been through a couple of review rounds already. So the immediate task is to get them through review and merged in. This alone won't result in their release as part of Sailfish OS as they're currently being merged into bespoke ESR 91 branches. Jolla will need to merge these into the main branch before they can become part of any official Sailfish OS release.
It's nevertheless exciting to see that as part of the recent upgrade from Sailfish OS 4.6.0.13 to 4.6.0.15, several changes to libhybris were included that will support the move to ESR 91. As readers of my diary entries will know, there were several issues that caused the browser to crash or hang which were ultimately traced back to libhybris and which, looking at the changelog, will now be fixed. If ESR 91 does go out in some future Sailfish OS release, this will make the transition much smoother.
At present I've been building exclusively for aarch64. The build will need to be tested and potentially amended for armv7hl and i486 targets. On top of this, it appears that getting the browser to work on native platforms such as the emulator and the PinePhone, where there is no libhybris layer, will also require some additional work.
In the longer term, there are two, maybe three, objectives. The obvious next step after the release of ESR 91 would be to move to the next ESR release, which is 102.15.1. Checking the various release notes we can see that ESR 91.9.1 was released on 20 May 2022, whereas ESR 102.15.1 was released on 12 September 2023. That's a gap of around 16 months. So far the upgrade from ESR 78 has taken 13 months, so it looks like we may have an opportunity to catch up with Firefox ESR latest. In practice though it's usually around 12 months between ESR releases so some acceleration will be needed if we're to properly keep up. It's worth noting that the extended service releases have a much longer support cycle than other releases, which can lead to some overlap. For example both ESR 115.15.0 and ESR 128.2.0 were released on 3 September 2024.
Besides the obvious upgrade to the renderer engine it would also be great to add features to the browser. On the Sailfish OS Forum Niels (fingus) suggested supporting MPRIS for the video and audio controls of the browser. That's the sort of thing I'd love to add, but which would require some research and effort to investigate and implement. I'd also love to introduce support for reader mode, scrollbars and maybe even extensions. There's no shortage of interesting ideas for things to work on.
The third objective would be to properly support the WebRender compositor on Sailfish OS. It's not clear how much work this would involve, but it's potentially substantial. Integrating this with the Sailfish OS render pipeline could be quite a challenge.
Finally there's plenty of scope to make important improvements to the browser build process. Updating Rust, fixing the multi-process hang — which remains a significant barrier to reducing build times — and introducing a build cache would all help to make development easier.
Lessons Learned
The main outcome of this work for me has been the reaffirmation that the browser is a critical component of Sailfish OS. The better the browser the more usable Sailfish OS becomes as a daily driver. Make no mistake, the reason I wanted to do this work was for entirely selfish reasons: Sailfish OS is my mobile phone operating system of choice. I enjoy using it and I want it to remain relevant so that it continues to be supported. Upgrading the browser is my way of helping ensure this happens; it's my itch and I've been scratching it.But I've learnt a whole lot more than this and not just from the process of development, but also from the experience of writing a daily diary about it. I'd like to think that the work has helped demonstrate the importance and benefit of open source, for users of course, but also for Jolla. Jolla invested heavily in ensuring the browser is open source. Not just in giving the code the right licence and making the source available, but also in documenting it, following an open development model and supporting the community in making it accessible. In no way was this a "free" browser upgrade for Jolla, but I hope it goes some small way to justifying this open source strategy. I'd also like to think the diary entries have demonstrated some of the benefits of being open about progress as well.
I've also learnt more than I'd like to admit about Brownian debugging. This is the process of performing a random walk, changing bits of the code en route, until it works. It may not be the most efficient debugging approach and it may be that an element of strategic direction improves matters, but as long as the problem space can be constrained I've found Brownian debugging can be unexpectedly effective. Given enough time and patience.
There's a follow-up to this, which is that it also demonstrates how much can be achieved without the benefit of understanding or insight, but relying on perseverance alone. I'm definitely more familiar with the gecko code than when I started, but the gaps in my knowledge remain prodigious. Armed only with my abilities in Brownian debugging and enough time to deploy them, I managed to make some progress.
I admit this wasn't my first involvement in upgrading the browser. While working at Jolla I contributed to the upgrade from ESR 60 to ESR 68, and then again from ESR 68 to ESR 78. But that was as part of a team with an incredible depth of knowledge of the browser and impressive software development skills. When I started this process I wasn't at all certain whether I'd be able to make any meaningful contribution to the next upgrade. I'm now much more confident that not only has this been possible, but that I'd be able to do it again.
It's been great to feel some purpose within the Sailfish OS community. I really enjoyed working for Jolla, not least because it felt worthwhile contributing to an operating system I love using, but also contributing to the community I felt a part of. Doing this work has served as a great way to continue feeling like I have something to contribute.
Writing the development diaries was, I hope, helpful in demonstrating that work was continuing on the browser: it hadn't been forgotten or left to decay. It gave me a lot more visibility than I would have got otherwise. Crucially though it made me realise that there are many, many, Sailfish OS developers putting in similar or greater levels of commitment, for ports and apps and bug checking, who may not have the same visibility because they're not writing a diary, but who nevertheless put in more work and deserve the same appreciation that I've felt privileged to have received from the community.