flypig.co.uk

List items

Items from the current list are shown below.

Blog

All items from October 2023

31 Oct 2023 : Day 76 #
Yesterday was taken up with forum moderation, but hopefully I'll be back on track today. Right now I'm on the train. I've debugged on the train before, but debugging with two phones simultaneously is a bit too much of a balancing act in the confined space. So I'm reading through the code to get familiar with it instead.

The method I'm reading through is SkDraw::drawRect() which I talked about a couple of days back. As the name suggests, this is a low-level primitive rectangle drawing routine. At this depth in the system the ESR 78 code and ESR 91 code are identical. It's a bit like being at the bottom of the ocean (I imagine): calm.

The method ends up either calling draw_rect_as_path() for a boundary rectangle or a variant of SkScan::FillRect() if it's filled.

In practice, when I run it in ESR 78 the type is of the form kFill_RectType which means it ends up going into this branch of a switch statement:
        case kFill_RectType:
            if (paint.isAntiAlias()) {
                SkScan::AntiFillRect(devRect, clip, blitter);
            } else {## Day 77
It then goes into the AntiFillRect() branch of the condition inside it. This eventually ends up inside this bit of code:
static void antifillrect(const SkXRect& xr, SkBlitter* blitter) {
    antifilldot8(SkFixedToFDot8(xr.fLeft), SkFixedToFDot8(xr.fTop),
                 SkFixedToFDot8(xr.fRight), SkFixedToFDot8(xr.fBottom),
                 blitter, true);
}
Which ends up calling blitter->blitRect(), which turns out to specialise to SkAAClipBlitter::blitRect(). That's quite a few nested calls, but as you can see, this is basically ending up by clearing the background:
SkARGB32_Blitter::blitRect (height=45, width=777, y=0, x=256, this=0x7fde8d2038)
It still goes down a little further though. This calls the following method:
SkOpts::rect_memset32(device, color, width, rowBytes, height);
Which ends up running the following snippet of templated code:
    template 
    static void rect_memsetT(T buffer[], T value, int count, size_t rowBytes, int height) {
        while (height --> 0) {
            memsetT(buffer, value, count);
            buffer = (T*)((char*)buffer + rowBytes);
        }
    }
That gets us to a memset which is realistically as far as I can dig with this.

Next up is the ESR 91 version. Does it get to the same place? That'll have to be something to check tomorrow. With any luck it will be identical and then we'll have exhausted this part of the investigation.

For all the other entries in my developer diary, check out the Gecko-dev Diary page.
Comment
30 Oct 2023 : Day 75 #
Yesterday I reached what felt like an exciting point in the development, since I'd identified that certain important-looking pieces of functionality weren't being called as the EGL display structures were created and configured. I'd really hoped to make progress with it today, but life intervened.

Rather than working on the gecko code I've spent the evening putting together text for the Sailfish OS Community Newsletter. It's unfortunately just not been possible to squeeze both into the same day along with my day job as well.

The newsletter will be out on Thursday so you'll be able to see what I've been working on in favour of gecko then. But my plan is to get back to gecko tomorrow evening, add in the missing functionality, and hope that it helps get the rendering working.

If you were really hoping to hear about code today, then you can always check out my earlier posts on my Gecko-dev Diary page.
Comment
29 Oct 2023 : Day 74 #
That extra hour was worth the wait. I've not yet decided exactly how I spent it (possibly asleep, possibly fixing my Hue Hub) but it felt good to have enough time for both, and now it feels like it's the middle of the night, but it's actually only six thirty in the evening.

So today I'm continuing to go deeper and deeper into the rendering code. I've now got as far as nsCSSBorderRenderer::DrawBorders(). This is one of the methods that's called to render a bordered rectangle. I'm rendering a page that contains such objects and the breakpoint is hit on both ESR 78 and ESR 91.

The method then makes lots of decisions about exactly what it's going to render: are all the borders the same? Do they have zero thickness? Are they solid? Things like that.

Eventually we get to the point where an actual rectangle is going to be plotted:
        mDrawTarget->FillRect(GetCornerRect(corner),
                              ColorPattern(ToDeviceColor(color)));
But what is mDrawTarget and which version of FillRect() is being called (there appear to be at least 18 to choose from). It looks like it should be an easy question: just check what type mDrawTarget is. But with multiple inheritance and virtual methods it's easier said than done. It can be easier to do at runtime with a debugger than by manual inspection of the code, in fact.

So I'm stepping inside the method on ESR 91, from which it seems to be the case that it's the version in DrawTarget from 2D.h:
(gdb) explore *mDrawTarget
The value of '*mDrawTarget' is of type 'nsCSSBorderRenderer::DrawTarget' which is a typedef of type 'mozilla::gfx::DrawTarget'
The value of '*mDrawTarget' is a struct/class of type 'mozilla::gfx::DrawTarget' with the following fields:

  mozilla::external::AtomicRefCounted<mozilla::gfx::DrawTarget> =
                      <Enter 0 to explore this base class of type 'mozilla::
                      external::AtomicRefCounted<mozilla::gfx::DrawTarget>'>
          mUserData = <Enter 1 to explore this field of type 'mozilla::gfx::UserData'>
         mTransform = <Enter 2 to explore this field of type 'mozilla::gfx::Matrix'>
        mOpaqueRect = <Enter 3 to explore this field of type 'mozilla::gfx::IntRect'>
    mTransformDirty = true .. (Value of type 'bool')
  mPermitSubpixelAA = true .. (Value of type 'bool')
            mFormat = mozilla::gfx::SurfaceFormat::B8G8R8X8 ..
                      (Value of type 'mozilla::gfx::SurfaceFormat')

Enter the field number of choice: 
But as I mentioned the method is virtual and the above doesn't preclude it being overridden. So let's try to step inside it to see where we end up.
(gdb) n
3217            mDrawTarget->FillRect(GetCornerRect(corner),
(gdb) n
3218                                  ColorPattern(ToDeviceColor(color)));
(gdb) n
3217            mDrawTarget->FillRect(GetCornerRect(corner),
(gdb) s
mozilla::gfx::DrawOptions::DrawOptions (aAntialiasMode=mozilla::gfx::
    AntialiasMode::DEFAULT, aCompositionOp=mozilla::gfx::CompositionOp::OP_OVER, 
    aAlpha=1, this=0x7f9f3dfa90)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/gfx/2D.h:124
124     ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/gfx/2D.h:
    No such file or directory.
(gdb) n
nsCSSBorderRenderer::DrawBorders (this=this@entry=0x7f9f3dfc20)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/layout/painting/
    nsCSSRenderingBorders.cpp:3219
3219            continue;
That's a mess. It's mostly a mess because there are four method calls essentially all on the same line as the FillRect() call that we're interested in, so we want to stop over three and into one of them:
  1. FillRect() (step into).
  2. GetCornerRect() (step over).
  3. ColorPattern() (step over).
  4. ToDeviceColor() (step over).
Eventually after trying to step through carefully and failing I decide to use brute force by placing a breakpoint on every instance of FillRect(). This from ESR 78:
(gdb) b FillRect
Breakpoint 3 at 0x7ff2858a58: FillRect. (25 locations)
(gdb) c
Continuing.

Thread 10 "GeckoWorkerThre" hit Breakpoint 3, mozilla::gfx::DrawTargetTiled::
    FillRect (this=0x7fb9250c10, aRect=..., aPattern=..., aDrawOptions=...)
    at /usr/src/debug/xulrunner-qt5-78.15.1+git33.2-1.21.1.jolla.aarch64/gfx/2d/
    DrawTargetTiled.cpp:221
221                                    const DrawOptions& aDrawOptions) {
(gdb) 
Happily this gives us what we need: it's the FillRect() method from DrawTargetTiled. This is what that method looks like:
void DrawTargetTiled::FillRect(const Rect& aRect, const Pattern& aPattern,
                               const DrawOptions& aDrawOptions) {
  Rect deviceRect = mTransform.TransformBounds(aRect);
  for (size_t i = 0; i < mTiles.size(); i++) {
    if (!mTiles[i].mClippedOut &&
        deviceRect.Intersects(Rect(mTiles[i].mTileOrigin.x,
                                   mTiles[i].mTileOrigin.y,
                                   mTiles[i].mDrawTarget->GetSize().width,
                                   mTiles[i].mDrawTarget->GetSize().height))) {
      mTiles[i].mDrawTarget->FillRect(aRect, aPattern, aDrawOptions);
    }
  }
}
On ESR 91 we get something very similar:
(gdb) b FillRect
Breakpoint 5 at 0x7fba5084ec: FillRect. (16 locations)
(gdb) c
Continuing.

Thread 8 "GeckoWorkerThre" hit Breakpoint 5, mozilla::gfx::DrawTargetTiled::
    FillRect (this=0x7f88dd55f0, aRect=..., aPattern=..., aDrawOptions=...)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/2d/DrawTargetTiled.cpp:221
221                                    const DrawOptions& aDrawOptions) {
(gdb) 
That's all encouraging and takes us a bit closer.

If we step through on either ESR 78 or ESR 91 we get to the same place:
(gdb) n
222       Rect deviceRect = mTransform.TransformBounds(aRect);
(gdb) n
223       for (size_t i = 0; i < mTiles.size(); i++) {
(gdb) n
224         if (!mTiles[i].mClippedOut &&
(gdb) n
313     obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h: No such file or directory.
(gdb) n
228                                        mTiles[i].mDrawTarget->GetSize().height))) {
(gdb) n
225             deviceRect.Intersects(Rect(mTiles[i].mTileOrigin.x,
(gdb) n
229           mTiles[i].mDrawTarget->FillRect(aRect, aPattern, aDrawOptions);
(gdb) c
Continuing.

Thread 10 "GeckoWorkerThre" hit Breakpoint 3, mozilla::gfx::DrawTargetSkia::
    FillRect (this=0x7fb8f8a3c0, aRect=..., aPattern=..., aOptions=...)
    at gfx/2d/DrawTargetSkia.cpp:775
775                                   const DrawOptions& aOptions) {
(gdb) 
The DrawTargetSkia::FillRect() method is more complex, but eventually calls this:
  mCanvas->drawRect(rect, paint.mPaint);
Breakpointing on that takes us another step closer:
(gdb) c
Continuing.
[Switching to LWP 20030]

Thread 10 "GeckoWorkerThre" hit Breakpoint 6, SkCanvas::drawRect
    (this=0x7fb90a72a0, r=..., paint=...)
    at gfx/skia/skia/src/core/SkCanvas.cpp:1803
1803    void SkCanvas::drawRect(const SkRect& r, const SkPaint& paint) {
(gdb) c
Continuing.

Thread 10 "GeckoWorkerThre" hit Breakpoint 6, SkBitmapDevice::drawRect
    (this=0x7fbb044a40, r=..., paint=...)
    at gfx/skia/skia/src/core/SkBitmapDevice.cpp:364
364     void SkBitmapDevice::drawRect(const SkRect& r, const SkPaint& paint) {
(gdb) c
Continuing.

Thread 10 "GeckoWorkerThre" hit Breakpoint 6, SkDraw::drawRect
    (paint=..., rect=..., this=0x7fde8d2d98)
    at gfx/skia/skia/src/core/SkDraw.h:42
42              this->drawRect(rect, paint, nullptr, nullptr);
(gdb) 

This takes us to the SkDraw::drawRect() method:
void SkDraw::drawRect(const SkRect& prePaintRect, const SkPaint& paint,
                      const SkMatrix* paintMatrix, const SkRect* postPaintRect)
The code reaches exactly the same place on ESR 78 and ESR 91. This seems like a good place to stop for the night. It'll take me a while to carefully read through this method to figure out what it's doing. It's also a good place to pause since putting a breakpoint on this method will allow me to easily return to this point tomorrow as well.

So that's it for tonight! Sleep well.

For all the other entries in my developer diary, check out the Gecko-dev Diary page.
Comment
28 Oct 2023 : Day 73 #
Yesterday I was, after much wrangling, finally able to get the render to be active when painting occurs. There's still no actual rendering taking place to the screen so this isn't enough to get things working, but it's an essential step forwards.

Today I'm looking into the nsDisplayBorder::Paint() method, which is now getting successfully called, to find out if there's anything lower down the stack that may be preventing the actual render instructions from being enacted.

I ran the ESR 78 build first and got the following call stack for nsDisplayBorder::Paint().
Thread 10 "GeckoWorkerThre" hit Breakpoint 4, nsDisplayBorder::Paint
    (this=0x7fb82968c0, aBuilder=0x7fde8d4630, aCtx=0x7fb8e352a0)
    at /usr/src/debug/xulrunner-qt5-78.15.1+git33.2-1.21.1.jolla.aarch64/layout/
    painting/nsDisplayList.cpp:5031
5031	void nsDisplayBorder::Paint(nsDisplayListBuilder* aBuilder, gfxContext* aCtx) {
(gdb) bt
#0  nsDisplayBorder::Paint (this=0x7fb82968c0, aBuilder=0x7fde8d4630,
    aCtx=0x7fb8e352a0)
    at /usr/src/debug/xulrunner-qt5-78.15.1+git33.2-1.21.1.jolla.aarch64/layout/
    painting/nsDisplayList.cpp:5031
#1  0x0000007ff44e9bdc in mozilla::FrameLayerBuilder::PaintItems
    (this=this@entry=0x7fb8df2cc0, aItems=std::vector of length 60, capacity
    118 = {...}, aRect=..., aContext=aContext@entry=0x7fb8e352a0,
    aBuilder=aBuilder@entry=0x7fde8d4630, aPresContext=aPresContext@entry=
    0x7fb90afd30, aOffset=..., aXScale=<optimized out>, aYScale=<optimized out>)
    at /usr/src/debug/xulrunner-qt5-78.15.1+git33.2-1.21.1.jolla.aarch64/layout/
    painting/FrameLayerBuilder.cpp:7093
[...]
#46 0x0000007fef65b89c in ?? () from /lib64/libc.so.6
(gdb) 
I then ran the same command with the same breakpoints using the new ESR 91 executable. Here's what I got:
Thread 8 "GeckoWorkerThre" hit Breakpoint 5, nsDisplayBorder::Paint
    (this=0x7f88f05aa8, aBuilder=0x7f9f3d1268, aCtx=0x7f88db8e70)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/layout/
    painting/nsDisplayList.cpp:5138
5138	void nsDisplayBorder::Paint(nsDisplayListBuilder* aBuilder, gfxContext* aCtx) {
(gdb) bt
#0  nsDisplayBorder::Paint (this=0x7f88f05aa8, aBuilder=0x7f9f3d1268,
    aCtx=0x7f88db8e70)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/layout/
    painting/nsDisplayList.cpp:5138
#1  0x0000007fbc3a5584 in mozilla::FrameLayerBuilder::PaintItems
    (this=this@entry=0x7ef0005c10, aItems=std::vector of length 60, capacity
    64 = {...}, aRect=..., aContext=aContext@entry=0x7f88db8e70,
    aBuilder=aBuilder@entry=0x7f9f3d1268, aPresContext=aPresContext@entry=
    0x7f88b6d270, aOffset=..., aXScale=1, aYScale=<optimized out>)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/layout/
    painting/FrameLayerBuilder.cpp:7112
[...]
#49 0x0000007fb78b389c in ?? () from /lib64/libc.so.6
(gdb) 
So they both look very similar and they're both being correctly called. There's not much else to be gained from these backtraces, except that it gives us a platform to work from. The question now is: what happens inside these methods and inside the methods they call.

Manually digging down into the code, I've read all the way down to nsCSSRendering::PaintBorderWithStyleBorder() but not yet quite as far as nsImageRenderer::Draw(). I'm definitely getting closer to the GLES or other commands that actually perform the rendering.

Let's look in more detail at this nsCSSBorderRenderer::DrawBorders() method. Inside the drawing calls are all methods that belong to the mDrawTarget instance of the class on ESR 78. We can see from the code that this is a pointer to an instance of type DrawTarget. What's harder to tell is whether it's actually this, or some other class that inherits from DrawTarget.

Stepping through the code allows us to check this.
(gdb) explore mDrawTarget
'mDrawTarget' is a pointer to a value of type 'nsCSSBorderRenderer::DrawTarget'
Continue exploring it as a pointer to a single value [y/n]: y

The value of '*mDrawTarget' is of type 'nsCSSBorderRenderer::DrawTarget' which
    is a typedef of type 'mozilla::gfx::DrawTarget'
The value of '*mDrawTarget' is a struct/class of type 'mozilla::gfx::DrawTarget'
    with the following fields:

  mozilla::external::AtomicRefCounted<mozilla::gfx::DrawTarget>
                  = <Enter 0 to explore this base class of type
                    'mozilla::external::AtomicRefCounted<mozilla::gfx::DrawTarget>'>
        mUserData = <Enter 1 to explore this field of type 'mozilla::gfx::UserData'>
       mTransform = <Enter 2 to explore this field of type 'mozilla::gfx::Matrix'>
      mOpaqueRect = <Enter 3 to explore this field of type 'mozilla::gfx::IntRect'>
  mTransformDirty = true .. (Value of type 'bool')
mPermitSubpixelAA = true .. (Value of type 'bool')
          mFormat = mozilla::gfx::SurfaceFormat::B8G8R8X8 ..
                    (Value of type 'mozilla::gfx::SurfaceFormat')

Enter the field number of choice: 
(gdb) 
This is useful. It tells us that the actual class is of type nsCSSBorderRenderer::DrawTarget. It's possible this will be overriding some methods, so we should look at this class carefully. Moreover, it's good to see that this mDrawTarget exists in very similar ways for both builds.

As a result of looking through all this code I'm fairly confident now that the rendering is happening to the render target and that it's the render target itself that's the problem, either because it's not capturing the result or (which I think is the more likely) the render target simply isn't making it on to the screen.

But this rendering is happening at the sharp end. If I need to consider the render target I'll need to move right out from the micro to the macro and consider what's happening elsewhere in the rendering pipeline.

What this means, as has generally been the case so far, is that I've made progress, eliminated one line of enquiry, and given me more focus on where the problem is hiding. And also a whole bunch of new things to check.

We'll get there.

For all the other entries in my developer diary, check out the Gecko-dev Diary page.
Comment
27 Oct 2023 : Day 72 #
After my distraction with moderation yesterday I'm back on track again today. You may recall I'd been looking at the BrowsingContext and why it isn't active by the time the first paint happens. I'd been comparing the execution path on an ESR 78 build with that on an ESR 91 build by stepping through both with the debugger. My aim was to find out where they diverged in a way that might cause the context to be active on one, but inactive on the other.

I continued today. There are two big difficulties. First of all the code changes that cause the problem are in the EmbedLite code and aren't reflected in the other windowing approaches (e.g. on Android, Gtk or Windows). There seems to be a hint of related code in a snippet of nsCocoaWindow.mm in the changeset. But to make things harder, changes have been layered on top of that changeset so that they're no longer entirely valid.

This means ESR 78 doesn't provide a good example to compare against any more. And because no relevant methods are firing in ESR 91 it's very hard to determine where any changes are actually needed in order to make the various methods that I need to fire actually, well, fire! So to speak.

Nevertheless today I persevered. There must be an answer to this after all (that's what I always tell myself: with open source code there's always an answer, even if it involves rewriting everything from scratch). Looking through the code I notice many hints, such as the code that suspends and resumes the compositor that's part of the CompositorBridge classes. But mostly nothing concrete.

But then a key realisation is that the code for setting the active state is actually in qtmozembed rather than gecko. I only realise this because the backtrace on some calls mysteriously vanishes way too quickly. That turned out to be because the qtmozembed methods weren't being picked up by the debugger.

The latest changes, then, are to call BrowsingContext::SetExplicitActive() directly in the EmbedLiteViewChild::RecvSetIsActive() method, like this:
  Unused << docShell->GetBrowsingContext()->SetExplicitActive(
      dom::ExplicitActiveStatus::Inactive);
Having decided this is what I wanted, the task is then to figure out how to get the SetExplicitActive() method into scope. Getting methods isn't always straightforward. It usually involves following a breadcrumb path of pointers, class inheritance and Mozilla's magical runtime introspection GetInterface() calls. After various failures I eventually settle on the following for getting hold of the BrowsingContext. You'll notice I'm skipping a null check that might turn out to be necessary, but I'm hoping it will be okay.
  nsCOMPtr docShell = do_GetInterface(mWebNavigation);
  if (NS_WARN_IF(!docShell)) {
    return IPC_OK();
  }

  Unused << docShell->GetBrowsingContext()->SetExplicitActive(
      dom::ExplicitActiveStatus::Inactive);
The partial build went through and I'm now transferring the massive 2.8 GiB resulting library over to my phone. It's big because it contains debug symbols, but I need those so I can put breakpoints on and step through the code.

After installing and debugging the new library, the good news is that the nsDisplayList::PaintRoot() method is now being called multiple times during the render, rather than not at all. This is excellent.

There are two pieces of bad news. The first is that the user interface has gone back to a state where it gets stuck. The second is that there's still no image being rendered to the screen.

But this feels like a step forwards. Tomorrow I'll have to figure out where the rendered material is actually ending up and why that's not the screen.

In the meantime, for all the other entries in my developer diary, check out the Gecko-dev Diary page.
Comment
26 Oct 2023 : Day 71 #
Yesterday I was still digging around trying to figure out why the BrowsingContext isn't active by the time the first paint happens. I stepped carefully through the code on an ESR 91 build and also on an ESR 78 build for comparison.

I'd every intention of continuing today. Unfortunately I got sidetracked.

There was an excellent Sailfish Community Meeting this morning where the topic of forum moderation came up. The Sailfish Forum has had a number of problematic threads recently. Since leaving Jolla I've not been doing any moderation on the forum for multiple reasons, but during the Community Meeting I offered to try to take a more active role again.

Since Jolla was happy with this I've spent the evening working through flagged posts on the forum to try to decide how best to proceed with them. Moderation is notoriously difficult and I'm notoriously bad at making decisions, so this took some time.

So unfortunately the result is that I've not had time to do any gecko development today. Hopefully I can continue my debugging session tomorrow. I'll be travelling by train, which hopefully will give me the opportunity!

Don't forget that for the other entries in my developer diary, you can check out the Gecko-dev Diary page.
Comment
25 Oct 2023 : Day 70 #
A small win this morning. The changes I've been making over the last week to the embedhelper.js code caused problems — or at least I though they did — with the responsiveness of the Qt user interface layer of the browser. I went as far as reverting the changes to try to fix this.

Last night I'd reverted them all and still found the problem persisted. So I made one more change, to remove the forced activation state. That meant an overnight rebuild. This morning the issue is resolved.

So it turned out not to be the embedhelper.js change after at all, but rather the change I'd added in directly prior to that. My question is now whether I should reintroduce the embedhelper.js changes or not. But I have something else to do before that.

Over the last few days I've also been digging into the render code more. What I really want to know is what happens inside the nsDisplayList::PaintRoot() method, assuming it's getting called. I plan to find out.

Immediately on trying to investigate this I hit my first problem, which is that while it's called cleanly on ESR 78, it's not called at all on ESR 91. I know exactly why that is; it's because the compositor isn't active. That's why I made the change I reverted last night. So I need to get it activated, and I want to get it activated higher up the stack than before to avoid it introducing this user interface glitch.

You might ask why I don't just live with the user interface glitch. It's because it messes up control of the browser, which will make things far harder to test and debug. So I want to get to the bottom (or in this case, in relation to the stack, the top) of it.

Here, for reference, is the change that's caused the problems:
--- a/docshell/base/BrowsingContext.cpp
+++ b/docshell/base/BrowsingContext.cpp
@@ -375,6 +375,7 @@ already_AddRefed<BrowsingContext> BrowsingContext::CreateDetached(
     if (aType == Type::Content) {
       // Content gets managed by the chrome front-end / embedder element and
       // starts as inactive.
-      return ExplicitActiveStatus::Inactive;
+      return ExplicitActiveStatus::Active;
     }
There is this comment there: "Content gets managed by the chrome front-end / embedder element and starts as inactive." I need to figure out what that means in practice, so I'm going to review, again, the changes that brought us here.

First of all there's this commit from December 2020. This makes big changes to the way the IsActive flag gets set. In particular, it essentially moves it from being the job of the docShell to manage to being the job of the presShell to manage.
$ git log -1 3987c781d028e
commit 3987c781d028e4edc599659f0776d26b747bfbd6
Author: Emilio Cobos Álvarez <emilio@crisal.io>
Date:   Fri Dec 11 15:43:19 2020 +0000

    Bug 1635914 - Move active flag handling explicitly to BrowsingContext. r=nika
    
    And have it mirror in the parent process more automatically.
    
    The docShellIsActive setter in the browser-custom-element side needs to
    be there rather than in the usual DidSet() calls because the
    AsyncTabSwitcher code relies on getting an exact amount of notifications
    as response to that specific setter. Not pretty, but...
    
    BrowserChild no longer sets IsActive() on the docshell itself for OOP
    iframes. This fixes bug 1679521. PresShell activeness is used to
    throttle rAF as well, which handles OOP iframes nicely as well.
    
    Differential Revision: https://phabricator.services.mozilla.com/D96072
In theory part of the fix is to switch from using the docShell to the presShell in the EmbedLite code that sets the active flag; so I'm amending it like this:
--- a/embedding/embedlite/embedshared/EmbedLiteViewChild.cpp
+++ b/embedding/embedlite/embedshared/EmbedLiteViewChild.cpp
@@ -626,12 +626,12 @@ mozilla::ipc::IPCResult EmbedLiteViewChild::RecvSetIsActive(const bool &aIsActiv
     widget->SetActive(aIsActive);
   }
 
-  // Update state via DocShell -> PresShell
-  nsCOMPtr<nsIDocShell> docShell = do_GetInterface(mWebNavigation);
-  if (NS_WARN_IF(!docShell)) {
+  if (NS_WARN_IF(!presShell)) {
     return IPC_OK();
   }
 
+  presShell->SetIsActive(aIsActive);
+
   mWidget->Show(aIsActive);
   mWebBrowser->SetVisibility(aIsActive);

Unfortunately this fails because PresShell:SetIsActive() is a private method. That surprised me because I could see it being used very similarly in the earlier upstream change set.

The reason is that the change that made it private came later. Here it is, from July 2021:
$ git log -1 a5f4c42f89d16
commit a5f4c42f89d1688dd037dfe7eeaedb1699d59e3c
Author: Emilio Cobos Álvarez <emilio@crisal.io>
Date:   Mon Jul 5 17:31:48 2021 +0000

    Bug 1717983 - Improve PresShell active flag handling. r=nika
    
    This moves the logic of whether a pres shell should be active to a
    single place to make it sane to reason about, and fixes the
    subdocument propagation when a BrowserChild becomes visible.
    
    Differential Revision: https://phabricator.services.mozilla.com/D118703
This does more than just make the method private. It switches it for a method PresShell::ActivenessMaybeChanged() that takes no parameters, but determines whether or not the document is active using some other criteria, rather than it being set explicitly.

I tried replacing the call to DocShell::IsActive() to use this magical method instead, but it didn't fix things. So I also tried updating the class to make the PresShell::SetIsActive() method public again. That had no obviously beneficial effect. But I'm building from here.

For completeness the call to EmbedLiteViewChild::RecvSetIsActive() that calls it is happening and PresShell::SetIsActive() is being called to activate the document. So my next step is to follow the calls to PresShell::ActivenessMaybeChanged() and PresShell::SetIsActive() in case one or other is deactivating the document.

If that's not the case, then maybe it's because the flag isn't propagating down to the level of nsLayoutUtils::PaintFrame(), which is the method that should be calling PaintRoot().

Incidentally, PresShell::ActivenessMaybeChanged() works by calling PresShell::ShouldBeActive(), which is returning false because in the following code bc->IsActive() is returning false:
  BrowsingContext* bc = doc->GetBrowsingContext();
  MOZ_LOG(gLog, LogLevel::Debug,
          (" > BrowsingContext %p  active: %d", bc, bc && bc->IsActive()));
  return bc && bc->IsActive();
Stepping through the code there's now a bit of back-and-forth enabling and disabling the activeness. The result of bc->IsActive() isn't always false but it looks like it would be worthwhile following this code manually to figure out the circumstances under which it's true or not. But even when the state is set to active, the PaintRoot() method still doesn't see to be being called. So in short there's still much more work to be done here.

I'll continue digging in to this tomorrow.

For all the other entries in my developer diary, check out the Gecko-dev Diary page.
Comment
24 Oct 2023 : Day 69 #
Yesterday I was grappling with a full reinstall of my Sailfish SDK. Everything is upgraded, shiny and fresh now. I'm still getting used to the new quirks. I had hoped to run a build overnight but it failed repeatedly while attempting to create the mach environment, stumbling each time while trying to download and install Python modules.

That was using my rock-solid home broadband. This morning I'm sitting on the train between Cambridge and London enjoying the temperamental Great Northern Line train Wifi. It's not what I'd call rock solid (it's free; I'm not complaining), but despite that the modules were downloaded and installed first time. What gives?

Anyway, the important thing is that the build is happily chugging away. I now have eight hours of work ahead of me, which should be long enough to get the build through, assuming everything works as expected (which is rarely the case, but hope springs eternal!).

[..]

I'm now on another train, this time enjoying free Wifi from Thameslink. Wifi, but no power outlets. I'm squeezing the last juice out of my laptop to finish the gecko build I started this morning. It's been running all day and as we pull in to Lechworth Garden City (sounds better than it is) is writing out the final rpm packages. That last package is finally written out as we arrive in to Baldock (sounds worse than it is). Sadly that doesn't leave time to do any proper development today.

I'll be hoping for a more productive day tomorrow.

For all the other entries in my developer diary, check out the Gecko-dev Diary page.
Comment
23 Oct 2023 : Day 68 #
Sometimes things go wrong. Last night I tried to build qtmozembed to avoid the crash that I was looking into related to embedhelper.js. The package refused to build because the SDK couldn't access the package repositories. There are reasons for this which make sense but which I don't want to go in to here.

The solution to my problem requires reinstalling my build SDK targets. This is a lengthy process at the best of times, but made more so by the intricate configuration that gecko needs to build.

So it's frustrating that this is necessary at all, but in some sense it couldn't have come at a better time. Having just been through the entire process from start to finish, I know all of the steps needed. In theory if I go through the same steps again, things will just work. Famous last words.

Things are never quite so simple of course. The first thing I try to do is remove the build target containing all of the cruft and that needs replacing. This should be a really straightforward step.
$ sfdk tools target remove SailfishOS-4.5.0.18-aarch64
IFW Version: 3.2.3, built with Qt 5.15.2.
Build date: Jan  4 2023
Installer Framework SHA1: fa8a71c1
[0] Language: en-GB
[...]
[837] Cannot proceed to the next step, canceling. Possible reason: No changes in component selection
[840] Created question message box "cancelInstallation": "Sailfish SDK Question", "Do you want to quit the maintenance application?"
Failed to uninstall installer-provided packages
Not straightforward today then. After playing around for a bit, attempting to remove the Early Access targets I have installed instead, the updated goes into a frenzy. I asked it to remove something, but now it's downloading something. Fine, I'm sure it knows what it's doing!

[...]

I'm not entirely sure that it did know what it was doing as it seems to have removed the sfdk binary completely. I'm switching to using the SDKMaintenanceTool to try to fix things and if that doesn't work it'll be time for a root-and-branch reinstall.
 
The SDKMaintenanceTool downloading components.

There's a good chance fixing my SDK will be the only thing I get done gecko-wise today. Just the download looks like it'll take at least an hour.
 
The SDKMaintenanceTool showing an error: Execution failed.

Normal service will be resumed shortly.

[...]

It's now the evening and the new SDK has installed. There are some new and interesting changes for me to take on board, such as the fact that snapshots are no longer shown by default in the target list or at the start of each build. That threw me for a bit. But it's just the way the newer SDK works.

On the positive side qtmozembed — the thing that I really want right now — built without me having to perform any forced install shenanigans. The build engine pulled in the packages it required (including the xulrunner-qt5-devel package that contains the gecko libraries) directly from my local repository. Nice!

Sadly that's all I can get done for today. Sometimes a couple of hours just isn't enough. The good news is that I have more time to spend on development over the next couple of days. I'm looking forward after two weeks to finally having some time to really think about this.

For all the other entries in my developer diary, check out the Gecko-dev Diary page.
Comment
22 Oct 2023 : Day 67 #
Over the last few days I've spent some time testing out and writing up instructions for how to build gecko. With that done I'm now back to the broken renderer, the fix for which is still eluding me.

This evening I've been looking through the nsLayoutUtils::PaintFrame() method. I know the method is being called, but I'd like to be sure that an attempt is being made to actually render something, which means figuring out what happens inside the method.

Reading carefully through the code is quite time-consuming, but also doesn't leave me with much to write about today as I've mostly just been staring at code. Hopefully this will pay off both in terms of having something more interesting to write about in the future, and finding a solution to the rendering problem.

Tomorrow I intend to step through the code with the debugger to find out what is — and isn't — getting called. Unfortunately right now, just from reading the code, it's not at all clear what the active ingredient is. It looks to me like it's this call to PaintRoot():
  RefPtr layerManager = list->PaintRoot(
      builder, aRenderingContext, flags, Some(geckoDLBuildTime));
But until I've done the debugging I can't be certain. So that's it for today. Hopefully I'll have more to say in the coming days.

For all the other entries in my developer diary, check out the Gecko-dev Diary page.
Comment
21 Oct 2023 : Day 66 #
Over the last couple of days I've been putting together instructions for how to build gecko ESR 91 from scratch. I started out by spinning up an AWS instance running Ubuntu 22.04 and went from start (installing the SDK) to finish (building all the packages). I then followed that with the same thing again, but this time with feeling... but also more importantly context. That is I wrote out the instructions and explained my working as I went along

Today I'm doing something similar, but instead of writing it here I'm posting the instructions to the sailfish-browser wiki. There may be better places to put these and I'm very open to suggestions on this, but at least for now this acts as a good single source for the instructions that others can edit too.

[...]

The instructions are now up on the wiki. It's just a short diary entry today because I did most of the work over there. Hopefully this can be useful for someone and if you do give building gecko, please do also feedback so it can be improved.

For all the other entries in my developer diary, check out the Gecko-dev Diary page.
Comment
20 Oct 2023 : Day 65 #
Yesterday I worked out all the steps needed to build gecko-dev on a freshly installed Ubuntu 22.04 instance running in the cloud. It's a lengthy and complex process, so today I'm going to go through the same steps and try to explain what's going on with them.

There are a couple of reasons it's so complex. Mostly it's because there are slight difference between OBS and the local SDK which means that some manual steps need to be performed if you're planning to build locally.

I'm going to write out the steps here. In practice these are likely not to be entirely correct for everyone. But I'm hoping for feedback to allow the steps to be refined until everything works. To that end I'll also aim to put the instructions up on the sailfish-browser project wiki on GitHub, so that others can comment there if they're inclined to.

Let's go.

Step 1: Install the Sailfish SDK

Before installing the SDK you should make sure you have docker set up and working on your system. I strongly recommend adding your user to the docker group to avoid having to run your docker commands as root. You can also use a VirtualBox VM instead, but my preference is to use docker.

Now you're ready to install the Sailfish SDK. All the testing I've done has been with the aarch64 target, but there's nothing stopping you trying with any of the others.

Instructions for installing the SDK can be found on the [Sailfish Docs site](https://docs.sailfishos.org/Tools/Sailfish_SDK/).

At time of writing the latest versions are:
  1. Sailfish SDK 3.1.0.
  2. Build target aarch64 Early Access 4.5.0.18
 
The SDKMaintenanceTool window showing my currently installed SDK components.


I am personally using an Early Access target, but the latest release target works just fine as well. Here's my configuration:
  1. Sailfish SDK 3.1.0.4
  2. Docker build engine
  3. Build target aarch64 4.5.0.16
Yesterday when I was testing it I used 4.5.0.18 and that worked just as well. Although on my development machine I've installed QtCreator using the graphical tool, all of my development is done on the command line and yesterday I did everything — from installation up — entirely at the command line, so the commands are there if want to install things headlessly. But even if you install things using the graphical tool that doesn't prevent you executing everything at the command once you have things installed.

I like to add the following line to my ~/.bashrc file which allows me to use the sfdk command directly.
alias sfdk=/home/flypig/Programs/sailfish-sdk/sailfish-sdk/bin/sfdk
If you don't want it to persist you can just execute the command directly in your shell.

The latest targets will be installed automatically when you install the SDK, but if you want to install a new target from the command line, you can do so with the following commands.
$ sfdk tools target list -a
$ sfdk tools target install SailfishOS-4.5.0.18-aarch64
All of this is standard SDK stuff. I've you've got a Sailfish SDK installed with the latest targets, you'll already be set.

Step 2: Configure your SDK

I recommend you clone your target. Snapshots are already there to keep your targets clean, but in the context of this walkthrough cloning has two benefits. First it avoids any mishaps with snapshots. If something goes wrong you can just delete the clone and restart. Second it allows us to use a consistent name for the target, irrespective of the actual version you're using.

Here's how I clone it. I've called the clone SailfishOS-devel-aarch64 because it's the target I do my development in.
$ sfdk tools target clone SailfishOS-4.5.0.18-aarch64 SailfishOS-devel-aarch64
From now on I'll use this "devel" target in commands, but if you choose not to clone your target, or choose a different name, you'll need to amend the commands accordingly.

For building gecko it's especially helpful — and necessary for the commands here to work in fact — to create a unified output folder using the output-prefix configuration option of the SDK. Usually when you build something with the SDK it will place the resulting packages in a folder relative to where the build takes place (either inside the project folder or in a folder at the same level, depending on whether you're performing in-tree or out-of-tree builds). We want all of the packages to end up in a single folder. And we want the folder to act as a repository for our build dependencies too.

To this end we run the following commands to set up an output folder. I've chosen ~/RPMS but you're safe to set this to anything you like, as long as you don't change it midway through.
$ mkdir ~/RPMS
$ sfdk config --global --push output-prefix ~/RPMS
Note that the --global flag will make this change persist across reboots. If you want to use a different arrangements later or for other builds, you'll need to explicitly change it.

We're also going to set up a specific target to use to make future commands easier to execute and to avoid us accidentally building something for the wrong target (which is surprisingly easy to do):
$ sfdk config --global --push target SailfishOS-devel-aarch64
Here's how things look on my system after we've done this:
$ sfdk config
# ---- command scope ---------
# 

# ---- session scope ---------
# 

# ---- global scope ---------
output-prefix = ~/RPMS
target = SailfishOS-devel-aarch64
Step 3: Increase swap memory

Gecko takes a lot of memory to build and if you don't have enough the tooling is liable to spit out cryptic segmentation fault errors. I found total memory of 32 GiB to be too little but 48 GiB to be enough. For what it's worth, this is what I have on a machine with 32.0 GiB of actual RAM.
flypig@chattan:~$ swapon --show
NAME      TYPE      SIZE   USED PRIO
/swapfile file       16G 755.5M   -2
/dev/dm-2 partition 976M     0B   -3
If you don't have sufficient physical memory you should create sufficient swap space to make up for it. The exact method you use to increase your swap size will depend on your distro and how you prefer to set things up, but it shouldn't be hard to do. For example, I just created a swap file and that was that. Here's what I did to configure an extra 16 GiB of swap (these commands are just illustrative; I don't recommend you run these commands as they are; check the docs for your distribution instead).
$ sudo dd if=/dev/zero of=/swapfile bs=16384 count=1048576
$ sudo chmod 600 /swapfile
$ sudo mkswap /swapfile
$ sudo swapon /swapfile
$ echo 'swapfile swap swap defaults 0 0' | sudo tee -a /etc/fstab
Step 4: Configure the build engine

Now we're into the gecko-specific stuff. For this you'll need to do some configuration of your build engine. These steps are counter intuitive, because you have to install various clang components into the SDK execution engine. We'd usually be installing things into a target rather than the build engine. We need to do this because the non-i486 builds still need access to the i486 clang library and gcc compiler suite.
$ sfdk engine exec
$ sudo zypper -n install clang-libs llvm-libs gcc-c++
$ exit
For context, the reason this is needed is that the build chain checks the clang version by dynamically loading it and checking that it contains various link points. This is done in the i486 environment of the tooling, so we need to have an i486 version of clang floating around in order for this to correctly complete.

There's an additional complexity in that Rust is used in two ways when building gecko. On the one hand some of the Rust code is compiled and incorporated into the final library binary. On the other hand some of the Rust code is compiled and incorporated into the build process itself. The former must output target-specific code (e.g. aarch64) whereas the latter must output tooling-specific code (usually i486). So we need both.

It's true that Rust can be cross-compiled, but scratchbox takes precedence in the Sailfish SDK, so we have to play by its rules.

Depending on whether you're running an aarch64 or an i486 build you may also need to apply, or remove, a semaphore fix. I'd ignore this unless you run into problems.

Step 5: Get the code

We're now ready to get all the ESR 91 gecko and related code; there's quite a lot of it. This is how I'm doing it right now, but hopefully in the near future the repository can be changed to use the official sailfishos repositories instead.

If you plan to make changes to this code you may also want to fork the repositories and use your own, but it's also fine to deal with that later if you need to.

First the pre-requisites. These are the packages you'll need to build gecko.
$ git clone -b master --recurse-submodules https://github.com/sailfishos/rust-cbindgen.git
$ git clone -b master https://github.com/sailfishos/nspr.git
$ git clone -b sailfishos-esr91 --recurse-submodules --depth 256 https://github.com/llewelld/gecko-dev.git
Note that the gecko code is really big, made worse by the history going back twenty five years.
$ git log --reverse 
commit 3b56a9af51519d2e77e05efa672a13e6be2e9ebc
Author: ltabb <ltabb>
Date:   Sat Mar 28 02:44:41 1998 +0000

    Free the lizard
Hence I've restricted the depth to the last 256 commits, which should be plenty for most purposes.

Second the post-requisites, by which I mean the packages that depend on gecko. You don't need to build these to build gecko, but you will need to have them built against your version of gecko if you want to install the updated browser on your device.
$ git clone -b sailfishos-esr91 https://github.com/llewelld/qtmozembed.git
$ git clone -b sailfishos-esr91 https://github.com/llewelld/embedlite-components.git
$ git clone -b sailfishos-esr91 https://github.com/llewelld/sailfish-browser.git
$ git clone -b master https://github.com/sailfishos/mapplauncherd-booster-browser
Step 6: Build the pre-requisites

You now have all the code. Building the pre-requisites will take some time, but hopefully will be smooth sailing as long as you've configured things correctly up until now. I'll write this as a sequence of commands, but you'll need to wait for each build to complete before moving on to the next.

In the commands below I've added a -p flag so that the prep ("prepare") stage in the spec file is execuated. On subsequent builds you should leave this out.

The rust-cbindgen build is straightforward.
$ cd rust-cbindgen
$ sfdk build -d -p
$ cd ..
The NSPR package is in the form of what's called a "dummy" structure. These work fine on OBS, but I don't know of a way to get the SDK to build them in their default form (can anyone explain how this can be done?). To get around this you can restructure the repository into something the SDK understands. The commands below perform all of this restructuring and then kick off the build. It's messy, but it should work.
$ cd nspr
$ mkdir rpm
$ mv *.patch *.spec *.changes rpm
$ sed -i -e 's/"@${SOURCE_DATE_EPOCH}"/"${SOURCE_DATE_EPOCH}"/g' rpm/nspr.spec 
$ tar -xvf nspr-4.35.tar.gz --strip-components=1
$ SOURCE_DATE_EPOCH=$(git log -1 --pretty=%ct)
$ git add -u .
$ git add nspr rpm
$ git commit -m "temp"
$ sfdk build -d -p
$ cd ..
The dependencies should be installed automatically when you build gecko, including these ones you've just built. They'll be detected by the SDK in the ~/RPMS directory and will be installed as if they were coming from a remote repository. That's why we need the output-prefix in our config.

The exception is gcc which is a pain to build. So it'll be easier to grab the pre-built packages from OBS and use those.

Note that we're installing this into the snapshot of our cloned target, not the clone itself. This is to avoid mess, but it means you'll have to repeat these steps if you delete the snapshot.
$ sfdk engine exec
$ sb2 -R -m sdk-install -t SailfishOS-devel-aarch64.default
$ zypper ar -f https://repo.sailfishos.org/obs/home:/flypig:/gecko-esr91/sailfish_latest_aarch64 gecko-esr91
$ zypper --no-gpg-checks -n ref --repo gecko-esr91
$ zypper ref
$ zypper -n install gmp mpc
$ zypper -n install --force --repo gecko-esr91 cpp gcc gcc libstdc++ libgomp libstdc++
$ zypper rr gecko-esr91
$ exit
$ exit
These commands add a remote repository from community OBS, install what we need from it, and then remove the repository again.

Step 7: Build!

You've set up and configured your SDK, installed gcc, built all of the pre-requisites. Finally you can build yourself a gecko. The build is likely to take a long time (many hours) so be prepared to set it running and forget about it for a bit.
$ cd gecko-dev/
$ sfdk -c no-fix-version build -d -p --with git_workaround
$ cd ..
For this build command we add a few flags. Here's what they do:
  1. -c no-fix-version: use the details from the spec file rather than the latest git tag to determine the package version number.
  2. -d: generate debuginfo and debugsource packages.
  3. -p: run the prepare step. On subsequent builds you should skip this flag.
  4. --with git_workaround: rename the .git directory to .git-disabled at the start of the build.
The reason for having to rename the .git directory is that cargo is used to build some of the required components and if it thinks it's building from a git repository it will cause an error when the file fingerprints its expecting don't match.

Step 8: Clean up

The build can fail for many reasons. If for example the rust-cbindgen or nspr packages don't install automatically, then you may need to install them manually (although I tend to find that running the build a second time can help here; I have no idea why this would make a difference).

If you do find you have to install local packages manually, you can do something along the following lines.
$ sfdk engine exec
$ sb2 -R -m sdk-install -t SailfishOS-devel-aarch64.default
$ rpm -U --oldpackage ~/RPM/cbindgen-0.*.rpm nspr-4.*.rpm nspr-devel-4.*.rpm
$ exit
$ exit
If the build fails before a certain point you may also have to reverse the git_workaround by renaming the .git directory manually (this will be performed automatically if the build succeeds).
$ mv .git-disabled .git
Step 8: Build the post-requisites

There are several things which depend on gecko. If you want to install gecko on your device you'll need to build these too. We already got the code in Step 5 above, so now we just need to run the SDK over it.
$ cd qtmozembed/
$ sfdk -c no-fix-version build -d -p
$ cd ..

$ cd embedlite-components/
$ sfdk build -d -p
$ cd ..

$ cd sailfish-browser
$ sfdk build -d -p
$ cd ..

$ cd mapplauncherd-booster-browser
$ sfdk build -d -p
$ cd ..
Step 9: Install the results on your device

You'll find all of the packages you built in the ~/RPMS/SailfishOS-devel-aarch64/ directory. The following are the packages you'll need to install on your device. I've left off the version numbers to hopefully make things a little clearer.
embedlite-components
mapplauncherd-booster-browser
nspr
qtmozembed-qt5
sailfish-browser
sailfish-browser-settings
xulrunner-qt5
xulrunner-qt5-misc
If you want to perform debugging, you should also install the following.
embedlite-components-qt5-debuginfo
embedlite-components-qt5-debugsource
qtmozembed-qt5-debuginfo
qtmozembed-qt5-debugsource
sailfish-browser-debuginfo
sailfish-browser-debugsource
xulrunner-qt5-debuginfo
xulrunner-qt5-debugsource
Having written these steps up, tomorrow I'm going to post up the results to the sailfish-browser repository for others to use and edit.

Don't forget, if you'd like to read more about all this gecko stuff, do take a look at my full Gecko Dev Diary.
Comment
19 Oct 2023 : Day 64 #
When we hit Day 50 I made a point about it being a milestone. We live in such a decimalised world. Well I'm more of a binary person myself, so Day 26 is a much more exciting event!

As I explained yesterday, over the last couple of days I've been figuring out the exact steps needed to build gecko from start to finish.

Today I'm going to share the results in the most basic form: as a sequence of commands with very little explanation.

So that I know the correct commands I'm firing up a cloud server instance to test out the build steps from a completely clean start. I'm going to build this on an r6a.xlarge server from AWS with 23 GiB of RAM, 4 CPUs, 128 GiB of SSD persistent storage and running Ubuntu 22.04. That means it's also going to be headless, so I'll have to do everything at the command line.

I'm using a cloud server so that it's a completely clean install. If it works here, it should work on any x86 machine with Ubuntu 22.04 installed.

Today I'm just going to literally run through all of the commands needed. Tomorrow I'll write all this up with some better explanations alongside.
# Login to fresh AWS Ubuntu 22.04 instance
$ sudo apt upgrade
$ sudo reboot
# Reboot and log straight back in

# Install and configure docker
# See https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-on-ubuntu-22-04
$ sudo apt install -y apt-transport-https ca-certificates curl software-properties-common
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \
  sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
$ echo "deb [arch=$(dpkg --print-architecture)" \
  "signed-by=/usr/share/keyrings/docker-archive-keyring.gpg]" \
  "https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
$ sudo apt update
$ apt-cache policy docker-ce
$ sudo apt install -y docker-ce
$ sudo usermod -aG docker ${USER}
# Restart the shell

# Install SDK pre-requisites
$ sudo apt install -y libxcb-glx0 libx11-xcb1 libxcb-icccm4 libxcb-image0 \
  libxcb-keysyms1 libxcb-randr0 libxcb-render-util0 libxcb-shape0 libxcb-sync1 \
  libxcb-xfixes0 libxcb-xinerama0 libxcb-xkb1 libsm6 libxkbcommon-x11-0 \
  libwayland-egl1 libegl1 libxcomposite1 libfontconfig1 libwayland-cursor0 \
  libharfbuzz0b libgl1 openssl 
# See https://forum.sailfishos.org/t/installing-sailfish-sdk-on-ubuntu-22-04/14121
$ wget http://security.ubuntu.com/ubuntu/pool/main/o/openssl/libssl1.1_1.1.1f-1ubuntu2.19_amd64.deb
$ sudo dpkg -i libssl1.1_1.1.1f-1ubuntu2.19_amd64.deb

# Download the SDK
$ mkdir -p Programs/sailfish-sdk
$ cd ~/Programs/sailfish-sdk
$ wget https://releases.sailfishos.org/sdk/installers/3.10.4/SailfishSDK-3.10.4-linux64-online.run
$ chmod +x SailfishSDK-3.10.4-linux64-online.run

# Headless install of the SDK
$ QT_QPA_PLATFORM=minimal ./SailfishSDK-3.10.4-linux64-online.run --verbose \
  non-interactive=1 accept-licenses=1 build-engine-type=docker

$ echo "alias sfdk=/home/ubuntu/SailfishOS/bin/sfdk" >> ~/.bashrc 
$ alias sfdk=/home/ubuntu/SailfishOS/bin/sfdk

# Check that things are looking okay
$ sfdk --version
SDK_RELEASE=3.10.4
SDK_RELEASE_CYCLE=Stable
SDK_CONFIG_DIR=SailfishSDK
SDK_VENDOR="Jolla"

$ sfdk tools target list
sfdk: [I] Starting the build engine…
SailfishOS-4.5.0.18-aarch64  sdk-provided,latest
SailfishOS-4.5.0.18-armv7hl  sdk-provided,latest
SailfishOS-4.5.0.18-i486     sdk-provided,latest

# Clone the latest target
$ sfdk tools target clone SailfishOS-4.5.0.18-aarch64 SailfishOS-devel-aarch64

# Configure a unified folder for our packages to be output to
$ mkdir ~/RPMS
$ sfdk config --global --push output-prefix ~/RPMS
$ sfdk config --global --push target SailfishOS-devel-aarch64

# Configure 16 GiB of swap memory
$ sudo dd if=/dev/zero of=/swapfile bs=16384 count=1048576
$ sudo chmod 600 /swapfile
$ sudo mkswap /swapfile
$ sudo swapon /swapfile
$ echo 'swapfile swap swap defaults 0 0' | sudo tee -a /etc/fstab

# Clone all the repositories
$ cd ~/Documents
# First the pre-requisites
$ git clone -b master --recurse-submodules https://github.com/sailfishos/rust-cbindgen.git
$ git clone -b master https://github.com/sailfishos/nspr.git
$ git clone -b sailfishos-esr91 --recurse-submodules --depth 256 https://github.com/llewelld/gecko-dev.git
# Second the post-requisites
$ git clone -b sailfishos-esr91 https://github.com/llewelld/qtmozembed.git
$ git clone -b sailfishos-esr91 https://github.com/llewelld/embedlite-components.git
$ git clone -b sailfishos-esr91 https://github.com/llewelld/sailfish-browser.git
$ git clone -b master https://github.com/sailfishos/mapplauncherd-booster-browser

# Build the pre-requisites
$ cd rust-cbindgen
$ sfdk build -d -p
$ cd ..

# Building NSPR is messy, but can be done
$ cd nspr
$ mkdir rpm
$ mv *.patch *.spec *.changes rpm
$ sed -i -e 's/"@${SOURCE_DATE_EPOCH}"/"${SOURCE_DATE_EPOCH}"/g' rpm/nspr.spec 
$ tar -xvf nspr-4.35.tar.gz --strip-components=1
$ SOURCE_DATE_EPOCH=$(git log -1 --pretty=%ct)
$ git add -u .
$ git add nspr rpm
$ git commit -m "temp"
$ sfdk build -d -p
$ cd ..

# Install the patched version of gcc
$ sfdk engine exec
$ sb2 -R -m sdk-install -t SailfishOS-devel-aarch64.default

$ zypper ar -f https://repo.sailfishos.org/obs/home:/flypig:/gecko-esr91/sailfish_latest_aarch64 gecko-esr91
$ zypper --no-gpg-checks -n ref --repo gecko-esr91
$ zypper -n install gmp mpc
$ zypper -n install --force --repo gecko-esr91 cpp gcc gcc libstdc++ libgomp libstdc++
$ zypper rr gecko-esr91
$ exit

# Install some requirements into the build engine (not the target!)
$ sudo zypper -n install clang-libs llvm-libs gcc-c++
$ exit

# Build gecko (this takes many hours)
$ cd gecko-dev/
$ sfdk -c no-fix-version build -d -p --with git_workaround
$ cd ..

# Build the post-requisites
$ cd qtmozembed/
$ sfdk -c no-fix-version build -d -p
$ cd ..

$ cd embedlite-components/
$ sfdk build -d -p
$ cd ..

$ cd sailfish-browser
$ sfdk build -d -p
$ cd ..

$ cd mapplauncherd-booster-browser
$ sfdk build -d -p
$ cd ..

# Let's see what we got!
$ ls ~/RPMS/SailfishOS-devel-aarch64/
cbindgen-0.19.0+git1-0.aarch64.rpm
embedlite-components-qt5-1.23.1+sailfishos.esr91.20231008210351.b410bec-1.aarch64.rpm
embedlite-components-qt5-debuginfo-1.23.1+sailfishos.esr91.20231008210351.b410bec-1.aarch64.rpm
embedlite-components-qt5-debugsource-1.23.1+sailfishos.esr91.20231008210351.b410bec-1.aarch64.rpm
mapplauncherd-booster-browser-0.2.1-1.aarch64.rpm
mapplauncherd-booster-browser-debuginfo-0.2.1-1.aarch64.rpm
mapplauncherd-booster-browser-debugsource-0.2.1-1.aarch64.rpm
nspr-4.35.0+git1+master.20231016112133.ae5e6ed-1.aarch64.rpm
nspr-debuginfo-4.35.0+git1+master.20231016112133.ae5e6ed-1.aarch64.rpm
nspr-debugsource-4.35.0+git1+master.20231016112133.ae5e6ed-1.aarch64.rpm
nspr-devel-4.35.0+git1+master.20231016112133.ae5e6ed-1.aarch64.rpm
qtmozembed-qt5-1.53.9-1.aarch64.rpm
qtmozembed-qt5-debuginfo-1.53.9-1.aarch64.rpm
qtmozembed-qt5-debugsource-1.53.9-1.aarch64.rpm
qtmozembed-qt5-devel-1.53.9-1.aarch64.rpm
qtmozembed-qt5-tests-1.53.9-1.aarch64.rpm
rust-cbindgen-debuginfo-0.19.0+git1-0.aarch64.rpm
rust-cbindgen-debugsource-0.19.0+git1-0.aarch64.rpm
sailfish-browser-1.18.14+sailfishos.esr91.20231003080304.8563696b-1.aarch64.rpm
sailfish-browser-debuginfo-1.18.14+sailfishos.esr91.20231003080304.8563696b-1.aarch64.rpm
sailfish-browser-debugsource-1.18.14+sailfishos.esr91.20231003080304.8563696b-1.aarch64.rpm
sailfish-browser-settings-1.18.14+sailfishos.esr91.20231003080304.8563696b-1.aarch64.rpm
sailfish-browser-tests-1.18.14+sailfishos.esr91.20231003080304.8563696b-1.aarch64.rpm
sailfish-browser-ts-devel-1.18.14+sailfishos.esr91.20231003080304.8563696b-1.aarch64.rpm
xulrunner-qt5-91.9.1-1.aarch64.rpm
xulrunner-qt5-debuginfo-91.9.1-1.aarch64.rpm
xulrunner-qt5-debugsource-91.9.1-1.aarch64.rpm
xulrunner-qt5-devel-91.9.1-1.aarch64.rpm
xulrunner-qt5-misc-91.9.1-1.aarch64.rpm
It took a bit of back-and-forth, but ultimately that all seemed to work, so I'm confident that these steps should work on a clean Ubuntu system. Tomorrow I'll need to write the steps up properly with a bit of helpful context.

Don't forget, if you'd like to read more about all this gecko stuff, do take a look at my full Gecko Dev Diary.
Comment
18 Oct 2023 : Day 63 #
Yesterday I tested out the browser with embedhelper.js disabled. While I was able to get it not to crash, it — perhaps not unexpectedly — did mess up the user interface. Eventually I decided to revert the change back.

It might feel like this was a wasted diversion, but it was also a good way to learn a bit more about how things are working. So hopefully not totally without merit.

This weekend I'm travelling. I'm also pretty exhausted from looking through backtraces for a hint of what the problem might be. What's more they also make for rubbish diary entries. So I've decided to take the time to do something very different.

I've had reports of at least three people attempting to build ESR 91, or being interested to do so. But the build process isn't straightforward. So rather than spend this time coding, I've decided to spend it writing up the build process. In particular, the commands needed to get it to build using the Sailfish OS SDK.

This is going to take a few days to get right. Today and tomorrow I'll do a basic run through of all the steps. This means that tomorrow there will be a dump of all the console commands needed to get a fully built (but not yet working I'm afraid) ESR 91 gecko engine.

For some people a dump of console commands will be more useful than a set of step-by-step instructions. But for others context is important. So then the day after I'll write out the full instructions with some associated explanations.

It's just a short one today, then, but that's the ebb and flow of development for you.

If you'd like to read more about all this gecko stuff, do take a look at my full Gecko Dev Diary.
Comment
17 Oct 2023 : Day 62 #
This morning the build has completed, but I didn't have a chance to work on it before work. So by the time I'm writing this it's already evening. You may recall that yesterday I was trying to avoid a crash after disabling the loading of embedhelper.js. Having installed the packages that built overnight and given them a spin, I'm still getting a crash, and frustratingly it's coming from the same place and happening for exactly the same reason. It's time for more backtraces I'm afraid.
Thread 8 "GeckoWorkerThre" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 5050]
mozilla::embedlite::BrowserChildHelper::DispatchMessageManagerMessage
    (this=0x7f885ddf80, aMessageName=..., aJSONData=...)
    at mobile/sailfishos/utils/BrowserChildHelper.cpp:219
219	  RefPtr mm = kungFuDeathGrip->GetMessageManager();
(gdb) bt
#0  mozilla::embedlite::BrowserChildHelper::DispatchMessageManagerMessage
    (this=0x7f885ddf80, aMessageName=..., aJSONData=...)
    at mobile/sailfishos/utils/BrowserChildHelper.cpp:219
#1  0x0000007fbcc861f4 in mozilla::embedlite::EmbedLiteViewChild::RecvAsyncMessage
    (this=0x7f885d6e20, aMessage=..., aData=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#2  0x0000007fba30b5b8 in mozilla::embedlite::PEmbedLiteViewChild::OnMessageReceived
    (this=0x7f885d6e20, msg__=...) at PEmbedLiteViewChild.cpp:2560
[...]
#30 0x0000007fb79cf89c in ?? () from /lib64/libc.so.6
(gdb) p mBrowserChildMessageManager
$1 = {mRawPtr = 0x0}
(gdb) 
Rather than digging down into this any further I've decided to add a null check instead. It's only a temporary thing after all. I'll do this by adding the following code into the BrowserChildHelper::DispatchMessageManagerMessage() method:
  if (NS_WARN_IF(!mBrowserChildMessageManager)) {
    return;
  }
A quick partial build later and I've transferred the new library over to my phone.

With this change it looks like not all messages are now travelling between the gecko library and the user interface chrome. But it doesn't crash any more. I'm not completely certain this is an improvement (in terms of figuring out rendering issues). So I've reluctantly decided to revert the change that necessitated this in qtmozembed.

Having reverted it I've no built myself some fresh packages and installed them. With that all in place I need to go back to checking the Active/Suspect/Resume status again.

Unfortunately it has been a very long day at work today, so I'm going to have to continue with that tomorrow. I'm frustrated at the lack of progress, but also more convinced then ever that it'll get there.

As always, if you'd like to read more about all this gecko stuff, do take a look at my full Gecko Dev Diary.
Comment
16 Oct 2023 : Day 61 #
This morning I've woken up to two things I like. The autumn rain pouring down on the world outside (while I stay cosy and warm inside) and a completed build from yesterday. There's no better way to meet the day!

During the community meeting last week Raine pointed out that it might be sensible to comment out loading of embedhelper.js; so I thought I'd give that a go today. The code that does this is in qtmozembed:
    loadFrameScript(QStringLiteral("chrome://embedlite/content/embedhelper.js"));
After commenting this out the browser now crashes almost immediately on start up. We got ourselves a backtrace:
Thread 8 "GeckoWorkerThre" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 31158]
mozilla::embedlite::BrowserChildHelper::DispatchMessageManagerMessage
    (this=0x7f8894f540, aMessageName=..., aJSONData=...)
    at mobile/sailfishos/utils/BrowserChildHelper.cpp:219
219	  RefPtr mm = kungFuDeathGrip->GetMessageManager();
(gdb) bt
#0  mozilla::embedlite::BrowserChildHelper::DispatchMessageManagerMessage
    (this=0x7f8894f540, aMessageName=..., aJSONData=...)
    at mobile/sailfishos/utils/BrowserChildHelper.cpp:219
#1  0x0000007fbcc861f4 in mozilla::embedlite::EmbedLiteViewChild::RecvAsyncMessage
    (this=0x7f885d83b0, aMessage=..., aData=...)
    at ${PROJECT}/..//obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#2  0x0000007fba30b5b8 in mozilla::embedlite::PEmbedLiteViewChild::
    OnMessageReceived (this=0x7f885d83b0, msg__=...) at PEmbedLiteViewChild.cpp:2560
#3  0x0000007fba2f6e24 in mozilla::embedlite::PEmbedLiteAppChild::
    OnMessageReceived (this=, msg__=...)
    at ${PROJECT}/..//obj-build-mer-qt-xr/dist/include/mozilla/ipc/
    ProtocolUtils.h:675
#4  0x0000007fba1e3630 in mozilla::ipc::MessageChannel::DispatchAsyncMessage
    (this=this@entry=0x7f888b7658, aProxy=aProxy@entry=0x7ec4001b40, aMsg=...)
    at ${PROJECT}/..//obj-build-mer-qt-xr/dist/include/mozilla/ipc/
    ProtocolUtils.h:675
#5  0x0000007fba1f20ac in mozilla::ipc::MessageChannel::DispatchMessage
    (this=this@entry=0x7f888b7658, aMsg=...)
    at ipc/glue/MessageChannel.cpp:2001
#6  0x0000007fba1f3504 in mozilla::ipc::MessageChannel::RunMessage
    (this=0x7f888b7658, aTask=...)
    at ipc/glue/MessageChannel.cpp:1860
#7  0x0000007fba1f3664 in mozilla::ipc::MessageChannel::MessageTask::Run
    (this=0x5555e39de0)
    at ${PROJECT}/..//obj-build-mer-qt-xr/dist/include/mozilla/ipc/
    MessageChannel.h:588
#8  0x0000007fb9e03bc0 in mozilla::RunnableTask::Run (this=0x55559588f0)
    at ${PROJECT}/..//obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
[...]
#30 0x0000007fb79cf89c in ?? () from /lib64/libc.so.6
(gdb) p kungFuDeathGrip
$1 = {mRawPtr = 0x0}
(gdb) p mBrowserChildMessageManager
$2 = {mRawPtr = 0x0}
(gdb) 
The reason for the crash is pretty clear from those null pointers at the end. So why is mBrowserChildMessageManager null? It's supposed to be instantiated in InitBrowserChildHelperMessageManager() like this:
bool
BrowserChildHelper::InitBrowserChildHelperMessageManager()
{
  mShouldSendWebProgressEventsToParent = true;

  if (mBrowserChildMessageManager) {
    return true;
  }

  nsCOMPtr window = do_GetInterface(WebNavigation());
  NS_ENSURE_TRUE(window, false);
  RefPtr chromeHandler(window->GetChromeEventHandler());
  NS_ENSURE_TRUE(chromeHandler, false);

  RefPtr scope = mBrowserChildMessageManager =
      new BrowserChildHelperMessageManager(this);

  MOZ_ALWAYS_TRUE(nsMessageManagerScriptExecutor::Init());

  nsCOMPtr root = do_QueryInterface(chromeHandler);
  if (NS_WARN_IF(!root)) {
    mBrowserChildMessageManager = nullptr;
    return false;
  }
  root->SetParentTarget(scope);

  RefPtr wasvc = JSActorService::GetSingleton();
  wasvc->RegisterChromeEventTarget(scope);

  return true;
}
By stepping through the code I can see that it's not getting as far as constructing the BrowserChildHelperMessageManager() because do_GetInterface(WebNavigation()) is returning null and the method is being exited on this line:
  NS_ENSURE_TRUE(window, false);
This line is the Mozilla way (NS stands for "Netscape" I think) "Ensure window is non-zero; if it isn't, then leave the method with a return value of false".

The underlying problem then, is that mWebNavigation isn't being set, and the only way that can happen is through a call to BrowserChildHelper::SetWebNavigation().

It turns out that this is being called, but too late in the day. So it may be that this can be fixed by tightening up the ordering. Or it may be that the null value just needs to be masked out.

These two are in the wrong order. Here's the first call that should be setting the value:
Thread 8 "GeckoWorkerThre" hit Breakpoint 1, mozilla::embedlite::
    BrowserChildHelper::InitBrowserChildHelperMessageManager
    (this=this@entry=0x7f885d8780)
    at mobile/sailfishos/utils/BrowserChildHelper.cpp:253
253	{
(gdb) bt
#0  mozilla::embedlite::BrowserChildHelper::InitBrowserChildHelperMessageManager
    (this=this@entry=0x7f885d8780)
    at mobile/sailfishos/utils/BrowserChildHelper.cpp:253
#1  0x0000007fbcc9c80c in mozilla::embedlite::BrowserChildHelper::
    BrowserChildHelper (this=0x7f885d8780, aView=,
    aId=)
    at mobile/sailfishos/utils/BrowserChildHelper.cpp:103
#2  0x0000007fbcc912f4 in mozilla::embedlite::EmbedLiteViewChild::InitGeckoWindow
    (this=0x7f885d6960, parentId=, parentBrowsingContext=
    0x0, isPrivateWindow=false, isDesktopMode=false)
    at ${PROJECT}/..//obj-build-mer-qt-xr/dist/include/mozilla/cxxalloc.h:33
#3  0x0000007fbcc829f4 in mozilla::detail::RunnableMethodArguments::
    applyImpl, StoreRefPtrPassByPtr
    , StoreCopyPassByConstLRef,
    StoreCopyPassByConstLRef, 0ul, 1ul, 2ul, 3ul>
    (args=..., m=, o=)
    at ${PROJECT}/..//obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:280
[...]
#28 0x0000007fb79cf89c in ?? () from /lib64/libc.so.6
(gdb) 
And here's the second one that's the trigger that allows the first call to work successfully:
Thread 8 "GeckoWorkerThre" hit Breakpoint 2, mozilla::embedlite::
    BrowserChildHelper::SetWebNavigation
    (this=0x7f885d8780, aWebNavigation=0x7f885d9878)
    at mobile/sailfishos/utils/BrowserChildHelper.cpp:702
702	void BrowserChildHelper::SetWebNavigation(nsIWebNavigation *aWebNavigation) {
(gdb) bt
#0  mozilla::embedlite::BrowserChildHelper::SetWebNavigation (this=0x7f885d8780,
    aWebNavigation=0x7f885d9878)
    at mobile/sailfishos/utils/BrowserChildHelper.cpp:702
#1  0x0000007fbcc91474 in mozilla::embedlite::EmbedLiteViewChild::InitGeckoWindow
    (this=0x7f885d6960, parentId=, 
    parentBrowsingContext=0x0, isPrivateWindow=,
    isDesktopMode=false)
    at ${PROJECT}/..//obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#2  0x0000007fbcc829f4 in mozilla::detail::RunnableMethodArguments
    ::applyImpl, StoreRefPtrPassByPtr,
    StoreCopyPassByConstLRef, StoreCopyPassByConstLRef,
    0ul, 1ul, 2ul, 3ul> (args=..., m=, o=)
    at ${PROJECT}/..//obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:280
[...]
#27 0x0000007fb79cf89c in ?? () from /lib64/libc.so.6
(gdb) 
From the backtraces we can see that both are being called from inside EmbedLiteViewChild::InitGeckoWindow(). The first on line 267, then the second on line 322.

I've made just a small tweak to the code, so that the value now gets set slightly later at a position where it's more likely to succeed. In fact it attempts to set it twice. First on line 267 as before, but now also directly after line 322, in the hope that the second time it will succeed.
  mHelper->SetWebNavigation(mWebNavigation);
  
  // Attempt this again, just in case it failed the first time
  mChrome->SetBrowserChildHelper(mHelper.get());
None of this should be necessary to get a working browser, but it will help at this still early stage to get things working before we re-establish the proper loading of embedhelper.js.

But now I'm having to build another version, so that's it for today.

Here's hoping tomorrow morning starts as well as today with a completed build!

As always, if you'd like to read more about all this gecko stuff, do take a look at my full Gecko Dev Diary.
Comment
15 Oct 2023 : Day 60 #
Yesterday I put you through backtrace after backtrace, in the hope of in the hope of understanding why the mIsActive flag was being unset. This flag is critical in the render pipeline because when it's unset, simply put, all of the rendering will be skipped.

We didn't get to the bottom of it yesterday so I'm continuing today. I have to warn you in advance that there will be a lot more backtraces today. I mean a lot. This may be one to skip.

We want to know why the value is being set to false. This will most likely be happening as a result of a call to PresShell::SetIsActive() with the parameter set to false. There are four calls to this method when rendering a simple page with ESR 78. In all cases the aIsActive parameter is passed in as true.

Here are all for of the ESR 78 calls to PresShell::SetIsActive() identified by the debugger break points. Two of them have problematic backtraces, but are still valid hits. You can also see from these that the aIsActive input parameter is set to true for all of these calls.
Thread 10 "GeckoWorkerThre" hit Breakpoint 2, mozilla::PresShell::SetIsActive
    (this=this@entry=0x7fb87b2b90, aIsActive=true)
    at layout/base/PresShell.cpp:10724
10724   nsresult PresShell::SetIsActive(bool aIsActive) {
(gdb) bt
#0  mozilla::PresShell::SetIsActive (this=this@entry=0x7fb87b2b90, aIsActive=true)
    at layout/base/PresShell.cpp:10724
#1  0x0000007ff4223770 in mozilla::PresShell::QueryIsActive
    (this=this@entry=0x7fb87b2b90)
    at layout/base/PresShell.cpp:10709
#2  0x0000007ff424065c in mozilla::PresShell::Init (
dwarf2read.c:10473: internal-error: process_die_scope::process_die_scope
    (die_info*, dwarf2_cu*): Assertion `!m_die->in_process' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.

Thread 10 "GeckoWorkerThre" hit Breakpoint 2, mozilla::PresShell::SetIsActive
    (this=this@entry=0x7fb87b2b90, aIsActive=aIsActive@entry=true)
    at layout/base/PresShell.cpp:10724
10724   nsresult PresShell::SetIsActive(bool aIsActive) {
(gdb) bt
#0  mozilla::PresShell::SetIsActive (this=this@entry=0x7fb87b2b90,
    aIsActive=aIsActive@entry=true)
    at layout/base/PresShell.cpp:10724
#1  0x0000007ff4acae90 in nsDocShell::SetIsActive (this=0x7fb90793d0, aIsActive=true)
    at docshell/base/nsDocShell.cpp:4607
#2  0x0000007ff4d9291c in mozilla::embedlite::EmbedLiteViewChild::RecvSetIsActive
    (this=0x7fb9070f10, aIsActive=@0x7fde8d68f0: true)
    at mobile/sailfishos/embedshared/EmbedLiteViewChild.cpp:623
#3  0x0000007ff23d3a00 in mozilla::embedlite::PEmbedLiteViewChild::
    OnMessageReceived (this=0x7fb9070f10, msg__=...) at PEmbedLiteViewChild.cpp:1158
[...]
#26 0x0000007fef65b89c in ?? () from /lib64/libc.so.6
(gdb) 

Thread 10 "GeckoWorkerThre" hit Breakpoint 2, mozilla::PresShell::SetIsActive
    (this=this@entry=0x7fb93ef050, aIsActive=true)
    at layout/base/PresShell.cpp:10724
10724   nsresult PresShell::SetIsActive(bool aIsActive) {
(gdb) bt
#0  mozilla::PresShell::SetIsActive (this=this@entry=0x7fb93ef050, aIsActive=true)
    at layout/base/PresShell.cpp:10724
#1  0x0000007ff4223770 in mozilla::PresShell::QueryIsActive
    (this=this@entry=0x7fb93ef050)
    at layout/base/PresShell.cpp:10709
#2  0x0000007ff424065c in mozilla::PresShell::Init (
dwarf2read.c:10473: internal-error: process_die_scope::process_die_scope
    (die_info*, dwarf2_cu*): Assertion `!m_die->in_process' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.

Thread 10 "GeckoWorkerThre" hit Breakpoint 2, mozilla::PresShell::SetIsActive
    (this=this@entry=0x7fb93ef050, aIsActive=aIsActive@entry=true)
    at layout/base/PresShell.cpp:10724
10724   nsresult PresShell::SetIsActive(bool aIsActive) {
(gdb) bt
#0  mozilla::PresShell::SetIsActive (this=this@entry=0x7fb93ef050,
    aIsActive=aIsActive@entry=true)
    at layout/base/PresShell.cpp:10724
#1  0x0000007ff4ac3fa4 in nsDocShell::SetupNewViewer (this=this@entry=0x7fb90793d0, aNewViewer=aNewViewer@entry=0x7fb93cc1d0, 
    aWindowActor=aWindowActor@entry=0x0) at docshell/base/nsDocShell.cpp:7879
#2  0x0000007ff4acd75c in nsDocShell::Embed (this=this@entry=0x7fb90793d0,
    aContentViewer=0x7fb93cc1d0, aWindowActor=aWindowActor@entry=0x0)
    at docshell/base/nsDocShell.cpp:5441
#3  0x0000007ff4add358 in nsDocShell::CreateContentViewer (this=0x7fb90793d0,
    aContentType=..., aRequest=0x7fb926aa60, aContentHandler=<optimized out>)
    at docshell/base/nsDocShell.cpp:7662
#4  0x0000007ff4adde80 in nsDSURIContentListener::DoContent
    (this=this@entry=0x5555deec70, aContentType=..., 
    aIsContentPreferred=aIsContentPreferred@entry=false,
    aRequest=aRequest@entry=0x7fb926aa60, aContentHandler=0x7fb92b8ef0, 
    aAbortProcess=aAbortProcess@entry=0x7fde8d68a0)
    at docshell/base/nsDSURIContentListener.cpp:178
#5  0x0000007ff27af350 in nsDocumentOpenInfo::TryContentListener
    (this=0x7fb92b8ed0, aListener=0x5555deec70, aChannel=0x7fb926aa60)
    at obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:1351
[...]
#31 0x0000007fef65b89c in ?? () from /lib64/libc.so.6
(gdb) 
Now let's compare those to the equivalent calls to PresShell::SetIsActive() in ESR 91. There are five of these with backtraces that are very different to those for ESR 78. In all but one case the aIsActive input parameter is set to false.
Thread 8 "GeckoWorkerThre" hit Breakpoint 1, mozilla::PresShell::SetIsActive
    (this=this@entry=0x7f889b5330, aIsActive=false)
    at layout/base/PresShell.cpp:10865
10865	void PresShell::SetIsActive(bool aIsActive) {
(gdb) bt
#0  mozilla::PresShell::SetIsActive (this=this@entry=0x7f889b5330, aIsActive=false)
    at layout/base/PresShell.cpp:10865
#1  0x0000007fbc2236d0 in mozilla::PresShell::ActivenessMaybeChanged
    (this=this@entry=0x7f889b5330)
    at layout/base/PresShell.cpp:10794
#2  0x0000007fbc247da4 in mozilla::PresShell::Init (this=this@entry=0x7f889b5330, aPresContext=aPresContext@entry=0x7f885e1760, 
    aViewManager=aViewManager@entry=0x7f885e22f0) at layout/base/PresShell.cpp:1035
#3  0x0000007fbab910ac in mozilla::dom::Document::CreatePresShell
    (this=0x7f885dbc50, aContext=0x7f885e1760, aViewManager=0x7f885e22f0)
    at dom/base/Document.cpp:6637
#4  0x0000007fbc27b7bc in nsDocumentViewer::InitPresentationStuff
    (this=this@entry=0x7f885dd490, aDoInitialReflow=aDoInitialReflow@entry=false)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:289
#5  0x0000007fbc27d4c0 in nsDocumentViewer::InitInternal (this=0x7f885dd490,
    aParentWidget=<optimized out>, aState=aState@entry=0x0, aActor=0x0, 
    aBounds=..., aDoCreation=aDoCreation@entry=true, aNeedMakeCX=aNeedMakeCX@entry=true, aForceSetNewDocument=aForceSetNewDocument@entry=true)
    at layout/base/nsDocumentViewer.cpp:913
#6  0x0000007fbc27d68c in nsDocumentViewer::Init (this=<optimized out>,
    aParentWidget=<optimized out>, aBounds=..., aActor=<optimized out>)
    at layout/base/nsDocumentViewer.cpp:682
#7  0x0000007fbc969a04 in nsDocShell::SetupNewViewer (this=this@entry=0x7f885da880, aNewViewer=aNewViewer@entry=0x7f885dd490, 
    aWindowActor=aWindowActor@entry=0x0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:859
#8  0x0000007fbc96e728 in nsDocShell::Embed (this=this@entry=0x7f885da880,
    aContentViewer=0x7f885dd490, aWindowActor=aWindowActor@entry=0x0, 
    aIsTransientAboutBlank=aIsTransientAboutBlank@entry=true,
    aPersist=aPersist@entry=false)
    at docshell/base/nsDocShell.cpp:5552
#9  0x0000007fbc96eb9c in nsDocShell::CreateAboutBlankContentViewer
    (this=this@entry=0x7f885da880, aPrincipal=aPrincipal@entry=0x0, 
    aPartitionedPrincipal=aPartitionedPrincipal@entry=0x0, aCSP=<optimized out>,
    aBaseURI=0x0, aCOEP=..., aTryToSaveOldPresentation=<optimized out>, 
    aTryToSaveOldPresentation@entry=true,
    aCheckPermitUnload=aCheckPermitUnload@entry=true, aActor=<optimized out>,
    aActor@entry=0x0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:859
#10 0x0000007fbc96f008 in nsDocShell::EnsureContentViewer
    (this=this@entry=0x7f885da880)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/MaybeStorageBase.h:79
#11 0x0000007fbc96f9d8 in nsDocShell::GetDocument (this=0x7f885da880) at
    docshell/base/nsDocShell.cpp:3053
#12 0x0000007fbaad8780 in nsPIDOMWindowOuter::MaybeCreateDoc (this=<optimized out>)
    at dom/base/nsGlobalWindowOuter.cpp:7678
#13 0x0000007fbaad8aa4 in non-virtual thunk to nsGlobalWindowOuter::
    WrapObject(JSContext*, JS::Handle<JSObject*>) ()
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/js/HeapAPI.h:727
[...]
#47 0x0000007fb79cf89c in ?? () from /lib64/libc.so.6
(gdb) 

Thread 8 "GeckoWorkerThre" hit Breakpoint 1, mozilla::PresShell::SetIsActive
    (this=this@entry=0x7f889b5330, aIsActive=false)
    at layout/base/PresShell.cpp:10865
10865	void PresShell::SetIsActive(bool aIsActive) {
(gdb) bt
#0  mozilla::PresShell::SetIsActive (this=this@entry=0x7f889b5330, aIsActive=false)
    at layout/base/PresShell.cpp:10865
#1  0x0000007fbc2236d0 in mozilla::PresShell::ActivenessMaybeChanged
    (this=this@entry=0x7f889b5330)
    at layout/base/PresShell.cpp:10794
#2  0x0000007fbc969a54 in nsDocShell::SetupNewViewer (this=this@entry=0x7f885da880, aNewViewer=aNewViewer@entry=0x7f885dd490, 
    aWindowActor=aWindowActor@entry=0x0) at docshell/base/nsDocShell.cpp:8058
#3  0x0000007fbc96e728 in nsDocShell::Embed (this=this@entry=0x7f885da880,
    aContentViewer=0x7f885dd490, aWindowActor=aWindowActor@entry=0x0, 
    aIsTransientAboutBlank=aIsTransientAboutBlank@entry=true, aPersist=aPersist@entry=false)
    at docshell/base/nsDocShell.cpp:5552
#4  0x0000007fbc96eb9c in nsDocShell::CreateAboutBlankContentViewer
    (this=this@entry=0x7f885da880, aPrincipal=aPrincipal@entry=0x0, 
    aPartitionedPrincipal=aPartitionedPrincipal@entry=0x0, aCSP=<optimized out>,
    aBaseURI=0x0, aCOEP=..., aTryToSaveOldPresentation=<optimized out>, 
    aTryToSaveOldPresentation@entry=true,
    aCheckPermitUnload=aCheckPermitUnload@entry=true, aActor=<optimized out>,
    aActor@entry=0x0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:859
#5  0x0000007fbc96f008 in nsDocShell::EnsureContentViewer
    (this=this@entry=0x7f885da880)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/MaybeStorageBase.h:79
#6  0x0000007fbc96f9d8 in nsDocShell::GetDocument (this=0x7f885da880) at
    docshell/base/nsDocShell.cpp:3053
#7  0x0000007fbaad8780 in nsPIDOMWindowOuter::MaybeCreateDoc (this=<optimized out>)
    at dom/base/nsGlobalWindowOuter.cpp:7678
#8  0x0000007fbaad8aa4 in non-virtual thunk to nsGlobalWindowOuter::
    WrapObject(JSContext*, JS::Handle<JSObject*>) ()
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/js/HeapAPI.h:727
[...]
#42 0x0000007fb79cf89c in ?? () from /lib64/libc.so.6
(gdb) 

Thread 8 "GeckoWorkerThre" hit Breakpoint 1, mozilla::PresShell::SetIsActive
    (this=this@entry=0x7f88c0a200, aIsActive=false)
    at layout/base/PresShell.cpp:10865
10865	void PresShell::SetIsActive(bool aIsActive) {
(gdb) bt
#0  mozilla::PresShell::SetIsActive (this=this@entry=0x7f88c0a200, aIsActive=false)
    at layout/base/PresShell.cpp:10865
#1  0x0000007fbc2236d0 in mozilla::PresShell::ActivenessMaybeChanged
    (this=this@entry=0x7f88c0a200)
    at layout/base/PresShell.cpp:10794
#2  0x0000007fbc247da4 in mozilla::PresShell::Init
    (this=this@entry=0x7f88c0a200, aPresContext=aPresContext@entry=0x7f88bdfe80, 
    aViewManager=aViewManager@entry=0x7f88b80b60) at
    layout/base/PresShell.cpp:1035
#3  0x0000007fbab910ac in mozilla::dom::Document::CreatePresShell
    (this=0x7f88ab64e0, aContext=0x7f88bdfe80, aViewManager=0x7f88b80b60)
    at dom/base/Document.cpp:6637
#4  0x0000007fbc27b7bc in nsDocumentViewer::InitPresentationStuff
    (this=this@entry=0x7f88b16b60, aDoInitialReflow=aDoInitialReflow@entry=false)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:289
#5  0x0000007fbc27d4c0 in nsDocumentViewer::InitInternal (this=0x7f88b16b60,
    aParentWidget=<optimized out>, aState=aState@entry=0x0, aActor=0x0, 
    aBounds=..., aDoCreation=aDoCreation@entry=true,
    aNeedMakeCX=aNeedMakeCX@entry=true,
    aForceSetNewDocument=aForceSetNewDocument@entry=true)
    at layout/base/nsDocumentViewer.cpp:913
#6  0x0000007fbc27d68c in nsDocumentViewer::Init (this=<optimized out>,
    aParentWidget=<optimized out>, aBounds=..., aActor=<optimized out>)
    at layout/base/nsDocumentViewer.cpp:682
#7  0x0000007fbc969a04 in nsDocShell::SetupNewViewer (this=this@entry=0x7f885da880, aNewViewer=aNewViewer@entry=0x7f88b16b60, 
    aWindowActor=aWindowActor@entry=0x0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:859
#8  0x0000007fbc96e728 in nsDocShell::Embed (this=this@entry=0x7f885da880, aContentViewer=aContentViewer@entry=0x7f88b16b60, 
    aWindowActor=aWindowActor@entry=0x0,
    aIsTransientAboutBlank=aIsTransientAboutBlank@entry=false, aPersist=true)
    at docshell/base/nsDocShell.cpp:5552
#9  0x0000007fbc97cf8c in nsDocShell::CreateContentViewer
    (this=this@entry=0x7f885da880, aContentType=...,
    aRequest=aRequest@entry=0x7f88b075e0, 
    aContentHandler=aContentHandler@entry=0x7f88ac5130)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:859
#10 0x0000007fbc97d6a8 in nsDSURIContentListener::DoContent (this=0x55559666c0,
    aContentType=..., aIsContentPreferred=<optimized out>, 
    aRequest=0x7f88b075e0, aContentHandler=0x7f88ac5130, aAbortProcess=0x7f9f4f35c0)
    at docshell/base/nsDSURIContentListener.cpp:179
[...]
#51 0x0000007fb79cf89c in ?? () from /lib64/libc.so.6
(gdb) 

Thread 8 "GeckoWorkerThre" hit Breakpoint 1, mozilla::PresShell::SetIsActive
    (this=this@entry=0x7f88c0a200, aIsActive=false)
    at layout/base/PresShell.cpp:10865
10865	void PresShell::SetIsActive(bool aIsActive) {
(gdb) bt
#0  mozilla::PresShell::SetIsActive (this=this@entry=0x7f88c0a200, aIsActive=false)
    at layout/base/PresShell.cpp:10865
#1  0x0000007fbc2236d0 in mozilla::PresShell::ActivenessMaybeChanged
    (this=this@entry=0x7f88c0a200)
    at layout/base/PresShell.cpp:10794
#2  0x0000007fbc969a54 in nsDocShell::SetupNewViewer (this=this@entry=0x7f885da880, aNewViewer=aNewViewer@entry=0x7f88b16b60, 
    aWindowActor=aWindowActor@entry=0x0) at docshell/base/nsDocShell.cpp:8058
#3  0x0000007fbc96e728 in nsDocShell::Embed (this=this@entry=0x7f885da880, aContentViewer=aContentViewer@entry=0x7f88b16b60, 
    aWindowActor=aWindowActor@entry=0x0,
    aIsTransientAboutBlank=aIsTransientAboutBlank@entry=false, aPersist=true)
    at docshell/base/nsDocShell.cpp:5552
#4  0x0000007fbc97cf8c in nsDocShell::CreateContentViewer
    (this=this@entry=0x7f885da880, aContentType=...,
    aRequest=aRequest@entry=0x7f88b075e0, 
    aContentHandler=aContentHandler@entry=0x7f88ac5130)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:859
#5  0x0000007fbc97d6a8 in nsDSURIContentListener::DoContent (this=0x55559666c0,
    aContentType=..., aIsContentPreferred=<optimized out>, 
    aRequest=0x7f88b075e0, aContentHandler=0x7f88ac5130, aAbortProcess=0x7f9f4f35c0)
    at docshell/base/nsDSURIContentListener.cpp:179
#6  0x0000007fba5dc5c8 in nsDocumentOpenInfo::TryContentListener
    (this=0x7f88ac5110, aListener=0x55559666c0, aChannel=0x7f88b075e0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:359
#7  0x0000007fba5dc91c in nsDocumentOpenInfo::TryDefaultContentListener
    (this=<optimized out>, aChannel=<optimized out>)
    at uriloader/base/nsURILoader.cpp:626
[...]
#46 0x0000007fb79cf89c in ?? () from /lib64/libc.so.6
(gdb) 

Thread 8 "GeckoWorkerThre" hit Breakpoint 1, mozilla::PresShell::SetIsActive
    (this=this@entry=0x7f8868ab50, aIsActive=true)
    at layout/base/PresShell.cpp:10865
10865	void PresShell::SetIsActive(bool aIsActive) {
(gdb) bt
#0  mozilla::PresShell::SetIsActive (this=this@entry=0x7f8868ab50, aIsActive=true)
    at layout/base/PresShell.cpp:10865
#1  0x0000007fbc2236d0 in mozilla::PresShell::ActivenessMaybeChanged
    (this=this@entry=0x7f8868ab50)
    at layout/base/PresShell.cpp:10794
#2  0x0000007fbc247da4 in mozilla::PresShell::Init (this=this@entry=0x7f8868ab50, aPresContext=aPresContext@entry=0x7f88b50b10, 
    aViewManager=aViewManager@entry=0x7f88590220) at
    layout/base/PresShell.cpp:1035
#3  0x0000007fbab910ac in mozilla::dom::Document::CreatePresShell
    (this=0x7f88680b10, aContext=0x7f88b50b10, aViewManager=0x7f88590220)
    at dom/base/Document.cpp:6637
#4  0x0000007fbc27b7bc in nsDocumentViewer::InitPresentationStuff
    (this=this@entry=0x7f8886baf0, aDoInitialReflow=aDoInitialReflow@entry=false)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:289
#5  0x0000007fbc27d4c0 in nsDocumentViewer::InitInternal (this=0x7f8886baf0,
    aParentWidget=<optimized out>, aState=aState@entry=0x0, aActor=0x0, 
    aBounds=..., aDoCreation=aDoCreation@entry=true, aNeedMakeCX=aNeedMakeCX@entry=true, aForceSetNewDocument=aForceSetNewDocument@entry=true)
    at layout/base/nsDocumentViewer.cpp:913
#6  0x0000007fbc27d68c in nsDocumentViewer::Init (this=<optimized out>,
    aParentWidget=<optimized out>, aBounds=..., aActor=<optimized out>)
    at layout/base/nsDocumentViewer.cpp:682
#7  0x0000007fba96f1b4 in gfxSVGGlyphsDocument::SetupPresentation
    (this=this@entry=0x7f8867a940)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/gfx/BaseRect.h:53
#8  0x0000007fba96f3c8 in gfxSVGGlyphsDocument::gfxSVGGlyphsDocument
    (this=0x7f8867a940, 
    aBuffer=0x7ea8f3c42d "<svg xmlns=\"http://www.w3.org/2000/svg\"
    id=\"glyph690\"><g transform=\"translate(0 -6.75) translate(0,-1638.4)
    scale(56.", '8' <repeats 14 times>, "6)\"><path fill=\"#C1694F\" d=\"M32
    34a2 2 0 0 1-2 2H6a2 2 0 0 1-2-2V7a2 "..., aBufLen=<optimized out>,
    aSVGGlyphs=<optimized out>)
    at gfx/thebes/gfxSVGGlyphs.cpp:311
[...]
#120 0x0000007fb79cf89c in ?? () from /lib64/libc.so.6
(gdb) 
That's quite a difference. The backtraces are so different that it's hard to effectively compare the two sets with one another. I also notice that although these last five backtraces for ESR 91 look quite similar, they're actually all quite difference. We can see that just from the lengths of the backtraces themselves: 47, 42, 51, 46, 120. These all represent a very deep level of nesting. I believe this is to do with the way the page is laid out.

Also worth noting is that the glyph that we can see being rendered as an SVG image right at the end of the final backtrace is a glyph that does appear in the page being rendered.

In the case of ESR 91 the value passed in is coming from the PresShell::ShouldBeActive() method, so the obvious question is why this isn't returning true in the majority of the cases.

So I've put a breakpoint on PresShell::ShouldBeActive() to find out. Here's what stepping through the method gives us. Note that when we reach the IsActive() call we step inside (rather than stepping over it) to find out what's happening inside.
Thread 8 "GeckoWorkerThre" hit Breakpoint 1, mozilla::PresShell::ShouldBeActive
    at layout/base/PresShell.cpp:10797
10797	bool PresShell::ShouldBeActive() const {
(gdb) n
[LWP 32361 exited]
10798	  MOZ_LOG(gLog, LogLevel::Debug,
(gdb) n
10805	  Document* doc = mDocument;
(gdb) n
10807	  if (doc->IsBeingUsedAsImage()) {
(gdb) n
10814	  if (Document* displayDoc = doc->GetDisplayDocument()) {
(gdb) n
10823	  Document* root = nsContentUtils::GetInProcessSubtreeRootDocument(doc);
(gdb) n
10824	  if (auto* browserChild = BrowserChild::GetFrom(root->GetDocShell())) {
(gdb) n
10859	  BrowsingContext* bc = doc->GetBrowsingContext();
(gdb) n
10860	  MOZ_LOG(gLog, LogLevel::Debug,
(gdb) n
10862	  return bc && bc->IsActive();
(gdb) p bc
$1 = (mozilla::dom::BrowsingContext *) 0x7f885e1ff0
(gdb) s
mozilla::dom::BrowsingContext::IsActive (this=this@entry=0x7f885e1ff0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/dom/
    BrowsingContext.h:227
227	${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/dom/BrowsingContext.h:
    No such file or directory.
(gdb) n
611	    if (explicit_ != ExplicitActiveStatus::None) {
(gdb) n
612	      return explicit_ == ExplicitActiveStatus::Active;
(gdb) p explicit_
$2 = mozilla::dom::ExplicitActiveStatus::Inactive
(gdb) n
From this we can see that the explicit_ variable, which is used to determine the return result, is set to ExplicitActiveStatus::Inactive, which is why the function is returning false.

We're a bit further along. Now it's clear that BrowsingContext::GetExplicitActive() is returning ExplicitActiveStatus::Inactive when we want it to return ExplicitActiveStatus::Active.

This value gets set through calls to the BrowsingContext::SetExplicitActive() method, so we should try to determine where that gets called.

But putting a breakpoint on it doesn't result in any hits. So it seems it's not getting called anywhere. Hmmm; that's unexpected. That must be because the value being set in BrowsingContext::CreateDetached() is still being used. There it's set to a lambda function that looks like this:
  fields.mExplicitActive = [&] {
    if (parentBC) {
      // Non-root browsing-contexts inherit their status from its parent.
      return ExplicitActiveStatus::None;
    }
    if (aType == Type::Content) {
      // Content gets managed by the chrome front-end / embedder element and
      // starts as inactive.
      return ExplicitActiveStatus::Inactive;
    }
    // Chrome starts as active.
    return ExplicitActiveStatus::Active;
  }();
What does this function do? It first checks whether parentBC is null. Then it checks whether aType is set to Type::Content. If either of these things hold it will return a value that will result in the status being set as inactive. If none of these conditions hold then we reach the end of the lambda function and a value is returned that say the current state is active.

I want to see whether this is where our rendering is getting blocked. So I've changed the second of the three return statements so it now looks like this:
      return ExplicitActiveStatus::Active;
In other words, the lambda will now result in an active statement when it would have been inactive before. I'm hoping this will force the elements to always be active. I've done a partial build to test this and scp-d the resulting libxul.so file over to my phone. On executing it, the paint function is now being called:
Thread 8 "GeckoWorkerThre" hit Breakpoint 1, nsLayoutUtils::PaintFrame
    (aRenderingContext=aRenderingContext@entry=0x0,
    aFrame=aFrame@entry=0x7f88bea280, 
    aDirtyRegion=..., aBackstop=aBackstop@entry=4294967295, 
    aBuilderMode=aBuilderMode@entry=nsDisplayListBuilderMode::Painting, 
    aFlags=aFlags@entry=(nsLayoutUtils::PaintFrameFlags::WidgetLayers |
    nsLayoutUtils::PaintFrameFlags::ExistingTransaction |
    nsLayoutUtils::PaintFrameFlags::NoComposite)) at
    ${PROJECT}/gecko-dev/layout/base/nsLayoutUtils.cpp:3144
3144	${PROJECT}/gecko-dev/layout/base/nsLayoutUtils.cpp: No such file or directory.
(gdb) bt
#0  nsLayoutUtils::PaintFrame (aRenderingContext=aRenderingContext@entry=0x0,
    aFrame=aFrame@entry=0x7f88bea280, aDirtyRegion=..., 
    aBackstop=aBackstop@entry=4294967295,
    aBuilderMode=aBuilderMode@entry=nsDisplayListBuilderMode::Painting, 
    aFlags=aFlags@entry=(nsLayoutUtils::PaintFrameFlags::WidgetLayers |
    nsLayoutUtils::PaintFrameFlags::ExistingTransaction |
    nsLayoutUtils::PaintFrameFlags::NoComposite)) at
    ${PROJECT}/gecko-dev/layout/base/nsLayoutUtils.cpp:3144
#1  0x0000007fbc230c24 in mozilla::PresShell::Paint (this=this@entry=0x7f88ba9150, aViewToPaint=aViewToPaint@entry=0x7f880dfe90, aDirtyRegion=..., 
    aFlags=aFlags@entry=mozilla::PaintFlags::PaintLayers)
    at ${PROJECT}/gecko-dev/layout/base/PresShell.cpp:6400
#2  0x0000007fbc068ae4 in nsViewManager::ProcessPendingUpdatesPaint
    (this=this@entry=0x7f88b81020, aWidget=aWidget@entry=0x7f88a229f0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/gfx/RectAbsolute.h:43
#3  0x0000007fbc068e98 in nsViewManager::ProcessPendingUpdatesForView
    (this=this@entry=0x7f88b81020, aView=<optimized out>, 
    aFlushDirtyRegion=aFlushDirtyRegion@entry=true)
    at ${PROJECT}/gecko-dev/view/nsViewManager.cpp:394
#4  0x0000007fbc069488 in nsViewManager::ProcessPendingUpdates
    (this=this@entry=0x7f88b81020)
    at ${PROJECT}/gecko-dev/view/nsViewManager.cpp:972
#5  0x0000007fbc069548 in nsViewManager::WillPaintWindow
    (this=this@entry=0x7f88b81020, aWidget=0x7f88a229f0)
    at ${PROJECT}/gecko-dev/view/nsViewManager.cpp:625
#6  0x0000007fbc069594 in nsView::WillPaintWindow (this=<optimized out>,
    aWidget=<optimized out>)
    at ${PROJECT}/gecko-dev/view/nsView.cpp:1051
#7  0x0000007fbcc95974 in mozilla::embedlite::PuppetWidgetBase::Invalidate
    (this=0x7f88a229f0, aRect=...)
    at ${PROJECT}/gecko-dev/mobile/sailfishos/embedshared/PuppetWidgetBase.cpp:268
#8  0x0000007fbcc9a558 in mozilla::embedlite::PuppetWidgetBase::Show
    (this=this@entry=0x7f88a229f0, aState=aState@entry=true)
    at ${PROJECT}/gecko-dev/mobile/sailfishos/embedshared/PuppetWidgetBase.cpp:129
#9  0x0000007fbcc90a58 in mozilla::embedlite::EmbedLitePuppetWidget::Show
    (this=0x7f88a229f0, aState=<optimized out>)
    at ${PROJECT}/gecko-dev/mobile/sailfishos/embedshared/
    EmbedLitePuppetWidget.cpp:97
#10 0x0000007fbc27e11c in nsDocumentViewer::Show (this=0x7f88ac6100)
    at ${PROJECT}/gecko-dev/layout/base/nsDocumentViewer.cpp:2082
#11 0x0000007fbc28c2f8 in nsPresContext::EnsureVisible (this=0x7f88a59d80)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:869
#12 0x0000007fbc244ab4 in mozilla::PresShell::UnsuppressAndInvalidate
    (this=this@entry=0x7f88ba9150)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#13 0x0000007fbc24b338 in mozilla::PresShell::ProcessReflowCommands
    (this=this@entry=0x7f88ba9150, aInterruptible=aInterruptible@entry=false)
    at ${PROJECT}/gecko-dev/layout/base/PresShell.cpp:9812
#14 0x0000007fbc24a024 in mozilla::PresShell::DoFlushPendingNotifications
    (this=this@entry=0x7f88ba9150, aFlush=..., aFlush@entry=...)
    at ${PROJECT}/gecko-dev/layout/base/PresShell.cpp:4233
#15 0x0000007fbab915d8 in mozilla::PresShell::FlushPendingNotifications
    (aType=..., this=0x7f88ba9150)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/PresShell.h:1413
#16 mozilla::dom::Document::FlushPendingNotifications (this=0x7f88b780f0, aFlush=...)
    at ${PROJECT}/gecko-dev/dom/base/Document.cpp:10613
#17 0x0000007fbab91754 in mozilla::dom::Document::FlushPendingNotifications
    (this=<optimized out>, aType=aType@entry=mozilla::FlushType::Layout)
    at ${PROJECT}/gecko-dev/dom/base/Document.cpp:10534
#18 0x0000007fbabec604 in mozilla::dom::Selection::ScrollIntoView
    (this=this@entry=0x7f88b59c70, aRegion=1, aVertical=..., aHorizontal=...,
    aFlags=10)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:289
[...]
#50 0x0000007fb79cf89c in ?? () from /lib64/libc.so.6
(gdb) 
Sadly there's still nothing appearing on the screen though.

Phew, that's a lot of debugging and investigation. And it's been quite exhausting. It sadly hasn't got us where we need to be yet, but it has been illuminating. I'll be persevering with this over the coming days.

As always, if you'd like to read more about all this gecko stuff, do take a look at my full Gecko Dev Diary.
Comment
14 Oct 2023 : Day 59 #
Happy weekend everyone! First a quick recap. Getting logging to work by fixing some small JavaScript errors in the front end code caused a segfault to trigger. This turned out to be due to the code that switches between light and dark themes. When the browser starts up it sets this theme, but the interface had changed to require a value to specify how much re-rendering is needed. For example, does the layout need updating, or can that be skipped? The C++ code was expecting this value, found null instead and promptly crashed.

The fix was to pass in a suitable value and handily the gecko devs already included a nice helper function for doing it all automatically. So I used that yesterday and now no crash.

So now it's back to rendering. I've been looking at the rendering for the last couple of days and it feels like I'm going round in circles a bit. Having log output without crashes is going to help. I hope.

There are a bunch of errors all related to Services.appinfo being undefined. It might make sense to fix these to get a load more modules potentially working quite easily. But for now I'm just going to take a note of it.
JavaScript error: file:///usr/lib64/mozembedlite/components/
  UserAgentOverrideHelper.js, line 110: TypeError: Services.appinfo is undefined
JavaScript error: resource://gre/modules/EnterprisePoliciesParent.jsm, line 500:
  TypeError: Services.appinfo is undefined
JavaScript error: resource://gre/modules/URLQueryStrippingListService.jsm, line
  42: TypeError: Services.appinfo is undefined
JavaScript Error: "TypeError: Services.appinfo is undefined" {file:
  "resource://gre/modules/LoginRecipes.jsm" line: 56}
JavaScript Error: "1696965251556       addons.manager  ERROR   startup failed:
  TypeError: Services.appinfo is undefined(resource://gre/modules/AddonManager.jsm:680:1) JS Stack trace: startup@AddonManager.jsm:680:1 startup@AddonManager.jsm:3511:26
  observe@addonManager.js:81:29" {file: "resource://gre/modules/Log.jsm"
  line: 723}
JavaScript error: resource://gre/modules/Region.jsm, line 119: TypeError:
  Services.appinfo is undefined
Let's focus on the SetIsActive and SuspendRendering/ResumeRendering states now.

For the first an important method is EmbedLiteViewChild::RecvSetIsActive() since this seems to propagate the active flag through to lots of different places: PresShell, nsWebBrowser, EmbedLitePuppetWidget and DocShell. My plan is to step through this method to check they're all getting set appropriately, and if not, figure out why not.

First here's the backtrace we get when we hit the breakpoint.
Thread 8 "GeckoWorkerThre" hit Breakpoint 1, mozilla::embedlite::
  EmbedLiteViewChild::RecvSetIsActive (this=0x7f885831b0,
    aIsActive=@0x7f9f4f3600: true)
    at ${PROJECT}/gecko-dev/mobile/sailfishos/embedshared/EmbedLiteViewChild.cpp:607
607     ${PROJECT}/gecko-dev/mobile/sailfishos/embedshared/EmbedLiteViewChild.cpp:
    No such file or directory.
(gdb) bt
#0  mozilla::embedlite::EmbedLiteViewChild::RecvSetIsActive (this=0x7f885831b0,
    aIsActive=@0x7f9f4f3600: true)
    at ${PROJECT}/gecko-dev/mobile/sailfishos/embedshared/EmbedLiteViewChild.cpp:607
#1  0x0000007fba3094e0 in mozilla::embedlite::PEmbedLiteViewChild::OnMessageReceived
    (this=0x7f885831b0, msg__=...) at PEmbedLiteViewChild.cpp:1218
#2  0x0000007fba2f6e84 in mozilla::embedlite::PEmbedLiteAppChild::OnMessageReceived
    (this=, msg__=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/ipc/ProtocolUtils.h:675
#3  0x0000007fba1e3690 in mozilla::ipc::MessageChannel::DispatchAsyncMessage
    (this=this@entry=0x7f88876968, aProxy=aProxy@entry=0x7ef4004830, aMsg=...)
[..]
#29 0x0000007fb79cf89c in ?? () from /lib64/libc.so.6
(gdb) 
The app is paused at the point the breakpoint hit. Now it's time to step through the code.
(gdb) n
Missing separate debuginfo for /usr/lib64/libnssckbi.so
Try: zypper install -C "debuginfo(build-id)=c2e22ddaa9e2d7802fd7ff14979ba02fb6508bcd"
Missing separate debuginfo for /usr/lib64/libtasn1.so.6
Try: zypper install -C "debuginfo(build-id)=95f83c2cec19f8b9030e796b6c25c31a10c185e8"
[LWP 7851 exited]
310     ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:
        No such file or directory.
(gdb) n
867     ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:
        No such file or directory.
(gdb) n
313     ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:
        No such file or directory.
(gdb) n
609       RefPtr presShell = mHelper->GetTopLevelPresShell();
(gdb) n
610       if (aIsActive) {
(gdb) p aIsActive
$1 = (const bool &) @0x7f9f4df600: true
(gdb) n
616           mWebBrowser->FocusActivate(nsFocusManager::GenerateFocusActionId());
(gdb) n
617           LOGT("Activate browser");
(gdb) n
624       EmbedLitePuppetWidget *widget = GetPuppetWidget();
(gdb) n
625       if (widget) {
(gdb) p widget
$2 = (mozilla::embedlite::EmbedLitePuppetWidget *) 0x7f88587490
(gdb) n
626         widget->SetActive(aIsActive);
(gdb) n
630       nsCOMPtr docShell = do_GetInterface(mWebNavigation);
(gdb) n
867     ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:
        No such file or directory.
(gdb) p docShell
$3 = { = {mRawPtr = 0x7f887b3b18}, }
(gdb) n
635       mWidget->Show(aIsActive);
(gdb) n
313     ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:
        No such file or directory.
(gdb) n
638       if (aIsActive) {
(gdb) n
639         RecvScheduleUpdate();
(gdb) n
641       return IPC_OK();
(gdb) n
630       nsCOMPtr docShell = do_GetInterface(mWebNavigation);
(gdb) n
609       RefPtr presShell = mHelper->GetTopLevelPresShell();
(gdb) c
Continuing.
This is quite hard to follow without actually having stepped through it, so let me highlight a few parts. First it's worth noting that this all looks pretty sensible. The widget is set to something valid aIsActive is set to true.

To try to progress further I've also put breakpoints on a few more things to see if there's any noticeable difference between ESR 78 and ESR 91.

What I find is that on ESR 78 the CompositorOGL::BeginFrame() is called (unsurprisingly, every time a frame is rendered). But on ESR 91 it's not being called at all. I think it's worth finding out why.

Here's the backtrace for it on ESR 78:
(gdb) bt
#0  mozilla::layers::CompositorOGL::BeginFrame (this=0x7ed0003450,
    aInvalidRegion=..., aClipRect=..., aRenderBounds=..., aOpaqueRegion=...)
    at gfx/
    layers/opengl/CompositorOGL.cpp:967
#1  0x0000007ff2999070 in mozilla::layers::CompositorOGL::BeginFrameForWindow
    (this=, aInvalidRegion=..., aClipRect=..., 
    aRenderBounds=..., aOpaqueRegion=...) at /usr/src/debug/
    xulrunner-qt5-78.15.1+git33.2-1.21.1.jolla.aarch64/gfx/layers/opengl/
    CompositorOGL.cpp:793
#2  0x0000007ff2a5fcc8 in mozilla::layers::LayerManagerComposite::Render
    (this=this@entry=0x7ed021df20, aInvalidRegion=..., aOpaqueRegion=...)
    at 
    obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#3  0x0000007ff2a602c4 in mozilla::layers::LayerManagerComposite::UpdateAndRender
    (this=this@entry=0x7ed021df20)
    at gfx/
    layers/composite/LayerManagerComposite.cpp:647
#4  0x0000007ff2a60514 in mozilla::layers::LayerManagerComposite::EndTransaction
    (aFlags=mozilla::layers::LayerManager::END_DEFAULT, aTimeStamp=..., 
    this=0x7ed021df20) at /usr/src/debug/
    xulrunner-qt5-78.15.1+git33.2-1.21.1.jolla.aarch64/gfx/layers/composite/
    LayerManagerComposite.cpp:566
#5  mozilla::layers::LayerManagerComposite::EndTransaction (this=0x7ed021df20,
    aTimeStamp=..., aFlags=mozilla::layers::LayerManager::END_DEFAULT)
    at gfx/
    layers/composite/LayerManagerComposite.cpp:536
#6  0x0000007ff2a87f9c in mozilla::layers::CompositorBridgeParent::
    CompositeToTarget (this=0x7fb89bdf20, aId=..., aTarget=0x0,
    aRect=)
    at 
    obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#7  0x0000007ff4d8288c in mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    CompositeToDefaultTarget (this=0x7fb89bdf20, aId=...)
    at mobile/
    sailfishos/embedthread/EmbedLiteCompositorBridgeParent.cpp:159
#8  0x0000007ff2a67988 in mozilla::layers::CompositorVsyncScheduler::Composite
    (this=0x7fb8cd6390, aId=..., aVsyncTimestamp=...)
    at gfx/
    layers/ipc/CompositorVsyncScheduler.cpp:249
#9  0x0000007ff2a65ff0 in mozilla::detail::RunnableMethodArguments, mozilla::TimeStamp>::
    applyImpl, mozilla::TimeStamp), StoreCopyPassByConstLRef >, StoreCopyPassByConstLRef, 0ul, 1ul> (args=..., m=, o=)
    at 
    obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:925
[...]
#22 0x0000007fef65b89c in ?? () from /lib64/libc.so.6
(gdb) 
What we know is that on ESR 91, something in this backtrace is skipping over something. A few more breakpoints and it becomes clear that CompositorBridgeParent::CompositeToTarget() is being entered. So why isn't it calling LayerManagerComposite::EndTransaction?

Here's a potentially interesting aside. I'm using gdb for debugging. By default it's configured to break on signals. If you leave a connection unattended for too long it will get closed at by the other and and the operating system will send a SIGPIPE (broken pipe) signal to the process that opened the connection.

When you're debugging it's common to leave the process in a suspended state, blocked from executing by the debugger. If you do this for too long the SIGPIPE signal will be sent for every open connection (which for a browser can be quite a few). Each one halts execution. It's a pain because you have to tell the debugger to continue after every one and it completely breaks the debugging flow.

Anyway, the solution is to run the following command in gdb:
handle SIGPIPE nostop
This tells the debugger not to halt the process if it receives the SIGPIPE signal. It's a simple step, but saves a lot of hassle.

Now back to the issue at hand. Stepping through the code it looks like the problem is that mLayerManager has its mRoot value unset. As a consequence CanComposite() is returning false:
bool CompositorBridgeParent::CanComposite() {
  return mLayerManager && mLayerManager->GetRoot() && !mPaused;
}
Here's the result of stepping through:
Thread 34 "Compositor" hit Breakpoint 6, mozilla::layers::
    CompositorBridgeParent::CompositeToTarget (this=0x7f888b4b10, aId=...,
    aTarget=0x0, aRect=0x0)
    at gfx/
    layers/ipc/CompositorBridgeParent.cpp:936
936                                                    const gfx::IntRect* aRect) {
(gdb) n
937       AUTO_PROFILER_TRACING_MARKER("Paint", "Composite", GRAPHICS);
(gdb) n
938       AUTO_PROFILER_LABEL("CompositorBridgeParent::CompositeToTarget", GRAPHICS);
(gdb) n
939       PerfStats::AutoMetricRecording autoRecording;
(gdb) n
943       TimeStamp start = TimeStamp::Now();
(gdb) n
945       if (!CanComposite()) {
(gdb) p CanComposite()
$4 = false
(gdb) p mLayerManager
$5 = {mRawPtr = 0x7eb8113150}
(gdb) p mLayerManager.mRawPtr->mRoot
$6 = {mRawPtr = 0x0}
(gdb) p mPaused
$7 = false
(gdb) 
Compare this with stepping through the ESR 78 code:
Thread 39 "Compositor" hit Breakpoint 4, mozilla::layers::
    CompositorBridgeParent::CompositeToTarget (this=0x7fb89bdf20, aId=...,
    aTarget=0x0, aRect=0x0)
    at gfx/
    layers/ipc/CompositorBridgeParent.cpp:932
932                                                    const gfx::IntRect* aRect) {
(gdb) n
933       AUTO_PROFILER_TRACING_MARKER("Paint", "Composite", GRAPHICS);
(gdb) n
934       AUTO_PROFILER_LABEL("CompositorBridgeParent::CompositeToTarget", GRAPHICS);
(gdb) n
935       PerfStats::AutoMetricRecording autoRecording;
(gdb) n
939       TimeStamp start = TimeStamp::Now();
(gdb) n
941       if (!CanComposite()) {
(gdb) p CanComposite()
Cannot evaluate function -- may be inlined
(gdb) p mLayerManager
$1 = {mRawPtr = 0x7ed021df20}
(gdb) p mLayerManager.mRawPtr->mRoot
$2 = {mRawPtr = 0x7ed027e880}
(gdb) p mPaused
$3 = false
We can see the crucial difference here, which is that mRoot is set. It's also worth exploring the mLayerManager a bit to see what its polymorphism hierarchy looks like. Knowing these details helps understand what methods, potentially overridden, are likely to be called and which source files are relevant. It also highlights how powerful gdb is.
(gdb) explore mLayerManager
The value of 'mLayerManager' is a struct/class of type 'RefPtr' with the following fields:

  mRawPtr = 

Enter the field number of choice: 0
'mLayerManager.mRawPtr' is a pointer to a value of type
    'mozilla::layers::HostLayerManager'
Continue exploring it as a pointer to a single value [y/n]: y

      The value of '*(mLayerManager.mRawPtr)' is a struct/class of type
      'mozilla::layers::HostLayerManager' with the following fields:

  mozilla::layers::LayerManager = 
    mDebugOverlayWantsNextFrame = false .. (Value of type 'bool')
   mImageCompositeNotifications = '>
                  mWarningLevel = 0 .. (Value of type 'float')
                      mWarnTime = 
                   mDiagnostics =  >'>
            mCompositorBridgeID = 
                 mLastPaintTime = 
               mRenderStartTime = 
           mCompositionRecorder =  >'>
               mCompositionTime = 
            mCompositeUntilTime = 

Enter the field number of choice: 0
The value of '(*(mLayerManager.mRawPtr)).mozilla::layers::LayerManager' is a
  struct/class of type 'mozilla::layers::LayerManager' with the following fields:

  mozilla::layers::FrameRecorder = 
                         mRefCnt = 
                           mRoot = '>
                       mUserData = 
                      mDestroyed = false .. (Value of type 'bool')
        mSnapEffectiveTransforms = true .. (Value of type 'bool')
                  mRegionToClear = 
                             mId = 
                  mInTransaction = false .. (Value of type 'bool')
                    mContainsSVG = false .. (Value of type 'bool')
             mAnimationReadyTime = 
              mPaintedPixelCount = 
                        mPayload = '>
           mPendingScrollUpdates = 
I've now put a breakpoint on LayerManagerComposite::SetRoot() in the ESR 78 code to find out where it gets called from, in the hope of finding out why it's not (presumably) getting called on ESR 91.

And the answer is: it isn't getting called!

But there are six other classes which have a SetRoot() method. Maybe it's being set by one of these instead?
(gdb) b SetRoot
Breakpoint 7 at 0x7ff29af3e0: SetRoot. (6 locations)
(gdb) info b
Num     Type           Disp Enb Address            What
7       breakpoint     keep y            
7.1                         y     0x0000007ff29af3e0 in WebRenderLayerManager::
                                  SetRoot(mozilla::layers::Layer*) 
                                                   at WebRenderLayerManager.cpp:682
7.2                         y     0x0000007ff2a0f5f0 in BasicLayerManager::
                                  SetRoot(mozilla::layers::Layer*) 
                                                   at RefPtr.h:187
7.3                         y     0x0000007ff2a22928 in ClientLayerManager::
                                  SetRoot(mozilla::layers::Layer*) 
                                                   at ClientLayerManager.cpp:153
7.4                         y     0x0000007ff2a22980 in ClientLayerManager::
                                  SetRoot(mozilla::layers::Layer*) 
                                                   at RefPtr.h:313
7.5                         y     0x0000007ff2a4dca0 in LayerManagerComposite::
                                  SetRoot(mozilla::layers::Layer*) 
                                                   at RefPtr.h:187
7.6                         y     0x0000007ff2a94928 in ShadowLayerForwarder::
                                  SetRoot(mozilla::layers::ShadowableLayer*) 
                                                   at ShadowLayers.cpp:297
(gdb) 
A little incongruously the ESR 91 build only has five such methods.
(gdb) break SetRoot
Breakpoint 9 at 0x7fba7d5400: SetRoot. (5 locations)
(gdb) info break
Num     Type           Disp Enb Address            What
9       breakpoint     keep y            
9.1                         y     0x0000007fba7d5400 in WebRenderLayerManager::
                                  SetRoot(mozilla::layers::Layer*) 
                                                   at WebRenderLayerManager.cpp:748
9.2                         y     0x0000007fba8698ac in BasicLayerManager::
                                  SetRoot(mozilla::layers::Layer*) 
                                                   at RefPtr.h:187
9.3                         y     0x0000007fba8799f0 in ClientLayerManager::
                                  SetRoot(mozilla::layers::Layer*) 
                                                   at RefPtr.h:514
9.4                         y     0x0000007fba88c4e0 in LayerManagerComposite::
                                  SetRoot(mozilla::layers::Layer*) 
                                                   at RefPtr.h:187
9.5                         y     0x0000007fba8da6d8 in ShadowLayerForwarder::
                                  SetRoot(mozilla::layers::ShadowableLayer*) 
                                                   at ShadowLayers.cpp:301
(gdb) 
On ESR 78 it's the ClientLayerManager version which is being called:
Thread 10 "GeckoWorkerThre" hit Breakpoint 7, ClientLayerManager::SetRoot
    (this=0x7fb8cdc820, aLayer=0x7fb8722560)
    at gfx/layers/client/ClientLayerManager.cpp:153
153     void ClientLayerManager::SetRoot(Layer* aLayer) {
(gdb) bt
#0  mozilla::layers::ClientLayerManager::SetRoot (this=0x7fb8cdc820,
    aLayer=0x7fb8722560)
    at gfx/layers/client/ClientLayerManager.cpp:153
#1  0x0000007ff423c36c in mozilla::PresShell::Paint (this=this@entry=0x7fb864a0f0,
    aViewToPaint=aViewToPaint@entry=0x7fb8cef9b0, aDirtyRegion=..., 
    aFlags=aFlags@entry=mozilla::PaintFlags::PaintLayers)
    at layout/base/PresShell.cpp:6282
#2  0x0000007ff40b30ac in nsViewManager::ProcessPendingUpdatesPaint
    (this=this@entry=0x7fb8cef930, aWidget=aWidget@entry=0x7fb8cefa30)
    at obj-build-mer-qt-xr/dist/include/nsTArray.h:554
#3  0x0000007ff40b33c0 in nsViewManager::ProcessPendingUpdatesForView
    (this=this@entry=0x7fb8cef930, aView=, 
    aFlushDirtyRegion=aFlushDirtyRegion@entry=true) at view/nsViewManager.cpp:395
#4  0x0000007ff40b3b50 in nsViewManager::ProcessPendingUpdates (this=0x7fb8cef930)
    at view/nsViewManager.cpp:1018
#5  nsViewManager::ProcessPendingUpdates (this=)
    at view/nsViewManager.cpp:1004
#6  0x0000007ff40b3c0c in nsViewManager::WillPaintWindow
    (this=this@entry=0x7fb8cef930, aWidget=0x7fb8cefa30)
    at view/nsViewManager.cpp:664
#7  0x0000007ff40b3c88 in nsView::WillPaintWindow
    (this=, aWidget=)
    at view/nsView.cpp:1048
#8  0x0000007ff4d98344 in mozilla::embedlite::PuppetWidgetBase::Invalidate
    (aRect=..., this=0x7fb8cefa30)
    at mobile/sailfishos/embedshared/PuppetWidgetBase.cpp:268
#9  mozilla::embedlite::PuppetWidgetBase::Invalidate (this=0x7fb8cefa30, aRect=...)
    at mobile/sailfishos/embedshared/PuppetWidgetBase.cpp:253
#10 0x0000007ff4d91768 in mozilla::embedlite::EmbedLitePuppetWidget::Show
    (this=0x7fb8cefa30, aState=)
    at mobile/sailfishos/embedshared/EmbedLitePuppetWidget.cpp:97
#11 0x0000007ff42923e8 in nsDocumentViewer::Show (this=0x7fb8ce70b0)
    at layout/base/nsDocumentViewer.cpp:2132
#12 0x0000007ff4a9cf64 in nsDocShell::SetVisibility (this=,
    aVisibility=)
    at docshell/base/nsDocShell.cpp:4742
#13 0x0000007ff4bfc78c in nsWebBrowser::SetVisibility (this=0x7fb8b1e190,
    aVisibility=true)
    at obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:857
#14 0x0000007ff4d92958 in mozilla::embedlite::EmbedLiteViewChild::RecvSetIsActive
    (this=0x7fb8cde9d0, aIsActive=@0x7fde8d68f0: true)
    at obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
[...]
#38 0x0000007fef65b89c in ?? () from /lib64/libc.so.6
(gdb) 
It gets in some other places too, including the ShadowLayerForwarder version. On ESR 91 none of them get called, so maybe we're getting closer.

In this callstack it looks like PresShell::Paint() is being called in ESR 91 but nsLayoutUtils::PaintFrame() is not, so that's worth investigating.

And here we have something substantial. In ESR 91 mIsActive is set to false. As a result the Paint() method is being exited almost immediately because of this:
  if (!mIsActive) {
    return;
  }
I've set a breakpoint on PresShell::SetIsActive() and on ESR 91 get a backtrace like this:
(gdb) bt
#0  mozilla::PresShell::SetIsActive (this=this@entry=0x7f889b4c30, aIsActive=false)
    at layout/base/PresShell.cpp:10865
#1  0x0000007fbc2236e8 in mozilla::PresShell::ActivenessMaybeChanged
    (this=this@entry=0x7f889b4c30)
    at layout/base/PresShell.cpp:10794
#2  0x0000007fbc247dbc in mozilla::PresShell::Init (this=this@entry=0x7f889b4c30, aPresContext=aPresContext@entry=0x7f885e0f10,
    aViewManager=aViewManager@entry=0x7f885e1a60) at layout/base/PresShell.cpp:1035
#3  0x0000007fbab910ac in mozilla::dom::Document::CreatePresShell
    (this=0x7f885dcec0, aContext=0x7f885e0f10, aViewManager=0x7f885e1a60)
    at dom/base/Document.cpp:6637
#4  0x0000007fbc27b7d4 in nsDocumentViewer::InitPresentationStuff
    (this=this@entry=0x7f885d9c60, aDoInitialReflow=aDoInitialReflow@entry=false)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:289
#5  0x0000007fbc27d4d8 in nsDocumentViewer::InitInternal (this=0x7f885d9c60,
    aParentWidget=, aState=aState@entry=0x0, aActor=0x0,
    aBounds=..., aDoCreation=aDoCreation@entry=true,
    aNeedMakeCX=aNeedMakeCX@entry=true,
    aForceSetNewDocument=aForceSetNewDocument@entry=true)
    at layout/base/nsDocumentViewer.cpp:913
#6  0x0000007fbc27d6a4 in nsDocumentViewer::Init (this=,
    aParentWidget=, aBounds=..., aActor=)
    at layout/base/nsDocumentViewer.cpp:682
#7  0x0000007fbc969a1c in nsDocShell::SetupNewViewer
    (this=this@entry=0x7f88915160, aNewViewer=aNewViewer@entry=0x7f885d9c60,
    aWindowActor=aWindowActor@entry=0x0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:859
#8  0x0000007fbc96e740 in nsDocShell::Embed (this=this@entry=0x7f88915160,
    aContentViewer=0x7f885d9c60, aWindowActor=aWindowActor@entry=0x0,
    aIsTransientAboutBlank=aIsTransientAboutBlank@entry=true,
    aPersist=aPersist@entry=false)
    at docshell/base/nsDocShell.cpp:5552
#9  0x0000007fbc96ebb4 in nsDocShell::CreateAboutBlankContentViewer
    (this=this@entry=0x7f88915160, aPrincipal=aPrincipal@entry=0x0,
    aPartitionedPrincipal=aPartitionedPrincipal@entry=0x0, aCSP=,
    aBaseURI=0x0, aCOEP=..., aTryToSaveOldPresentation=,
    aTryToSaveOldPresentation@entry=true, aCheckPermitUnload=
    aCheckPermitUnload@entry=true, aActor=, aActor@entry=0x0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:859
#10 0x0000007fbc96f020 in nsDocShell::EnsureContentViewer
    (this=this@entry=0x7f88915160)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/MaybeStorageBase.h:79
#11 0x0000007fbc96f9f0 in nsDocShell::GetDocument (this=0x7f88915160) at
    docshell/base/nsDocShell.cpp:3053
#12 0x0000007fbaad8780 in nsPIDOMWindowOuter::MaybeCreateDoc (this=)
    at dom/base/nsGlobalWindowOuter.cpp:7678
#13 0x0000007fbaad8aa4 in non-virtual thunk to nsGlobalWindowOuter::WrapObject
    (JSContext*, JS::Handle) ()
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/js/HeapAPI.h:727
#14 0x0000007fba57190c in XPCConvert::NativeInterface2JSObject
    (cx=cx@entry=0x7f88233350, d=d@entry=..., aHelper=..., iid=iid@entry=0x7f9f4eb410,
    allowNativeWrapper=allowNativeWrapper@entry=true, pErr=pErr@entry=0x0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/js/RootingAPI.h:593
#15 0x0000007fba572a44 in XPCConvert::NativeData2JS (cx=cx@entry=0x7f88233350,
    d=d@entry=..., s=s@entry=0x7f9f4eb588, type=...,
    iid=iid@entry=0x7f9f4eb410, arrlen=, pErr=pErr@entry=0x0)
    at js/xpconnect/src/XPCConvert.cpp:355
#16 0x0000007fba591ac0 in nsXPCWrappedJS::CallMethod (this=,
    methodIndex=, info=0x7fbe37e340 , 
    nativeParams=0x7f9f4eb588) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/
    js/RootingAPI.h:1320
#17 0x0000007fb9e2b140 in PrepareAndDispatch (self=0x7f8849f8e0,
    methodIndex=, args=, gprData=0x7f9f4eb640, 
    fprData=0x7f9f4eb600) at xpcom/reflect/xptcall/md/unix/xptcstubs_aarch64.cpp:190
#18 0x0000007fb9e2a110 in SharedStub () at xpcom/reflect/xptcall/md/unix/
    xptcstubs_asm_aarch64.S:47
#19 0x0000007fb9dab038 in nsObserverList::NotifyObservers (this=,
    aSubject=aSubject@entry=0x7f8854c650,
    aTopic=aTopic@entry=0x7fbe2cea78 "embedliteviewcreated",
    someData=someData@entry=0x0)
    at xpcom/ds/nsTArray.h:413
#20 0x0000007fb9db6718 in nsObserverService::NotifyObservers (this=0x7f88046b10,
    aSubject=0x7f8854c650, aTopic=0x7fbe2cea78 "embedliteviewcreated",
    aSomeData=0x0) at xpcom/ds/nsObserverService.cpp:291
#21 0x0000007fbcc916f8 in mozilla::embedlite::EmbedLiteViewChild::
    InitGeckoWindow (this=0x7f885d7b40, parentId=,
    parentBrowsingContext=, isPrivateWindow=,
    isDesktopMode=false)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:859
[...]
#47 0x0000007fb79cf89c in ?? () from /lib64/libc.so.6
(gdb)
In contrast, on ESR 78 we have the following:
Thread 10 "GeckoWorkerThre" hit Breakpoint 1, mozilla::PresShell::SetIsActive
    (this=this@entry=0x7fb864a250, aIsActive=aIsActive@entry=true)
    at layout/base/PresShell.cpp:10724
10724   nsresult PresShell::SetIsActive(bool aIsActive) {
(gdb) bt
#0  mozilla::PresShell::SetIsActive (this=this@entry=0x7fb864a250,
    aIsActive=aIsActive@entry=true)
    at layout/base/PresShell.cpp:10724
#1  0x0000007ff4acae90 in nsDocShell::SetIsActive (this=0x7fb8cea3e0,
    aIsActive=true)
    at docshell/base/nsDocShell.cpp:4607
#2  0x0000007ff4d9291c in mozilla::embedlite::EmbedLiteViewChild::
    RecvSetIsActive (this=0x7fb8cdf500, aIsActive=@0x7fde8d68f0: true)
    at mobile/sailfishos/embedshared/EmbedLiteViewChild.cpp:623
#3  0x0000007ff23d3a00 in mozilla::embedlite::PEmbedLiteViewChild::
    OnMessageReceived (this=0x7fb8cdf500, msg__=...) at
    PEmbedLiteViewChild.cpp:1158
#4  0x0000007ff23c12f0 in mozilla::embedlite::PEmbedLiteAppChild::
    OnMessageReceived (this=, msg__=...)
    at obj-build-mer-qt-xr/dist/include/mozilla/ipc/ProtocolUtils.h:866
[...]
#26 0x0000007fef65b89c in ?? () from /lib64/libc.so.6
(gdb) 
I wonder why EmbedLiteViewChild::RecvSetIsActive() isn't being called on ESR 91? Well, it turns out it is, and with the value true, but it's not sticking! I wonder why not.

This is all very messy, and needs a clear head. But it's now late here and my head is anything but clear. So time to leave this until tomorrow.

As always, if you'd like to read more about all this gecko stuff, do take a look at my full Gecko Dev Diary.
Comment
13 Oct 2023 : Day 58 #
This morning I'm on the train to work after mulling the crash from yesterday. The line that's causing trouble, right at the top of the stack is this one:
    auto kind = widget::ThemeChangeKind(aData[0]);
    ThemeChanged(kind);
I've intentionally included the line before as well here, because it doesn't look like the segmentation fault is happening inside ThemeChanged(), but instead is happening on the line before. Maybe the compiler optimised them together?

Looking at the parameters passed on the stack to PresShell::Observe() we can see that aData is a null pointer. So dereferencing it is going to cause a crash.

This is a "look-and-feel-changed" message, and the fact this has only started happening since we sorted out some of the JavaScript suggests that it's being triggered by something in the JavaScript. But that doesn't mean that's where the error is emanating from of course. It pleases me that the "look and feel" code is part of the code I had to mess around with earlier. That means it's somewhat of a known quantity.

The fact this is a "received" message means its origin is going to be masked by the message queue. This means there's no point looking further down the call stack to try to determine where it was sent from. Instead I'm going to grep the code for "look-and-feel-changed" and see what the possibilities are.

It's worth noting that there aren't any instances of this in the embedlite-embedding code, so this didn't come directly from the JavaScript we were just hacking around with.

There are quite a few in the gecko coded though:
$ grep -rIn ""look-and-feel-changed"" *
gecko-dev/layout/base/PresShell.cpp:1004:
  os->AddObserver(this, "look-and-feel-changed", false);
gecko-dev/layout/base/PresShell.cpp:1337:
  os->RemoveObserver(this, "look-and-feel-changed");
gecko-dev/layout/base/PresShell.cpp:9873:
  if (!nsCRT::strcmp(aTopic, "look-and-feel-changed")) {
gecko-dev/browser/base/content/browser.js:8677:
  Services.obs.addObserver(this, "look-and-feel-changed");
gecko-dev/browser/base/content/browser.js:8694:
  Services.obs.removeObserver(this, "look-and-feel-changed");
gecko-dev/browser/base/content/browser.js:8702:
  if (topic != "look-and-feel-changed") {
gecko-dev/widget/nsXPLookAndFeel.cpp:902:
  obs->NotifyObservers(nullptr, "look-and-feel-changed", kind);
gecko-dev/widget/qt/nsLookAndFeel.cpp:94:
  obs->NotifyObservers(nullptr, "look-and-feel-changed", nullptr);
rpm/0093-sailfishos-gecko-Add-support-for-prefers-color-schem.patch:95:
  +            obs->NotifyObservers(nullptr, "look-and-feel-changed", nullptr);
Only the last three of these are actually of interest. It's worth noting patch 0093 "Add support for prefers-color-scheme" is one of the ones we've applied. It adds support for dark mode, which doesn't seem essential for the build to complete so I'm not sure why I applied it, but there it is.

The line in nsLookAndFeel.cpp that also happens to be one of the lines added by the patch, is notable in that it sends in a nullptr for the data value. That could be the null value that PresShell is choking on.

The patch hasn't changed since ESR 78 so I'm curious to know why it didn't cause trouble in previous versions. The answer to this lies in the PressShell code from ESR 78:
  if (!nsCRT::strcmp(aTopic, "look-and-feel-changed")) {
    ThemeChanged();
    return NS_OK;
  }
Notice there's no kin parameter being passed in here. That means that what wasn't expecting a data parameter before, is expecting one now. Let's find out why that is.
$ git blame layout/base/PresShell.cpp -L 9873
bb457d5b09e9a layout/base/PresShell.cpp
  (Emilio Cobos Álvarez  2020-05-22 23:03:43 +0000  9873)
  if (!nsCRT::strcmp(aTopic, "look-and-feel-changed")) {
d622f54db0014 layout/base/PresShell.cpp
  (Emilio Cobos Álvarez  2020-10-27 10:24:40 +0000  9874)
  // See how LookAndFeel::NotifyChangedAllWindows encodes this.
0080b2b390d92 layout/base/PresShell.cpp
  (Emilio Cobos Álvarez  2021-05-13 01:53:08 +0200  9875)
  auto kind = widget::ThemeChangeKind(aData[0]);
d622f54db0014 layout/base/PresShell.cpp
  (Emilio Cobos Álvarez  2020-10-27 10:24:40 +0000  9876)
  ThemeChanged(kind);
7f498141e7939 layout/base/PresShell.cpp
  (Adam Gashlin          2019-09-19 19:05:01 +0000  9877)
  return NS_OK;
7f498141e7939 layout/base/PresShell.cpp
  (Adam Gashlin          2019-09-19 19:05:01 +0000  9878)
  }

$ git log -1 d622f54db0014
commit d622f54db00143472ac60aecf7e18728c6a5908c
Author: Emilio Cobos Álvarez 
Date:   Tue Oct 27 10:24:40 2020 +0000

    Bug 1668875 - Distinguish theme changes that can and cannot affect style/layout. r=tnikkel
    
    This should make the optimization landed earlier in this bug apply for
    some of the NotifyThemeChanged() calls in nsWindow.cpp which are causing
    all the extra invalidations.
    
    If we know that system colors/fonts didn't change, we can avoid doing a
    bunch of reflow work and the patch from earlier in the bug can avoid
    re-rasterizing images too.
    
    Differential Revision: https://phabricator.services.mozilla.com/D94425
From this diff we can see there's a new file widget/ThemeChangeKind.h with a new enum:
enum class ThemeChangeKind : uint8_t {
  // This is the cheapest change, no need to forcibly recompute style and/or
  // layout.
  MediaQueriesOnly = 0,
  // Style needs to forcibly be recomputed because some of the stuff that may
  // have changed, like system colors, are reflected in the computed style but
  // not in the specified style.
  Style = 1 << 0,
  // Layout needs to forcibly be recomputed because some of the stuff that may
  // have changed is layout-dependent, like system font.
  Layout = 1 << 1,
  // The union of the two flags above.
  StyleAndLayout = Style | Layout,
  // For IPC serialization purposes.
  AllBits = Style | Layout,
};
The new parameter is supposed to be set as one of these enum values.

Rather than setting it explicitly, there is now this method that's part of the LookAndFeel class. It's static but also public and inherited by nsXPLookAndFeel which is itself inherited by our Qt nsLookAndFeel class, so we should be able to access it directly.

So I have replaced this: obs->NotifyObservers(nullptr, "look-and-feel-changed", nullptr); With this new call that essentially wraps the same code in a function:
NotifyChangedAllWindows(widget::ThemeChangeKind::StyleAndLayout);
It's time to set things compiling. Building using mach will take at least a couple of hours, so I'm attempting to use the shortcut described on the "Working with Browser" page. Not just the compile step but also the libxul.so creation step. This should be a lot quicker.

The build is indeed quicker, although copying the library, including all the debug symbols, over to my phone took an age. Admittedly not a seven-hour age, but still quite some time.

On executing the browser many (but not all) of the JavaScript errors are now gone. There's no segfault any more and no other crashes. The rendering still doesn't work, but we do at least now also have logging:
$ EMBED_CONSOLE=1 sailfish-browser
[D] unknown:0 - Using Wayland-EGL
library "libGLESv2_adreno.so" not found
library "eglSubDriverAndroid.so" not found
greHome from GRE_HOME:/usr/bin
libxul.so is not found, in /usr/bin/libxul.so
Created LOG for EmbedLiteTrace
[...]
CONSOLE message:
OpenGL compositor Initialized Succesfully.
Version: OpenGL ES 3.2 V@415.0 (GIT@248cd04, I42b5383e2c, 1569430435) (Date:09/25/19)
Vendor: Qualcomm
Renderer: Adreno (TM) 610
FBO Texture Target: TEXTURE_2D
JavaScript error: file:///usr/lib64/mozembedlite/components/
  EmbedLiteChromeManager.js, line 213: NS_ERROR_FILE_NOT_FOUND: 
JSScript: ContextMenuHandler.js loaded
JSScript: SelectionPrototype.js loaded
JSScript: SelectionHandler.js loaded
JSScript: SelectAsyncHelper.js loaded
JSScript: FormAssistant.js loaded
JSScript: InputMethodHandler.js loaded
EmbedHelper init called
[...]
Tomorrow I'll continue working on the rendering pipeline.

As always, if you'd like to read more about all this gecko stuff, do take a look at my full Gecko Dev Diary.
Comment
12 Oct 2023 : Day 57 #
We continue today with the render pipeline, following application of a few additional patches yesterday. We also set the values correctly for a couple of preferences (or, more accurately, changed the hard-coded values that are currently acting as proxies for them).

After building packages overnight I've been testing them this morning. Yesterday the initial crash happened due to an attempt to call a SwapChain method before having instantiated the class. Recall that we hit a segmentation fault as a result with the following backtrace.
Thread 35 "Compositor" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 9321]
mozilla::gl::SwapChain::OffscreenSize (this=0x0) at 
    /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/gl/GLScreenBuffer.cpp:129
129       return mPresenter->mBackBuffer->mFb->mSize;
(gdb) bt
#0  mozilla::gl::SwapChain::OffscreenSize (this=0x0) at
    /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/gl/GLScreenBuffer.cpp:129
#1  0x0000007fbcc8149c in mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    CompositeToDefaultTarget (this=0x7f8859d7f0, aId=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:290
#2  0x0000007fba8d1bec in mozilla::layers::CompositorVsyncScheduler::
    ForceComposeToTarget (this=0x7f88737560, aTarget=aTarget@entry=0x0, 
    aRect=aRect@entry=0x0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/layers/LayersTypes.h:82
#3  0x0000007fba8d1c48 in mozilla::layers::CompositorBridgeParent::
    ResumeComposition (this=this@entry=0x7f8859d7f0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#4  0x0000007fba8d1cd4 in mozilla::layers::CompositorBridgeParent::
    ResumeCompositionAndResize (this=0x7f8859d7f0, x=, y=, 
    width=, height=) at /usr/src/debug/
    xulrunner-qt5-91.9.1-1.aarch64/gfx/layers/ipc/CompositorBridgeParent.cpp:794
I wasn't able to replicate this backtrace on ESR 78, and the reason turned out to be because ESR 91 was set to render off-screen.

After applying the patches and preference fixes there's no longer a crash. But do we still get this backtrace? I've abridged the output for clarity, but this is what happens:
$ EMBED_CONSOLE=1 gdb sailfish-browser
GNU gdb (GDB) Mer (8.2.1+git9)
(gdb) b EmbedLiteCompositorBridgeParent::CompositeToDefaultTarget
(gdb) r
Starting program: /usr/bin/sailfish-browser 
[...]
Thread 34 "Compositor" hit Breakpoint 1, non-virtual thunk to mozilla::
  embedlite::EmbedLiteCompositorBridgeParent::CompositeToDefaultTarget
  (mozilla::layers::BaseTransactionId) ()
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/mobile/sailfishos/
    embedthread/EmbedLiteCompositorBridgeParent.h:58
58	  virtual void CompositeToDefaultTarget(VsyncId aId) override;
(gdb) bt
#0  non-virtual thunk to mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    CompositeToDefaultTarget(mozilla::layers::BaseTransactionId
    ) () at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/
    mobile/sailfishos/embedthread/EmbedLiteCompositorBridgeParent.h:58
#1  0x0000007fba8d1bec in mozilla::layers::CompositorVsyncScheduler::
    ForceComposeToTarget (this=0x7f88709ef0, aTarget=aTarget@entry=0x0, 
    aRect=aRect@entry=0x0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/layers/LayersTypes.h:82
#2  0x0000007fba8d1c48 in mozilla::layers::CompositorBridgeParent::
    ResumeComposition (this=this@entry=0x7f8877d1c0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#3  0x0000007fba8d1cd4 in mozilla::layers::CompositorBridgeParent::
    ResumeCompositionAndResize (this=0x7f8877d1c0, x=,
    y=, width=, height=) at
    /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/layers/ipc/
    CompositorBridgeParent.cpp:794
#4  0x0000007fba8ca870 in mozilla::detail::RunnableMethodArguments::applyImpl, StoreCopyPassByConstLRef,
    StoreCopyPassByConstLRef, StoreCopyPassByConstLRef, 0ul, 1ul, 2ul,
    3ul> (args=..., m=, o=)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1151
[...]
(gdb) 
I'm a little surprised to see that this still matches up with the path from yesterday: ResumeCompositionAndResize() calling ResumeComposition() calling ForceComposeToTarget() calling CompositeToDefaultTarget().

In comparison on ESR 78 the first hit is ResumeComposition() calling CompositeToDefaultTarget() directly:
Thread 40 "Compositor" hit Breakpoint 1, non-virtual thunk to mozilla::
    embedlite::EmbedLiteCompositorBridgeParent::CompositeToDefaultTarget
    (mozilla::layers::BaseTransactionId) ()
    at /usr/src/debug/xulrunner-qt5-78.15.1+git33.2-1.21.1.jolla.aarch64/mobile/
    sailfishos/embedthread/EmbedLiteCompositorBridgeParent.h:58
58        virtual void CompositeToDefaultTarget(VsyncId aId) override;
(gdb) bt
#0  non-virtual thunk to mozilla::embedlite::EmbedLiteCompositorBridgeParent::CompositeToDefaultTarget
    (mozilla::layers::BaseTransactionId) () at /usr/src/
    debug/xulrunner-qt5-78.15.1+git33.2-1.21.1.jolla.aarch64/mobile/sailfishos/
    embedthread/EmbedLiteCompositorBridgeParent.h:58
#1  0x0000007ff2a729b0 in mozilla::layers::CompositorBridgeParent::
    ResumeComposition (this=0x7fb89be110)
    at /home/abuild/rpmbuild/BUILD/xulrunner-qt5-78.15.1+git33.2/
    obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#2  0x0000007ff2a662ec in mozilla::detail::RunnableMethodArguments::applyImpl, StoreCopyPassByConstLRef,
    StoreCopyPassByConstLRef, StoreCopyPassByConstLRef, 0ul, 1ul, 2ul,
    3ul> (args=..., m=, o=)
    at /home/abuild/rpmbuild/BUILD/xulrunner-qt5-78.15.1+git33.2/
    obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1188
[...]
#15 0x0000007fef65b89c in ?? () from /lib64/libc.so.6
(gdb) 
It's always hard to follow the callstack back before any of the Runnable steps because these are events that have potentially been scheduled on a separate thread. Looking at the code though, it looks like the resize may be triggered by a call to CompositorBridgeParent::ScheduleResumeOnCompositorThread(). It resizes depending on whether it has dimension arguments or not.

So I've placed a breakpoint on this method and kicked off the execution again for both versions.

On the ESR 91 side I get this:
Thread 1 "sailfish-browse" hit Breakpoint 2, mozilla::layers::
    CompositorBridgeParent::ScheduleResumeOnCompositorThread
    (this=this@entry=0x7f8877d670, 
    x=0, y=0, width=1080, height=2520) at /usr/src/debug/
    xulrunner-qt5-91.9.1-1.aarch64/gfx/layers/ipc/CompositorBridgeParent.cpp:829
829	                                                              int height) {
(gdb) bt
#0  mozilla::layers::CompositorBridgeParent::ScheduleResumeOnCompositorThread
    (this=this@entry=0x7f8877d670, x=0, y=0, width=1080, height=2520)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/layers/ipc/
    CompositorBridgeParent.cpp:829
#1  0x0000007fbcc81b74 in mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    ResumeRendering (this=0x7f8877d670)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/mobile/sailfishos/
    embedthread/EmbedLiteCompositorBridgeParent.cpp:295
#2  0x0000007fbcc99ccc in mozilla::embedlite::EmbedLiteWindowParent::
    ResumeRendering (this=)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/mobile/sailfishos/
    embedshared/EmbedLiteWindowParent.cpp:100
#3  0x0000007fbcc84688 in mozilla::embedlite::EmbedLiteWindow::
    ResumeRendering (this=)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/mobile/sailfishos/
    EmbedLiteWindow.cpp:70
#4  0x000000555559004c in _start ()
(gdb) 
While on the ESR 78 side I get this:
Thread 1 "sailfish-browse" hit Breakpoint 2, mozilla::layers::
    CompositorBridgeParent::ScheduleResumeOnCompositorThread
    (this=this@entry=0x7fb89c5e20, 
    x=0, y=0, width=1080, height=2520) at /usr/src/debug/
    xulrunner-qt5-78.15.1+git33.2-1.21.1.jolla.aarch64/gfx/layers/ipc/
    CompositorBridgeParent.cpp:814
814       MonitorAutoLock lock(mResumeCompositionMonitor);
(gdb) bt
#0  mozilla::layers::CompositorBridgeParent::ScheduleResumeOnCompositorThread
    (this=this@entry=0x7fb89c5e20, x=0, y=0, width=1080, height=2520)
    at /usr/src/debug/xulrunner-qt5-78.15.1+git33.2-1.21.1.jolla.aarch64/gfx/
    layers/ipc/CompositorBridgeParent.cpp:814
#1  0x0000007ff4d83188 in mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    ResumeRendering (this=0x7fb89c5e20)
    at /usr/src/debug/xulrunner-qt5-78.15.1+git33.2-1.21.1.jolla.aarch64/mobile/
    sailfishos/embedthread/EmbedLiteCompositorBridgeParent.cpp:279
#2  0x000000555559004c in _start ()
(gdb) 
It's also notable that on the ESR 78 side I'm getting this, which isn't appearing for ESR 91:
OpenGL compositor Initialized Succesfully.
Version: OpenGL ES 3.2 V@0502.0 (GIT@704ecd9a2b, Ib3f3e69395, 1609240670) (Date:12/29/20)
Vendor: Qualcomm
Renderer: Adreno (TM) 619
FBO Texture Target: TEXTURE_2D
These are all hinting at significant and important differences between the two. Plenty to look into.

But I have to stop for a bit; I'll return to this later.

[...]

Now I'm back and trying to focus again. I've placed a couple of breakpoints on CompositorOGL::Initialize(), which is where the "OpenGL compositor Initialized Succesfully" text is being printed from. My guess is that this will fire on ESR 78 but not on ESR 91. I'm hoping that the ESR 78 backtrace will offer up some hints.

In fact the breakpoint fires for both executables. Here's the ESR 91 backtrace:
Thread 34 "Compositor" hit Breakpoint 2, mozilla::layers::CompositorOGL::
    Initialize (this=0x7eb8110650, out_failureReason=0x7f9c078560)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/layers/opengl/
    CompositorOGL.cpp:380
380	bool CompositorOGL::Initialize(nsCString* const out_failureReason) {
(gdb) bt
#0  mozilla::layers::CompositorOGL::Initialize (this=0x7eb8110650,
    out_failureReason=0x7f9c078560)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/layers/opengl/
    CompositorOGL.cpp:380
#1  0x0000007fba8e03c4 in mozilla::layers::CompositorBridgeParent::NewCompositor
    (this=this@entry=0x7f884c91c0, aBackendHints=...)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/layers/ipc/
    CompositorBridgeParent.cpp:1493
#2  0x0000007fba8eb440 in mozilla::layers::CompositorBridgeParent::
    InitializeLayerManager (this=this@entry=0x7f884c91c0, aBackendHints=...)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/layers/ipc/
    CompositorBridgeParent.cpp:1436
#3  0x0000007fba8eb570 in mozilla::layers::CompositorBridgeParent::
    AllocPLayerTransactionParent (this=this@entry=0x7f884c91c0,
    aBackendHints=..., aId=...)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/layers/ipc/
    CompositorBridgeParent.cpp:1546
#4  0x0000007fbcc81dac in mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    AllocPLayerTransactionParent (this=0x7f884c91c0, aBackendHints=..., 
    aId=...) at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/mobile/sailfishos/
    embedthread/EmbedLiteCompositorBridgeParent.cpp:86
#5  0x0000007fba27f8a4 in mozilla::layers::PCompositorBridgeParent::
    OnMessageReceived (this=0x7f884c91c0, msg__=...) at
    PCompositorBridgeParent.cpp:1285
[...]
#20 0x0000007fb79cf89c in ?? () from /lib64/libc.so.6
(gdb) 
The backtrace from ESR 78 is basically identical, just with slightly different line numbers.

Stepping through the method makes clear that it is firing the same debug output on both after all. For some reason it's not showing on the ESR 91 version, but it is getting logged. So this isn't the smoking gun I thought it might be.

This isn't quite a dead end though: it raises a new question about log output. While logging is clearly broken in ESR 91, it's worth getting it to work before proceeding. Having logging working could pay dividends in the long-run. So I'm diverting briefly to try to address this.

Let's get straight in to it. When we execute the browser we get immediate errors that look like this:
JavaScript error: file:///usr/lib64/mozembedlite/components/
  EmbedLiteConsoleListener.js, line 251: TypeError:
  XPCOMUtils.generateNSGetFactory is not a function
JavaScript error: file:///usr/lib64/mozembedlite/components/
  ContentPermissionManager.js, line 94: TypeError:
  XPCOMUtils.generateNSGetFactory is not a function
These are going to stop a lot of the front-end and logging capabilities from working, even if they wouldn't necessarily stop the browser from rendering.

Looking through the code it quickly becomes clear that XPCOMUtils.generateNSGetFactory() has been moved from the XPCOMUtils.jsm file to the ComponentUtils.jsm file. Therefore we have to add a line and amend a line, so that instead of this:
const { XPCOMUtils } = ChromeUtils.import("resource://gre/modules/XPCOMUtils.jsm");
const { Services } = ChromeUtils.import("resource://gre/modules/Services.jsm");
[...]
this.NSGetFactory = XPCOMUtils.generateNSGetFactory([$EmbedLiteConsoleListener]);
We have this:
const { ComponentUtils } = ChromeUtils.import("resource://gre/modules/ComponentUtils.jsm");
const { XPCOMUtils } = ChromeUtils.import("resource://gre/modules/XPCOMUtils.jsm");
const { Services } = ChromeUtils.import("resource://gre/modules/Services.jsm");
[...]
this.NSGetFactory = ComponentUtils.generateNSGetFactory([$EmbedLiteConsoleListener]);
Just changing this in EmbedLiteConsoleListener.js is enough to get logging working. I should quickly sort it for all of the files shown in the logs though for good measure. So that's all of these files that where generating the error:
EmbedLiteConsoleListener.js
ContentPermissionManager.js
EmbedLiteChromeManager.js
EmbedLiteErrorPageHandler.js
EmbedLiteFaviconService.js
EmbedLiteOrientationChangeHandler.js
EmbedLiteSearchEngine.js
EmbedLiteSyncService.js
EmbedLiteWebrtcUI.js
EmbedPrefService.js
EmbedliteDownloadManager.js
LoginsHelper.js
PrivateDataManager.js
UserAgentOverrideHelper.js
After updating these files the debug output is both cleaner and more informative. But the browser also now crashes with the following output:
$ EMBED_CONSOLE=1 sailfish-browser
[D] unknown:0 - Using Wayland-EGL
library "libGLESv2_adreno.so" not found
library "eglSubDriverAndroid.so" not found
greHome from GRE_HOME:/usr/bin
libxul.so is not found, in /usr/bin/libxul.so
Created LOG for EmbedLiteTrace
[...]
OpenGL compositor Initialized Succesfully.
Version: OpenGL ES 3.2 V@415.0 (GIT@248cd04, I42b5383e2c, 1569430435) (Date:09/25/19)
Vendor: Qualcomm
Renderer: Adreno (TM) 610
FBO Texture Target: TEXTURE_2D
Segmentation fault (core dumped)
And with the following backtrace.
Thread 8 "GeckoWorkerThre" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 7247]
mozilla::PresShell::Observe (this=0x7f8895ec90, aSubject=,
    aTopic=0x7fbe27b780 "look-and-feel-changed", aData=0x0)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/layout/base/PresShell.cpp:9876
9876        ThemeChanged(kind);
(gdb) bt
#0  mozilla::PresShell::Observe (this=0x7f8895ec90, aSubject=,
    aTopic=0x7fbe27b780 "look-and-feel-changed", aData=0x0)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/layout/base/PresShell.cpp:9876
#1  0x0000007fb9dab038 in nsObserverList::NotifyObservers (this=,
    aSubject=aSubject@entry=0x0, 
    aTopic=aTopic@entry=0x7fbe27b780 "look-and-feel-changed", someData=someData@entry=0x0)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/xpcom/ds/nsTArray.h:413
#2  0x0000007fb9db6718 in nsObserverService::NotifyObservers (this=0x7f88046f30,
    aSubject=0x0, aTopic=0x7fbe27b780 "look-and-feel-changed", aSomeData=0x0)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/xpcom/ds/nsObserverService.cpp:291
#3  0x0000007fbc0c0060 in nsLookAndFeel::Observer::Observe (this=,
    aTopic=, aData=)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/widget/qt/nsLookAndFeel.cpp:94
#4  0x0000007fb9dab038 in nsObserverList::NotifyObservers (this=,
    aSubject=aSubject@entry=0x0, 
    aTopic=aTopic@entry=0x7f88a092d8 "ambience-theme-changed",
    someData=someData@entry=0x7f889b37c8 u"dark")
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/xpcom/ds/nsTArray.h:413

[...]
#34 0x0000007fb79cf89c in ?? () from /lib64/libc.so.6
(gdb) 
I can only apologise for all of the backtraces today. These aren't so much rabbit holes as full-blown rabbit warrens. But this is the nature of the work right now, and I'm afraid it's likely to continue to be until this rendering is working. Probably even a lot beyond that.

As for now, these horrific backtraces have led us somewhere useful today, but actually getting to the bottom of it will have to wait until tomorrow.

As always, if you'd like to read more about all this gecko stuff, do take a look at my full Gecko Dev Diary.
Comment
11 Oct 2023 : Day 56 #
This morning I woke up feeling much refreshed. That's just as well because it's a bit of a long one today (you have been warned!) I checked the build as soon as I got up but was surprised to find it still running. It had reached the final step of writing out the rpm packages, but wasn't quite there yet. This is either an indication that the build is taking far too long, or my sleep is too short. Given my bright demeanour this morning, I'm going for the former.

Thankfully by the end of breakfast the build had completed. I'm now scp-ing it over to my phone to test. The changes from yesterday will hopefully have hooked the EmbedLite compositor into the render pipeline. There's a very small chance that it will attempt to render something now. More likely is that this will be only one step on the way to fixing the render. Even more likely yet is that this change will cause the browser to crash. None of these would necessarily be a bad thing.

As the files copy over to my phone I realise I've made a stupid mistake:
xulrunner-qt5-78.15.1+git36+sailfishos.esr91.20231006232636.0ad0f289bdc6+gecko.dev.49148b0b591a-1.aarch64.rpm
xulrunner-qt5-debuginfo-78.15.1+git36+sailfishos.esr91.20231006232636.0ad0f289bdc6+gecko.dev.49148b0b591a-1.aarch64.rpm
xulrunner-qt5-debugsource-78.15.1+git36+sailfishos.esr91.20231006232636.0ad0f289bdc6+gecko.dev.49148b0b591a-1.aarch64.rpm
xulrunner-qt5-devel-78.15.1+git36+sailfishos.esr91.20231006232636.0ad0f289bdc6+gecko.dev.49148b0b591a-1.aarch64.rpm
xulrunner-qt5-misc-78.15.1+git36+sailfishos.esr91.20231006232636.0ad0f289bdc6+gecko.dev.49148b0b591a-1.aarch64.rpm
The version numbers on those packages are all 78.15.1. Aaargh. I've built the correct code, but with the wrong version number.

How does this happen? It's all down to the way Jolla specifies version numbers, which is baked into the Sailfish SDK. As any Sailfish OS developer will know, in order to package up some code into an RPM package you have to create what's called a spec-file. This specifies all sorts of aspects of the packages, including their names, how to build the code, which files to include and so on. It also specifies the version number of the packages.

Jolla chooses not to use the version number in the spec file. Instead Jolla devs tag commits in the git repository with the version number. When you build a package using the Platform or Application SDK, the build tooling will check the latest tag and use that instead of the value in the spec file. I'm not entirely certain why Jolla does it this way. Maybe it's to avoid having to commit changes to the repo just to set the version number. Maybe it's because having a dynamic version number can be helpful. Whatever the reason, this is how it is.

I don't want to tag gecko with the new version number yet because it's not ready. So I prefer to use the version in the spec file during development. There is a way to get sfdk to honour this by setting the no-fix-version configuration option. Do this and the build chain will use the version number in the spec file.

I usually set this configuration option like this:
sfdk config --session --push no-fix-version
That essentially sets the option until you close the shell you're working in. I've been doing it this way because I only want the option to apply to gecko, not to other things I build. So it's convenient to constrain it just to a single session.

Somehow my session changed. Maybe I opened a new shell or switched my gnu screen window; probably it was a consequence of me being so tired, but now checking this morning I can see the option hasn't been set.

So the build used the last tag rather than the version in the spec file. I think I need to fix my processes. So rather than using a session configuration value, in future I'm going to add it to my build command, like this:
sfdk -c no-fix-version build -d -p --with git_workaround
I need the build to be using the correct version number, otherwise it will mess up the dependencies; possibly mess up where the files are installed; potentially other messy things. So after it spent all night building I've just now run this command to do it all over again.

[...]

The build completed (much more quickly this time); I've scp-d over the packages to my phone and installed them. Time for another run.
$ EMBED_CONSOLE=1 sailfish-browser 
[D] unknown:0 - Using Wayland-EGL
library "libGLESv2_adreno.so" not found
library "eglSubDriverAndroid.so" not found
greHome from GRE_HOME:/usr/bin
libxul.so is not found, in /usr/bin/libxul.so
Created LOG for EmbedLiteTrace
[W] unknown:0 - Unable to open bookmarks  "/home/defaultuser/.local/share/
  org.sailfishos/browser/bookmarks.json"
[D] onCompleted:105 - ViewPlaceholder requires a SilicaFlickable parent
Created LOG for EmbedLite
JavaScript error: file:///usr/lib64/mozembedlite/components/
  EmbedLiteConsoleListener.js, line 251: TypeError: 
  XPCOMUtils.generateNSGetFactory is not a function
JavaScript error: file:///usr/lib64/mozembedlite/components/
  ContentPermissionManager.js, line 94: TypeError:
  XPCOMUtils.generateNSGetFactory is not a function
[...]
JavaScript error: resource://gre/modules/EnterprisePoliciesParent.jsm, line 500:
  TypeError: Services.appinfo is undefined
JavaScript error: resource://gre/modules/AddonManager.jsm, line 1479:
  NS_ERROR_NOT_INITIALIZED: AddonManager is not initialized
JavaScript error: resource://gre/modules/URLQueryStrippingListService.jsm, line 42:
  TypeError: Services.appinfo is undefined
Created LOG for EmbedPrefs
Created LOG for EmbedLiteLayerManager
JSScript: ContextMenuHandler.js loaded
JSScript: SelectionPrototype.js loaded
JSScript: SelectionHandler.js loaded
JSScript: SelectAsyncHelper.js loaded
JSScript: FormAssistant.js loaded
JSScript: InputMethodHandler.js loaded
EmbedHelper init called
Available locales: en-US, fi, ru
Frame script: embedhelper.js loaded
JavaScript error: chrome://embedlite/content/embedhelper.js, line 259: TypeError: sessionHistory is null
Segmentation fault (core dumped)
The application runs for a bit and starts downloading Jolla's Web site. It gets far enough to download and decode the TLS certificate. But as soon as it hits rendering properly it now crashes with a segmentation fault. This doesn't sound great but it might turn out to be good news. It gives us a lead. If we run it through the debugger we can figure out where the crash is occurring.

So having done that, here's the backtrace.
Thread 35 "Compositor" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 9321]
mozilla::gl::SwapChain::OffscreenSize (this=0x0) at
  /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/gl/GLScreenBuffer.cpp:129
129       return mPresenter->mBackBuffer->mFb->mSize;
(gdb) bt
#0  mozilla::gl::SwapChain::OffscreenSize (this=0x0) at
    /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/gl/GLScreenBuffer.cpp:129
#1  0x0000007fbcc8149c in mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    CompositeToDefaultTarget (this=0x7f8859d7f0, aId=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:290
#2  0x0000007fba8d1bec in mozilla::layers::CompositorVsyncScheduler::
    ForceComposeToTarget (this=0x7f88737560, aTarget=aTarget@entry=0x0, 
    aRect=aRect@entry=0x0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/layers/LayersTypes.h:82
#3  0x0000007fba8d1c48 in mozilla::layers::CompositorBridgeParent::
    ResumeComposition (this=this@entry=0x7f8859d7f0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#4  0x0000007fba8d1cd4 in mozilla::layers::CompositorBridgeParent::
    ResumeCompositionAndResize (this=0x7f8859d7f0, x=,
    y=, width=, height=) at
    /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/layers/ipc/CompositorBridgeParent.cpp:794
#5  0x0000007fba8ca870 in mozilla::detail::RunnableMethodArguments::applyImpl, StoreCopyPassByConstLRef,
    StoreCopyPassByConstLRef, StoreCopyPassByConstLRef, 0ul, 1ul, 2ul,
    3ul> (args=..., m=, o=)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1151
[...]
#17 0x0000007fb79cf89c in ?? () from /lib64/libc.so.6
From this we can see that the crash is happening in the SwapChain::OffscreenSize() method called from EmbedLiteCompositorBridgeParent::CompositeToDefaultTarget() on line 129 of the file GLScreenBuffer.cpp.

Crashes are never good of course, but I'm nevertheless quite happy about this. This is part of the code I had to hack around with to get the library to build, so it's not so surprising that it's causing problems. The issue seems to be that the SwapChain instance hasn't been constructed at all; the pointer to it is set to null. That should be possible to fix.

The backtrace doesn't show it, but the call to SwapChain::OffscreenSize() is happening on line 160 of EmbedLiteCompositorBridgeParent.cpp. It's being masked in the backtrace by the UniquePtr. Here's the problem line:
    if (context->GetSwapChain()->OffscreenSize() != mEGLSurfaceSize
      && !context->GetSwapChain()->Resize(mEGLSurfaceSize)) {
      return;
    }
Presumably GetSwapChain() is returning null. Let's dig into that. Before I changed it this code used to look like this:
    if (context->OffscreenSize() != mEGLSurfaceSize
      && !context->ResizeOffscreen(mEGLSurfaceSize)) {
      return;
    }
Through a roundabout kind of route, I added the context->GetSwapChain() method which looks like this:
  SwapChain* GetSwapChain() const { return mSwapChain.get(); }
The equivalent method used for returning mScreen used to look like this:
  GLScreenBuffer* Screen() const { return mScreen.get(); }
I guess you can see the pattern there. The context->OffscreenSize() method used to look like this:
const gfx::IntSize& GLContext::OffscreenSize() const {
  MOZ_ASSERT(IsOffscreen());
  return mScreen->Size();
}
So in essence what's happened is I've swapped a call (excuse the notational looseness here) to GLContext::mScreen.get()->Size() with a call to GLContext::mSwapChain.get().OffscreenSize().

What we've drilled down to — and the point I'm rather clumsily trying to make — is that mSwapChain is the new mScreen and at this point in the previous code mScreen contained a live instance. So in the current code mSwapChain should be pointing to a live instance by now as well. But it's not.

Wherever mScreen was being created in the old code, mSwapChain should probably be being created now.

Checking the code, right now it looks like mSwapChain is never constructed. The method I'd expect to create it is called from GLContext like this:
  GLContext* context = static_cast(state->mLayerManager->GetCompositor())->gl();
In the previous version mScreen was being created like this:
bool GLContext::CreateScreenBufferImpl(const IntSize& size,
                                       const SurfaceCaps& caps) {
  UniquePtr newScreen =
      GLScreenBuffer::Create(this, size, caps);
  if (!newScreen) return false;

  if (!newScreen->Resize(size)) {
    return false;
  }

  // This will rebind to 0 (Screen) if needed when
  // it falls out of scope.
  ScopedBindFramebuffer autoFB(this);

  mScreen = std::move(newScreen);

  return true;
}
Just by looking at the code I can see the call stack for this is something like this:
GLContext::CreateScreenBufferImpl()
GLContext::CreateScreenBuffer()
GLContext::InitOffscreen()
GLContextProviderEGL::CreateOffscreen()
None of these methods exist anymore and following this back in the ESR 78 code is turning out to be a bit troublesome, so I'm going to fire up a version of the old browser in the debugger to check this properly.
$ gdb sailfish-browser
(gdb) b GLContext::CreateScreenBufferImpl
(gdb) r
Starting program: /usr/bin/sailfish-browser 
[...]
(gdb) info break
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x0000007ff29237d8 in mozilla::gl::GLContext::
  CreateScreenBufferImpl(mozilla::gfx::IntSizeTyped
  const&, mozilla::gl::SurfaceCaps const&) 
                                                   at /usr/src/debug/
  xulrunner-qt5-78.15.1+git33.2-1.21.1.jolla.aarch64/gfx/gl/GLContext.cpp:2120
But the breakpoint is never hit. It's definitely set correctly, it's just that this particular piece of code is never executed.

That's interesting.

Some of the methods higher up the stack are being called.
Thread 39 "Compositor" hit Breakpoint 2, mozilla::layers::
    CompositorBridgeParent::ResumeCompositionAndResize (this=0x7fb89bd230, x=0,
    y=0, width=1080, height=2520)
    at /usr/src/debug/xulrunner-qt5-78.15.1+git33.2-1.21.1.jolla.aarch64/gfx/
    layers/ipc/CompositorBridgeParent.cpp:777
777	  SetEGLSurfaceRect(x, y, width, height);
(gdb) bt
#0  mozilla::layers::CompositorBridgeParent::ResumeCompositionAndResize
    (this=0x7fb89bd230, x=0, y=0, width=1080, height=2520)
    at /usr/src/debug/xulrunner-qt5-78.15.1+git33.2-1.21.1.jolla.aarch64/gfx/
    layers/ipc/CompositorBridgeParent.cpp:777
#1  0x0000007ff2a662ec in mozilla::detail::RunnableMethodArguments::applyImpl, StoreCopyPassByConstLRef,
    StoreCopyPassByConstLRef, StoreCopyPassByConstLRef, 0ul, 1ul, 2ul,
    3ul> (args=..., m=, o=)
    at /home/abuild/rpmbuild/BUILD/xulrunner-qt5-78.15.1+git33.2/
    obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1188
#2  mozilla::detail::RunnableMethodArguments::apply
     (m=, 
    o=, this=)
    at /home/abuild/rpmbuild/BUILD/xulrunner-qt5-78.15.1+git33.2/
    obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1191
[...]
#14 0x0000007fef65b89c in ?? () from /lib64/libc.so.6
(gdb) 
But I can't get an identical backtrace up any further than this. There are calls to EmbedLiteCompositorBridgeParent::CompositeToDefaultTarget() but these don't go via CompositorBridgeParent::ResumeCompositionAndResize().

It looks like the crucial difference is that context->IsOffscreen() is set differently for each. There is a difference in implementation for these. In ESR 91:
  bool IsOffscreen() const { return mDesc.isOffscreen; }
And in ESR 78:
  bool IsOffscreen() const { return mIsOffscreen; }
Let's check the value for ESR 78 in the debugger:
(gdb) p context
$1 = (mozilla::gl::GLContext *) 0x7ed41a2840
(gdb) p context->IsOffscreen()
Cannot evaluate function -- may be inlined
(gdb) p context->mIsOffscreen
$3 = false
(gdb) 
In ESR 78 this is set when the context is created and never changed:
GLContext::GLContext(CreateContextFlags flags, const SurfaceCaps& caps,
                     GLContext* sharedContext, bool isOffscreen,
                     bool useTLSIsCurrent)
    : mUseTLSIsCurrent(ShouldUseTLSIsCurrent(useTLSIsCurrent)),
      mIsOffscreen(isOffscreen),
      mDebugFlags(ChooseDebugFlags(flags)),
      mSharedContext(sharedContext),
      mCaps(caps),
      mWorkAroundDriverBugs(
          StaticPrefs::gfx_work_around_driver_bugs_AtStartup()) {
  mOwningThreadId = PlatformThread::CurrentId();
  MOZ_ALWAYS_TRUE(sCurrentContext.init());
  sCurrentContext.set(0);
}
This is also true in ESR 91 except in that case the value of mDesc is passed to the GLContext constructor. Here's the constructor being called for ESR 91; notice in this case that mDesc.isOffscreen is set to true:
Thread 35 "Compositor" hit Breakpoint 1, mozilla::gl::GLContext::GLContext
    (this=this@entry=0x7eb0108e40, desc=..., 
    sharedContext=sharedContext@entry=0x0, useTLSIsCurrent=useTLSIsCurrent@entry=false)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/gl/GLContext.cpp:283
283     GLContext::GLContext(const GLContextDesc& desc, GLContext* sharedContext,
(gdb) p desc
$1 = (const mozilla::gl::GLContextDesc &) @0x7ebbf7b0f0: {
     = {
    flags = mozilla::gl::CreateContextFlags::REQUIRE_COMPAT_PROFILE},
    isOffscreen = true}
(gdb) bt
#0  mozilla::gl::GLContext::GLContext (this=this@entry=0x7eb0108e40, desc=...,
    sharedContext=sharedContext@entry=0x0, 
    useTLSIsCurrent=useTLSIsCurrent@entry=false) at /usr/src/debug/
    xulrunner-qt5-91.9.1-1.aarch64/gfx/gl/GLContext.cpp:283
#1  0x0000007fba727f58 in mozilla::gl::GLContextEGL::GLContextEGL
    (this=0x7eb0108e40, egl=std::shared_ptr (use count
    5, weak count 2) = {...}, desc=..., config=0x5555a80fc0, surface=0x7eb0003ce0, 
    context=0x7eb0004bb0) at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/
    gl/GLContextProviderEGL.cpp:377
#2  0x0000007fba74c314 in mozilla::gl::GLContextEGL::CreateGLContext
    (egl=std::shared_ptr (use count 5, weak count 2) =
    {...}, desc=..., config=, config@entry=0x5555a80fc0,
    surface=surface@entry=0x7eb0003ce0, useGles=useGles@entry=true, 
    out_failureId=out_failureId@entry=0x7ebbf7b208)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/cxxalloc.h:33
[...]
#27 0x0000007fb79cf89c in ?? () from /lib64/libc.so.6
(gdb) 
And here's the same on ESR 78; note the value of isOffscreen is set to false:
Thread 38 "Compositor" hit Breakpoint 2, mozilla::gl::GLContext::GLContext
    (this=this@entry=0x7ed41a2810, flags=mozilla::gl::CreateContextFlags::NONE, 
    caps=..., sharedContext=sharedContext@entry=0x0, isOffscreen=false,
    useTLSIsCurrent=useTLSIsCurrent@entry=false)
    at /usr/src/debug/xulrunner-qt5-78.15.1+git33.2-1.21.1.jolla.aarch64/gfx/gl/
    GLContext.cpp:274
274	GLContext::GLContext(CreateContextFlags flags, const SurfaceCaps& caps,
(gdb) bt
#0  mozilla::gl::GLContext::GLContext (this=this@entry=0x7ed41a2810,
    flags=mozilla::gl::CreateContextFlags::NONE, caps=..., 
    sharedContext=sharedContext@entry=0x0, isOffscreen=false,
    useTLSIsCurrent=useTLSIsCurrent@entry=false)
    at /usr/src/debug/xulrunner-qt5-78.15.1+git33.2-1.21.1.jolla.aarch64/gfx/gl/
    GLContext.cpp:274
#1  0x0000007ff2909ad0 in mozilla::gl::GLContextEGL::GLContextEGL
    (this=0x7ed41a2810, egl=0x7ed41a25c0, flags=, caps=..., 
    isOffscreen=, config=0x0, surface=0x5555ba33a0,
    context=0x7ed4004e40)
    at /usr/src/debug/xulrunner-qt5-78.15.1+git33.2-1.21.1.jolla.aarch64/gfx/gl/
    GLContextProviderEGL.cpp:472
#2  0x0000007ff29110b8 in mozilla::gl::GLContextProviderEGL::
    CreateWrappingExisting (aContext=0x7ed4004e40, aSurface=0x5555ba33a0, 
    aDisplay=) at /home/abuild/rpmbuild/BUILD/
    xulrunner-qt5-78.15.1+git33.2/obj-build-mer-qt-xr/dist/include
    /mozilla/cxxalloc.h:33
[...]
#26 0x0000007fef65b89c in ?? () from /lib64/libc.so.6
(gdb) 
The crucial difference appears to be that in CompositorOGL::CreateContext() on ESR 91 GLContextProviderEGL::CreateHeadless() is being called whereas on ESR 78 it's embedlite::nsWindow::GetGLContext.

However there's a crucial part inside this CreateContext() method that's the same in both versions and looks like this:
  void* widgetOpenGLContext =
      widget ? widget->GetNativeData(NS_NATIVE_OPENGL_CONTEXT) : nullptr;
The decision about whether to go headless or not is dependent on whether this returns null or not. We're honing in. The embedlite::nsWindow::GetNativeData() method looks like this:
void *
nsWindow::GetNativeData(uint32_t aDataType)
{
  LOGT("t:%p, DataType: %i", this, aDataType);
  switch (aDataType) {
    case NS_NATIVE_SHAREABLE_WINDOW: {
      LOGW("aDataType:%i\n", aDataType);
      return (void*)nullptr;
    }
    case NS_NATIVE_OPENGL_CONTEXT: {
      MOZ_ASSERT(!GetParent());
      return GetGLContext();
    }
    case NS_NATIVE_WINDOW:
    case NS_NATIVE_DISPLAY:
    case NS_NATIVE_PLUGIN_PORT:
    case NS_NATIVE_GRAPHIC:
    case NS_NATIVE_SHELLWIDGET:
    case NS_NATIVE_WIDGET:
      LOGW("nsWindow::GetNativeData not implemented for this type");
      break;
    case NS_RAW_NATIVE_IME_CONTEXT:
      return NS_ONLY_ONE_NATIVE_IME_CONTEXT;
    default:
      NS_WARNING("nsWindow::GetNativeData called with bad value");
      break;
  }

  return nullptr;
}
This is one of the methods explicitly highlighted to me by Raine in our call yesterday. We checked it at the time (I even wrote about it) and decided the correct aDataType was being passed in. When I check it today that's still the case. Inside this method the active ingredient is the call to GetGLContext() which looks like this:
GLContext*
nsWindow::GetGLContext() const
{
  LOGT("this:%p, UseExternalContext:%d", this, sUseExternalGLContext);
  if (sUseExternalGLContext) {
    void* context = nullptr;
    void* surface = nullptr;
    void* display = nullptr;
    if (mWindow && mWindow->GetListener()->RequestGLContext(context, surface,
      display)) {
      MOZ_ASSERT(context && surface);
      RefPtr mozContext = GLContextProvider::CreateWrappingExisting
                                     (context, surface, display);
      if (!mozContext || !mozContext->Init()) {
        NS_ERROR("Failed to initialize external GL context!");
        return nullptr;
      }
      return mozContext.forget().take();
    } else {
      NS_ERROR("Embedder wants to use external GL context without actually providing it!");
    }
  }
  return nullptr;
}
When I step through this with ESR 91 I can see that this isn't returning anything useful, in the first case because sUseExternalGLContext is set to false. If I force the value to be true, the other parts step through okay.
Thread 35 "Compositor" hit Breakpoint 1, mozilla::embedlite::nsWindow::
    GetNativeData (this=0x7f88778610, aDataType=12)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/mobile/sailfishos/
    embedshared/nsWindow.cpp:180
180     {
(gdb) n
181       LOGT("t:%p, DataType: %i", this, aDataType);
(gdb) n
182       switch (aDataType) {
(gdb) n
189           return GetGLContext();
(gdb) s
mozilla::embedlite::nsWindow::GetGLContext (this=this@entry=0x7f88778610)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/mobile/sailfishos/
    embedshared/nsWindow.cpp:413
413       LOGT("this:%p, UseExternalContext:%d", this, sUseExternalGLContext);
(gdb) n
414       if (sUseExternalGLContext) {
(gdb) p sUseExternalGLContext
$1 = false
(gdb) set sUseExternalGLContext = true
(gdb) p sUseExternalGLContext
$2 = true
(gdb) n
415         void* context = nullptr;
(gdb) n
416         void* surface = nullptr;
(gdb) n
417         void* display = nullptr;
(gdb) n
418         if (mWindow && mWindow->GetListener()->RequestGLContext(context,
                surface, display)) {
(gdb) n
420           RefPtr mozContext = GLContextProvider::
                CreateWrappingExisting(context, surface, display);
(gdb) n
421           if (!mozContext || !mozContext->Init()) {
(gdb) p mozContext
$3 = {mRawPtr = 0x7eb0111680}
(gdb) n
mozilla::layers::CompositorOGL::CreateContext (this=this@entry=0x7eb0002f10)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/layers/opengl/
    CompositorOGL.cpp:234
234       if (widgetOpenGLContext) {
(gdb) 
Having forced this change in the debugger, now when I continue the browser runs without crashing. Still no rendering, but no crashes either.

Once again it looks like the underlying reason is due to the BoolVarCache config values that need to be changed to static preferences. These are set in sailfish-browser like this:
    // Use external Qt window for rendering content
    webEngineSettings->setPreference(QString("gfx.compositor.external-window"),
                                     QVariant(true));
    webEngineSettings->setPreference(QString("gfx.compositor.clear-context"),
                                     QVariant(false));
    webEngineSettings->setPreference(QString("gfx.webrender.force-disabled"),
                                     QVariant(true));
    webEngineSettings->setPreference(QString("embedlite.compositor.external_gl_context"),
                                     QVariant(true));
    webEngineSettings->setPreference(QString("embedlite.compositor.request_external_gl_context_early"),
                                     QVariant(true));
I need to make sure these values are reflected in the gecko code that I changed. I've updated both embedlite.compositor.external_gl_context and embedlite.compositor.request_external_gl_context_early so that they're true when previously they were set to false.

They're only small changes, but I'll need to perform a rebuild to establish their full effects. Which means that this will be all the changes for today. We'll see whether this has had any positive effect tomorrow.

If you'd like to read more about all this gecko stuff, do take a look at my full Gecko Dev Diary.
Comment
10 Oct 2023 : Day 55 #
Yesterday I started looking at the rendering pipeline — currently broken — to try to make headway on task #1020. I posted a message on the mozilla graphics matrix channel in the evening and although I did get a response, so far nothing that's going to move things forwards.

So this morning I'm working through the D75055 changeset again. My hunch is that this holds the key. I'm going to write up my thoughts as I work through it.

However, this entire task does highlight a problem with this diary format. Up until now I've mostly had to tackle small incremental and well-defined changes. Rarely did I hit anything that couldn't be solved in one day, two days at most. That works well for a daily diary.

But if this task is going to require "deep thought"/"head scratching" for multiple days there will be very little to show for it.

So you'll have to forgive me if these diary entries turn into inarticulate mind dumps for a bit. I'm certain that writing out my thoughts can help crystallise my ideas, but whether they make sense or not to anyone else... well, That's to be seen.

So, back to rendering.

There are a few key files in this process, which you can see in the change set.

Core to all this is the GLContext class defined in GLContext.h and GLContext.cpp. These files provide base functionality for the graphics context.

Then we have the GLContextEGL class defined in GLContextEGL.h which inherits from GLContext. Just to add confusion the implementation for this subclass is provided in GLContextProviderEGL.cpp (there is no GLContextEGL.cpp).

The other crucial files for Sailfish OS are GLScreenBuffer.h and GLScreenBuffer.cpp. As you might expect these define a class called GLScreenBuffer in ESR 78. It looks like this provided the surfaces to render on and also the double-buffering which allowed gecko to render off-screen, switch buffers to show the result on-screen, then rinse and repeat.

This is where the big change happened. If you check the diff for the file between ESR 78 and ESR 91, you'll notice that this GLScreenBuffer class was completely removed. In its place we find a class called SwapChain.

But while GLScreenBuffer did all of this exciting stuff with surfaces, in contrast SwapChain is really sparse. It's sparse enough that I can list the entire definition of the class here without breaking a sweat.
class SwapChain final {
  friend class SwapChainPresenter;

 public:
  UniquePtr mFactory;
  bool mPreserve = false;

 private:
  std::queue> mPool;
  std::shared_ptr mFrontBuffer;

 public:
  std::shared_ptr
      mPrevFrontBuffer;  // Hold this ref while it's in-flight.
 private:
  SwapChainPresenter* mPresenter = nullptr;

 public:
  SwapChain();
  virtual ~SwapChain();

  void ClearPool();
  const auto& FrontBuffer() const { return mFrontBuffer; }
  UniquePtr Acquire(const gfx::IntSize&);

  const gfx::IntSize& Size() const;
  const gfx::IntSize& OffscreenSize() const;
  bool Resize(const gfx::IntSize & size);
  bool PublishFrame(const gfx::IntSize& size) { return Swap(size); }
  void Morph(UniquePtr newFactory);

private:
  // Returns false on error or inability to resize.
  bool Swap(const gfx::IntSize& size);
};
Compare that to the GLScreenBuffer monstrosity from ESR 78. So one of the main questions I think I need to answer is "how should I make use of the SwapChain class? Is it a replacement for GLScreenBuffer? If it is, do I need to patch a SwapChain instance into GLContext in the same way GLScreenBuffer was or should it be stored somewhere else?"

Okay, that's actually several questions. I'll have to post all of this up on Matrix.

Before I do that I want to write up my meeting with Raine. As I've mentioned previously, Raine was the lead on the Gecko project while I was there and is super-knowledgeable about how it works; far more knowledgeable than I am. So if anyone can help with this, he can.

We discussed the patches I've already applied and the ones I've not applied but which may be necessary for the rendering pipeline. We also checked a few things to see what was — and wasn't — happening at run time.

This was a really helpful conversation.

We started out by looking at CompositorOGL::CreateContext() and whether this is being called. This is one of the early stages of the rendering pipeline which sets up the widget and rendering context, so it's a useful sanity check. We ran the debugger during the call to check it and yes, it is being called. Here's the call stack.
Thread 33 "Compositor" hit Breakpoint 1, mozilla::layers::CompositorOGL::
    CreateContext (this=this@entry=0x7eb8002e90)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/layers/opengl/
    CompositorOGL.cpp:227
227	already_AddRefed CompositorOGL::CreateContext() {
(gdb) bt
#0  mozilla::layers::CompositorOGL::CreateContext (this=this@entry=0x7eb8002e90)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/layers/opengl/
    CompositorOGL.cpp:227
#1  0x0000007fba7c9020 in mozilla::layers::CompositorOGL::Initialize
    (this=0x7eb8002e90, out_failureReason=0x7f002fd580)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/layers/opengl/
    CompositorOGL.cpp:389
#2  0x0000007fba8dea5c in mozilla::layers::CompositorBridgeParent::
    NewCompositor (this=this@entry=0x7f8879e5b0, aBackendHints=...)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/layers/ipc/
    CompositorBridgeParent.cpp:1493
#3  0x0000007fba8e9ad8 in mozilla::layers::CompositorBridgeParent::
    InitializeLayerManager (this=this@entry=0x7f8879e5b0, aBackendHints=...)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/layers/ipc/
    CompositorBridgeParent.cpp:1436
#4  0x0000007fba8e9c08 in mozilla::layers::CompositorBridgeParent::
    AllocPLayerTransactionParent (this=0x7f8879e5b0, aBackendHints=..., aId=...)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/layers/ipc/
    CompositorBridgeParent.cpp:1546
#5  0x0000007fba27e828 in mozilla::layers::PCompositorBridgeParent::
    OnMessageReceived (this=0x7f8879e5b0, msg__=...) at
    PCompositorBridgeParent.cpp:1285
This all looks okay. Raine also noted that there's an important change made by Patch 0031 "Create EmbedLiteCompositorBridgeParent in CompositorManagerParent (part 2)". This switches out a call to gecko's CompositorBridgeParent constructor for a call to the EmbedLiteCompositorBridgeParent constructor instead. This is crucial because it essentially links the compositor and EmbedLite together. Without this there'll be no rendering.

As it happens I've not applied this patch. So this is the first thing to fix. It's only a small patch, so should be easy, but crucial.

We also looked at patch 0039 "Do not create CreateFallbackSurface". It's not clear how crucial this is right now. Some time ago Adam Pigg made some changes to the rendering pipeline to get it to work on native ports. We think the patch may have been part of these changes. Now while the changes were only needed for native ports (so not needed on the Xperia 10 II I'm testing on for example), there were also changes required to QtMozEmbed. It's possible that having the QtMozEmbed change without this patch will result in problems. Better to apply the patch.

Happily it's another self-contained patch, so should be easy to apply. For the same reason I should certainly apply patch 0038 "Fix mesa egl display and buffer initialisation" as well.

We checked that GetNativeData() is being called and that the values its being called with are correct:
$ MOZ_LOG="EmbedLiteTrace:5" sailfish-browser 2>&1 | grep GetNativeData
[Parent 16595: Compositor]: D/EmbedLiteTrace TRACE::virtual void*
  mozilla::embedlite::nsWindow::GetNativeData(uint32_t):181 t:74fc788480,
  DataType: 12
[Parent 16595: Unnamed thread 74fc0020b0]: D/EmbedLiteTrace TRACE::virtual void* mozilla::embedlite::EmbedLitePuppetWidget::GetNativeData(uint32_t):114 t: 74fc4528d0,
  DataType: 3
This is called as part of the CompositorOGL constructor, so the fact it's being called is a good sign. The DataType from the debug output is a reference to NS_NATIVE_OPENGL_CONTEXT defined in nsIWidget.h as the value 12, which is correct. So that's also promising.

When the browser screen comes up the Sailfish Browser code performs a dummy swap to render the screen. This is certainly happening because as Raine explained without this you don't even get the native QML user interface rendered, which it is. However, apart from the lack of the above patches, there are multiple other reasons why the render might not be running beyond that.

For example, we have various checks in place which intentionally pause or block rendering. These include SetIsActive and the SuspendRendering/ResumeRendering states, all of which need to be set appropriately (to one of true or false). Raine recommended that I go through and force all of these to the correct states.

When the browser runs it will usually toggle these states for performance reasons, for example if the browser goes into the background. But right now performance isn't our problem, rendering is, so better to make sure none of these are blocking rendering unintentionally.

Raine also noted that mUseExternalGLContext should be set to true in EmbedLiteCompositorBridgeParent. Previously this was set by a BoolVarCache value. But you may recall I stripped the code of all of these way back on Day 17. I set them all to hard coded values with the intention of switching them out for static prefs later. As part of this I had to choose what I thought were sensible fixed values to set them to.

Well, it turns out for this one I made the wrong choice! So I'll need to flip this to true. Small things can make a huge difference.

There was other useful advice too. The browser and the WebView use slightly different rendering techniques. In particular the WebView uses what's called off-screen rendering, which the browser doesn't make use of. Raine felt that it was quite possible the SwapChain changes I've made would affect the WebView but not the Browser. If so, getting the browser to render may be easier than I thought (although all of this will have to be fixed for the WebView as well, but that can come later).

Once I've made these changes, when running the browser I should check for the following debug output:
=============== Preparing offscreen rendering context ===============
That's an indicator that the WebView rendering approach is in use. I should therefore check that this isn't appearing for the browser.

And as a final thought, Raine noted that when initially testing the rendering, I should start with the about:blank page and move on to the about:license page since these are text-only, so will put the least strain possible on the system. Once they're working, then it makes sense to check with more complex sites.

So there you have it. Lots of really useful advice. I'm going to take a bit of time now to apply the changes and perform the checks Raine suggested. Rest assured I'll come back and tell you about the results!

First I'm applying patch 0031:
$ git am --3way ../rpm/0031-sailfishos-gecko-Create-EmbedLiteCompositorBridgePar.patch
Applying: Create EmbedLiteCompositorBridgeParent in CompositorManagerParent (part 2). JB#50505
Using index info to reconstruct a base tree...
M       gfx/layers/ipc/CompositorManagerParent.cpp
Falling back to patching base and 3-way merge...
Auto-merging gfx/layers/ipc/CompositorManagerParent.cpp
Patch 0038 doesn't go quite so smoothly:
$ git am --3way ../rpm/0038-sailfishos-embedlite-egl-Fix-mesa-egl-display-and-bu.patch
Applying: Fix mesa egl display and buffer initialisation
Using index info to reconstruct a base tree...
M       gfx/gl/GLContextProviderEGL.cpp
M       gfx/gl/GLContextProviderImpl.h
M       gfx/gl/GLLibraryEGL.cpp
M       gfx/gl/GLLibraryEGL.h
Falling back to patching base and 3-way merge...
Auto-merging gfx/gl/GLLibraryEGL.h
CONFLICT (content): Merge conflict in gfx/gl/GLLibraryEGL.h
Auto-merging gfx/gl/GLLibraryEGL.cpp
CONFLICT (content): Merge conflict in gfx/gl/GLLibraryEGL.cpp
Auto-merging gfx/gl/GLContextProviderImpl.h
Auto-merging gfx/gl/GLContextProviderEGL.cpp
CONFLICT (content): Merge conflict in gfx/gl/GLContextProviderEGL.cpp
error: Failed to merge in the changes.
Patch failed at 0001 Fix mesa egl display and buffer initialisation
hint: Use 'git am --show-current-patch=diff' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
So I had to apply Patch 38 "Fix mesa egl display and buffer initialisation" manually, but as it's such a small patch it was relatively straightforward to do. Patch 39 "Do not create CreateFallbackSurface" on the other hand wouldn't apply at all. Checking it manually it appears a huge amount of gecko code related to it has been rearranged or removed entirely.

Right now I feel absolutely exhausted. That's nothing to do with gecko and more to do with a long day at work followed by a long commute. So I don't have the energy to look into Patch 39 further tonight.

Having applied as many of these patches as practically possible I've set the build to run overnight again. Feels like old times.

If you'd like to read more about all this gecko stuff, do take a look at my full Gecko Dev Diary.
Comment
9 Oct 2023 : Day 54 #
I didn't mention it yesterday, but while putting together the OBS build I realised I'd not submitted the PR for the nspr update the build needs. It took just a few hours for mal to check, test and merge the PR; so I extend much gratitude and thanks to him for the fast turnaround.

This PR spawned some interesting talk on IRC. A couple of days ago I mentioned in my diary that the build took 7 hours 36 minutes and 10.3 seconds. This provoked Nico, direc85 and mal to consider the reasons for it taking so long. Nico highlighted the fact that this is much longer than for a normal i486 Firefox build.
 
Nico: What computer are you using anyway, that it takes so long to build gecko? :D Nico: Because I am used to a firefox build taking less than an hour, usually more around 30 minutes :D

That's a huge difference. As I noted in the discussion, just the linking step takes 20 minutes for a Gecko building targeting aarch64 on Sailfish OS (although if you're read my missive from Day 52, you'll know that this requires some caveats).

During the discussion some of the reasons for the longer build time became apparent. First of all, builds simply do take longer using the Sailfish SDK, especially when Rust is involved. This is something that direc85 has experienced through his work with rubdos on Whisperfish. Here's how he describes the current situation for the Whisperfish build pipeline:
 
direc85: CI is also single threaded. The host compilation takes 7 minutes (not in SFDK, so it's threaded), but armv7hl and aarch64 take around 38 minutes and i486 106 minutes (because it can't use sccache for reasons).

The issue isn't just for local builds using sfdk either, as mal pointed out.
 
mal: Nico: even on jolla obs the build takes quite a while, about 1.5 hours for x86 and longer for arm and aarch64 mal: Nico: quite likely the issues come somehow from scratchbox2 used for arm and aarch64 builds

There's always going to be a discrepancy with cross-compilation, because some of the build will be happening under emulation using QEMU. But much of that is supposed to be abstracted so that native tools are used where possible (for example compiling using clang or gcc and linking using ld). But as direc85 alludes to with his comment, it's also because we're having to perform builds sequentially, a single job at a time, rather than scaling it up with the number of processors.
 
direc85: With Whisperfish — a Rust application — we also must use a single thread when compiling, or the compiler is likely to hang. Running something like "taskset 0x555555 cargo -j8" (pin it for 12 cores and use 8) helps somewhat, but it still hangs almost every time direc85: I'm not sure if it's the same underlying issue, but back in 2006-ish in uni when we made apps for Nokia N810 (with C or C++) we were also forced to use -j1 to prevent hanging. So I have a long history with that one :

For those — like me — who aren't unfamiliar with it, the taskset command allows you to execute a particular process (in this case cargo) so that it only runs using specific CPUs cores, otherwise known as the CPU affinity.

So there is a problem that seems to afflict Rust builds in particular (gecko includes a lot of Rust components) that means that if you try to run them with more than one job, there's a high likelihood the build will hang and have to be restarted.

During the discussion direc85 and Nico had many useful suggestions, for example around futexes and unimplemented Rust features, which could be related. But as yet there is no clear cause and the issue persists.

There's definitely some interesting work to be done around this. I have my own nascent theories about what might be causing the problem, but they're too poorly defined to expand on here. If anyone has thoughts about how to fix this and would like to do some investigation, the whole Sailfish ecosystem (including Jolla's build pipeline) would benefit from a solution. I did create an issue related to this for ESR 91, although it probably deserves a more global issue lodged against scratcbox2 or the SDK tooling.

That's a bit of an aside, but an interesting one I hope.

Today I'm looking at the gecko rendering pipeline. Right now the problem of how to get the rendering pipeline to work remains mysterious, but I do have three leads:
  1. Check the code changes I had to make in relation to switching GLScreenBuffer for SwapChain. This is almost certainly a contributing factor to the problem, although how to fix it is another matter.
  2. Check the code changes I made in relation to EglDisplay. Although less likely to be the issue than GLScreenBuffer, it's quite likely that both will need fixing.
  3. Back on Day 18 I received some great advice from Fabrice, who recommended that I get in touch with the Mozilla graphics team on Matrix. Now would seem like a good time to go that route.
  4. Finally I have a meeting arranged with Raine from Jolla in a couple of days. As I've explained before, Raine is the master of all things Sailfish OS Gecko, so if anyone can help, he can.
[...]

I have asked my question on the gfx-firefox Matrix channel.
 
Hello all. I'm currently upgrading the gecko-based browser on Sailfish OS from ESR 78 to ESR 91 (we're a bit behind). We previously used GLScreenBuffer, but this has been replaced with SwapChain. I've read through the changes, but am pretty lost. Would this be a good place to get help on this? I'm trying to understand how to make use of SwapChain as a replacement for how we were using GLScreenBuffer.

Hopefully that will get a response. But this query is a bit open-ended; I think I might get a more useful response if I can formulate a more precise question as well.

However unfortunately it looks like that's all I'm going to have time for today. I'll spend tomorrow poring over the rendering pipeline code to see what I can make of it. If I'm going to make progress I'll also need to actually debug the running code, to see which parts it's touching. After all I may have the completely wrong idea about this.

More tomorrow!

If you'd like to read more about all this gecko stuff, do take a look at my full Gecko Dev Diary.
Comment
8 Oct 2023 : Day 53 #
When I wrote my diary entry yesterday I spent the first couple of paragraphs talking about Autumn. I've previously mentioned how the transition between seasons is the most exciting time of year for me. It occured to me this morning that this project is moving through its own transition right now: from Stage 1 to Stage 2.

This was always planned as being a three-stage project. As I wrote way back in the preamble to these diaries, the three stages were intended to look like this:
  1. Apply a minimal set of changes and patches to get ESR 91 to build.
  2. Apply any remaining patches where possible and other changes to get it to run and render.
  3. Handle the Sailfish OS specific integrations.
It was already about a week ago on Day 45 that I declared Stage 1 to be complete. Yesterday I talked about tackling the render pipeline as being one of the next things I'd planned to work on. But right now we're sort-of sitting in between the transition from Stage 1 to Stage 2.

This has taken the form of me writing out a load of issues to the sailfish-browser bug-tracker as I did yesterday. And it's continuing today with me setting up various projects on OBS. For those that don't know, OBS is the Open Build Service provided by Jolla and used by Sailfish OS, which is essentially the Sailfish OS continuous delivery service. All of the packages that make up Sailfish OS are built on OBS. Jolla make a Community OBS available which members of the community like me can submit their own software to, so that it can be built against the Sailfish OS packages.

That includes building updated versions of the existing packages, which is what I plan to set up today if I can.

There are various glitches that may prevent this from working. For example, I'll need to create packages for everything that I've changed, even changes to the build tooling. I'm not sure whether it's going to be possible to have the build-tooling built on the system then used by the system. Probably not, but maybe I can work around that. The remote OBS isn't configured quite the same way as my local build system, so that may also cause problems. Finally I've only ever built the aarch64 version of Gecko. Building for other targets (arm32 and i486) is also likely to introduce problems that until now I've not had to worry about.

The following are the packages I'll likely need to add to the project:
  1. nspr, 4.35.0
  2. xulrunner-qt5, 91.9.1
  3. qtmozembed-qt5, 1.53.25
  4. embedlite-components-qt5, 1.24.34
  5. sailfish-components-webview, 1.5.17
  6. sailfish-browser, 2.2.61
  7. mapplauncherd-booster-browser, 0.2.1
The following have updated versions, but because these have already been merged in I'm hoping I won't need to add these to the OBS project. That should become clearer after OBS has attempted a full build:
  1. icu, 0.19.0
  2. rust-cbindgen, 0.19.0
  3. gcc, 8.3.0-7 (patched)
As things stand I've added these to a gecko-esr91 project I've set up on OBS and things are building away. OBS claims it can't build qtmozembed because xulrunner is the incorrect version, but I'm hoping that will be resolved once xulrunner has been successfully built. As I write this the xulrunner (gecko-dev) package is stil marked as blocked, but hopefully that's just because it's waiting for a runner to become available.

So that's it. I'm up-to-date with my untechnical debt. I can't put it off any longer: I'm going to have to have a go at tackling the rendering pipeline issue.

I've assigned myself the issue. No turning back.

We're now in Stage 2.

If you'd like to read more about all this gecko stuff, do take a look at my full Gecko Dev Diary.
Comment
7 Oct 2023 : Day 52 #
As I mentioned yesterday the view outside my window has become thoroughly autumnal. After a night of solid rain the situation hasn't changed. There's a real feeling of anticipation in the air as the world starts preparing for the winter months. Work on gecko also reached a point of anticipation yesterday as the browser stack — gecko, qtmozembed, embedlite-components, sailfish-components-webview, sailfish-browser — now starts up and continues running without crashing. There's no rendering, so it's completely useless as a browser, but that goal looks ever closer.

Of the five components mentioned, two of them (embedlite-components and sailfish-components-webview) remain as-yet unchanged. The others I've had to make changes to in order to adjust to the ESR 91 API and changes in the underlying configuration.

My next steps are therefore to get these changes pushed to the repositories (not in the main branches, but so that they're publicly accessible) and built using OBS. This will then open the way for others to contribute if they want to.

If you've been following along you'll also know that I took numerous shortcuts to get to this point. Breaking the rendering pipeline was just one of them. The time has now come for me to collate all of these issues in the public sailfish-browser issue tracker on GitHub. That'll allow for a more structured approach to addressing the issues, as well as making it easier for others to pick up tasks to contribute to in case anyone fancies it.

I've no idea whether anyone will have the time or inclination, but that's not really the point. While it would be great if others were to, the important thing is to make this as open and accessible as possible.

So this morning I'm committing and pushing. Here are the results of these efforts:
  1. gecko-mirror patches.
  2. gecko-dev.
  3. qtmozembed.
  4. sailfish-browser.
Time works in strange ways when you're writing a diary like this. It's quite late now and I've spent the evening combing through the various commits I've been making to gecko and its related projects to see how many of them left lingering tasks behind.

It turns out, there's quite a lot of unfinished business. I've written up all of the ones I could find and added them to the sailfish-browser issue tracker. I've given each of them an ESR 91 tag to make them easier to find, but here also is the list in full:
  1. Set up ESR 91 OBS project build: #1016.
  2. Check active flag still works on ESR 91: #1017.
  3. Check effect of user interaction flags on GoBack and GoForward for ESR 91: #1018.
  4. Check whether GetChromeOuterWindowID() returns the correct value: #1019.
  5. Fix ESR 91 rendering pipeline: #1020.
  6. Check whether Fission methods are needed for ESR 91: #1021.
  7. Check whether MessageChannel is broken on ESR 91: #1022.
  8. Check unapplied patches on ESR 91: #1023.
  9. Restore WebRTC code to ESR 91: #1024.
  10. Restore build processes for ESR 91: #1025.
  11. Convert ESR 91 gecko-mirror commits to patches: #1026.
  12. Convert VarCache values to static prefs for ESR 91: #1027.
  13. Restore ESR 91 build optimisation: #1028.
  14. Check Qt theming in ESR 91: #1029.
  15. Fix Qt printing pipeline for ESR 91: #1030.
  16. Build ESR 91 against newer Linux kernel headers: #1031.
  17. Avoid compiler crash when building swgl optimised for ESR 91: #1032.
  18. Enable debuginfo for ESR 91 Rust components: #1033.
  19. Check browser functionality with ESR 91: #1034.
I've also created an ESR 91 milestone and added them all to that too.

It's taken quite a while to write all of these up (not that I've done a particularly good job, but it's the quantity). So no time for any actual code development today.

The next step, which will be for tomorrow, will be to set up the build on OBS. After that, it's the rendering pipeline.

Before I sign off for today, let me just throw some timings out there, primarily for the benefit of Nico. During a conversation with Nico I mentioned that the link time for gecko using the Sailfish SDK was around 20 minutes. But when I said that it was just a guess and I promised to time it to check.

This turned out to be a misrepresentation, although arguably an understandable one. The actual link time is just 3 minutes and 20.32 seconds. That's a lot quicker than I claimed. However the final packaging step — that's the time needed to package everything up into rpms — takes 13 minutes 26.26 seconds. In between the two there's 3 minutes 45.91 seconds of general jiggery pokery &mash; performing checks, marshalling files, printing out information — that also needs to complete. So from the time the linking starts to the time the packages are actually spat out (which is the point at which the build finishes) is 19 minutes and 32.49 seconds.

Alright, that's enough for today. If you'd like to read more about all this gecko stuff, do take a look at my full Gecko Dev Diary.
Comment
6 Oct 2023 : Day 51 #
It's most definitely autumn now. The trees and bushes outside are still green, but covered with bright red berries and fruit. The birds are making the most almighty racket in the trees, preparing for their annual migration. I do love this time of year. The nights are longer and darker and very soon the leaves will also start their migration through fiery colours and then from branch to floor.

When I'm not staring out of the window contemplating autumn you'll find me staring at my laptop screen, unaffected by the changing seasons. It looks very much as it did when I started this process. But things have been progressing even if this isn't reflected in any changes onscreen.

If you've been following along you'll know that yesterday I got to the point where ESR 91 was running on my phone. I appreciate all your comments on the forum including the suggestions for how to work around the C++17 issue with the MOC. I plan to try them out in due course.

The fact the executable run is a pretty exciting step, but also comes with major caveats. It was running without crashing, but it wasn't rendering anything. It could only be coaxed into running beyond the first moments of execution with some nasty symlink hacks so it could find the library and using some pretty intrusive debugger surgery.

That leaves us with two immediate issues to address. The first is ensuring it looks in the correct folder for the libraries so it can actually run. The second is ensuring it doesn't immediately crash by forcing the WEBRENDER_SOFTWARE graphics variable to false (which is what the gdb-hackery was all about).

I'm going to focus on the second first, because it's the more interesting.

Some notes worth recording are the fact that the enum list of variable names can be found in the gfxFeature.h header file. Gecko has this wonderful pre-processor macro approach of dealing with setting enums and names. It's obfuscatory, but I like it.

Then the work of collating and managing these configuration variables happens in gfxConfig.h. Finally the code that needs to use the value (which is the point where the debugger currently comes in useful) is in the InitWebRenderConfig() method which can be found in the gfxPlatform.cpp file.

While combing through the code this morning before work I found this method inside gfxFeature.cpp that looks like it's going to read the default state of the variable from a settings file.
void FeatureState::SetDefaultFromPref(const char* aPrefName, bool aIsEnablePref,
                                      bool aDefaultValue,
                                      Maybe aUserValue) {
  bool baseValue =
      Preferences::GetBool(aPrefName, aDefaultValue, PrefValueKind::Default);
  SetDefault(baseValue == aIsEnablePref, FeatureStatus::Disabled,
             "Disabled by default");

  if (aUserValue) {
    if (*aUserValue == aIsEnablePref) {
      nsCString message("Enabled via ");
      message.AppendASCII(aPrefName);
      UserEnable(message.get());
    } else {
      nsCString message("Disabled via ");
      message.AppendASCII(aPrefName);
      UserDisable(message.get(), "FEATURE_FAILURE_PREF_OFF"_ns);
    }
  }
}
Following this lead takes me to gfxConfigManager::Init() where we can see that it's possible to forcefully disable Web Render in full using an environment variable. Here's the line in this method:
  mWrEnvForceDisabled = gfxPlatform::WebRenderEnvvarDisabled();
And this is what it's calling (this can be found in gfxPlatform.cpp:
/*static*/
bool gfxPlatform::WebRenderEnvvarDisabled() {
  const char* env = PR_GetEnv("MOZ_WEBRENDER");
  return (env && *env == '0');
}
And this environment variable does seem to work in practice. If I run the browser like this, it will execute and stay executing:
MOZ_WEBRENDER=0 sailfish-browser
That's good, but I want a permanent solution based on a preference value that I can set. Further up in the same Init() method is this line which looks like it should have a similar effect: mWrForceDisabled = StaticPrefs::gfx_webrender_force_disabled_AtStartup(); Let's follow this path a bit too. This is defined in StaticprefList_gfx.h and associated with the gfx.webrender.force-disabled preference. If we can set that in a configuration file, we might be done.

There are a bunch of default preference that can be found in ~/.local/share/org.sailfishos/browser/.mozilla/prefs.js and if I add the preference there like this:
user_pref("gfx.webrender.force-disabled", true);
Then that also allows the browser to run. But as it warns at the top of this file, this isn't the place to put it:
// Mozilla User Preferences

// DO NOT EDIT THIS FILE.
//
// If you make changes to this file while the application is running,
// the changes will be overwritten when the application exits.
//
// To change a preference value, you can either:
// - modify it via the UI (e.g. via about:config in the browser); or
// - set it within a user.js file in your profile.
So we really want to put it elsewhere. The correct place seems to be in sailfish-browser. Either in the data/prefs.js file, or set in code in the DeclarativeWebUtils::setRenderingPreferences() method. I think I'm actually going to go for both of those for good measure.

Having made those changes I've built myself some fresh-looking sailfish-browser rpms. The sailfish-browser package essentially provides the user interface to the browser engine. It's not as big as Gecko, but it's still a large piece of software. It takes a good five minutes to build from scratch. I'll need to transfer the new packages over to my phone to test them out. The gecko rpms that were building overnight have also completed, so I'll need to install those as well.
$ cd sailfish-browser
$ sfdk -c snapshot=temp -c no-pull-build-requires build -d -p
$ scp sailfish-browser-*.prm defaultuser@10.0.0.116:~/Documents/Development/gecko/
$ ssh defaultuser@10.0.0.116
$ cd ~/.local/share/org.sailfishos/browser
$ mv .mozilla .mozilla.bak
$ cd ~/Documents/Development/gecko/
$ devel-su rpm -U sailfish-browser-2.*.rpm sailfish-browser-settings-2.*.rpm
$ sailfish-browser
After this change the browser now starts up and continues running without errors. Astonishingly (to me) it even shuts down without errors as well. Of course it's still not doing any rendering.

Before moving on to the rendering we need to get it finding the correct location of the library. In other words, it needs to look in /usr/lib64/xulrunner-qt5-91.9.1/ rather than /usr/lib64/xulrunner-qt5-78.15.1/. This was an issue we were seeing yesterday.

One possibility is that rebuilding sailfish-browser against the new gecko library may have already fixed the issue. Worth checking like this:
$ devel-su rm /usr/lib64/xulrunner-qt5-78.15.1
$ sailfish-browser
The results are good! The browser now picks up libxul.so from the intended location in /usr/lib64/xulrunner-qt5-91.9.1/.

That resolves all of the immediate non-rendering-related problems. But I do still have one further task to do before moving on to this rendering issue. It's that pesky untechnical-debt again.

In particular, I need to transfer all of the issues that have been collecting during these last fifty one days and register them on the sailfish-browser bug tracker. But that's going to have to wait until tomorrow.

If you'd like to read more about all this gecko stuff, do take a look at my full Gecko Dev Diary.
Comment
5 Oct 2023 : Day 50 #
It's the big five-oh. The last few days things have moved frustratingly slowly, but not through lack of effort. There have been a bunch of glitches that have just taken longer to work through than I was expecting. Which means that the gecko-qtmozembed combination still isn't quite there yet.

There's a bit of compensation today with the result that this is a longer post than usual. I apologise for this, but it is rather the nature of the development process: sometimes it goes fast, sometimes it goes slow.

The QtMozEmbed from yesterday didn't produce any obvious compilation errors, but there was an error during the moc pass that looked like this:
usr/include/xulrunner-qt5-91.9.1/mozilla/MaybeStorageBase.:16: Parse error at "mozilla"
That's a strange error. The filename extension is missing; the path appears to be relative. I guess I was expecting the detailed level of output that comes from gcc, but this isn't gcc, it's moc, which has its own more terse approach.

Right now I'm confused by what's going on. But we're going to work through the investigation.

If you're unfamiliar with what the MOC is, now might be a good time to check out my short explanation from Day 15.

From the build command and error message we can immediately see that the error is happening when the moc is trying to consume the qmozview_p.h file. Within that file we have this line:
#include <mozilla/embedlite/EmbedLiteView.h>
This line appears to be the point at which the failure occurs within the file. It I remove it, the moc pass completes successfully. This file itself has changed very little since ESR 78:
$ diff ./embedding/embedlite/EmbedLiteView.h \
  ../../gecko-dev-project/gecko-dev/./embedding/embedlite/EmbedLiteView.h
89,90c89,90
<   virtual void GoBack(bool aRequireUserInteraction, bool aUserActivation);
<   virtual void GoForward(bool aRequireUserInteraction, bool aUserActivation);
---
>   virtual void GoBack();
>   virtual void GoForward();
But that's not really the point, because the real issue isn't happening in this file. It's happening in a file included from it. Finding out these include chains is a bit of a pain (can anyone recommend a good command-line tool?), but with a bit of manual digging I came up with the following, which is at least one paths through which this problem file gets included.
qmozview_p.h 
embedding/embedlite/EmbedLiteView.h
gecko-dev/gfx/thebes/gfxRect.h
gecko-dev/gfx/2d/Rect.h
gecko-dev/mfbt/Maybe.h
gecko-dev/mfbt/MaybeStorageBase.h
At which point it hits this line inside MaybeStorageBase.h that causes the error:
namespace mozilla::detail {
This syntax of avoiding deep nesting by using the double colon :: notation was introduced in C++17:
namespace ns-name :: member-name { declarations }
Which, by the Sailfish OS Qt tooling standard is quite new. Could that be it?
$ moc --help
Usage: /usr/lib64/qt5/bin/moc [options] [header-file] [@option-file]
Qt Meta Object Compiler version 67 (Qt 5.6.3)
[...]
This file is part of Gecko and I don't want to have to do a complete rebuild to find out, so I'm going to hack around inside the SDK to edit this file manually. I do this by loading the file /usr/include/xulrunner-qt5-91.9.1/mozilla/MaybeStorageBase.h into vim from within the SDK.

Sure enough after I make the edit to split the namespace into nested blocks the moc command goes through. But I still have to try the full build. Let's see.
$ sfdk -c snapshot=temp -c no-pull-build-requires build -d -p
[...]
Wrote: /home/flypig/RPMS/SailfishOS-devel-aarch64/
       qtmozembed-qt5-tests-1.53.25+master.20231001133137.3db972e-1.aarch64.rpm
Wrote: /home/flypig/RPMS/SailfishOS-devel-aarch64/
       qtmozembed-qt5-devel-1.53.25+master.20231001133137.3db972e-1.aarch64.rpm
Wrote: /home/flypig/RPMS/SailfishOS-devel-aarch64/
       qtmozembed-qt5-debugsource-1.53.25+master.20231001133137.3db972e-1.aarch64.rpm
Wrote: /home/flypig/RPMS/SailfishOS-devel-aarch64/
       qtmozembed-qt5-1.53.25+master.20231001133137.3db972e-1.aarch64.rpm
Wrote: /home/flypig/RPMS/SailfishOS-devel-aarch64/
       qtmozembed-qt5-debuginfo-1.53.25+master.20231001133137.3db972e-1.aarch64.rpm
Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.N1oCG5
+ umask 022
+ cd /home/flypig/Documents/Development/jolla/qtmozembed
+ /bin/rm -rf /home/deploy/installroot
+ RPM_EC=0
++ jobs -p
+ exit 0
Lovely! The full QtMozEmbed build now goes through without error and generates some nice solid rpm packages. I've updated the Gecko code to properly incorporate this change. I'll need to rebuild the rpm packages again. But to avoid another multi-hour wait I should first try linking mapplauncherd-booster-browser against the new library too.

After manually installing its dependencies, including our new qtmozembed-qt5 and qtmozembed-qt5-devel rpms, the mapplauncherd-booster-browser package builds first time and without any issues. That's not so surprising, because we didn't make any changes to the QtMozEmbed API. But I'm still happy to see it.
$ sfdk -c snapshot=temp -c no-pull-build-requires build -d -p
[...]
Wrote: /home/flypig/RPMS/SailfishOS-devel-aarch64/
       mapplauncherd-booster-browser-debugsource-0.2.1-1.aarch64.rpm
Wrote: /home/flypig/RPMS/SailfishOS-devel-aarch64/
       mapplauncherd-booster-browser-0.2.1-1.aarch64.rpm
Wrote: /home/flypig/RPMS/SailfishOS-devel-aarch64/
       mapplauncherd-booster-browser-debuginfo-0.2.1-1.aarch64.rpm
Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.BSqNUV
+ umask 022
+ cd /home/flypig/Documents/Development/jolla/mapplauncherd-booster-browser
+ /bin/rm -rf /home/deploy/installroot
+ RPM_EC=0
++ jobs -p
+ exit 0
This brings us back to where we were on Day 46, but this time armed with a bag of shiny new rpm packages to install alongside our Gecko build. But first, I need to do that Gecko rebuild. That'll need a bit of time to complete.

[...]

Build complete. Now it's time to scp them all over to my phone and give it a go.
$ devel-su rpm -U --oldpackage xulrunner-qt5-91.*.rpm qtmozembed-qt5-1.53.*.rpm \
    mapplauncherd-booster-browser-0.2.*.rpm
$ sailfish-browser 
sailfish-browser: error while loading shared libraries: libxul.so: cannot open
  shared object file: No such file or directory
Once again, this isn't such a surprise. The library has moved from /usr/lib64/xulrunner-qt5-78.15.1/ to /usr/lib64/xulrunner-qt5-91.9.1/ and at the very least, some path or other likely needs updating. I'll probably need to build sailfish-browser and the other browser packages as well to avoid this. But my gecko build is currently running, so in the meantime I'll work around it like this:
mv /usr/lib64/xulrunner-qt5-91.9.1 /usr/lib64/xulrunner-qt5-78.15.1
Now when I run it, something else happens.
$ sailfish-browser
[D] unknown:0 - Using Wayland-EGL
library "libGLESv2_adreno.so" not found
library "eglSubDriverAndroid.so" not found
greHome from GRE_HOME:/usr/bin
libxul.so is not found, in /usr/bin/libxul.so
libxul.so is not found, in /usr/lib64/xulrunner-qt5-91.9.1/libxul.so return fail
Couldn't load XPCOM from 
[F] unknown:0 - ASSERT failure in QMozContextPrivate::QMozContextPrivate(QObject*):
    "Failed load XPCOMGlue", file qmozcontext.cpp, line 67
Redirecting call to abort() to mozalloc_abort

Segmentation fault (core dumped)
There's a file and line specified for the error, which makes things easier. Let's try that in the debugger for good measure.
greHome from GRE_HOME:/usr/bin
libxul.so is not found, in /usr/bin/libxul.so
libxul.so is not found, in /usr/lib64/xulrunner-qt5-91.9.1/libxul.so return fail
Couldn't load XPCOM from 
[F] unknown:0 - ASSERT failure in QMozContextPrivate::QMozContextPrivate(QObject*):
    "Failed load XPCOMGlue", file qmozcontext.cpp, line 67
Redirecting call to abort() to mozalloc_abort


Thread 1 "sailfish-browse" received signal SIGSEGV, Segmentation fault.
0x0000007fbfb95d78 in mozalloc_abort () from /usr/lib64/libqt5embedwidget.so.1
(gdb) bt
#0  0x0000007fbfb95d78 in mozalloc_abort () from /usr/lib64/libqt5embedwidget.so.1
#1  0x0000007fbfb95d34 in abort () from /usr/lib64/libqt5embedwidget.so.1
#2  0x0000007fb7dd7bec in QMessageLogger::fatal(char const*, ...) const () from
                          /usr/lib64/libQt5Core.so.5
#3  0x0000007fb7de7f70 in qt_assert_x(char const*, char const*, char const*, int)
                          () from /usr/lib64/libQt5Core.so.5
#4  0x0000007fbfb666ec in QMozContextPrivate::QMozContextPrivate(QObject*) ()
                          from /usr/lib64/libqt5embedwidget.so.1
#5  0x0000007fbfb66808 in QMozContextPrivate::instance() () from
                          /usr/lib64/libqt5embedwidget.so.1
#6  0x0000007fbfb668b8 in QMozContext::QMozContext(QObject*) () from
                          /usr/lib64/libqt5embedwidget.so.1
#7  0x0000007fbfc5378c in SailfishOS::WebEngine::WebEngine(QObject*) () from
                          /usr/lib64/libsailfishwebengine.so.1
#8  0x0000007fbfc537fc in SailfishOS::WebEngine::instance() () from
                          /usr/lib64/libsailfishwebengine.so.1
#9  0x0000007fbfc53a10 in SailfishOS::WebEngine::initialize(QString const&, bool)
                          () from /usr/lib64/libsailfishwebengine.so.1
#10 0x0000005555583764 in _start ()
(gdb) 
Here's the line causing the problem.
    Q_ASSERT_X(LoadEmbedLite(), __PRETTY_FUNCTION__, "Failed load XPCOMGlue");
The LoadEmbedLite() function lives inside EmbedInitGlue.cpp. It's actually the method that we were playing around with yesterday. It's apparently returning false.
bool LoadEmbedLite(int argc, char** argv)
{
  // start the glue, i.e. load and link against xpcom shared lib
  std::string xpcomPath = ResolveXPCOMPath(argc, argv);
  BootstrapResult bootstrapResult = mozilla::GetBootstrap(xpcomPath.c_str());
  if (bootstrapResult.isErr()) {
    printf("Couldn't load XPCOM from %s\n", xpcomPath.c_str());
    return false;
  }
  gBootstrap = bootstrapResult.unwrap();
  return true;
}
Now although this code is part of the gecko source, it's being compiled into the QtMozEmbed code, which is why the error is happening in the libqt5embedwidget.so.1 library.

Looking through the code where this error is happening (and especially the code in ResolveXPCOMPath) and comparing it against the output from the original failed load, it becomes clear that we need the library to be stored in /usr/lib64/xulrunner-qt5-91.9.1 as well. So let's try this another way.
mv /usr/lib64/xulrunner-qt5-78.15.1 /usr/lib64/xulrunner-qt5-91.9.1
ln -s /usr/lib64/xulrunner-qt5-91.9.1 /usr/lib64/xulrunner-qt5-78.15.1
This simple change allows us to get a fair bit further.
$ sailfish-browser
[D] unknown:0 - Using Wayland-EGL
library "libGLESv2_adreno.so" not found
library "eglSubDriverAndroid.so" not found
greHome from GRE_HOME:/usr/bin
libxul.so is not found, in /usr/bin/libxul.so
Created LOG for EmbedLiteTrace
[W] unknown:0 - Unable to open bookmarks
    "/home/defaultuser/.local/share/org.sailfishos/browser/bookmarks.json"
[D] onCompleted:105 - ViewPlaceholder requires a SilicaFlickable parent
Created LOG for EmbedLite
JavaScript error: file:///usr/lib64/mozembedlite/components/
                  EmbedLiteConsoleListener.js, line 251: TypeError:
                  XPCOMUtils.generateNSGetFactory is not a function
JavaScript error: file:///usr/lib64/mozembedlite/components/
                  ContentPermissionManager.js, line 94: TypeError:
                  XPCOMUtils.generateNSGetFactory is not a function
JavaScript error: file:///usr/lib64/mozembedlite/components/
                  EmbedLiteChromeManager.js, line 226: TypeError:
                  XPCOMUtils.generateNSGetFactory is not a function
[...]
JavaScript error: resource://gre/modules/EnterprisePoliciesParent.jsm, line 500:
                  TypeError: Services.appinfo is undefined
JavaScript error: resource://gre/modules/AddonManager.jsm, line 1479:
                  NS_ERROR_NOT_INITIALIZED: AddonManager is not initialized
JavaScript error: resource://gre/modules/URLQueryStrippingListService.jsm,
                  line 42: TypeError: Services.appinfo is undefined
Created LOG for EmbedPrefs
Created LOG for EmbedLiteLayerManager
Segmentation fault (core dumped)
Running it in the debugger gives us more info.
Thread 8 "GeckoWorkerThre" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 19550]
0x0000007fbcc9390c in mozilla::embedlite::PuppetWidgetBase::Invalidate
  (this=0x7f8849b2a0, aRect=...)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/mobile/sailfishos/
    embedshared/PuppetWidgetBase.cpp:274
274         MOZ_CRASH("Unexpected layer manager type");
(gdb) bt
#0  0x0000007fbcc9390c in mozilla::embedlite::PuppetWidgetBase::Invalidate
    (this=0x7f8849b2a0, aRect=...)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/mobile/sailfishos/
    embedshared/PuppetWidgetBase.cpp:274
#1  0x0000007fbcc980b8 in mozilla::embedlite::PuppetWidgetBase::UpdateBounds
    (this=0x7f8849b2a0, aRepaint=aRepaint@entry=true)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/mobile/sailfishos/
    embedshared/PuppetWidgetBase.cpp:395
#2  0x0000007fbcca12b0 in mozilla::embedlite::EmbedLiteWindowChild::CreateWidget
    (this=0x7f88737860)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/xpcom/base/nsCOMPtr.h:851
#3  0x0000007fbcc918f8 in mozilla::detail::RunnableMethodArguments<>::applyImpl
    
    (mozilla::embedlite::EmbedLiteWindowChild*,
    void (mozilla::embedlite::EmbedLiteWindowChild::*)(),
    mozilla::Tuple<>&, std::integer_sequence)
    (args=..., m=, o=)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1151
#4  mozilla::detail::RunnableMethodArguments<>::apply
     (
    m=, o=, this=)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1154
#5  mozilla::detail::RunnableMethodImpl::Run (this=)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1201
#6  0x0000007fb9e02b60 in mozilla::RunnableTask::Run (this=0x5555d9fd50)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
[...]
#28 0x0000007fb79cf89c in ?? () from /lib64/libc.so.6
(gdb) 
The error is coming from this check here in PuppetWidgetBase.cpp:
  if (mozilla::layers::LayersBackend::LAYERS_CLIENT == lm->GetBackendType()) {
    // No need to do anything, the compositor will handle drawing
  } else {
    MOZ_CRASH("Unexpected layer manager type");
  }
Happily we can coax the debugger into giving us the value returned by lm->GetBackendType():
(gdb) p lm
$1 = (nsIWidget::LayerManager *) 0x7f8869fae0
(gdb) p lm->GetBackendType()
$2 = mozilla::layers::LayersBackend::LAYERS_WR
So PuppetWidgetBase is expecting a layer backend of type LAYERS_CLIENT, but what we actually have is a backend of type LAYERS_WR. That is, we're getting a WebRender layer manager, when what we want is a Client layer manager. Here are the relevant bits of code. First from LayersTypes.h:
enum class LayersBackend : int8_t {
  LAYERS_NONE = 0,
  LAYERS_BASIC,
  LAYERS_OPENGL,
  LAYERS_D3D11,
  LAYERS_CLIENT,
  LAYERS_WR,
  LAYERS_LAST
};
Then from WebRenderLayerManager.h:
  LayersBackend GetBackendType() override { return LayersBackend::LAYERS_WR; }
And finally from ClientLayerManager.h:
  LayersBackend GetBackendType() override {
    return LayersBackend::LAYERS_CLIENT;
  }
Clearly the wrong type of layer manager is being instantiated. This will take a bit more digging to get to the bottom of.

To try to figure out why the wrong layer manager is being created I've put a breakpoint on the WebRenderLayerManager constructor. That should be enough to get the answer we need:
Thread 8 "GeckoWorkerThre" hit Breakpoint 1,
    mozilla::layers::WebRenderLayerManager::WebRenderLayerManager
    (this=0x7f887e5040, aWidget=0x7f887e0dc0)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/layers/wr/
    WebRenderLayerManager.cpp:38
38      WebRenderLayerManager::WebRenderLayerManager(nsIWidget* aWidget)
(gdb) bt
#0  mozilla::layers::WebRenderLayerManager::WebRenderLayerManager
    (this=0x7f887e5040, aWidget=0x7f887e0dc0)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/layers/wr/
    WebRenderLayerManager.cpp:38
#1  0x0000007fbc06fd98 in nsBaseWidget::CreateCompositorSession
    (this=this@entry=0x7f887e0dc0, aWidth=aWidth@entry=1080,
    aHeight=aHeight@entry=2520, 
    aOptionsOut=aOptionsOut@entry=0x7f9f4eb7b0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/cxxalloc.h:33
#2  0x0000007fbc074348 in nsBaseWidget::CreateCompositor
    (this=this@entry=0x7f887e0dc0, aWidth=aWidth@entry=1080,
    aHeight=aHeight@entry=2520)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/widget/nsBaseWidget.cpp:1440
#3  0x0000007fbcc93d58 in mozilla::embedlite::nsWindow::CreateCompositor
    (this=0x7f887e0dc0, aWidth=1080, aHeight=2520)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/mobile/sailfishos/
    embedshared/nsWindow.cpp:175
#4  0x0000007fbcc92e2c in mozilla::embedlite::nsWindow::CreateCompositor
    (this=0x7f887e0dc0)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/mobile/sailfishos/
     embedshared/nsWindow.cpp:168
#5  0x0000007fbcc95c40 in mozilla::embedlite::nsWindow::GetLayerManager
    (this=0x7f887e0dc0, aShadowManager=,
    aBackendHint=, 
    aPersistence=nsIWidget::LAYER_MANAGER_CURRENT) at
    /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/mobile/sailfishos/embedshared/
    nsWindow.cpp:230
#6  0x0000007fbcc9387c in nsIWidget::GetLayerManager (this=0x7f887e0dc0) at
    /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/widget/nsIWidget.h:1303
[...]
#35 0x0000007fb79cf89c in ?? () from /lib64/libc.so.6
(gdb) 
Following these breadcrumbs and checking inside nsBaseWidget::CreateCompositorSession() I see the following code, which looks very much like it may be causing this situation:
    RefPtr lm;
    if (options.UseWebRender()) {
      lm = new WebRenderLayerManager(this);
    } else {
      lm = new ClientLayerManager(this);
    }
Now it may be that we actually want the WebRender layer manager. But for now I'm going to stick with what we know and try to get it to produce a Client layer manager instead. At any rate, the bit of code above isn't new. What is new is the way the options.UseWebRender() is getting set. Here's the relevant code in ESR 78:
    bool enableWR =
        gfx::gfxVars::UseWebRender() && WidgetTypeSupportsAcceleration();
Pretty concise. Whereas here's the equivalent code from ESR 91.
    bool supportsAcceleration = WidgetTypeSupportsAcceleration();
    bool enableWR;
    bool enableSWWR;
    if (supportsAcceleration ||
        StaticPrefs::gfx_webrender_unaccelerated_widget_force()) {
      enableWR = gfx::gfxVars::UseWebRender();
      enableSWWR = gfx::gfxVars::UseSoftwareWebRender();
    } else if (gfxPlatform::DoesFissionForceWebRender() ||
               StaticPrefs::
                   gfx_webrender_software_unaccelerated_widget_allow()) {
      enableWR = enableSWWR = gfx::gfxVars::UseWebRender();
    } else {
      enableWR = enableSWWR = false;
    }
Not so concise!

Although there's a lot more code here, it's still quite simple. By stepping through the code using the debugger we can find out what some of these values are set to at runtime.
(gdb) p supportsAcceleration
$4 = 
(gdb) p WidgetTypeSupportsAcceleration()
$5 = true
(gdb) p StaticPrefs::gfx_webrender_unaccelerated_widget_force()
No symbol "gfx_webrender_unaccelerated_widget_force" in namespace
"mozilla::StaticPrefs".
(gdb) p gfx::gfxVars::UseWebRender()
Cannot evaluate function -- may be inlined
(gdb) p gfx::gfxVars::sInstance.mRawPtr->mVarUseWebRender.mValue
$15 = true
(gdb) p enableWR
$6 = true
(gdb) p enableSWWR
$7 = 
(gdb) p enableAPZ
$9 = false
(gdb) p options
$10 = {mUseAPZ = false, mUseWebRender = true, mUseSoftwareWebRender = true,
  mAllowSoftwareWebRenderD3D11 = false, mAllowSoftwareWebRenderOGL = false,
  mUseAdvancedLayers = false, mUseWebGPU = false, mInitiallyPaused = false}
(gdb) 
The crucial points here are that supportsAcceleration and gfx::gfxVars::UseWebRender() are both true. It looks to me like the latter should be false. According to the code in gfxVars.h the default value is false, so it must be getting set to true elsewhere at runtime.

There's only a couple of places where this can happen, so I'll stick some breakpoints on them to see if they get triggered.

The first one to get triggered is the call to InitWebRenderConfig():
Thread 8 "GeckoWorkerThre" hit Breakpoint 4, gfxPlatform::InitWebRenderConfig
    this=0x7f88791ac0)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/thebes/gfxPlatform.cpp:2646
2646    void gfxPlatform::InitWebRenderConfig() {
As it happens inside this method there's a clear difference in approach between the ESR 78 version of the code and the ESR 91 version of the code. The code that sets the web render variable in ESR 78 looks like this:
  if (gfxConfig::IsEnabled(Feature::WEBRENDER)) {
    gfxVars::SetUseWebRender(true);
This suggests that the configuration value has to be explicitly set for the web render variable to follow. On the other hand in ESR 91 we have some code that looks like this:
  bool hasHardware = gfxConfig::IsEnabled(Feature::WEBRENDER);
  bool hasSoftware = gfxConfig::IsEnabled(Feature::WEBRENDER_SOFTWARE);
  bool hasWebRender = hasHardware || hasSoftware;
[...]
  if (hasWebRender) {
    gfxVars::SetUseWebRender(true);
There's a new Feature::WEBRENDER_SOFTWARE variable which is also now allowing the value to be set. And in fact stepping through this code we see exactly what we expect:
(gdb) p hasHardware
$16 = false
(gdb) p hasSoftware
$17 = true
(gdb) p hasWebRender
$19 = true
(gdb) 
So there we have it.

This WEBRENDER_SOFTWARE variable is used in quite a few places, so it seems the sensible thing to do here would be to flip the setting so it returns false everywhere. I don't yet know how to do this. But before I do, maybe I can try something by flipping the value just in this spot using the debugger to see what happens.
(gdb) set hasWebRender = false
(gdb) p hasWebRender
$21 = false
(gdb) c
Continuing.
The thing is, at this point, it really does continue. There are no crashes or really serious errors. There's no rendering either! But a bunch of other stuff does still seem to be working.

For example, it downloads the pages, switches between mobile and desktop mode, even allows searching on the page.


 
The browser actually running. There's no rendering, but it is downloading something.

Make no mistake, this is a thoroughly broken version of the browser. The screen remains stubbornly blank no matter what I do. But given how much I had to cut out of the rendering pipeline in order to get it to build, this is no real surprise.

I'm pretty excited by this. I don't want to imply there isn't a huge amount of work still to be done. There really is. But the fact that it's not just flat-out crashing is a positive. It means we're on the right track.

But this is also enough for today. I need to figure out how to set this WEBRENDER_SOFTWARE variable properly, not just forcing it using the debugger. I also need to write up all the tickets that I've yet to detail. This second task is now all the more important given things are at the stage where others may be able to contribute as well. That's for tomorrow.

If you'd like to read more about all this gecko stuff, do take a look at my full Gecko Dev Diary.
Comment
4 Oct 2023 : Day 49 #
Unfortunately the build I set off yesterday didn't complete due to an error in some of the changes I made. My code arrangement had forced the BootstrapResult instance I added to be instantiated before being replaced by the return value of the GetBootstrap() method, like this:
  BootstrapResult bootstrapResult;
  bootstrapResult = mozilla::GetBootstrap(xpcomPath.c_str());
It didn't like that, so I had to change it to this;
  BootstrapResult bootstrapResult = mozilla::GetBootstrap(xpcomPath.c_str());
Presumably the constructor wasn't available. Now that I've made this change it builds correctly. But it means I've had to set the build off again. I'm not really able to tackle any more of the QtMozEmbed changes until the build has completed, so I guess I'll just have to drink coffee and watch a Wheel of Time episode instead. I also have to write some text for the App Roundup in the next newsletter. So I have plenty of non-gecko tasks to be getting on with. I should probably do those instead!

The extended build time can be frustrating, but it can also be a good way to force myself to break up my development into chunks. So it's actually quite convenient that I'm forced not to rush the process. It would be different if someone were paying me to do it of course.

[...]

After 7 hours 36 minutes and 10.3 seconds the build completes.
Wrote: /home/flypig/RPMS/SailfishOS-devel-aarch64/xulrunner-qt5-devel-91.9.1-1.aarch64.rpm
Wrote: /home/flypig/RPMS/SailfishOS-devel-aarch64/xulrunner-qt5-misc-91.9.1-1.aarch64.rpm
Wrote: /home/flypig/RPMS/SailfishOS-devel-aarch64/xulrunner-qt5-91.9.1-1.aarch64.rpm
Wrote: /home/flypig/RPMS/SailfishOS-devel-aarch64/xulrunner-qt5-debugsource-91.9.1-1.aarch64.rpm
Wrote: /home/flypig/RPMS/SailfishOS-devel-aarch64/xulrunner-qt5-debuginfo-91.9.1-1.aarch64.rpm
So now it's time to install the built RPMs and try to build QtMozEmbed against them again.

After doing this and running the QtMozEmbed build it still, unfortunately, comes up with an error. The error looks like this (I've reformatted it a bit to try to make things a little clearer):
SailfishOS-devel-aarch64.temp/usr/lib64/qt5/bin/moc -DMESA_EGL_NO_X11_HEADERS \
  -DXPCOM_GLUE=1 -DXPCOM_GLUE_USE_NSPR=1 -DMOZ_GLUE_IN_PROGRAM=1 \
  -DBUILD_GRE_HOME="\"SailfishOS-devel-aarch64.temp/usr/lib64/xulrunner-qt5-91.9.1\"" \
  -DQT_OPENGLEXTENSIONS_LIB -DQT_QUICK_LIB -DQT_GUI_LIB -DQT_QML_LIB \
  -DQT_NETWORK_LIB -DQT_CORE_LIB -I/usr/share/qt5/mkspecs/linux-g++ \
  -I./qtmozembed/src -I./qtmozembed/src/ -I/usr/include/nspr4 \
  -I/usr/include/nspr4 -I/usr/include/pixman-1 -I/usr/include/systemsettings \
  -I/usr/include/qt5/QtDBus -I/usr/include/qt5 -I/usr/include/qt5/QtCore \
  -I/usr/include/profiled -I/usr/include/dbus-1.0 -I/usr/lib64/dbus-1.0/include \
  -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include \
  -I/usr/include/libsailfishkeyprovider -I/usr/include/connman-qt5 \
  -I/usr/include/xulrunner-qt5-91.9.1 -I/usr/include/qt5/QtOpenGLExtensions \
  -I/usr/include/qt5/QtQuick -I/usr/include/qt5/QtGui -I/usr/include/qt5/QtQml \
  -I/usr/include/qt5/QtNetwork \
  -I/opt/cross/aarch64-meego-linux-gnu/include/c++/8.3.0 \
  -I/opt/cross/aarch64-meego-linux-gnu/include/c++/8.3.0/aarch64-meego-linux-gnu \
  -I/opt/cross/aarch64-meego-linux-gnu/include/c++/8.3.0/backward \
  -I/opt/cross/lib/gcc/aarch64-meego-linux-gnu/8.3.0/include \
  -I/usr/local/include \
  -I/opt/cross/lib/gcc/aarch64-meego-linux-gnu/8.3.0/include-fixed \
  -I/opt/cross/aarch64-meego-linux-gnu/include -I/usr/include qmozview_p.h \
  -o ../src/moc/release_static/moc_qmozview_p.cpp
usr/include/xulrunner-qt5-91.9.1/mozilla/MaybeStorageBase.:16: Parse error at "mozilla"
make[1]: *** [Makefile:566: ../src/moc/release_static/moc_qmozview_p.cpp] Error 1
This error is odd, in that there doesn't seem to be a filetype suffix:
usr/include/xulrunner-qt5-91.9.1/mozilla/MaybeStorageBase.:16: Parse error at "mozilla"
The path usr/include/xulrunner-qt5-91.9.1 is also missing the root /. Strange. I'm not sure about this. Unfortunately it's quite late now, so although we're only finally reaching the point where we can try to fix things, this is going to now have to wait until the morning.

Most of the day was spent waiting for the build to finish, which is a little frustrating. It means there wasn't so much progress today. Tomorrow will be Day 50 and even though it's just a quirk of decimal numbers, that still feels like a big deal. Hopefully it'll be possible to make more progress tomorrow.

If you'd like to read more about all this gecko stuff, do take a look at my full Gecko Dev Diary.
3 Oct 2023 : Day 48 #
Yesterday we started out working on getting QtMozEmbed to build. I made all the changes to QtMozEmbed that looked sensible and also added a new file — EmbedInitGlue.cpp to the Gecko code. The build of this was set to run overnight.

This morning I've unfortunately found the build failed. I actually spent a fair bit of last night getting the sfdk build target into a usable state. There were lots of problems getting the dependencies installed. So my initial response was exasperation: is this some intermittent sfdk Heisenbug? Thankfully not. On closer inspection it turns out to be an genuine error in the new EmbedInitGlue.cpp file that I added. Phew. That's likely to be much easier to track down and fix than an SDK issue.
219:03.60 mozglue/sailfishos
219:05.45 ${PROJECT}/gecko-dev/mozglue/sailfishos/EmbedInitGlue.cpp: In function
          ‘bool LoadEmbedLite(int, char**)’:
219:05.45 ${PROJECT}/gecko-dev/mozglue/sailfishos/EmbedInitGlue.cpp:89:55: error:
          no match for ‘operator=’ (operand types are ‘mozilla::Bootstrap::UniquePtr’
          {aka ‘mozilla::UniquePtr’}
          and ‘mozilla::BootstrapResult’ {aka
          ‘mozilla::Result, mozilla::Variant > > >’})
219:05.45    gBootstrap = mozilla::GetBootstrap(xpcomPath.c_str());
219:05.45                                                        ^
It looks like the problem here is the return value that's intended to end up stored in gBootstrap:
using BootstrapResult = ::mozilla::Result;

inline BootstrapResult GetBootstrap(const char* aXPCOMFile = nullptr) {
Upstream diff D104263 changed the return value from Bootstrap::UniquePtr to BootstrapResult with the definition above; so we'll need to update the EmbedInitGlue.cpp code to match.

That means we should also update the error condition from this ("check for null pointer"):
  gBootstrap = mozilla::GetBootstrap(xpcomPath.c_str());
  if (!gBootstrap) {
    printf("Couldn't load XPCOM from %s\n", xpcomPath.c_str());
    return false;
  }
To this ("check for explicit error return value"):
  BootstrapResult bootstrapResult;
  bootstrapResult = mozilla::GetBootstrap(xpcomPath.c_str());
  if (bootstrapResult.isErr()) {
    printf("Couldn't load XPCOM from %s\n", xpcomPath.c_str());
    return false;
  }
  gBootstrap = bootstrapResult.unwrap();
This change looks to me like it's coming from Rust, which has these nice Result enums which act similarly to this and force explicit error checks. It's very easy to forget to include a null-pointer check.

That's the only error coming out right now, so worth triggering another rebuild of Gecko.

Before I do that I want to run another build of QtMozEmbed to see whether the changes I've made have had a positive effect.

Obviously the EmbedInitGlue.h header will still be missing, but maybe the other changes I've made will throw up something new.

And they do! The following new errors now appear.
SailfishOS-devel-aarch64.temp/usr/include/xulrunner-qt5-91.9.1/
  nsReadableUtils.h: In function ‘bool EnsureUTF16Validity(nsAString&)’:
SailfishOS-devel-aarch64.temp/usr/include/xulrunner-qt5-91.9.1/
  nsReadableUtils.h:414:26: error: ‘Utf16ValidUpTo’ is not a member of ‘mozilla’
   size_t upTo = mozilla::Utf16ValidUpTo(aString);
                          ^~~~~~~~~~~~~~
SailfishOS-devel-aarch64.temp/usr/include/xulrunner-qt5-91.9.1/
  nsReadableUtils.h:425:12: error: ‘EnsureUtf16ValiditySpan’ is not a member of
  ‘mozilla’
   mozilla::EnsureUtf16ValiditySpan(span.From(upTo + 1));
            ^~~~~~~~~~~~~~~~~~~~~~~
Investigating these turns out to be a bit of a rabbit hole. The error is coming from the nsReadableUtils.h file. The functions the compiler claims are missing already exist in Gecko's TextUtils.h file. But nsReadableUtils.h does include TextUtils.h, so why isn't it picking it up?

My suspicion is that it relates to the MOZ_HAS_JSRUST preprocessor define which wraps the two functions. If this define is set to 1 the functions will be included, but if it's set to 0 they won't be.

Where is this set? It looks like it's in the MOZ_HAS_JSRUST file, which has this tangle of defines:
#if (defined(MOZ_HAS_MOZGLUE) || defined(MOZILLA_INTERNAL_API)) && \
    !defined(MOZ_PRETEND_NO_JSRUST)
#  define MOZ_HAS_JSRUST() 1
#else
#  define MOZ_HAS_JSRUST() 0
#endif
The mozilla/JsRust.h file is included in mozilla/Latin1.h, which is included in Textutils.h. So it's definitely getting in to the build.

It seems the problem here is that MOZILLA_INTERNAL_API isn't defined. Digging further into the logs I see this error being output:
SailfishOS-devel-aarch64.temp/usr/include/xulrunner-qt5-91.9.1/
  nsTSubstring.h:29:4: error: #error "Using XPCOM strings is limited to code
  linked into libxul."
 #  error "Using XPCOM strings is limited to code linked into libxul."
    ^~~~~
This error is also due to MOZILLA_INTERNAL_API not being defined.

I did also check that none of these files have changed since ESR 78. They haven't. So there must be something else happening with these defines.

To see whether it really is a lack of this define being set, I've forcefully set it in qopenglwebpage.cpp, which is the root of where all the header files cascade in from. Add this define doesn't help, so maybe I have this wrong.

At this point I notice that the reason the header file is being included at all is because of the changes I made yesterday. Adding in nsFocusManager::GenerateFocusActionId() forced me to also include the nsFocusManager.h header, which has triggered all of these problems.

So maybe I should row back on that addition? After all, we don't want to be accessing an internal API, so probably we shouldn't be using GenerateFocusActionId().

One potential way to tackle this is to push the code that generates the aActionId parameter inside the Gecko library, rather than trying to expose it for the browser wrapper code to deal with. That way the code can call GenerateFocusActionId() without having to worry about these errors (because MOZILLA_INTERNAL_API will be set). Since I'm still holding back a Gecko build, now would be a good time to make this change.

There's also this error to tackle in the list:
quickmozview.cpp: In destructor ‘virtual QuickMozView::~QuickMozView()’:
quickmozview.cpp:104:36: error: no matching function for call to
  ‘mozilla::embedlite::EmbedLiteView::SetIsActive(bool)’
         d->mView->SetIsActive(false);
                                    ^
This is another instance of SetIsActive() that needs the additional aActionID parameter. There's a similar error further down in the same file. The changes I've made to Gecko to remove this parameter should also result in these errors being resolved automatically, or so I believe.

That's the last of the errors that doesn't seem related to the missing EmbedInitGlue files.

So, time to run the Gecko build overnight so I can retry the QtMozEmbed build again in the morning.

If you'd like to read more about all this gecko stuff, do take a look at my full Gecko Dev Diary.
Comment
2 Oct 2023 : Day 47 #
Yesterday was a good day given Thigg's graphical intervention. I also tried installing the newly generated gecko 91.9.1 rpm packages on my phone, with predictably unsuccessful results. Installation was blocked because there are at least a couple of packages (QtMozEmbed and the browser booster) which depend on it, and they insisted on linking against only the previous version of the library.

This isn't a disaster, it just means I have to rebuild them against the new library. But unfortunately this isn't just as simple as performing a rebuild because the libxul API has changed.

None of these API changes are a particular surprise: I recognise them as changes that I made to various headers from the library. But I will need to update QtMozEmbed to use them properly.

One of the nice things about this piece of work is that at least QtMozEmbed compiles pretty swiftly. That'll make the development process much easier than it is for gecko. Frankly, it's a breath of fresh air.

The first issue is that the qmozcontext.cpp file is trying to import mozilla/embedlite/EmbedInitGlue.h which apparently no longer exists. First thing to check is whether or not that's actually true.

And it is true: it's not there. But interestingly it's not there in ESR 78 either. Or rather, it is there, but it's not in the git tree. That's because it's been generated by a patch; a patch that I've not applied: patch 0018 "Introduce EmbedInitGlue to the mozglue".

I'll try to introduce it now.
$ git am --3way ../rpm/0019-sailfishos-mozglue-Introduce-EmbedInitGlue-to-the-mo.patch
Applying: Introduce EmbedInitGlue to the mozglue. JB#50788
Using index info to reconstruct a base tree...
M       mozglue/moz.build
Falling back to patching base and 3-way merge...
Auto-merging mozglue/moz.build
CONFLICT (content): Merge conflict in mozglue/moz.build
error: Failed to merge in the changes.
Patch failed at 0001 Introduce EmbedInitGlue to the mozglue. JB#50788
hint: Use 'git am --show-current-patch=diff' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
It's not quite a smooth application, but with some manual tinkering the conflict is turns out to be pretty straightforward to resolve. The thing to do now is create a new set of gecko rpm packages. But there are more errors in QtMozEmbed that it's worth trying to sort out first. It doesn't look like all of them are caused by the lack of this header file.

There's an error about ArrayLength() not being defined. This function (or maybe macro?.. no, it's a function) hasn't been removed: it's widely used in the gecko code. So it's probably just a case of a header file not making it into the QtMozEmbed code.

Just to complicate things, gecko has it defined in three distinct header files, so it's not clear which one we need to use.

I've had a look at a couple of gecko source files and the common include seems to be mozilla/ArrayUtils.h so I've decided to go for that one.

Next up we have this:
qopenglwebpage.cpp: In destructor ‘virtual QOpenGLWebPage::~QOpenGLWebPage()’:
qopenglwebpage.cpp:76:36: error: no matching function for call to
  ‘mozilla::embedlite::EmbedLiteView::SetIsActive(bool)’
         d->mView->SetIsActive(false);
                                    ^
You may recall that back on Day 30 we had to change the signature for SetIsActive(). Here's the relevant commit log message:
commit 96ae695458d3beb6ca4614f07e1915eff3577fe3
Author: Henri Sivonen 
Date:   Mon Nov 16 19:16:20 2020 +0000

    Bug 1618386 - Add action ids to filter out stale active browsing context updates. r=nika
    
    Differential Revision: https://phabricator.services.mozilla.com/D94969
This was the active ingredient for that commit as far as the current error is concerned:
-  virtual void SetIsActive(bool);
+  virtual void SetIsActive(bool, uint64_t aActionId);
So to get the QtMozEmbed code to compile we have to add an appropriate uint64_t aActionId to the SetIsActive() calls. I'm going to send in the value returned by nsFocusManager::GenerateFocusActionId() here, since this is what code elsewhere in gecko seems to be using.

Next we have this:
SailfishOS-devel-aarch64.temp/usr/include/xulrunner-qt5-91.9.1/mozilla/embedlite/
  EmbedLiteView.h:83:16: note:   candidate expects 2 arguments, 1 provided
qmozview_p.cpp: In member function ‘void QMozViewPrivate::goBack()’:
qmozview_p.cpp:444:19: error: no matching function for call to
  ‘mozilla::embedlite::EmbedLiteView::GoBack()’
     mView->GoBack();
                   ^
Plus a similar error for GoForward(). You might recall that we also changed these back on Day 30 to have an extra couple of parameters, bool aRequireUserInteraction and bool aUserActivation.

Ultimately we may want to expose these parameters from QtMozEmbed, but for the time being I've set both parameters to true in the two places these were used.

In theory that should cover all of the QtMozEmbed errors, but we can't find out now until the gecko build has finished. Then we can install the new gecko packages in the SDK to build QtMozEmbed against to see what happens.

So that's it for today. More tomorrow!

If you'd like to read more about all this gecko stuff, take a look at my full Gecko Dev Diary.
Comment
1 Oct 2023 : Day 46 #
We hit an important milestone yesterday with the generation of our first set of complete rpm packages that could — theoretically — be installed on a device. I'll test that theory later today.

But before I get into that I want to fist give a big shout out to Thigg, who created (with a little AI help) this amazing picture showing the inside of the lab at flypig HQ.
 
A pig with wings dressed as a doctor fixing the limb of a gecko with a bandage on a surgery table. Comic style.

It's an astonishingly accurate representation! I want to give the description Thigg gave of the prompt in full because I think it's just brilliant.
 
"A pig with wings dressed as a doctor fixing the limb of a gecko with a bandage on a surgery table. Comic style" ... Did i specify pig without legs…?

I've said it before, but I'll say it again. Feedback like this is by far the best motivation. It reaffirms what I've always known to be true, which is that the Sailfish community is just great! I'm going to carry straight on because we have a lot to get through today.

So, from yesterday we have a set of packages, but the version numbering on the packages is incorrect because I set it running with a faulty build configuration.

Overnight I ran it with the configuration adjusted, and this morning I can see we have some better results.
Processing files: xulrunner-qt5-misc-91.9.1-1.aarch64
warning: absolute symlink: /usr/bin/xulrunner-qt5 -> /home/flypig/Programs/
         sailfish-sdk/sailfish-sdk/mersdk/targets/SailfishOS-devel-aarch64.default/
         usr/lib64/xulrunner-qt5-91.9.1/xulrunner-qt5
Provides: xulrunner-qt5-misc = 91.9.1-1 xulrunner-qt5-misc(aarch-64) = 91.9.1-1
Requires(rpmlib): rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(FileDigests)
                  <= 4.6.0-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1
Processing files: xulrunner-qt5-debuginfo-91.9.1-1.aarch64
Provides: debuginfo(build-id) = 19a5cafc6a2d227874330bb042601dcb29564851
          debuginfo(build-id) = 5d45fc4b8f12fda685eb62a8fe5f568f9510c8fa
          debuginfo(build-id) = 73026cf7982ef61f8707c3e0f1459f3892784c86
          debuginfo(build-id) = f309784dbe98020d6ab529f9d645caca0860d065
          debuginfo(build-id) = fa071e3777327733d34f625430b1a3cc5ce5f28c
          xulrunner-qt5-debuginfo = 91.9.1-1 xulrunner-qt5-debuginfo(aarch-64)
          = 91.9.1-1
Requires(rpmlib): rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(FileDigests)
                  <= 4.6.0-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1
Recommends: xulrunner-qt5-debugsource(aarch-64) = 91.9.1-1
Processing files: xulrunner-qt5-debugsource-91.9.1-1.aarch64
Provides: xulrunner-qt5-debugsource = 91.9.1-1
          xulrunner-qt5-debugsource(aarch-64) = 91.9.1-1
Requires(rpmlib): rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(FileDigests)
                  <= 4.6.0-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1
Checking for unpackaged file(s): /home/flypig/Programs/sailfish-sdk/
                                 sailfish-sdk/mersdk/targets/
                                 SailfishOS-devel-aarch64.default/usr/lib/rpm/
                                 check-files /home/deploy/installroot
Wrote: /home/flypig/RPMS/SailfishOS-devel-aarch64/xulrunner-qt5-devel-91.9.1-1.aarch64.rpm
Wrote: /home/flypig/RPMS/SailfishOS-devel-aarch64/xulrunner-qt5-misc-91.9.1-1.aarch64.rpm
Wrote: /home/flypig/RPMS/SailfishOS-devel-aarch64/xulrunner-qt5-91.9.1-1.aarch64.rpm
Wrote: /home/flypig/RPMS/SailfishOS-devel-aarch64/xulrunner-qt5-debugsource-91.9.1-1.aarch64.rpm
Wrote: /home/flypig/RPMS/SailfishOS-devel-aarch64/xulrunner-qt5-debuginfo-91.9.1-1.aarch64.rpm
Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.9I0Dnw
+ umask 022
+ cd ${PROJECT}/gecko-dev
+ /bin/rm -rf /home/deploy/installroot
+ RPM_EC=0
++ jobs -p
+ exit 0
Let's take a quick look at those packages.
$ pushd ~/RPMS/SailfishOS-devel-aarch64/
$ ls -lh x*
-rw-rw-r-- 1 flypig 100000  34M Sep 27 00:08 xulrunner-qt5-91.9.1-1.aarch64.rpm
-rw-rw-r-- 1 flypig 100000 644M Sep 27 00:19 xulrunner-qt5-debuginfo-91.9.1-1.aarch64.rpm
-rw-rw-r-- 1 flypig 100000  26M Sep 27 00:08 xulrunner-qt5-debugsource-91.9.1-1.aarch64.rpm
-rw-rw-r-- 1 flypig 100000 6.5M Sep 27 00:07 xulrunner-qt5-devel-91.9.1-1.aarch64.rpm
-rw-rw-r-- 1 flypig 100000  16K Sep 27 00:07 xulrunner-qt5-misc-91.9.1-1.aarch64.rpm
$ mkdir temp
$ pushd temp
$ rpm2cpio ../xulrunner-qt5-91.9.1-1.aarch64.rpm | cpio -idmv
$ tree -sh
[4.0K]  .
└── [4.0K]  usr
    └── [4.0K]  lib64
        └── [4.0K]  xulrunner-qt5-91.9.1
            ├── [4.0K]  defaults
            ├── [  25]  dependentlibs.list
            ├── [  18]  dictionaries -> /usr/share/myspell
            ├── [ 66K]  liblgpllibs.so
            ├── [324K]  libmozavcodec.so
            ├── [259K]  libmozavutil.so
            ├── [ 95M]  libxul.so
            ├── [ 26M]  omni.ja
            ├── [  48]  platform.ini
            └── [451K]  plugin-container

4 directories, 9 files
$ rm -rf usr/
$ rpm2cpio ../xulrunner-qt5-debuginfo-91.9.1-1.aarch64.rpm | cpio -idmv
$ tree -sh
[4.0K]  .
└── [4.0K]  usr
    └── [4.0K]  lib
        └── [4.0K]  debug
            └── [4.0K]  usr
                └── [4.0K]  lib64
                    └── [4.0K]  xulrunner-qt5-91.9.1
                        ├── [205K]  liblgpllibs.so-91.9.1-1.aarch64.debug
                        ├── [1.3M]  libmozavcodec.so-91.9.1-1.aarch64.debug
                        ├── [595K]  libmozavutil.so-91.9.1-1.aarch64.debug
                        ├── [2.6G]  libxul.so-91.9.1-1.aarch64.debug
                        └── [10.0M]  plugin-container-91.9.1-1.aarch64.debug

6 directories, 5 files
$ popd
$ popd
Those are the content of the two most important packages. I'll skip the others because they contained a large number of small files; not very convenient or useful to show here.

Let's compare that with the latest version from the Jolla repositories. Just so you know, kolbe is the name of my phone.
$ ssh kolbe
$ mkdir cache
$ cd cache
$ zypper --pkg-cache-dir . download xulrunner-qt5 xulrunner-qt5-debuginfo xulrunner-qt5-debugsource xulrunner-qt5-devel xulrunner-q
t5-misc
$ cd jolla/oss/aarch64/
$ ls -lh
total 765M   
-rw-r--r--    1 defaultu defaultu   38.1M Sep 27 08:15 xulrunner-qt5-78.15.1+git33.2-1.21.1.jolla.aarch64.rpm
-rw-r--r--    1 defaultu defaultu  692.5M Sep 27 08:19 xulrunner-qt5-debuginfo-78.15.1+git33.2-1.21.1.jolla.aarch64.rpm
-rw-r--r--    1 defaultu defaultu   28.1M Sep 27 08:19 xulrunner-qt5-debugsource-78.15.1+git33.2-1.21.1.jolla.aarch64.rpm
-rw-r--r--    1 defaultu defaultu    6.3M Sep 27 08:19 xulrunner-qt5-devel-78.15.1+git33.2-1.21.1.jolla.aarch64.rpm
-rw-r--r--    1 defaultu defaultu   68.6K Sep 27 08:19 xulrunner-qt5-misc-78.15.1+git33.2-1.21.1.jolla.aarch64.rpm
$ mkdir temp
$ cd temp/
$ rpm2cpio ../xulrunner-qt5-78.15.1\+git33.2-1.21.1.jolla.aarch64.rpm | cpio -idmv

$ ~/tree -lh
[4.0K]  .
└── [4.0K]  usr
    └── [4.0K]  lib64
        └── [4.0K]  xulrunner-qt5-78.15.1
            ├── [4.0K]  defaults
            ├── [  10]  dependentlibs.list
            ├── [  18]  dictionaries -> /usr/share/myspell
            ├── [ 38K]  liblgpllibs.so
            ├── [260K]  libmozavcodec.so
            ├── [203K]  libmozavutil.so
            ├── [101M]  libxul.so
            ├── [ 25M]  omni.ja
            ├── [  49]  platform.ini
            └── [460K]  plugin-container

5 directories, 9 files
$ rm -rf usr/
$ rpm2cpio ../xulrunner-qt5-debuginfo-78.15.1\+git33.2-1.21.1.jolla.aarch64.rpm | cpio -idmv
$ ~/tree -lh
[4.0K]  .
└── [4.0K]  usr
    └── [4.0K]  lib
        └── [4.0K]  debug
            └── [4.0K]  usr
                └── [4.0K]  lib64
                    └── [4.0K]  xulrunner-qt5-78.15.1
                        ├── [214K]  liblgpllibs.so-78.15.1+git33.2-1.21.1.jolla.aarch64.debug
                        ├── [1.3M]  libmozavcodec.so-78.15.1+git33.2-1.21.1.jolla.aarch64.debug
                        ├── [621K]  libmozavutil.so-78.15.1+git33.2-1.21.1.jolla.aarch64.debug
                        └── [2.1G]  libxul.so-78.15.1+git33.2-1.21.1.jolla.aarch64.debug

7 directories, 4 files
$ cd ../../../../..
$ rm -rf cache/
The contents certainly look similar enough to be plausible. The key part is the libxul.so file. It'll be interesting to see what happens when we try to use this. I'm not expecting good results, most likely a complete failure of the executable when it tries to link against it. Even if it runs, I'm certainly not expecting it to render. But we have to take this one issue at a time. Baby steps (or as DrYak aptly puts it, "inching closer...").

So, let's give this a go and see what happens. I'm using a development Xperia 10 II rather than my daily phone. It's aarch64 running the latest Sailfish OS 4.5.0.24, so be able to run the code we've built.

Let's start by getting the packages on the phone:
$ scp xulrunner-qt5-*.rpm defaultuser@192.168.2.15:~
xulrunner-qt5-91.9.1-1.aarch64.rpm                 100%   33MB  19.5MB/s   00:01    
xulrunner-qt5-debuginfo-91.9.1-1.aarch64.rpm       100%  643MB  18.5MB/s   00:34    
xulrunner-qt5-debugsource-91.9.1-1.aarch64.rpm     100%   26MB  17.5MB/s   00:01    
xulrunner-qt5-devel-91.9.1-1.aarch64.rpm           100% 6604KB  17.7MB/s   00:00    
xulrunner-qt5-misc-91.9.1-1.aarch64.rpm            100%   16KB   6.8MB/s   00:00
Now let's try installing them while maintaining low expectations.
$ ssh defaultuser@192.168.2.15
Last login: Wed Sep  6 08:38:23 2023 from 192.168.2.7
,---
| Sailfish OS 4.5.0.24 (Struven ketju)
'---
$ ls -lh x*
-rw-r--r--    1 defaultu defaultu   33.5M Sep 27 18:55 xulrunner-qt5-91.9.1-1.aarch64.rpm
-rw-r--r--    1 defaultu defaultu  643.1M Sep 27 18:56 xulrunner-qt5-debuginfo-91.9.1-1.aarch64.rpm
-rw-r--r--    1 defaultu defaultu   25.7M Sep 27 18:56 xulrunner-qt5-debugsource-91.9.1-1.aarch64.rpm
-rw-r--r--    1 defaultu defaultu    6.4M Sep 27 18:56 xulrunner-qt5-devel-91.9.1-1.aarch64.rpm
-rw-r--r--    1 defaultu defaultu   15.6K Sep 27 18:56 xulrunner-qt5-misc-91.9.1-1.aarch64.rpm
$ devel-su 
$ rpm -U xulrunner-qt5-91.9.1-1.aarch64.rpm
error: Failed dependencies:
        libxul.so(xul78)(64bit) is needed by (installed) qtmozembed-qt5-1.53.25-1.23.1.jolla.aarch64
        libxul.so(xul78)(64bit) is needed by (installed) mapplauncherd-booster-browser-0.2.1-1.8.1.jolla.aarch64
That's an abrupt halt. It looks like I'll need to build QtMozEmbed and the browser booster against the new library. That sounds achievable. I'll do it by creating a new temporary snapshot and building them both in it.
cd qtmozembed
$ # Update the spec so it's forced to pull in the new xulrunner package
$ sed -i -e "s/60\.9\.1/90\.9\.1/g" rpm/qtmozembed-qt5.spec
$ sfdk -c snapshot=temp build -d -p
Unfortunately it won't immediately build because of the packages it needs installed. I'm never quite certain why it won't install them automatically (they're all available from the repos) but there it is.
error: Failed build dependencies:
        pkgconfig(Qt5QuickTest) is needed by qtmozembed-qt5-1.53.25+master.20230927180924.0decaf4-1.aarch64
        pkgconfig(nspr) >= 4.13.1 is needed by qtmozembed-qt5-1.53.25+master.20230927180924.0decaf4-1.aarch64
        pkgconfig(pixman-1) >= 0.19.2 is needed by qtmozembed-qt5-1.53.25+master.20230927180924.0decaf4-1.aarch64
        xulrunner-qt5-devel >= 90.9.1 is needed by qtmozembed-qt5-1.53.25+master.20230927180924.0decaf4-1.aarch64
Instead I'll have to install them manually.
$ sfdk engine exec
$ sb2 -R -m sdk-install -t SailfishOS-devel-aarch64.temp
$ cd /home/flypig/RPMS/SailfishOS-devel-aarch64
$ rpm -U nspr-devel-4.35.0-1.aarch64.rpm nspr-4.35.0-1.aarch64.rpm
$ zypper install qt5-qtdeclarative-qtquicktest-devel pixman-devel nss-devel
$ rpm -U xulrunner-qt5-devel-91.9.1-1.aarch64.rpm xulrunner-qt5-91.9.1-1.aarch64.rpm 
$ exit
$ exit
Now we can try the build again.
$ sfdk -c snapshot=temp -c no-pull-build-requires build -d -p
The build now fails, but at least in an interesting way. I've copied out some of the more interesting and representative errors below (but also left a load out given there were so many):
usr/include/xulrunner-qt5-91.9.1/mozilla/MaybeStorageBase.:16:
  Parse error at "mozilla"
make[1]: *** [Makefile:566: ../src/moc/release_static/moc_qmozview_p.cpp] Error 1
make[1]: *** Waiting for unfinished jobs....
qmozcontext.cpp:31:10: fatal error: mozilla/embedlite/EmbedInitGlue.h:
  No such file or directory
 #include "mozilla/embedlite/EmbedInitGlue.h"
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
make[1]: *** [Makefile:608: ../src/qmozcontext.o] Error 1
EmbedQtKeyUtils.cpp: In static member function ‘static int
  MozKey::QtKeyCodeToDOMKeyCode(int, int)’:
EmbedQtKeyUtils.cpp:365:21: error: ‘ArrayLength’ was not declared in this scope
     for (i = 0; i < ArrayLength(nsKeycodes); i++) {
                     ^~~~~~~~~~~
EmbedQtKeyUtils.cpp: In static member function ‘static int
  MozKey::DOMKeyCodeToQtKeyCode(int)’:
EmbedQtKeyUtils.cpp:386:17: warning: comparison of integer expressions of
  different signedness: ‘int’ and ‘uint32_t’ {aka ‘unsigned int’} [-Wsign-compare]
     if (aKeysym >= dom::KeyboardEventBinding::DOM_VK_A && aKeysym <= dom::KeyboardEventBinding::DOM_VK_Z) {
         ~~~~~~~~^~~~~~
EmbedQtKeyUtils.cpp:386:67: warning: comparison of integer expressions of
  different signedness: ‘int’ and ‘uint32_t’ {aka ‘unsigned int’} [-Wsign-compare]
     if (aKeysym >= dom::KeyboardEventBinding::DOM_VK_A && aKeysym <= dom::KeyboardEventBinding::DOM_VK_Z) {
                                                           ~~~~~~~~^~~~~~
EmbedQtKeyUtils.cpp:392:17: warning: comparison of integer expressions of
[...]
make[1]: *** [Makefile:630: ../src/EmbedQtKeyUtils.o] Error 1
qopenglwebpage.cpp: In destructor ‘virtual QOpenGLWebPage::~QOpenGLWebPage()’:
qopenglwebpage.cpp:76:36: error: no matching function for call to
  ‘mozilla::embedlite::EmbedLiteView::SetIsActive(bool)’
         d->mView->SetIsActive(false);
                                    ^
[...]
/home/flypig/Programs/sailfish-sdk/sailfish-sdk/mersdk/targets/
  SailfishOS-devel-aarch64.temp/usr/include/xulrunner-qt5-91.9.1/mozilla/
  embedlite/EmbedLiteView.h:83:16: note:
  candidate expects 2 arguments, 1 provided
make[1]: *** [Makefile:665: ../src/qopenglwebpage.o] Error 1
qmozview_p.cpp: In member function ‘void QMozViewPrivate::goBack()’:
qmozview_p.cpp:444:19: error: no matching function for call to
  ‘mozilla::embedlite::EmbedLiteView::GoBack()’
     mView->GoBack();
                   ^
[...]
make[1]: *** [Makefile:689: ../src/quickmozview.o] Error 1
make[1]: Leaving directory '/home/flypig/Documents/Development/jolla/qtmozembed/src'
make: *** [Makefile:45: sub-src-make_first-ordered] Error 2
error: Bad exit status from /var/tmp/rpm-tmp.EBjdEW (%build)
It's immediately clear that these errors are due to the library interface having changed! In fact most of these errors are due to changes I made myself. That means there's going to be a bit more work to be done to fix these errors before it'll be possible to test the library.

That's okay, that's the whole point of doing these checks. But that's not going to be work for today, it's something that will have to wait until tomorrow.

As always, if you want to read more about all this gecko stuff, take a look at my full Gecko Dev Diary.
Comment