flypig.co.uk

List items

Items from the current list are shown below.

Gecko

All items from June 2024

17 Jun 2024 : Day 261 #
Yesterday I collected a bunch of backtraces. These were all from methods added in the last commit for the purpose of WebView rendering, but which also trigger when rendering WebGL content on the browser. They're the methods I'm really interested in.

The very first of the methods to hit a breakpoint was the CompositorOGL::CreateContext() method. Checking the diff between the previous commit and HEAD, we can see that I made the following changes to this method:
@@ -247,11 +247,9 @@ already_AddRefed<mozilla::gl::GLContext> CompositorOGL::
    CreateContext() {
   // Allow to create offscreen GL context for main Layer Manager
   if (!context && gfxEnv::LayersPreferOffscreen()) {
     nsCString discardFailureId;
-    context = GLContextProvider::CreateHeadless(
-        {CreateContextFlags::REQUIRE_COMPAT_PROFILE}, &discardFailureId);
-    if (!context->CreateOffscreenDefaultFb(mSurfaceSize)) {
-      context = nullptr;
-    }
+    context = GLContextProvider::CreateOffscreen(
+        mSurfaceSize, CreateContextFlags::REQUIRE_COMPAT_PROFILE,
+        &discardFailureId);
   }
 
   if (!context) {
In other words, I messed around with the generation of the RefPtr<GLContext> context variable so that it gets created using GLContextProvider::CreateOffscreen() rather than GLContextProvider::CreateHeadless() as it was doing before.

Okay, that's something I can change back easily. So I've added this change on top, effectively reversing the change that I made before:
diff --git a/gfx/layers/opengl/CompositorOGL.cpp b/gfx/layers/opengl/
    CompositorOGL.cpp
index 122709eaf2de..06c84a9ebdaa 100644
--- a/gfx/layers/opengl/CompositorOGL.cpp
+++ b/gfx/layers/opengl/CompositorOGL.cpp
@@ -247,9 +247,15 @@ already_AddRefed<mozilla::gl::GLContext> CompositorOGL::
    CreateContext() {
   // Allow to create offscreen GL context for main Layer Manager
   if (!context && gfxEnv::LayersPreferOffscreen()) {
     nsCString discardFailureId;
-    context = GLContextProvider::CreateOffscreen(
-        mSurfaceSize, CreateContextFlags::REQUIRE_COMPAT_PROFILE,
-        &discardFailureId);
+
+    context = GLContextProvider::CreateHeadless(
+        {CreateContextFlags::REQUIRE_COMPAT_PROFILE}, &discardFailureId);
+    if (!context->CreateOffscreenDefaultFb(mSurfaceSize)) {
+      context = nullptr;
+    }
+//    context = GLContextProvider::CreateOffscreen(
+//        mSurfaceSize, CreateContextFlags::REQUIRE_COMPAT_PROFILE,
+//        &discardFailureId);
   }
 
   if (!context) {
As you can see, this comments out the new call to CreateOffscreen() and replaces it with a call to CreateHeadless() so that it's doing the same as it did before. I've left the removed code in as a comment to make it easier for me to compare, but actually git is keeping track of all this for me already, so I could have safely just deleted that code. I may yet do that!

Having rebuilt, reinstalled and executed these changes, there's no change on the browser side, but for the WebView I now get a segfault crash occurring like this:
Thread 37 &quot;Compositor&quot; received signal SIGSEGV, Segmentation fault.
[Switching to LWP 305]
0x0000007ff366470c in mozilla::gl::GLScreenBuffer::Size (this=0x0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:290
290     ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h: No 
    such file or directory.
(gdb) bt
#0  0x0000007ff366470c in mozilla::gl::GLScreenBuffer::Size (this=0x0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:290
#1  mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    CompositeToDefaultTarget (this=0x7fc4b0a5a0, aId=...)
    at ${PROJECT}/gecko-dev/mobile/sailfishos/embedthread/
    EmbedLiteCompositorBridgeParent.cpp:151
#2  0x0000007ff12b49d8 in mozilla::layers::CompositorVsyncScheduler::
    ForceComposeToTarget (this=0x7fc4c98760, aTarget=aTarget@entry=0x0, 
    aRect=aRect@entry=0x0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/layers/LayersTypes.h:
    82
#3  0x0000007ff12b4a34 in mozilla::layers::CompositorBridgeParent::
    ResumeComposition (this=this@entry=0x7fc4b0a5a0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#4  0x0000007ff12b4ac0 in mozilla::layers::CompositorBridgeParent::
    ResumeCompositionAndResize (this=0x7fc4b0a5a0, x=<optimized out>, 
    y=<optimized out>, 
    width=<optimized out>, height=<optimized out>)
    at ${PROJECT}/gecko-dev/gfx/layers/ipc/CompositorBridgeParent.cpp:794
#5  0x0000007ff12ad65c in mozilla::detail::RunnableMethodArguments<int, int, 
    int, int>::applyImpl<mozilla::layers::CompositorBridgeParent, void (mozilla:
    :layers::CompositorBridgeParent::*)(int, int, int, int), 
    StoreCopyPassByConstLRef<int>, StoreCopyPassByConstLRef<int>, 
    StoreCopyPassByConstLRef<int>, StoreCopyPassByConstLRef<int>, 0ul, 1ul, 
    2ul, 3ul> (args=..., m=<optimized out>, o=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1151
#6  mozilla::detail::RunnableMethodArguments<int, int, int, int>::apply<mozilla:
    :layers::CompositorBridgeParent, void (mozilla::layers::
    CompositorBridgeParent::*)(int, int, int, int)> (m=<optimized out>, 
    o=<optimized out>, this=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1154
#7  mozilla::detail::RunnableMethodImpl<mozilla::layers::
    CompositorBridgeParent*, void (mozilla::layers::CompositorBridgeParent::*)(
    int, int, int, int), true, (mozilla::RunnableKind)0, int, int, int, int>::
    Run (this=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1201
#8  0x0000007ff07fd018 in nsThread::ProcessNextEvent (this=0x7fc4c25780, 
    aMayWait=<optimized out>, aResult=0x7f1f608cb7)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:869
[...]
#17 0x0000007ff6a0289c in ?? () from /lib64/libc.so.6
(gdb) 
It seems that a call is now being made to GLScreenBuffer::Size() where the GLScreenBuffer object that the method belongs to doesn't exist.

This is fine. At present I really only care about getting the browser to work. If the change I just made turns out to be necessary I'll have to return to this and fix it, so I'll need this backtrace, but right now I have to forge on and focus on the WebGL rendering.

I'm now checking the following change:
@@ -1081,7 +1251,11 @@ static void FillContextAttribs(bool es3, bool useGles, 
    nsTArray<EGLint>* out) {
   } else
 #endif
   {
-    out->AppendElement(LOCAL_EGL_PBUFFER_BIT);
+    if (useWindow) {
+      out->AppendElement(LOCAL_EGL_WINDOW_BIT);
+    } else {
+      out->AppendElement(LOCAL_EGL_PBUFFER_BIT);
+    }
   }
 
   if (useGles) {
As we can see, previously it was always LOCAL_EGL_PBUFFER_BIT being appended. Now there's a switch it could potentially change. Here's what the debugger shows us:
Thread 8 &quot;GeckoWorkerThre&quot; hit Breakpoint 1, mozilla::gl::
    FillContextAttribs (out=0x7fdf293b78, useWindow=false, useGles=false, 
    es3=false)
    at ${PROJECT}/gecko-dev/gfx/gl/GLContextProviderEGL.cpp:1245
1245    ${PROJECT}/gecko-dev/gfx/gl/GLContextProviderEGL.cpp: No such file or 
    directory.
(gdb) p useWindow
$1 = false
(gdb) 
With useWindow set to false we'll still be appending LOCAL_EGL_PBUFFER_BIT, so this isn't a relevant change. It also appears to get called at initialisation, but not when the WebGL view is created. So this looks safe to leave as it is. We see something similar happening in CreateEGLPBufferOffscreenContextImpl():
@@ -1202,9 +1387,14 @@ RefPtr<GLContextEGL> GLContextEGL::
    CreateEGLPBufferOffscreenContextImpl(
   } else
 #endif
   {
-    surface = GLContextEGL::CreatePBufferSurfaceTryingPowerOfTwo(
-        *egl, config, LOCAL_EGL_NONE, pbSize);
+    if (useWindow) {
+      surface = CreateEmulatorBufferSurface(egl->mLib, config, pbSize);
+    } else {
+      surface = GLContextEGL::CreatePBufferSurfaceTryingPowerOfTwo(
+          *egl, config, LOCAL_EGL_NONE, pbSize);
+    }
   }
Again, what we want is for useWindow to be set to false and debugging the app shows this to be the case. We have to work a little harder for it this time though, because the variable is set based on conditions at the start of the method and the condition occurs further down. I would have place a breakpoint on the exact line but our partial build shenanigans has broken line-by-line debugging, so I've placed breakpoints on each of the called methods instead:
(gdb) break CreateEmulatorBufferSurface
Breakpoint 6 at 0x7ff28c1538: file ${PROJECT}/gecko-dev/gfx/gl/
    GLContextProviderEGL.cpp, line 982.
(gdb) break GLContextEGL::CreatePBufferSurfaceTryingPowerOfTwo
Breakpoint 7 at 0x7ff28ae934: file ${PROJECT}/gecko-dev/gfx/gl/
    GLContextProviderEGL.cpp, line 911.
(gdb) c
Continuing.

Thread 8 &quot;GeckoWorkerThre&quot; hit Breakpoint 7, mozilla::gl::
    GLContextEGL::CreatePBufferSurfaceTryingPowerOfTwo (egl=..., 
    config=config@entry=0x5555ab3950, 
    bindToTextureFormat=bindToTextureFormat@entry=12344, pbsize=...)
    at ${PROJECT}/gecko-dev/gfx/gl/GLContextProviderEGL.cpp:911
911     in ${PROJECT}/gecko-dev/gfx/gl/GLContextProviderEGL.cpp
(gdb) 
I'm going to try to tackle GLLibraryEGL::Init() now. There seems to be one significant chunk of code removed from this method. Adding it back in would be straightforward were it not for the fact the display value is no longer passed in to the method. Adding it back in required a cascade of changes, but in essence the important change is the following:
-bool GLLibraryEGL::Init(bool forceAccel, nsACString* const out_failureId) {
+bool GLLibraryEGL::Init(bool forceAccel, nsACString* const out_failureId, 
    EGLDisplay aDisplay) {
   MOZ_RELEASE_ASSERT(!mSymbols.fTerminate);
 
   mozilla::ScopedGfxFeatureReporter reporter(&quot;EGL&quot;);
@@ -501,6 +501,11 @@ bool GLLibraryEGL::Init(bool forceAccel, nsACString* const 
    out_failureId) {
   }
 
   // -
+  std::shared_ptr<EglDisplay> defaultDisplay = CreateDisplay(forceAccel, 
    out_failureId, aDisplay);
+  if (!defaultDisplay) {
+    return false;
+  }
+  mDefaultDisplay = defaultDisplay;
 
   InitLibExtensions(); 
Now when I execute this code I get an almost immediate segfault:
Thread 38 &quot;Compositor&quot; received signal SIGSEGV, Segmentation fault.
[Switching to LWP 32407]
0x0000007fe7e324cc in wl_proxy_marshal_constructor () from /usr/lib64/
    libwayland-client.so.0
(gdb) bt
#0  0x0000007fe7e324cc in wl_proxy_marshal_constructor () from /usr/lib64/
    libwayland-client.so.0
#1  0x0000007fe7b8242c in ServerWaylandBuffer::ServerWaylandBuffer(unsigned 
    int, unsigned int, int, int, android_wlegl*, wl_event_queue*) ()
   from /usr/lib64/libhybris//eglplatform_wayland.so
#2  0x0000007fe7b824c8 in WaylandNativeWindow::addBuffer() () from /usr/lib64/
    libhybris//eglplatform_wayland.so
#3  0x0000007fe7b81728 in WaylandNativeWindow::dequeueBuffer(
    BaseNativeWindowBuffer**, int*) () from /usr/lib64/libhybris//
    eglplatform_wayland.so
#4  0x0000007fe7b48124 in BaseNativeWindow::_dequeueBuffer(ANativeWindow*, 
    ANativeWindowBuffer**, int*) () from /usr/lib64/
    libhybris-platformcommon.so.1
#5  0x0000007fe4f69188 in ?? ()
#6  0x0000000000000438 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) 
It's not clear to me why the change I made has this effect, given all of this seems to be happening inside the Wayland code. However, by stepping through the code, even without the source lines, I'm been able to establish that the Init() method is called, as is the new code I added in. This new code executes without error, so there's nothing in the new code that's directly triggering the segfault. But some consequence of the change certainly is.

I've continued stepping through the method, out of the method and up the stack, but so far without yet hitting the crash. I'm going to continue looking at this, but to do so will have to wait until tomorrow now. I'm sure I'll get to the bottom of it and that we're approaching a solution, even if it is now taking longer than I'd have preferred.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
16 Jun 2024 : Day 260 #
I'm getting well into the swing of things now, doing my best to get to the bottom of the WebGL issue. Yesterday I tried removing some changes I made in the last commit. Today I'll be placing breakpoints on methods to find out which get hit and which sail through without incident.

A key difference I'm interested in is whether the methods get hit while running the browser on a page that displays WebGL content. Most of the methods should already be used when running a WebView — that is, after all, their purpose — and in theory few of them should be used — by default — by the browser. That's because they were added specifically to be used for offscreen rendering, which is what the WebView is all about.

The exception is WebGL which also uses offscreen rending, so it's the methods that are used on both the WebView and the browser that are of interest to us today.

I've been testing a few out. For example the following SharedSurfaceTextureClient constructor is used by the WebView, but not by the Sailfish Browser. Here it is being hit when executing the WebView:
Thread 37 &quot;Compositor&quot; hit Breakpoint 1, mozilla::layers::
    SharedSurfaceTextureClient::SharedSurfaceTextureClient (this=0x7ee01ab110, 
    aData=0x7ee01ab060, 
    aFlags=34, aAllocator=0x0)
    at ${PROJECT}/gecko-dev/gfx/layers/client/TextureClientSharedSurface.cpp:105
105     ${PROJECT}/gecko-dev/gfx/layers/client/TextureClientSharedSurface.cpp: 
    No such file or directory.
(gdb) 
This surprised me a little because I thought it'd be used for offscreen rendering more generally. But apparently not. I can conclude that this method is unlikely to be the source of the WebGL problems.

In contrast, I found that the following method is neither used by the WebView nor the browser:
(gdb) info break    
Num     Type           Disp Enb Address            What
2       breakpoint     keep y   0x0000007ff28c14fc in mozilla::gl::
    CreateEmulatorBufferSurface(mozilla::gl::GLLibraryEGL*, void*, mozilla::gfx:
    :IntSizeTyped<mozilla::gfx::UnknownUnits>&) at ${PROJECT}/gecko-dev/gfx/gl/
    GLContextProviderEGL.cpp:982
This is also surprising. I thought maybe that I should therefore think about getting rid of this method entirely, but then I realised there's a reason for this method existing: it's intended for use with native rendering and for the emulator. These are both cases I've not yet tested, but they're still important for Sailfish OS. I added these to try to map ESR 78 changes and while I've not tested them I cam only hope they work, which is still preferable to not including them at all.

The following is the CreateEGLPBufferOffscreenContextImpl() method used by the WebView, but which appears not to be used by the browser:
Thread 38 &quot;Compositor&quot; hit Breakpoint 2, mozilla::gl::GLContextEGL::
    CreateEGLPBufferOffscreenContextImpl (
    egl=std::shared_ptr<mozilla::gl::EglDisplay> (use count 3, weak count 2) = 
    {...}, desc=..., size=..., useGles=useGles@entry=false, 
    out_failureId=out_failureId@entry=0x7f1f94c1c8)
    at ${PROJECT}/gecko-dev/gfx/gl/GLContextProviderEGL.cpp:1359
1359    ${PROJECT}/gecko-dev/gfx/gl/GLContextProviderEGL.cpp: No such file or 
    directory.
(gdb) bt
#0  mozilla::gl::GLContextEGL::CreateEGLPBufferOffscreenContextImpl (egl=std::
    shared_ptr<mozilla::gl::EglDisplay> (use count 3, weak count 2) = {...}, 
    desc=..., size=..., useGles=useGles@entry=false, 
    out_failureId=out_failureId@entry=0x7f1f94c1c8)
    at ${PROJECT}/gecko-dev/gfx/gl/GLContextProviderEGL.cpp:1359
#1  0x0000007ff112f9f8 in mozilla::gl::GLContextEGL::
    CreateEGLPBufferOffscreenContext (
    display=std::shared_ptr<mozilla::gl::EglDisplay> (use count 3, weak count 
    2) = {...}, desc=..., size=..., 
    out_failureId=out_failureId@entry=0x7f1f94c1c8)
    at include/c++/8.3.0/ext/atomicity.h:96
#2  0x0000007ff112fbcc in mozilla::gl::GLContextProviderEGL::CreateHeadless (
    desc=..., out_failureId=out_failureId@entry=0x7f1f94c1c8)
    at include/c++/8.3.0/ext/atomicity.h:96
#3  0x0000007ff1130454 in mozilla::gl::GLContextProviderEGL::CreateOffscreen (
    size=..., 
    flags=flags@entry=mozilla::gl::CreateContextFlags::REQUIRE_COMPAT_PROFILE, 
    out_failureId=out_failureId@entry=0x7f1f94c1c8)
    at ${PROJECT}/gecko-dev/gfx/gl/GLContextProviderEGL.cpp:1456
#4  0x0000007ff1198048 in mozilla::layers::CompositorOGL::CreateContext (
    this=this@entry=0x7ed4002f50)
    at ${PROJECT}/gecko-dev/gfx/layers/opengl/CompositorOGL.cpp:250
#5  0x0000007ff11ad820 in mozilla::layers::CompositorOGL::Initialize (
    this=0x7ed4002f50, out_failureReason=0x7f1f94c520)
    at ${PROJECT}/gecko-dev/gfx/layers/opengl/CompositorOGL.cpp:387
#6  0x0000007ff12c359c in mozilla::layers::CompositorBridgeParent::
    NewCompositor (this=this@entry=0x7fc4b671f0, aBackendHints=...)
    at ${PROJECT}/gecko-dev/gfx/layers/ipc/CompositorBridgeParent.cpp:1493
#7  0x0000007ff12ce618 in mozilla::layers::CompositorBridgeParent::
    InitializeLayerManager (this=this@entry=0x7fc4b671f0, aBackendHints=...)
    at ${PROJECT}/gecko-dev/gfx/layers/ipc/CompositorBridgeParent.cpp:1436
#8  0x0000007ff12ce748 in mozilla::layers::CompositorBridgeParent::
    AllocPLayerTransactionParent (this=this@entry=0x7fc4b671f0, 
    aBackendHints=..., aId=...)
    at ${PROJECT}/gecko-dev/gfx/layers/ipc/CompositorBridgeParent.cpp:1546
#9  0x0000007ff3665368 in mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    AllocPLayerTransactionParent (this=0x7fc4b671f0, aBackendHints=..., 
    aId=...)
    at ${PROJECT}/gecko-dev/mobile/sailfishos/embedthread/
    EmbedLiteCompositorBridgeParent.cpp:80
#10 0x0000007ff0c5f2f0 in mozilla::layers::PCompositorBridgeParent::
    OnMessageReceived (this=0x7fc4b671f0, msg__=...) at 
    PCompositorBridgeParent.cpp:1285
[...]
#25 0x0000007ff6a0289c in ?? () from /lib64/libc.so.6
(gdb) 
As you can see I've captured a backtrace; I'm honestly not expecting it to be useful but I'd rather keep it just in case.

I've checked a few methods now but this is beginning to feel like a rather hit and miss approach: debugging as Brownian motion. I'm not averse to a bit of Brownian debugging, but I prefer something more structured if there's an alternative available. To this end, rather than testing methods individually I've now placed all of the following breakpoints on the executing browser to find out which ones hit:
1   mozilla::gl::GLScreenBuffer::Create(mozilla::gl::GLContext*, mozilla::gfx::
    IntSizeTyped<mozilla::gfx::UnknownUnits> const&)
    at ${PROJECT}/gecko-dev/gfx/gl/GLScreenBuffer.cpp:171
2   mozilla::gl::GLScreenBuffer::GLScreenBuffer(mozilla::gl::GLContext*,
    mozilla::UniquePtr<mozilla::gl::SurfaceFactory, mozilla::DefaultDelete
    <mozilla::gl::SurfaceFactory> >) 
    at ${PROJECT}/gecko-dev/gfx/gl/GLScreenBuffer.cpp:183
3   mozilla::gl::GLScreenBuffer::Swap(mozilla::gfx::IntSizeTyped<mozilla::
    gfx::UnknownUnits> const&)
    at ${PROJECT}/gecko-dev/gfx/gl/GLScreenBuffer.cpp:308
4   mozilla::gl::GLScreenBuffer::Resize(mozilla::gfx::IntSizeTyped<mozilla::
    gfx::UnknownUnits> const&)
    at ${PROJECT}/gecko-dev/gfx/gl/GLScreenBuffer.cpp:339
5   mozilla::gl::ReadBuffer::Attach(mozilla::gl::SharedSurface*)
    at ${PROJECT}/gecko-dev/gfx/gl/GLScreenBuffer.cpp:408
6   mozilla::gl::TileGenFunc
    at ${PROJECT}/gecko-dev/gfx/gl/GLTextureImage.cpp:352
7   mozilla::gl::SurfaceFactory_Basic::SurfaceFactory_Basic(mozilla::gl::
    GLContext&)
    at ${PROJECT}/gecko-dev/gfx/gl/SharedSurfaceGL.cpp:43
7.2 mozilla::gl::SurfaceFactory_Basic::SurfaceFactory_Basic(mozilla::gl::
    GLContext*, mozilla::layers::TextureFlags const&)
    at ${PROJECT}/gecko-dev/gfx/gl/SharedSurfaceGL.cpp:47
8   mozilla::gl::SharedSurface_Basic::Create(mozilla::gl::GLContext*,
    mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> const&)
    at ${PROJECT}/gecko-dev/gfx/gl/SharedSurfaceGL.cpp:56
9   GLFormatForImage
10  GLTypeForImage
11  TextureImageEGL::TextureImageEGL
12  TextureImageEGL::BindTexture
13  TextureImageEGL::BindTexImage
14  mozilla::gl::CreateTextureImageEGL(mozilla::gl::GLContext*, mozilla::gfx::
    IntSizeTyped<mozilla::gfx::UnknownUnits> const&, gfxContentType, unsigned 
    int, mozilla::gl::TextureImage::Flags, mozilla::gfx::SurfaceFormat)
    at ${PROJECT}/gecko-dev/gfx/gl/TextureImageEGL.cpp:185
15  mozilla::layers::SharedSurfaceTextureClient::Create(mozilla::UniquePtr
    <mozilla::gl::SharedSurface, mozilla::DefaultDelete<mozilla::gl::
    SharedSurface> >, mozilla::gl::SurfaceFactory*, mozilla::layers::
    LayersIPCChannel*, mozilla::layers::TextureFlags)
    at ${PROJECT}/gecko-dev/gfx/layers/client/TextureClientSharedSurface.cpp:114
16  mozilla::layers::CompositorOGL::CreateContext()
    at ${PROJECT}/gecko-dev/gfx/layers/opengl/CompositorOGL.cpp:227
17  mozilla::layers::CompositorOGL::ClearRect(mozilla::gfx::RectTyped<mozilla::
    gfx::UnknownUnits, float> const&)
    at ${PROJECT}/gecko-dev/gfx/layers/opengl/CompositorOGL.cpp:766
I get some results back. Each time something hits I've recorded a backtrace, since these are the really crucial methods I need to look into. I won't have time to look in to them today, but I can at least take a record of their details. First up is CompositorOGL::CreateContext(). This is a totally new method added as part of my changes, so a good candidate for where the issue might live:
Thread 37 &quot;Compositor&quot; hit Breakpoint 16, mozilla::layers::
    CompositorOGL::CreateContext (this=this@entry=0x7ed8002f10)
    at ${PROJECT}/gecko-dev/gfx/layers/opengl/CompositorOGL.cpp:227
227     ${PROJECT}/gecko-dev/gfx/layers/opengl/CompositorOGL.cpp: No such file 
    or directory.
(gdb) bt
#0  mozilla::layers::CompositorOGL::CreateContext (this=this@entry=0x7ed8002f10)
    at ${PROJECT}/gecko-dev/gfx/layers/opengl/CompositorOGL.cpp:227
#1  0x0000007ff2950820 in mozilla::layers::CompositorOGL::Initialize (
    this=0x7ed8002f10, out_failureReason=0x7f2e4ed510)
    at ${PROJECT}/gecko-dev/gfx/layers/opengl/CompositorOGL.cpp:387
#2  0x0000007ff2a6659c in mozilla::layers::CompositorBridgeParent::
    NewCompositor (this=this@entry=0x7fc8a7e820, aBackendHints=...)
    at ${PROJECT}/gecko-dev/gfx/layers/ipc/CompositorBridgeParent.cpp:1493
#3  0x0000007ff2a71618 in mozilla::layers::CompositorBridgeParent::
    InitializeLayerManager (this=this@entry=0x7fc8a7e820, aBackendHints=...)
    at ${PROJECT}/gecko-dev/gfx/layers/ipc/CompositorBridgeParent.cpp:1436
#4  0x0000007ff2a71748 in mozilla::layers::CompositorBridgeParent::
    AllocPLayerTransactionParent (this=this@entry=0x7fc8a7e820, 
    aBackendHints=..., aId=...)
    at ${PROJECT}/gecko-dev/gfx/layers/ipc/CompositorBridgeParent.cpp:1546
#5  0x0000007ff4e08368 in mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    AllocPLayerTransactionParent (this=0x7fc8a7e820, aBackendHints=..., 
    aId=...)
    at ${PROJECT}/gecko-dev/mobile/sailfishos/embedthread/
    EmbedLiteCompositorBridgeParent.cpp:80
#6  0x0000007ff24022f0 in mozilla::layers::PCompositorBridgeParent::
    OnMessageReceived (this=0x7fc8a7e820, msg__=...) at 
    PCompositorBridgeParent.cpp:1285
[...]
#21 0x0000007fefbac89c in ?? () from /lib64/libc.so.6
(gdb)
Next we have the SurfaceFactory_Basic constructor. This isn't new but has an updated signature. This seems less likely to be the cause, but worth checking just in case.
Thread 8 &quot;GeckoWorkerThre&quot; hit Breakpoint 7, mozilla::gl::
    SurfaceFactory_Basic::SurfaceFactory_Basic (this=0x7e9c005810, gl=...)
    at ${PROJECT}/gecko-dev/gfx/gl/SharedSurfaceGL.cpp:43
43      ${PROJECT}/gecko-dev/gfx/gl/SharedSurfaceGL.cpp: No such file or 
    directory.
(gdb) bt
#0  mozilla::gl::SurfaceFactory_Basic::SurfaceFactory_Basic (this=0x7e9c005810, 
    gl=...)
    at ${PROJECT}/gecko-dev/gfx/gl/SharedSurfaceGL.cpp:43
#1  0x0000007ff369ce20 in mozilla::MakeUnique<mozilla::gl::
    SurfaceFactory_Basic, mozilla::gl::GLContext&> ()
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/cxxalloc.h:33
#2  mozilla::WebGLContext::Present (this=this@entry=0x7fc961d590, 
    xrFb=<optimized out>, 
    consumerType=consumerType@entry=mozilla::layers::TextureType::Unknown, 
    webvr=webvr@entry=false)
    at ${PROJECT}/gecko-dev/dom/canvas/WebGLContext.cpp:929
#3  0x0000007ff3664e68 in mozilla::HostWebGLContext::Present (webvr=false, 
    t=mozilla::layers::TextureType::Unknown, xrFb=<optimized out>, 
    this=<optimized out>) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/
    mozilla/RefPtr.h:280
#4  mozilla::ClientWebGLContext::Run<void (mozilla::HostWebGLContext::*)(
    unsigned long, mozilla::layers::TextureType, bool) const, &(mozilla::
    HostWebGLContext::Present(unsigned long, mozilla::layers::TextureType, 
    bool) const), unsigned long, mozilla::layers::TextureType const&, bool 
    const&> (
    this=<optimized out>, args#0=@0x7fdf2926c0: 0, args#1=@0x7fdf2926bf: 
    mozilla::layers::TextureType::Unknown, args#2=@0x7fdf2926be: false)
    at ${PROJECT}/gecko-dev/dom/canvas/ClientWebGLContext.cpp:313
#5  0x0000007ff3664fd0 in mozilla::ClientWebGLContext::Present (
    this=this@entry=0x7fc9611ef0, xrFb=xrFb@entry=0x0, type=<optimized out>, 
    webvr=<optimized out>, webvr@entry=false)
    at ${PROJECT}/gecko-dev/dom/canvas/ClientWebGLContext.cpp:363
#6  0x0000007ff36907e0 in mozilla::ClientWebGLContext::OnBeforePaintTransaction 
    (this=0x7fc9611ef0)
    at ${PROJECT}/gecko-dev/dom/canvas/ClientWebGLContext.cpp:345
#7  0x0000007ff28ffc7c in mozilla::layers::CanvasRenderer::
    FirePreTransactionCallback (this=this@entry=0x7fc989b9a0)
    at ${PROJECT}/gecko-dev/gfx/layers/CanvasRenderer.cpp:75
[...]
#54 0x0000007fefbac89c in ?? () from /lib64/libc.so.6
(gdb) 
Similarly for SharedSurface_Basic::Create(). The sequencing here makes perfect sense: first you'd want to create a factory, next up you'd want to use it to create an instance of the object it's designed to create, which is exactly what we've just seen above. Here we have a SurfaceFactory_Basic that's generating a SharedSurface_Basic object:
Thread 8 &quot;GeckoWorkerThre&quot; hit Breakpoint 8, mozilla::gl::
    SharedSurface_Basic::Create (gl=0x7fc985a8e0, size=...)
    at ${PROJECT}/gecko-dev/gfx/gl/SharedSurfaceGL.cpp:56
56      in ${PROJECT}/gecko-dev/gfx/gl/SharedSurfaceGL.cpp
(gdb) bt
#0  mozilla::gl::SharedSurface_Basic::Create (gl=0x7fc985a8e0, size=...)
    at ${PROJECT}/gecko-dev/gfx/gl/SharedSurfaceGL.cpp:56
#1  0x0000007ff28a6578 in mozilla::gl::SurfaceFactory_Basic::CreateSharedImpl (
    this=<optimized out>, desc=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/WeakPtr.h:185
#2  0x0000007ff28a6488 in mozilla::gl::SurfaceFactory::CreateShared (
    this=0x7e9c005810, size=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefCounted.h:240
#3  0x0000007ff28a9504 in mozilla::gl::SwapChain::Acquire (
    this=this@entry=0x7fc961da28, size=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:290
#4  0x0000007ff369ca3c in mozilla::WebGLContext::PresentInto (
    this=this@entry=0x7fc961d590, swapChain=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:290
#5  0x0000007ff369ce6c in mozilla::WebGLContext::Present (
    this=this@entry=0x7fc961d590, xrFb=<optimized out>, 
    consumerType=consumerType@entry=mozilla::layers::TextureType::Unknown, 
    webvr=webvr@entry=false)
    at ${PROJECT}/gecko-dev/dom/canvas/WebGLContext.cpp:936
#6  0x0000007ff3664e68 in mozilla::HostWebGLContext::Present (webvr=false, 
    t=mozilla::layers::TextureType::Unknown, xrFb=<optimized out>, 
    this=<optimized out>) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/
    mozilla/RefPtr.h:280
#7  mozilla::ClientWebGLContext::Run<void (mozilla::HostWebGLContext::*)(
    unsigned long, mozilla::layers::TextureType, bool) const, &(mozilla::
    HostWebGLContext::Present(unsigned long, mozilla::layers::TextureType, 
    bool) const), unsigned long, mozilla::layers::TextureType const&, bool 
    const&> (
    this=<optimized out>, args#0=@0x7fdf2926c0: 0, args#1=@0x7fdf2926bf: 
    mozilla::layers::TextureType::Unknown, args#2=@0x7fdf2926be: false)
    at ${PROJECT}/gecko-dev/dom/canvas/ClientWebGLContext.cpp:313
#8  0x0000007ff3664fd0 in mozilla::ClientWebGLContext::Present (
    this=this@entry=0x7fc9611ef0, xrFb=xrFb@entry=0x0, type=<optimized out>, 
    webvr=<optimized out>, webvr@entry=false)
    at ${PROJECT}/gecko-dev/dom/canvas/ClientWebGLContext.cpp:363
#9  0x0000007ff36907e0 in mozilla::ClientWebGLContext::OnBeforePaintTransaction 
    (this=0x7fc9611ef0)
    at ${PROJECT}/gecko-dev/dom/canvas/ClientWebGLContext.cpp:345
[...]
#57 0x0000007fefbac89c in ?? () from /lib64/libc.so.6
(gdb) 
Next we have a call to CreateTextureImageEGL(). It's not clear to me how this relates to the earlier calls; I think this may be one to look in to in a bit more depth:
Thread 37 &quot;Compositor&quot; hit Breakpoint 14, mozilla::gl::
    CreateTextureImageEGL (gl=gl@entry=0x7ed81a29e0, aSize=...,
    aContentType=aContentType@entry=gfxContentType::COLOR_ALPHA, 
    aWrapMode=aWrapMode@entry=33071,
    aFlags=aFlags@entry=mozilla::gl::TextureImage::OriginBottomLeft, 
    aImageFormat=aImageFormat@entry=mozilla::gfx::SurfaceFormat::B8G8R8A8)
    at ${PROJECT}/gecko-dev/gfx/gl/TextureImageEGL.cpp:185
185     ${PROJECT}/gecko-dev/gfx/gl/TextureImageEGL.cpp: No such file or 
    directory.
(gdb) bt
#0  mozilla::gl::CreateTextureImageEGL (gl=gl@entry=0x7ed81a29e0, aSize=..., 
    aContentType=aContentType@entry=gfxContentType::COLOR_ALPHA, 
    aWrapMode=aWrapMode@entry=33071, aFlags=aFlags@entry=mozilla::gl::
    TextureImage::OriginBottomLeft, 
    aImageFormat=aImageFormat@entry=mozilla::gfx::SurfaceFormat::B8G8R8A8)
    at ${PROJECT}/gecko-dev/gfx/gl/TextureImageEGL.cpp:185
#1  0x0000007ff28b9154 in mozilla::gl::CreateTextureImage (
    gl=gl@entry=0x7ed81a29e0, aSize=..., 
    aContentType=aContentType@entry=gfxContentType::COLOR_ALPHA, 
    aWrapMode=aWrapMode@entry=33071, 
    aFlags=aFlags@entry=mozilla::gl::TextureImage::OriginBottomLeft, 
    aImageFormat=<optimized out>)
    at ${PROJECT}/gecko-dev/gfx/gl/GLTextureImage.cpp:30
#2  0x0000007ff294e9d4 in mozilla::layers::TextureImageTextureSourceOGL::Update 
    (this=0x7ed81a2940, aSurface=0x7ed822aa70, aDestRegion=0x0, 
    aSrcOffset=0x0, aDstOffset=0x0) at ${PROJECT}/obj-build-mer-qt-xr/dist/
    include/gfx2DGlue.h:70
#3  0x0000007ff2a43bf4 in mozilla::layers::BufferTextureHost::Upload (
    this=this@entry=0x7ed81fca30, aRegion=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#4  0x0000007ff2a4422c in mozilla::layers::BufferTextureHost::MaybeUpload (
    this=this@entry=0x7ed81fca30, aRegion=<optimized out>)
    at ${PROJECT}/gecko-dev/gfx/layers/composite/TextureHost.cpp:1046
#5  0x0000007ff2a44554 in mozilla::layers::BufferTextureHost::UploadIfNeeded (
    this=this@entry=0x7ed81fca30)
    at ${PROJECT}/gecko-dev/gfx/layers/composite/TextureHost.cpp:1031
#6  0x0000007ff2a44570 in mozilla::layers::BufferTextureHost::Lock (
    this=0x7ed81fca30)
    at ${PROJECT}/gecko-dev/gfx/layers/composite/TextureHost.cpp:650
#7  0x0000007ff2a35ad8 in mozilla::layers::ImageHost::Lock (this=0x7ed81fbe20)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#8  0x0000007ff2a35f68 in mozilla::layers::AutoLockCompositableHost::
    AutoLockCompositableHost (aHost=0x7ed81fbe20, this=0x7f2e4ecca0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#9  mozilla::layers::ImageHost::Composite (this=this@entry=0x7ed81fbe20, 
    aCompositor=aCompositor@entry=0x7ed8002f10, 
    aLayer=aLayer@entry=0x7ed81db4d0, 
    aEffectChain=..., aOpacity=1, aTransform=..., aSamplingFilter=<optimized 
    out>, aClipRect=..., aVisibleRegion=aVisibleRegion@entry=0x0, aGeometry=...)
    at ${PROJECT}/gecko-dev/gfx/layers/composite/ImageHost.cpp:197
#10 0x0000007ff2a26a88 in mozilla::layers::CanvasLayerComposite::<lambda(
    mozilla::layers::EffectChain&, const IntRect&)>::operator() (clipRect=..., 
    effectChain=..., __closure=<synthetic pointer>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/MaybeStorageBase.h:50
#11 mozilla::layers::RenderWithAllMasks<mozilla::layers::CanvasLayerComposite::
    RenderLayer(const IntRect&, const mozilla::Maybe<mozilla::gfx::
    PolygonTyped<mozilla::gfx::UnknownUnits> >&)::<lambda(mozilla::layers::
    EffectChain&, const IntRect&)> >(mozilla::layers::Layer *, mozilla::layers::
    Compositor *, const mozilla::gfx::IntRect &, mozilla::layers::
    CanvasLayerComposite::<lambda(mozilla::layers::EffectChain&, const 
    IntRect&)>) (aLayer=aLayer@entry=
    0x7ed81db0c0, aCompositor=<optimized out>, aClipRect=..., 
    aRenderCallback=aRenderCallback@entry=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/layers/
    LayerManagerCompositeUtils.h:69
#12 0x0000007ff2a26ddc in mozilla::layers::CanvasLayerComposite::RenderLayer (
    this=0x7ed81db0c0, aClipRect=..., aGeometry=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:289
#13 0x0000007ff2a32cd4 in mozilla::layers::RenderLayers<mozilla::layers::
    ContainerLayerComposite> (aContainer=aContainer@entry=0x7ed81f1600, 
    aManager=aManager@entry=0x7ed81a44e0, aClipRect=..., aGeometry=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/Maybe.h:443
[...]
#41 0x0000007fefbac89c in ?? () from /lib64/libc.so.6
(gdb) 
Next up we have a call to TileGenFunc(). You may recall that we already fixed this to align its functionality with the code as it was before the last commit, so this one should already be safe. I'll still keep its backtrace here though, both for completeness and in case I missed something earlier:
Thread 37 &quot;Compositor&quot; hit Breakpoint 6, 0x0000007ff28b8b88 in 
    mozilla::gl::TileGenFunc (aImageFormat=<optimized out>, aFlags=<optimized 
    out>,
    aContentType=<optimized out>, aSize=..., gl=<optimized out>)
    at ${PROJECT}/gecko-dev/gfx/gl/GLTextureImage.cpp:352
352     ${PROJECT}/gecko-dev/gfx/gl/GLTextureImage.cpp: No such file or 
    directory.
(gdb) bt
#0  0x0000007ff28b8b88 in mozilla::gl::TileGenFunc (aImageFormat=<optimized 
    out>, aFlags=<optimized out>, aContentType=<optimized out>, aSize=..., 
    gl=<optimized out>) at ${PROJECT}/gecko-dev/gfx/gl/GLTextureImage.cpp:352
#1  mozilla::gl::TiledTextureImage::Resize (this=this@entry=0x7ed81b5b10, 
    aSize=...)
    at ${PROJECT}/gecko-dev/gfx/gl/GLTextureImage.cpp:402
#2  0x0000007ff28b8fd0 in mozilla::gl::TiledTextureImage::TiledTextureImage (
    this=0x7ed81b5b10, aGL=0x7ed81a29e0, aSize=..., 
    aContentType=<optimized out>, aFlags=<optimized out>, 
    aImageFormat=<optimized out>)
    at ${PROJECT}/gecko-dev/gfx/gl/GLTextureImage.cpp:224
#3  0x0000007ff28d4908 in mozilla::gl::CreateTextureImageEGL (
    gl=gl@entry=0x7ed81a29e0, aSize=..., 
    aContentType=aContentType@entry=gfxContentType::COLOR_ALPHA, 
    aWrapMode=aWrapMode@entry=33071, 
    aFlags=aFlags@entry=mozilla::gl::TextureImage::OriginBottomLeft, 
    aImageFormat=aImageFormat@entry=mozilla::gfx::SurfaceFormat::B8G8R8A8)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/cxxalloc.h:33
#4  0x0000007ff28b9154 in mozilla::gl::CreateTextureImage (
    gl=gl@entry=0x7ed81a29e0, aSize=..., 
    aContentType=aContentType@entry=gfxContentType::COLOR_ALPHA, 
    aWrapMode=aWrapMode@entry=33071, 
    aFlags=aFlags@entry=mozilla::gl::TextureImage::OriginBottomLeft, 
    aImageFormat=<optimized out>)
    at ${PROJECT}/gecko-dev/gfx/gl/GLTextureImage.cpp:30
#5  0x0000007ff294e9d4 in mozilla::layers::TextureImageTextureSourceOGL::Update 
    (this=0x7ed81a2940, aSurface=0x7ed822aa70, aDestRegion=0x0, 
    aSrcOffset=0x0, aDstOffset=0x0) at ${PROJECT}/obj-build-mer-qt-xr/dist/
    include/gfx2DGlue.h:70
#6  0x0000007ff2a43bf4 in mozilla::layers::BufferTextureHost::Upload (
    this=this@entry=0x7ed81fca30, aRegion=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#7  0x0000007ff2a4422c in mozilla::layers::BufferTextureHost::MaybeUpload (
    this=this@entry=0x7ed81fca30, aRegion=<optimized out>)
    at ${PROJECT}/gecko-dev/gfx/layers/composite/TextureHost.cpp:1046
#8  0x0000007ff2a44554 in mozilla::layers::BufferTextureHost::UploadIfNeeded (
    this=this@entry=0x7ed81fca30)
    at ${PROJECT}/gecko-dev/gfx/layers/composite/TextureHost.cpp:1031
#9  0x0000007ff2a44570 in mozilla::layers::BufferTextureHost::Lock (
    this=0x7ed81fca30)
    at ${PROJECT}/gecko-dev/gfx/layers/composite/TextureHost.cpp:650
#10 0x0000007ff2a35ad8 in mozilla::layers::ImageHost::Lock (this=0x7ed81fbe20)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#11 0x0000007ff2a35f68 in mozilla::layers::AutoLockCompositableHost::
    AutoLockCompositableHost (aHost=0x7ed81fbe20, this=0x7f2e4ecca0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#12 mozilla::layers::ImageHost::Composite (this=this@entry=0x7ed81fbe20, 
    aCompositor=aCompositor@entry=0x7ed8002f10, 
    aLayer=aLayer@entry=0x7ed81db4d0, 
    aEffectChain=..., aOpacity=1, aTransform=..., aSamplingFilter=<optimized 
    out>, aClipRect=..., aVisibleRegion=aVisibleRegion@entry=0x0, aGeometry=...)
    at ${PROJECT}/gecko-dev/gfx/layers/composite/ImageHost.cpp:197
#13 0x0000007ff2a26a88 in mozilla::layers::CanvasLayerComposite::<lambda(
    mozilla::layers::EffectChain&, const IntRect&)>::operator() (clipRect=..., 
    effectChain=..., __closure=<synthetic pointer>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/MaybeStorageBase.h:50
#14 mozilla::layers::RenderWithAllMasks<mozilla::layers::CanvasLayerComposite::
    RenderLayer(const IntRect&, const mozilla::Maybe<mozilla::gfx::
    PolygonTyped<mozilla::gfx::UnknownUnits> >&)::<lambda(mozilla::layers::
    EffectChain&, const IntRect&)> >(mozilla::layers::Layer *, mozilla::layers::
    Compositor *, const mozilla::gfx::IntRect &, mozilla::layers::
    CanvasLayerComposite::<lambda(mozilla::layers::EffectChain&, const 
    IntRect&)>) (
    aLayer=aLayer@entry=0x7ed81db0c0, aCompositor=<optimized out>, 
    aClipRect=..., aRenderCallback=aRenderCallback@entry=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/layers/
    LayerManagerCompositeUtils.h:69
[...]
#44 0x0000007fefbac89c in ?? () from /lib64/libc.so.6
(gdb) 
Phew: that's a lot to check. Plenty to look in to, but as I mentioned at the outset, this is just the preliminary info-capturing work. I'll do my best to use this information tomorrow when I plan to investigate it all in much more depth. That's it for now though. More tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
16 Jun 2024 : Day 259 #
I've spent this last week participating in a HackWeek. Organised at my place of work for the Research Engineering Group I'm part of, it allowed us to work with a different team on different projects and with different technologies than we usually would do.

My team developed a "Plantcraft" simulation, built on a three-dimensional grid where each cell satisfies a small set of physical rules approximating those of reality. A cell can be one of Air, Soil, Rock or Plant, each with a state that determines its water content, energy level, colour and memory. Despite the naming the four types of cell are actually identical, save for the state values and the fact that Plant cells execute a state-machine programme which determines their behaviour.

If you're interested all the documentation and executable code is available in the project's git repository. Please don't judge the code too harshly: it was all written under tight time constraints!
 
A 3D grid showing some land covered in soil, all in the form of different coloured blocks, along with some plants growing in the soil

While it was huge fun to work with such a great team during the week, it's also nice to now be getting back to gecko once again. Time to get things moving and maybe apply some of the pressure of a time-constrained project to getting the gecko changes over the line too.

Let's recap where we were at last weekend before I paused during the week. I'll be continuing to work on getting WebGL rendering once again. The WebGL code was working nicely at one point, but since it makes use of the same offscreen rendering pipeline as the WebView, the changes I made to get the latter working seem to have broken the former.

I've already established and tweaked some of the relevant changes, namely that TileGenFunc() now executes CreateBasicTextureImage() in all circumstances and GLContext::ResizeScreenBuffer() now acts on SwapChain rather than GLScreenBuffer. Here's the full diff showing the changes I've made up to now to reverse these:
$ git diff
diff --git a/gfx/gl/GLContext.cpp b/gfx/gl/GLContext.cpp
index 1177768bb92e..aac6912bb914 100644
--- a/gfx/gl/GLContext.cpp
+++ b/gfx/gl/GLContext.cpp
@@ -1875,7 +1875,8 @@ void GLContext::MarkDestroyed() {
 
   // Null these before they're naturally nulled after dtor, as we want 
    GLContext
   // to still be alive in *their* dtors.
-  mScreen = nullptr;
+  //mScreen = nullptr;
+  mSwapChain = nullptr;
   mBlitHelper = nullptr;
   mReadTexImageHelper = nullptr;
 
@@ -1886,7 +1887,7 @@ void GLContext::MarkDestroyed() {
 bool GLContext::ResizeScreenBuffer(const gfx::IntSize& size) {
   if (!IsOffscreenSizeAllowed(size)) return false;
 
-  return mScreen->Resize(size);
+  return mSwapChain->Resize(size);
 }
 // -
 
diff --git a/gfx/gl/GLTextureImage.cpp b/gfx/gl/GLTextureImage.cpp
index c2def2dedb18..8152128bdc9c 100644
--- a/gfx/gl/GLTextureImage.cpp
+++ b/gfx/gl/GLTextureImage.cpp
@@ -47,6 +47,9 @@ already_AddRefed<TextureImage> CreateTextureImage(
 static already_AddRefed<TextureImage> TileGenFunc(
     GLContext* gl, const IntSize& aSize, TextureImage::ContentType 
    aContentType,
     TextureImage::Flags aFlags, TextureImage::ImageFormat aImageFormat) {
+  return CreateBasicTextureImage(gl, aSize, aContentType,
+                                 LOCAL_GL_CLAMP_TO_EDGE, aFlags);
+
   switch (gl->GetContextType()) {
     case GLContextType::EGL:
       return TileGenFuncEGL(gl, aSize, aContentType, aFlags, aImageFormat);
As I left things last weekend these changes were triggering a segfault. My task for today is to check the backtrace of the crash. It's bound to reveal something useful...

But oddly it doesn't. Or it might have were it not for the fact there's no crash after all. So since there's no backtrace I've had to go with a different approach. Instead I've been through the diff of the previous commit again to see whether it reveals any further gentle differences I can try to reverse. Ones that are unlikely to cause damage while at the same time might help resolve the WebGL issue.

One such change can be found in the SurfaceFactory constructor. This accepts an allocator and a flags parameter, neither of which appear to be used. So I've removed them to see what happens, setting the allocator to be nullptr where it's needed later instead.

Here's the diff of the changes I made:
diff --git a/gfx/gl/SharedSurface.cpp b/gfx/gl/SharedSurface.cpp
index 687d18b95893..1d911b84379a 100644
--- a/gfx/gl/SharedSurface.cpp
+++ b/gfx/gl/SharedSurface.cpp
[...]
@@ -149,10 +164,105 @@ UniquePtr<SurfaceFactory> SurfaceFactory::Create(
   return nullptr;
 }
 
-SurfaceFactory::SurfaceFactory(const PartialSharedSurfaceDesc& partialDesc)
-    : mDesc(partialDesc), mMutex(&quot;SurfaceFactor::mMutex&quot;) {}
+SurfaceFactory::SurfaceFactory(const PartialSharedSurfaceDesc& partialDesc,
+                               const RefPtr<layers::LayersIPCChannel>& 
    allocator,
+                               const layers::TextureFlags& flags)
+    : mDesc(partialDesc),
+      mAllocator(allocator),
+      mFlags(flags),
+      mMutex(&quot;SurfaceFactor::mMutex&quot;)
+{
+}
[...]
Changing, building, installing and testing this doesn't result in any change. The browser and WebView both work as before, but the WebGL functionality is still broken.

Given that I'm not getting a crash and that the various changes I've made today haven't had any apparent effect, tomorrow I'm going to go through the methods in the previous commit again, set breakpoints on them and see which are being used by the browser. Hopefully this will shed more light, while also giving me the opportunity to refresh my memory about the changes. A refresh is going to be helpful given I spent last week thinking about other things.

So, more on this tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
9 Jun 2024 : Day 258 #
As I mentioned a couple of days back, I'm taking part in a hackathon for my work during the next week, so I'm not planning to make any posts for the next five days. This coming Saturday I'll continue right back off where I leave off at the end of today though.

For today, I'm looking further into why WebGL might not be doing what it's supposed to be doing. So far I've found that there are two methods in my commit diff that get hit when executing the broken code. These are:
  1. SurfaceFactory::SurfaceFactory()
  2. TextureImageEGL::TextureImageEGL()

Looking at the code and observing the execution using the debugger I can see that the stack trace for the second of these includes TileGenFunc(), which calls TileGenFuncEGL() which then calls TextureImageEGL::TextureImageEGL(). And the flow is definitely being affected by what happens in TileGenFunc().

Here's the diff between the two versions:
 static already_AddRefed<TextureImage> TileGenFunc(
     GLContext* gl, const IntSize& aSize, TextureImage::ContentType 
    aContentType,
     TextureImage::Flags aFlags, TextureImage::ImageFormat aImageFormat) {
-  return CreateBasicTextureImage(gl, aSize, aContentType,
-                                 LOCAL_GL_CLAMP_TO_EDGE, aFlags);
+  switch (gl->GetContextType()) {
+    case GLContextType::EGL:
+      return TileGenFuncEGL(gl, aSize, aContentType, aFlags, aImageFormat);
+    default:
+      return CreateBasicTextureImage(gl, aSize, aContentType,
+                                     LOCAL_GL_CLAMP_TO_EDGE, aFlags);
+  }
 }
As we can see, the original version always calls CreateBasicTextureImage() in the original version, whereas in the new version there's a switch to contend with. That means that in the new version, rather than doing the same thing as the original it will instead on occasion call TileGenFuncEGL(). So this is clearly a candidate for where things are going wrong.

To see whether this is having an important effect I've amended the method so that it has the same approach as previously, by changing it to this:
$ git diff
diff --git a/gfx/gl/GLTextureImage.cpp b/gfx/gl/GLTextureImage.cpp
index c2def2dedb18..8152128bdc9c 100644
--- a/gfx/gl/GLTextureImage.cpp
+++ b/gfx/gl/GLTextureImage.cpp
@@ -47,6 +47,9 @@ already_AddRefed<TextureImage> CreateTextureImage(
 static already_AddRefed<TextureImage> TileGenFunc(
     GLContext* gl, const IntSize& aSize, TextureImage::ContentType 
    aContentType,
     TextureImage::Flags aFlags, TextureImage::ImageFormat aImageFormat) {
+  return CreateBasicTextureImage(gl, aSize, aContentType,
+                                 LOCAL_GL_CLAMP_TO_EDGE, aFlags);
+
   switch (gl->GetContextType()) {
     case GLContextType::EGL:
       return TileGenFuncEGL(gl, aSize, aContentType, aFlags, aImageFormat);
Now when this gets called, it will immediately call CreateBasicTextureImage() rather than going into the switch conditional. This isn't a long term solution, it's just a way for me to test things out.

Unfortunately though, rebuilding and executing this change gives me the same result as before, in that the WebGL is still not showing signs of life.

So it's back to the code again. There's also another important change in that in some cases in GLScreenBuffer I've switched use of mSwapChain for mScreen instead. The two have quite different characteristics, so I should try switching this back as well, for example like this:
 bool GLContext::ResizeScreenBuffer(const gfx::IntSize& size) {
   if (!IsOffscreenSizeAllowed(size)) return false;
 
-  return mScreen->Resize(size);
+  return mSwapChain->Resize(size);
 }
Now when I build and try this something different happens. Now the app crashes when it tries to render the WebGL. That's not a bad thing, because the debugger will tell me where the crash is taking place.

I'll need to investigate this further. Not today though as I'm out of time, and I won't be picking this up tomorrow either. Instead there will be the five-day pause I mentioned at the top of this post, but I'll be back to continue this where I've left it this coming Saturday.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
8 Jun 2024 : Day 257 #
It's an early start for me today as I'm travelling to London and back. But I had difficulty sleeping last night and am up even earlier than I usually would be, so I'm pleased to discover that the build I kicked off last night has already completed.

This means I now have two sets of RPM packages. One set that represents the last commit of ESR 91 when WebGL was working and a second set that adds a commit on top of this, but which breaks WebGL.

Here's a list of the packages, where the sailfishos.esr91 represents the most recent changes that caused the breakage, while the temp branch has these changes reverted.
$ ls webgl-broken/ webgl-working/
webgl-broken/:
xulrunner-qt5-91.9.1+git1+sailfishos.esr91.
  20240604225626.a84dc7d4765d+gecko.dev.7437a9d17284-1.aarch64.rpm
xulrunner-qt5-debuginfo-91.9.1+git1+sailfishos.esr91.
  20240604225626.a84dc7d4765d+gecko.dev.7437a9d17284-1.aarch64.rpm
xulrunner-qt5-debugsource-91.9.1+git1+sailfishos.esr91.
  20240604225626.a84dc7d4765d+gecko.dev.7437a9d17284-1.aarch64.rpm
xulrunner-qt5-devel-91.9.1+git1+sailfishos.esr91.
  20240604225626.a84dc7d4765d+gecko.dev.7437a9d17284-1.aarch64.rpm
xulrunner-qt5-misc-91.9.1+git1+sailfishos.esr91.
  20240604225626.a84dc7d4765d+gecko.dev.7437a9d17284-1.aarch64.rpm

webgl-working/:
xulrunner-qt5-91.9.1+git1+temp.
  20240212214917.9f64ce35a187-1.aarch64.rpm
xulrunner-qt5-debuginfo-91.9.1+git1+temp.
  20240212214917.9f64ce35a187-1.aarch64.rpm
xulrunner-qt5-debugsource-91.9.1+git1+temp.
  20240212214917.9f64ce35a187-1.aarch64.rpm
xulrunner-qt5-devel-91.9.1+git1+temp.
  20240212214917.9f64ce35a187-1.aarch64.rpm
xulrunner-qt5-misc-91.9.1+git1+temp.
  20240212214917.9f64ce35a187-1.aarch64.rpm
While the newly built RPMs transfer over to my phone, let me summarise what I'm expecting.

Previously the broken RPMs were crashing on a call to ToSurfaceDescriptor(). The reason for the crash is that I'd added an explicit request for the app to crash if this was ever called:
Maybe<layers::SurfaceDescriptor> SharedSurface_Basic::ToSurfaceDescriptor() {
  MOZ_CRASH(&quot;GFX: ToSurfaceDescriptor&quot;);
  return Nothing();
}
I added it for debugging purposes while working on the WebView changes. These latest packages have this MOZ_CRASH statement removed, so I'm no longer expecting a crash to happen here. However, I do expect it to crash nevertheless, just in some other location. Removing the MOZ_CRASH would be too simple a fix for it to actually work as a solution!

So I'm expecting to get a new backtrace from the crash. The question will be: what is this crash and how does it compare with the execution of the working version. As soon as I have this backtrace it will hopefully be clear the path the execution took to get there. Then I'll reinstall the working version and compare against the equivalent path there to establish what's changed.

This is the plan, at least.

They packages have copied over, so let's get to work.
$ sailfish-browser https://shadertoy.com
[...]
Created LOG for EmbedLiteLayerManager
JavaScript warning: https://www.shadertoy.com/, line 2388: WebGL warning: 
    drawArraysInstanced: Tex image TEXTURE_2D level 0 is incurring lazy 
    initialization.
[...]
Well, that's interesting. There is now no crash, so that failure was entirely self-induced. However the WebGL is broken. It's just displaying an empty canvas where the WebGL should be rendered. This makes things somewhat harder to debug, because now there's no obvious please to start from.

So my new plan is to debug the same piece of code that I debugged yesterday on the working version. Let's see if anything has changed.
(gdb) info break
Num     Type           Disp Enb Address            What
2       breakpoint     keep y   0x0000007ff29b0cfc in mozilla::layers::
    ShareableCanvasRenderer::UpdateCompositableClient() 
                                                   at gfx/layers/
    ShareableCanvasRenderer.cpp:191
        breakpoint already hit 1 time
(gdb) c
[...]

Thread 8 &quot;GeckoWorkerThre&quot; hit Breakpoint 2, mozilla::layers::
    ShareableCanvasRenderer::UpdateCompositableClient (this=0x7fc963c520)
    at gfx/layers/ShareableCanvasRenderer.cpp:192
192         FirePreTransactionCallback();
(gdb) n
195         auto tc = fnGetExistingTc();
(gdb) n
196         if (!tc) {
(gdb) p tc
$1 = {mRawPtr = 0x0}
(gdb) n
198           tc = fnMakeTcFromSnapshot();
(gdb) n
200         if (tc != mFrontBufferFromDesc) {
(gdb) p tc
$2 = {mRawPtr = 0x7fc8ceb370}
(gdb) p tc.mRawPtr
$3 = (mozilla::layers::TextureClient *) 0x7fc8ceb370
(gdb) 
This matches the flow in the working version, so it seems this isn't where the problem is. I'm going to have to look further afield.

To help with this search I've attached breakpoints to the majority of the new functions that have been added or seen significant changes in the latest commit. Here they all are (there are quite a few):
(gdb) break GLScreenBuffer::Create
Breakpoint 4 at 0x7ff28a7d94: file gfx/gl/GLScreenBuffer.cpp, line 171.
(gdb) break InitOffscreen
Breakpoint 5 at 0x7ff28d2500: file gfx/gl/GLContext.cpp, line 2345.
(gdb) break GLContext::CreateScreenBuffer
Breakpoint 6 at 0x7ff28d2428: file gfx/gl/GLContext.cpp, line 2073.
(gdb) b WaylandGLSurface::WaylandGLSurface
Breakpoint 7 at 0x7ff28c084c: file gfx/gl/GLContextProviderEGL.cpp, line 954.
(gdb) b GLContextProviderEGL::CreateOffscreen
Breakpoint 8 at 0x7ff28d2610: file gfx/gl/GLContextProviderEGL.cpp, line 1451.
(gdb) b ReadBuffer::Create
Breakpoint 9 at 0x7ff28a6ea8: file gfx/gl/GLScreenBuffer.cpp, line 358.
(gdb) b SurfaceFactory::SurfaceFactory
Breakpoint 10 at 0x7ff28acdc4: file gfx/gl/SharedSurface.cpp, line 167.
(gdb) b SharedSurface_EGLImage::SharedSurface_EGLImage
Breakpoint 11 at 0x7ff28d363c: file gfx/gl/SharedSurfaceEGL.cpp, line 95.
(gdb) b TextureImageEGL::TextureImageEGL
Breakpoint 12 at 0x7ff28d3e80: file gfx/gl/TextureImageEGL.cpp, line 46.
(gdb) r
[...]
If any of these breakpoints hit, that means they'd be good candidates for comparing against the working version. If they're new (rather just heavily amended) methods then that'll be even more relevant, because that'll indicate a wholesale change of flow. In that case I'll need to work backwards through the call stack to see where — and why — the divergence happened.

Contrariwise if they're not hit then they're not part of the execution flow and it should be safe for me to ignore them in my investigation.

When I now debug the program there are three breakpoints that hit; or rather two breakpoints are hit a total of three times:
Thread 8 &quot;GeckoWorkerThre&quot; hit Breakpoint 10, mozilla::gl::
    SurfaceFactory::SurfaceFactory (this=0x7fc95eb220, partialDesc=..., 
    allocator=..., 
    flags=@0x7fdf29256c: mozilla::layers::TextureFlags::NO_FLAGS)
    at gfx/gl/SharedSurface.cpp:167
167     SurfaceFactory::SurfaceFactory(const PartialSharedSurfaceDesc& 
    partialDesc,

Thread 37 &quot;Compositor&quot; hit Breakpoint 12, mozilla::gl::
    TextureImageEGL::TextureImageEGL (this=0x7ed81ab2d0, aTexture=20, 
    aSize=..., aWrapMode=33071, 
    aContentType=gfxContentType::COLOR_ALPHA, aContext=0x7ed81a2780, 
    aFlags=mozilla::gl::TextureImage::OriginBottomLeft, 
    aTextureState=mozilla::gl::TextureImage::Created, aImageFormat=mozilla::gfx:
    :SurfaceFormat::B8G8R8A8)
    at gfx/gl/TextureImageEGL.cpp:46
46      TextureImageEGL::TextureImageEGL(GLuint aTexture, const gfx::IntSize& 
    aSize,

Thread 37 &quot;Compositor&quot; hit Breakpoint 12, mozilla::gl::
    TextureImageEGL::TextureImageEGL (this=0x7ed825ed80, aTexture=21, 
    aSize=..., aWrapMode=33071, 
    aContentType=gfxContentType::COLOR_ALPHA, aContext=0x7ed81a2780, 
    aFlags=mozilla::gl::TextureImage::OriginBottomLeft, 
    aTextureState=mozilla::gl::TextureImage::Created, aImageFormat=mozilla::gfx:
    :SurfaceFormat::B8G8R8A8)
    at gfx/gl/TextureImageEGL.cpp:46
46      TextureImageEGL::TextureImageEGL(GLuint aTexture, const gfx::IntSize& 
    aSize,
Let's get some backtraces from those. These are really long backtraces and I do apologise for that. I want to keep copies here for future reference, but there's no need to look at them in any detail. Certainly not right now anyway. Here's the first one:
Thread 8 &quot;GeckoWorkerThre&quot; hit Breakpoint 10, mozilla::gl::
    SurfaceFactory::SurfaceFactory (this=0x7fc95e9c10, partialDesc=..., 
    allocator=...,
    flags=@0x7fdf29256c: mozilla::layers::TextureFlags::NO_FLAGS)
    at gfx/gl/SharedSurface.cpp:167
167     SurfaceFactory::SurfaceFactory(const PartialSharedSurfaceDesc& 
    partialDesc,
(gdb) bt
#0  mozilla::gl::SurfaceFactory::SurfaceFactory (this=0x7fc95e9c10, 
    partialDesc=..., allocator=...,
    flags=@0x7fdf29256c: mozilla::layers::TextureFlags::NO_FLAGS)
    at gfx/gl/SharedSurface.cpp:167
#1  0x0000007ff28d3950 in mozilla::gl::SurfaceFactory_Basic::
    SurfaceFactory_Basic (this=0x7fc95e9c10, gl=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:113
#2  0x0000007ff369d0d4 in mozilla::MakeUnique<mozilla::gl::
    SurfaceFactory_Basic, mozilla::gl::GLContext&> ()
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/cxxalloc.h:33
#3  mozilla::WebGLContext::Present (this=this@entry=0x7fc93ab2a0, 
    xrFb=<optimized out>,
    consumerType=consumerType@entry=mozilla::layers::TextureType::Unknown, 
    webvr=webvr@entry=false)
    at dom/canvas/WebGLContext.cpp:929  
#4  0x0000007ff366511c in mozilla::HostWebGLContext::Present (webvr=false, 
    t=mozilla::layers::TextureType::Unknown, xrFb=<optimized out>,
    this=<optimized out>) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/
    mozilla/RefPtr.h:280
#5  mozilla::ClientWebGLContext::Run<void (mozilla::HostWebGLContext::*)(
    unsigned long, mozilla::layers::TextureType, bool) const, &(mozilla::
    HostWebGLContext::Present(unsigned long, mozilla::layers::TextureType, 
    bool) const), unsigned long, mozilla::layers::TextureType const&, bool 
    const&> (
    this=<optimized out>, args#0=@0x7fdf2926c0: 0, args#1=@0x7fdf2926bf: 
    mozilla::layers::TextureType::Unknown, args#2=@0x7fdf2926be: false)
    at dom/canvas/ClientWebGLContext.cpp:313
#6  0x0000007ff3665284 in mozilla::ClientWebGLContext::Present (
    this=this@entry=0x7f28004210, xrFb=xrFb@entry=0x0, type=<optimized out>,
    webvr=<optimized out>, webvr@entry=false)
    at dom/canvas/ClientWebGLContext.cpp:363
#7  0x0000007ff3690a94 in mozilla::ClientWebGLContext::OnBeforePaintTransaction 
    (this=0x7f28004210)
    at dom/canvas/ClientWebGLContext.cpp:345
#8  0x0000007ff28fff30 in mozilla::layers::CanvasRenderer::
    FirePreTransactionCallback (this=this@entry=0x7fc93fb900)
    at gfx/layers/CanvasRenderer.cpp:75 
#9  0x0000007ff29b0d04 in mozilla::layers::ShareableCanvasRenderer::
    UpdateCompositableClient (this=0x7fc93fb900)
    at gfx/layers/ShareableCanvasRenderer.cpp:192
#10 0x0000007ff29f08a0 in mozilla::layers::ClientCanvasLayer::RenderLayer (
    this=0x7fc95fc380)
    at gfx/layers/client/ClientCanvasLayer.cpp:25
#11 0x0000007ff29ef9c0 in mozilla::layers::ClientLayer::RenderLayerWithReadback 
    (this=<optimized out>, aReadback=<optimized out>)
    at gfx/layers/client/ClientLayerManager.h:365
#12 0x0000007ff29ffd08 in mozilla::layers::ClientContainerLayer::RenderLayer (
    this=0x7fc92fc450)
    at gfx/layers/Layers.h:1051
#13 0x0000007ff29ef9c0 in mozilla::layers::ClientLayer::RenderLayerWithReadback 
    (this=<optimized out>, aReadback=<optimized out>)
    at gfx/layers/client/ClientLayerManager.h:365
#14 0x0000007ff29ffd08 in mozilla::layers::ClientContainerLayer::RenderLayer (
    this=0x7fc934a230)
    at gfx/layers/Layers.h:1051
#15 0x0000007ff29ef9c0 in mozilla::layers::ClientLayer::RenderLayerWithReadback 
    (this=<optimized out>, aReadback=<optimized out>)
    at gfx/layers/client/ClientLayerManager.h:365
#16 0x0000007ff29ffd08 in mozilla::layers::ClientContainerLayer::RenderLayer (
    this=0x7fc8d123e0)
    at gfx/layers/Layers.h:1051
#17 0x0000007ff2a069ec in mozilla::layers::ClientLayerManager::
    EndTransactionInternal (this=this@entry=0x7fc8a5ea90, 
    aCallback=aCallback@entry=
    0x7ff46a31ec <mozilla::FrameLayerBuilder::DrawPaintedLayer(mozilla::layers::
    PaintedLayer*, gfxContext*, mozilla::gfx::IntRegionTyped<mozilla::gfx::
    UnknownUnits> const&, mozilla::gfx::IntRegionTyped<mozilla::gfx::
    UnknownUnits> const&, mozilla::layers::DrawRegionClip, mozilla::gfx::
    IntRegionTyped<mozilla::gfx::UnknownUnits> const&, void*)>, 
    aCallbackData=aCallbackData@entry=0x7fdf293268)
    at gfx/layers/client/ClientLayerManager.cpp:341
#18 0x0000007ff2a118ec in mozilla::layers::ClientLayerManager::EndTransaction (
    this=0x7fc8a5ea90,
    aCallback=0x7ff46a31ec <mozilla::FrameLayerBuilder::DrawPaintedLayer(
    mozilla::layers::PaintedLayer*, gfxContext*, mozilla::gfx::
    IntRegionTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::
    IntRegionTyped<mozilla::gfx::UnknownUnits> const&, mozilla::layers::
    DrawRegionClip, mozilla::gfx::IntRegionTyped<mozilla::gfx::UnknownUnits> 
    const&, void*)>, aCallbackData=0x7fdf293268, aFlags=mozilla::layers::
    LayerManager::END_DEFAULT)
    at gfx/layers/client/ClientLayerManager.cpp:397
#19 0x0000007ff46a060c in nsDisplayList::PaintRoot (
    this=this@entry=0x7fdf295078, aBuilder=aBuilder@entry=0x7fdf293268, 
    aCtx=aCtx@entry=0x0,
    aFlags=aFlags@entry=13, aDisplayListBuildTime=...)
    at layout/painting/nsDisplayList.cpp:2622
#20 0x0000007ff442c968 in nsLayoutUtils::PaintFrame (
    aRenderingContext=aRenderingContext@entry=0x0, 
    aFrame=aFrame@entry=0x7fc9280d10, aDirtyRegion=...,
    aBackstop=aBackstop@entry=4294967295, 
    aBuilderMode=aBuilderMode@entry=nsDisplayListBuilderMode::Painting,
    aFlags=aFlags@entry=(nsLayoutUtils::PaintFrameFlags::WidgetLayers | 
    nsLayoutUtils::PaintFrameFlags::ExistingTransaction | nsLayoutUtils::
    PaintFrameFlags::NoComposite)) at ${PROJECT}/obj-build-mer-qt-xr/dist/
    include/mozilla/MaybeStorageBase.h:80
#21 0x0000007ff43b705c in mozilla::PresShell::Paint (
    this=this@entry=0x7fc921c9a0, aViewToPaint=aViewToPaint@entry=0x7fc8563cb0, 
    aDirtyRegion=...,
    aFlags=aFlags@entry=mozilla::PaintFlags::PaintLayers)
    at layout/base/PresShell.cpp:6400
#22 0x0000007ff41eef2c in nsViewManager::ProcessPendingUpdatesPaint (
    this=this@entry=0x7fc8563c70, aWidget=aWidget@entry=0x7fc90d0760)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/gfx/RectAbsolute.h:43
[...]
#55 0x0000007fefbab89c in ?? () from /lib64/libc.so.6
(gdb)
Here's the second one:
Thread 37 &quot;Compositor&quot; hit Breakpoint 12, mozilla::gl::
    TextureImageEGL::TextureImageEGL (this=0x7ee01faa90, aTexture=21, 
    aSize=..., aWrapMode=33071,
    aContentType=gfxContentType::COLOR_ALPHA, aContext=0x7ee01a28a0, 
    aFlags=mozilla::gl::TextureImage::OriginBottomLeft,
    aTextureState=mozilla::gl::TextureImage::Created, aImageFormat=mozilla::gfx:
    :SurfaceFormat::B8G8R8A8)
    at gfx/gl/TextureImageEGL.cpp:46
46      TextureImageEGL::TextureImageEGL(GLuint aTexture, const gfx::IntSize& 
    aSize,
(gdb) bt
#0  mozilla::gl::TextureImageEGL::TextureImageEGL (this=0x7ee01faa90, 
    aTexture=21, aSize=..., aWrapMode=33071, aContentType=gfxContentType::
    COLOR_ALPHA,
    aContext=0x7ee01a28a0, aFlags=mozilla::gl::TextureImage::OriginBottomLeft, 
    aTextureState=mozilla::gl::TextureImage::Created,
    aImageFormat=mozilla::gfx::SurfaceFormat::B8G8R8A8)
    at gfx/gl/TextureImageEGL.cpp:46
#1  0x0000007ff28d42e0 in mozilla::gl::TileGenFuncEGL (
    gl=gl@entry=0x7ee01a28a0, aSize=..., 
    aContentType=aContentType@entry=gfxContentType::COLOR_ALPHA,
    aFlags=aFlags@entry=mozilla::gl::TextureImage::OriginBottomLeft, 
    aImageFormat=aImageFormat@entry=mozilla::gfx::SurfaceFormat::B8G8R8A8)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/cxxalloc.h:33
#2  0x0000007ff28b7ec8 in mozilla::gl::TileGenFunc (aImageFormat=mozilla::gfx::
    SurfaceFormat::B8G8R8A8,
    aFlags=mozilla::gl::TextureImage::OriginBottomLeft, 
    aContentType=gfxContentType::COLOR_ALPHA, aSize=..., gl=0x7ee01a28a0)
    at gfx/gl/GLTextureImage.cpp:52
#3  mozilla::gl::TiledTextureImage::Resize (this=this@entry=0x7ee01d7660, 
    aSize=...)
    at gfx/gl/GLTextureImage.cpp:399
#4  0x0000007ff28b81cc in mozilla::gl::TiledTextureImage::TiledTextureImage (
    this=0x7ee01d7660, aGL=0x7ee01a28a0, aSize=...,
    aContentType=<optimized out>, aFlags=<optimized out>, 
    aImageFormat=<optimized out>)
    at gfx/gl/GLTextureImage.cpp:221
#5  0x0000007ff28d41f8 in mozilla::gl::CreateTextureImageEGL (
    gl=gl@entry=0x7ee01a28a0, aSize=...,
    aContentType=aContentType@entry=gfxContentType::COLOR_ALPHA, 
    aWrapMode=aWrapMode@entry=33071,
    aFlags=aFlags@entry=mozilla::gl::TextureImage::OriginBottomLeft, 
    aImageFormat=aImageFormat@entry=mozilla::gfx::SurfaceFormat::B8G8R8A8)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/cxxalloc.h:33
#6  0x0000007ff28b8350 in mozilla::gl::CreateTextureImage (
    gl=gl@entry=0x7ee01a28a0, aSize=...,
    aContentType=aContentType@entry=gfxContentType::COLOR_ALPHA, 
    aWrapMode=aWrapMode@entry=33071,
    aFlags=aFlags@entry=mozilla::gl::TextureImage::OriginBottomLeft, 
    aImageFormat=<optimized out>)
    at gfx/gl/GLTextureImage.cpp:30
#7  0x0000007ff294ec88 in mozilla::layers::TextureImageTextureSourceOGL::Update 
    (this=0x7ee01c70f0, aSurface=0x7ee019b290, aDestRegion=0x0,
    aSrcOffset=0x0, aDstOffset=0x0) at ${PROJECT}/obj-build-mer-qt-xr/dist/
    include/gfx2DGlue.h:70
#8  0x0000007ff2a43ea8 in mozilla::layers::BufferTextureHost::Upload (
    this=this@entry=0x7ee01bb470, aRegion=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#9  0x0000007ff2a444e0 in mozilla::layers::BufferTextureHost::MaybeUpload (
    this=this@entry=0x7ee01bb470, aRegion=<optimized out>)
    at gfx/layers/composite/TextureHost.cpp:1046
#10 0x0000007ff2a44808 in mozilla::layers::BufferTextureHost::UploadIfNeeded (
    this=this@entry=0x7ee01bb470)
    at gfx/layers/composite/TextureHost.cpp:1031
#11 0x0000007ff2a44824 in mozilla::layers::BufferTextureHost::Lock (
    this=0x7ee01bb470)
    at gfx/layers/composite/TextureHost.cpp:650
#12 0x0000007ff2a35d8c in mozilla::layers::ImageHost::Lock (this=0x7ee01b7c60)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#13 0x0000007ff2a3621c in mozilla::layers::AutoLockCompositableHost::
    AutoLockCompositableHost (aHost=0x7ee01b7c60, this=0x7f364e8ca0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#14 mozilla::layers::ImageHost::Composite (this=this@entry=0x7ee01b7c60, 
    aCompositor=aCompositor@entry=0x7ee0002ed0, 
    aLayer=aLayer@entry=0x7ee0265370,
    aEffectChain=..., aOpacity=1, aTransform=..., aSamplingFilter=<optimized 
    out>, aClipRect=..., aVisibleRegion=aVisibleRegion@entry=0x0, aGeometry=...)
    at gfx/layers/composite/ImageHost.cpp:197
#15 0x0000007ff2a26d3c in mozilla::layers::CanvasLayerComposite::<lambda(
    mozilla::layers::EffectChain&, const IntRect&)>::operator() (clipRect=...,
    effectChain=..., __closure=<synthetic pointer>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/MaybeStorageBase.h:50
#16 mozilla::layers::RenderWithAllMasks<mozilla::layers::CanvasLayerComposite::
    RenderLayer(const IntRect&, const mozilla::Maybe<mozilla::gfx::
    PolygonTyped<mozilla::gfx::UnknownUnits> >&)::<lambda(mozilla::layers::
    EffectChain&, const IntRect&)> >(mozilla::layers::Layer *, mozilla::layers::
    Compositor *, const mozilla::gfx::IntRect &, mozilla::layers::
    CanvasLayerComposite::<lambda(mozilla::layers::EffectChain&, const 
    IntRect&)>) (aLayer=aLayer@entry=
    0x7ee0264f60, aCompositor=<optimized out>, aClipRect=..., 
    aRenderCallback=aRenderCallback@entry=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/layers/
    LayerManagerCompositeUtils.h:69
#17 0x0000007ff2a27090 in mozilla::layers::CanvasLayerComposite::RenderLayer (
    this=0x7ee0264f60, aClipRect=..., aGeometry=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:289
#18 0x0000007ff2a32f88 in mozilla::layers::RenderLayers<mozilla::layers::
    ContainerLayerComposite> (aContainer=aContainer@entry=0x7ee025d580,
    aManager=aManager@entry=0x7ee01a43a0, aClipRect=..., aGeometry=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/Maybe.h:443
#19 0x0000007ff2a33e78 in mozilla::layers::ContainerRender<mozilla::layers::
    ContainerLayerComposite> (aContainer=0x7ee025d580, aManager=0x7ee01a43a0,
    aClipRect=..., aGeometry=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/gfx/BaseRect.h:53
#20 0x0000007ff2a33fc0 in mozilla::layers::ContainerLayerComposite::RenderLayer 
    (this=<optimized out>, aClipRect=..., aGeometry=...)
    at gfx/layers/composite/ContainerLayerComposite.cpp:745
#21 0x0000007ff2a32f88 in mozilla::layers::RenderLayers<mozilla::layers::
    ContainerLayerComposite> (aContainer=aContainer@entry=0x7ee01d0140,
    aManager=aManager@entry=0x7ee01a43a0, aClipRect=..., aGeometry=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/Maybe.h:443
#22 0x0000007ff2a33e78 in mozilla::layers::ContainerRender<mozilla::layers::
    ContainerLayerComposite> (aContainer=0x7ee01d0140, aManager=0x7ee01a43a0,
    aClipRect=..., aGeometry=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/gfx/BaseRect.h:53
#23 0x0000007ff2a33fc0 in mozilla::layers::ContainerLayerComposite::RenderLayer 
    (this=<optimized out>, aClipRect=..., aGeometry=...)
    at gfx/layers/composite/ContainerLayerComposite.cpp:745
#24 0x0000007ff2a32f88 in mozilla::layers::RenderLayers<mozilla::layers::
    ContainerLayerComposite> (aContainer=aContainer@entry=0x7ee01b0d00,
    aManager=aManager@entry=0x7ee01a43a0, aClipRect=..., aGeometry=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/Maybe.h:443
#25 0x0000007ff2a33e78 in mozilla::layers::ContainerRender<mozilla::layers::
    ContainerLayerComposite> (aContainer=0x7ee01b0d00, aManager=0x7ee01a43a0,
    aClipRect=..., aGeometry=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/gfx/BaseRect.h:53
#26 0x0000007ff2a33fc0 in mozilla::layers::ContainerLayerComposite::RenderLayer 
    (this=<optimized out>, aClipRect=..., aGeometry=...)
    at gfx/layers/composite/ContainerLayerComposite.cpp:745
#27 0x0000007ff2a1bc84 in mozilla::layers::LayerManagerComposite::<lambda(const 
    IntRect&)>::operator()(const mozilla::gfx::IntRect &) const (
    __closure=__closure@entry=0x7f364e98c8, aClipRect=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/MaybeStorageBase.h:50
#28 0x0000007ff2a30e68 in mozilla::layers::LayerManagerComposite::Render (
    this=this@entry=0x7ee01a43a0, aInvalidRegion=..., aOpaqueRegion=...)
    at gfx/layers/composite/LayerManagerComposite.cpp:1237
#29 0x0000007ff2a3148c in mozilla::layers::LayerManagerComposite::
    UpdateAndRender (this=this@entry=0x7ee01a43a0)
    at gfx/layers/composite/LayerManagerComposite.cpp:657
#30 0x0000007ff2a3183c in mozilla::layers::LayerManagerComposite::
    EndTransaction (this=this@entry=0x7ee01a43a0, aTimeStamp=...,
    aFlags=aFlags@entry=mozilla::layers::LayerManager::END_DEFAULT)
    at gfx/layers/composite/LayerManagerComposite.cpp:572
#31 0x0000007ff2a72fbc in mozilla::layers::CompositorBridgeParent::
    CompositeToTarget (this=0x7fc89b9920, aId=..., aTarget=0x0, 
    aRect=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#32 0x0000007ff4e07e38 in mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    CompositeToDefaultTarget (this=0x7fc89b9920, aId=...)
    at mobile/sailfishos/embedthread/EmbedLiteCompositorBridgeParent.cpp:160
#33 0x0000007ff2a58718 in mozilla::layers::CompositorVsyncScheduler::Composite (
    this=0x7fc8bd6dd0, aVsyncEvent=...)
    at gfx/layers/ipc/CompositorVsyncScheduler.cpp:256
#34 0x0000007ff2a50b98 in mozilla::detail::RunnableMethodArguments<mozilla::
    VsyncEvent>::applyImpl<mozilla::layers::CompositorVsyncScheduler, void (
    mozilla::layers::CompositorVsyncScheduler::*)(mozilla::VsyncEvent const&), 
    StoreCopyPassByConstLRef<mozilla::VsyncEvent>, 0ul> (args=..., m=<optimized 
    out>,
    o=<optimized out>) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/
    nsThreadUtils.h:887
[...]
#46 0x0000007fefbab89c in ?? () from /lib64/libc.so.6
(gdb)                              
To prevent this becoming tiresome I'm going to skip the last backtrace, since it relates to the same TextureImageEGL::TextureImageEGL() call we've just seen.

That feels like plenty to be getting on with. Tomorrow I'll need to compare these backtraces with the working ESR 91 code to see whether it's possible to get to the same place or not and, if it is, what might have changed.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
7 Jun 2024 : Day 256 #
It's the big one! A full 2^8 days of development have gone in to this now, which seems like an absurd amount of effort.
 
2^8 in the centre of a bright coloured flash

Unfortunately, while numerically this is very exciting, the actual work I'm doing right now isn't, so there's no big reveal to impress you with. Instead I'm going to continue hacking away at the WebGL bug I discovered a couple of days back.

To elaborate, I'm currently trying to find out why the WebView rendering fix has caused WebGL rendering to fail. Both are types of offscreen rendering, so it's not surprising that one has affected the other, but it's important that both of them are working correctly.

Over the last couple of days I discovered that the problem definitely exists in the latest commit added to the code. I checked that by rolling the repository back one commit, rebuilding and checking that the problem doesn't happen with the slightly older version.

Now I need to find out what has changed in the flow of the code to make the problem appear.

From the earlier backtraces we know that the problem is a call to SharedSurface_Basic::ToSurfaceDescriptor(), which itself is called from WebGLContext::GetFrontBuffer(). Stepping through this method I can see that there's no immediate crashing happening there, and execution continues into ShareableCanvasRenderer::UpdateCompositableClient(). The code being executed there looks like this:
    // First, let's see if we can get a no-copy TextureClient from the canvas.
    auto tc = fnGetExistingTc();
    if (!tc) {
      // Otherwise, snapshot the surface and copy into a TexClient.
      tc = fnMakeTcFromSnapshot();
    }
    if (tc != mFrontBufferFromDesc) {
      mFrontBufferFromDesc = nullptr;
    }
Both fnGetExistingTc() and fnMakeTcFromSnapshot() are lambda functions defined inside the method. But the first of these is where the call to SharedSurface_Basic::ToSurfaceDescriptor() occurs. This is returning null because a call to SharedSurface_Basic::ToSurfaceDescriptor() always returns Nothing().

However, the following call to fnMakeTcFromSnapshot() is returning a value, as we can see in the following debug steps:
(gdb) n
32        return Nothing();
(gdb) n
50      ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/MaybeStorageBase.h: 
    No such file or directory.
(gdb) n
mozilla::ClientWebGLContext::GetFrontBuffer (this=this@entry=0x7fc8b4a4b0, 
    fb=fb@entry=0x0, vr=<optimized out>, vr@entry=false)
    at dom/canvas/ClientWebGLContext.cpp:368
368       const auto notLost = mNotLost;
(gdb) n
mozilla::layers::ShareableCanvasRenderer::<lambda()>::operator() (
    __closure=<synthetic pointer>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/Maybe.h:443
443     ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/Maybe.h: No such 
    file or directory.
(gdb) 
149         if (!desc) return nullptr;
(gdb) n
148         const auto desc = webgl->GetFrontBuffer(nullptr);
(gdb) n
mozilla::layers::ShareableCanvasRenderer::UpdateCompositableClient (
    this=0x7fc98a98e0)
    at gfx/layers/ShareableCanvasRenderer.cpp:196
196         if (!tc) {
(gdb) p tc
$8 = {mRawPtr = 0x0}
(gdb) n
198           tc = fnMakeTcFromSnapshot();
(gdb) n
200         if (tc != mFrontBufferFromDesc) {
(gdb) p tc
$9 = {mRawPtr = 0x7fc93bc9a0}
(gdb) 
This will need comparing against what happens in our newer build where the crash occurs. Thinking back, I'm now a little concerned that the sole reason for the crash is this line that I added to SharedSurface_Basic::ToSurfaceDescriptor():
Maybe<layers::SurfaceDescriptor> SharedSurface_Basic::ToSurfaceDescriptor() {
  MOZ_CRASH(&quot;GFX: ToSurfaceDescriptor&quot;);
  return Nothing();
}
Certainly this will cause a crash, but I thought I'd also tested it without this. Now I'm not so sure...

Sadly I didn't keep copies of the newer packages to install back again, but I do have a copy of the libxul.so library from back then. I'm not sure if I'll be able to debug using it, but it's worth a try. If it turns out not to be debuggable I'll just have to do another complete rebuild (although, this time, I'll keep a copy of the current packages so I can reinstall them if I need to do another comparison!).

Sadly I don't get any joy testing the library:
Thread 8 &quot;GeckoWorkerThre&quot; received signal SIGSEGV, Segmentation 
    fault.
0x0000007fe5ee13a8 in ?? ()
(gdb) bt
#0  0x0000007fe5ee13a8 in ?? ()
#1  0x0000007fdf293e08 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb) 
I'm going to have to do a rebuild. This means restoring the original branch, then performing the build to create the full set of RPM packages.
$ cd gecko-dev
$ git checkout -b temp
$ git checkout FIREFOX_ESR_91_9_X_RELBRANCH_patches
$ git log --oneline -5
7437a9d17284 (HEAD -> FIREFOX_ESR_91_9_X_RELBRANCH_patches) Restore 
    GLScreenBuffer and TextureImageEGL
d3ba4df29a32 (temp) Restore NotifyDidPaint event and timers
f55057391ac0 Prevent errors from DownloadPrompter
eab04b8c0d80 Enable dconf
c6ea49286566 (origin/FIREFOX_ESR_91_9_X_RELBRANCH_patches) Disable SessionStore 
    functionality
$ cd ..
Before now performing the build I must remove the code that's guaranteed to cause a crash:
Maybe<layers::SurfaceDescriptor> SharedSurface_Basic::ToSurfaceDescriptor() {
  return Nothing();
}
Now to build:
$ sfdk build -d --with git_workaround
[...]
The build won't be ready until the morning at the earliest. So I'm going to pause there and come back to this tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
6 Jun 2024 : Day 255 #
It's day 0b011111111 today, or to put it another way, day (2^8 - 1). That means tomorrow is the big one. I'm certainly hoping I won't need until 2^9 before ESR 91 is released, which hopefully means this will be the last big one, numerical speaking, for this project.

A couple of months back Adam Pigg (piggz) claimed he suspected me of holding out on a solution:
 
[M]y theory is that its all working just fine, and he's just dragging it out to the big reveal on day 2^8 :)

The truth was that at that stage I wasn't at all convinced I'd be able to get the WebView working in time. Thankfully it is now working, in the nick of time as it turns out, but nevertheless the task isn't quite complete. Even once I've finalised this WebView patch, there'll still be more work to do in areas including video rendering, WebRTC videoconferencing, patch refactoring and a bunch of smaller glitches to iron out. So I'm sorry to say there's still no release on the horizon just yet. But as I hope is clear by now, I'm playing the long game. Not only am I committed to getting it finished, but I'm also doing my best to help ensure the process is as streamlined as possible for the future too. Hopefully, when it comes to the next release, things will be easier.

Before I get back to coding, I also need to give advance warning that I'll not be posting entries next week. Next week is Hackweek at work, which means a week long intensive coding session with my colleagues. There's a good chance that this won't leave much in the way of free-time for me to be working on Gecko. That'll be from Monday 10th June to Friday 14th June. I'll start up right back where I leave things off on the Saturday though.

Alright, now back to coding. Yesterday you'll recall I discovered a problem with WebGL rendering. I know this was working back in February because I demoed it at FOSDEM, but some change I've made between then and now has broken it.

Yesterday I recorded a couple of backtraces around the crash. My suspicion is that the problem relates to the recent changes to offscreen rendering.

To test this theory out I've created a new branch and rolled the project back a single commit to before I started making the WebView changes. The wonders of version control! During the day today I set it building a completely fresh set of RPM packages based on this slightly older version of the code.
$ cd gecko-dev
$ git checkout -b temp
$ git log FIREFOX_ESR_91_9_X_RELBRANCH_patches_temp --oneline -5
eb40ffd47432 (FIREFOX_ESR_91_9_X_RELBRANCH_patches_temp) Restore GLScreenBuffer 
    and TextureImageEGL
d3ba4df29a32 (HEAD -> temp) Restore NotifyDidPaint event and timers
f55057391ac0 Prevent errors from DownloadPrompter
eab04b8c0d80 Enable dconf
c6ea49286566 (origin/FIREFOX_ESR_91_9_X_RELBRANCH_patches) Disable SessionStore 
    functionality
$ git reset --hard d3ba4df29a32d53c38c68e4512d1fa82073ecdf4
$ git log --oneline -4
d3ba4df29a32 (HEAD -> temp) Restore NotifyDidPaint event and timers
f55057391ac0 Prevent errors from DownloadPrompter
eab04b8c0d80 Enable dconf
c6ea49286566 (origin/FIREFOX_ESR_91_9_X_RELBRANCH_patches) Disable SessionStore 
    functionality
$ cd ..
$ sfdk build -d --with git_workaround
[...]
Testing these new packages this evening I find that WebGL is indeed working with this one-commit-older version. That narrows down the problem to somewhere in the most recent commit eb40ffd47432.

That's a big help. With the two backtraces captured yesterday my plan is to compare the execution flow with the working version to see how they differ. Here's what I believe to be the equivalent backtrace:
Thread 10 &quot;GeckoWorkerThre&quot; hit Breakpoint 2, mozilla::gl::
    SharedSurface_Basic::SharedSurface_Basic (this=0x7f81347dc0, 
    gl=0x7f815d82c0, size=...,
    hasAlpha=true, tex=1, ownsTex=true) at gfx/gl/SharedSurfaceGL.cpp:54
54      SharedSurface_Basic::SharedSurface_Basic(GLContext* gl, const IntSize& 
    size,
This leads us to the second backtraces for the ToSurfaceDescriptor conversion method:
Thread 8 &quot;GeckoWorkerThre&quot; hit Breakpoint 5, mozilla::gl::
    SharedSurface_Basic::ToSurfaceDescriptor (this=0x7fc8d8c9f0)
    at gfx/gl/SharedSurfaceGL.cpp:31
31      Maybe<layers::SurfaceDescriptor> SharedSurface_Basic::
    ToSurfaceDescriptor() {
(gdb) bt
#0  mozilla::gl::SharedSurface_Basic::ToSurfaceDescriptor (this=0x7fc8d8c9f0)
    at gfx/gl/SharedSurfaceGL.cpp:31
#1  0x0000007ff3694278 in mozilla::WebGLContext::GetFrontBuffer (
    this=this@entry=0x7fc94b8d10, xrFb=<optimized out>, webvr=webvr@entry=false)
    at dom/canvas/WebGLContext.cpp:949
#2  0x0000007ff365c528 in mozilla::HostWebGLContext::GetFrontBuffer (
    this=<optimized out>, xrFb=<optimized out>, webvr=false)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:280
#3  0x0000007ff365c5d8 in mozilla::ClientWebGLContext::GetFrontBuffer (
    this=this@entry=0x7fc8b4a4b0, fb=fb@entry=0x0, vr=<optimized out>, 
    vr@entry=false)
    at dom/canvas/ClientWebGLContext.cpp:373
#4  0x0000007ff29b2410 in mozilla::layers::ShareableCanvasRenderer::<lambda()>::
    operator() (__closure=<synthetic pointer>)
    at gfx/layers/ShareableCanvasRenderer.cpp:148
#5  mozilla::layers::ShareableCanvasRenderer::UpdateCompositableClient (
    this=0x7fc98a98e0)
    at gfx/layers/ShareableCanvasRenderer.cpp:195
#6  0x0000007ff29f1e10 in mozilla::layers::ClientCanvasLayer::RenderLayer (
    this=0x7fc959bd60)
    at gfx/layers/client/ClientCanvasLayer.cpp:25
#7  0x0000007ff29f0f30 in mozilla::layers::ClientLayer::RenderLayerWithReadback 
    (this=<optimized out>, aReadback=<optimized out>)
    at gfx/layers/client/ClientLayerManager.h:365
#8  0x0000007ff2a01054 in mozilla::layers::ClientContainerLayer::RenderLayer (
    this=0x7fc9798e60)
    at gfx/layers/Layers.h:1051
#9  0x0000007ff29f0f30 in mozilla::layers::ClientLayer::RenderLayerWithReadback 
    (this=<optimized out>, aReadback=<optimized out>)
    at gfx/layers/client/ClientLayerManager.h:365
#10 0x0000007ff2a01054 in mozilla::layers::ClientContainerLayer::RenderLayer (
    this=0x7fc8d810f0)
    at gfx/layers/Layers.h:1051
#11 0x0000007ff29f0f30 in mozilla::layers::ClientLayer::RenderLayerWithReadback 
    (this=<optimized out>, aReadback=<optimized out>)
    at gfx/layers/client/ClientLayerManager.h:365
#12 0x0000007ff2a01054 in mozilla::layers::ClientContainerLayer::RenderLayer (
    this=0x7fc93748a0)
    at gfx/layers/Layers.h:1051
#13 0x0000007ff2a08270 in mozilla::layers::ClientLayerManager::
    EndTransactionInternal (this=this@entry=0x7fc8b18a30, 
    aCallback=aCallback@entry=0x7ff46a44d0 <mozilla::FrameLayerBuilder::
    DrawPaintedLayer(mozilla::layers::PaintedLayer*, gfxContext*, mozilla::gfx::
    IntRegionTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::
    IntRegionTyped<mozilla::gfx::UnknownUnits> const&, mozilla::layers::
    DrawRegionClip, mozilla::gfx::IntRegionTyped<mozilla::gfx::UnknownUnits> 
    const&, void*)>, aCallbackData=aCallbackData@entry=0x7fdf2dd268)
    at gfx/layers/client/ClientLayerManager.cpp:341
#14 0x0000007ff2a12be4 in mozilla::layers::ClientLayerManager::EndTransaction (
    this=0x7fc8b18a30, 
    aCallback=0x7ff46a44d0 <mozilla::FrameLayerBuilder::DrawPaintedLayer(
    mozilla::layers::PaintedLayer*, gfxContext*, mozilla::gfx::
    IntRegionTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::
    IntRegionTyped<mozilla::gfx::UnknownUnits> const&, mozilla::layers::
    DrawRegionClip, mozilla::gfx::IntRegionTyped<mozilla::gfx::UnknownUnits> 
    const&, void*)>, aCallbackData=0x7fdf2dd268, aFlags=mozilla::layers::
    LayerManager::END_DEFAULT)
    at gfx/layers/client/ClientLayerManager.cpp:397
#15 0x0000007ff46a18f0 in nsDisplayList::PaintRoot (
    this=this@entry=0x7fdf2df078, aBuilder=aBuilder@entry=0x7fdf2dd268, 
    aCtx=aCtx@entry=0x0, 
    aFlags=aFlags@entry=13, aDisplayListBuildTime=...)
    at layout/painting/nsDisplayList.cpp:2622
#16 0x0000007ff442dc4c in nsLayoutUtils::PaintFrame (
    aRenderingContext=aRenderingContext@entry=0x0, 
    aFrame=aFrame@entry=0x7fc9362940, aDirtyRegion=..., 
    aBackstop=aBackstop@entry=4294967295, 
    aBuilderMode=aBuilderMode@entry=nsDisplayListBuilderMode::Painting, 
    aFlags=aFlags@entry=(nsLayoutUtils::PaintFrameFlags::WidgetLayers | 
    nsLayoutUtils::PaintFrameFlags::ExistingTransaction | nsLayoutUtils::
    PaintFrameFlags::NoComposite)) at ${PROJECT}/obj-build-mer-qt-xr/dist/
    include/mozilla/MaybeStorageBase.h:80
#17 0x0000007ff43b8340 in mozilla::PresShell::Paint (
    this=this@entry=0x7fc92df890, aViewToPaint=aViewToPaint@entry=0x7fc8570b20, 
    aDirtyRegion=..., 
    aFlags=aFlags@entry=mozilla::PaintFlags::PaintLayers)
    at layout/base/PresShell.cpp:6400
#18 0x0000007ff41f0210 in nsViewManager::ProcessPendingUpdatesPaint (
    this=this@entry=0x7fc8570ae0, aWidget=aWidget@entry=0x7fc8570ba0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/gfx/RectAbsolute.h:43
#19 0x0000007ff41f05c4 in nsViewManager::ProcessPendingUpdatesForView (
    this=this@entry=0x7fc8570ae0, aView=<optimized out>, 
    aFlushDirtyRegion=aFlushDirtyRegion@entry=true)
    at view/nsViewManager.cpp:394
#20 0x0000007ff41f0bb4 in nsViewManager::ProcessPendingUpdates (
    this=this@entry=0x7fc8570ae0)
    at view/nsViewManager.cpp:972
[...]
#51 0x0000007fefbb189c in ?? () from /lib64/libc.so.6
(gdb)
There's actually very little difference between these calls, as we can see if we look at just the first couple of frames of each next to each other:
#0  0x0000007ff28d1ca4 in mozilla::gl::SharedSurface_Basic::ToSurfaceDescriptor 
    (this=<optimized out>)
    at gfx/gl/SharedSurfaceGL.cpp:38
#1  0x0000007ff36920a4 in mozilla::WebGLContext::GetFrontBuffer (
    this=this@entry=0x7fc94889c0, xrFb=<optimized out>, webvr=webvr@entry=false)
    at dom/canvas/WebGLContext.cpp:949
#0  mozilla::gl::SharedSurface_Basic::ToSurfaceDescriptor (this=0x7fc8d8c9f0)
    at gfx/gl/SharedSurfaceGL.cpp:31
#1  0x0000007ff3694278 in mozilla::WebGLContext::GetFrontBuffer (
    this=this@entry=0x7fc94b8d10, xrFb=<optimized out>, webvr=webvr@entry=false)
    at dom/canvas/WebGLContext.cpp:949
The first of these is the broken version, while the second is working. In order to get a deeper understanding, I'm going to want to step through the code between here and the crash.

Unfortunately after the build during the day I'm a bit short on time to delve deeper in to this now. But I'll pick this up again tomorrow to try to figure out what the difference is. Once I have that it will hopefully give a much clearer idea about how to fix the problem with my latest changeset. I can then roll back to my original commit, fix it, and... well, let's see.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
5 Jun 2024 : Day 254 #
I'm continuing with my attempts to simplify the latest commit by removing unnecessary changes and doing my best to align the changes with the upstream ESR 91 changes.

To do this I've been looking carefully through the code to try to find unused methods that were added in my latest commit. I managed to find a couple. Both GLContext::ResizeOffscreen() and GLContext::OffscreenSize() have equivalents in the GLScreenBuffer class which can be used as drop-in replacements. So those two methods, which were previously added, are now added no more.

The other change I've made is to remove the following member variable from GLContext:
  std::map<GLuint, SharedSurface*> mFBOMapping
With this removed I'm also able to remove the related code from the source file as well as the dependency on the standard map header. I've tested the result and it doesn't seem to have any negative effects on either the browser or the WebView.

The other change I've made today is to simplify how the EGLDisplay is passed around when things are still being initialised. I'd created quite a web of methods to pass this between, with all of these needing this variable to be passed in. Here they are, with the aDisplay parameter at the end being the one I'd really like to avoid the need for.
RefPtr<GLLibraryEGL> DefaultEglLibrary(nsACString* const out_failureId, 
    EGLDisplay aDisplay);
inline std::shared_ptr<EglDisplay> DefaultEglDisplay(nsACString* const 
    out_failureId, EGLDisplay aDisplay);
RefPtr<GLLibraryEGL> GLLibraryEGL::Create(nsACString* const out_failureId, 
    EGLDisplay aDisplay);
bool GLLibraryEGL::Init(bool forceAccel, nsACString* const out_failureId, 
    EGLDisplay aDisplay);
Just by rearranging things a little I've been able to remove the aDisplay parameter from all of these. It turns out that most of them were just passing the value on to one of the other methods. Once they were taken away from one, they weren't needed in the others either.

After these changes the patch is still pretty large, but looking considerably better.

However, during my testing I hit another problem. It turns out that somewhere, some of the changes I made (probably related to offscreen rendering) have broken the WebGLContext capabilities. That means that if a web page uses WebGL it'll now trigger a crash. I know for sure that this was working earlier, so something has changed.

You'll have to forgive me for including a lengthy backtrace. I admit it's not very enlightening, but I want to keep a copy here so I have something to refer to. This is the backtrace for the crash:
Thread 8 &quot;GeckoWorkerThre&quot; received signal SIGSEGV, Segmentation 
    fault.
[Switching to LWP 10166]
0x0000007ff28d1ca4 in mozilla::gl::SharedSurface_Basic::ToSurfaceDescriptor (
    this=<optimized out>)
    at gfx/gl/SharedSurfaceGL.cpp:38
38        MOZ_CRASH(&quot;GFX: ToSurfaceDescriptor&quot;);
(gdb) bt
#0  0x0000007ff28d1ca4 in mozilla::gl::SharedSurface_Basic::ToSurfaceDescriptor 
    (this=<optimized out>)
    at gfx/gl/SharedSurfaceGL.cpp:38
#1  0x0000007ff36920a4 in mozilla::WebGLContext::GetFrontBuffer (
    this=this@entry=0x7fc94889c0, xrFb=<optimized out>, webvr=webvr@entry=false)
    at dom/canvas/WebGLContext.cpp:949
#2  0x0000007ff365a410 in mozilla::HostWebGLContext::GetFrontBuffer (
    this=<optimized out>, xrFb=<optimized out>, webvr=false)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:280
#3  0x0000007ff365a4c0 in mozilla::ClientWebGLContext::GetFrontBuffer (
    this=this@entry=0x7fc8b48ec0, fb=fb@entry=0x0, vr=<optimized out>, 
    vr@entry=false)
    at dom/canvas/ClientWebGLContext.cpp:373
#4  0x0000007ff29b0084 in mozilla::layers::ShareableCanvasRenderer::<lambda()>::
    operator() (__closure=<synthetic pointer>)
    at gfx/layers/ShareableCanvasRenderer.cpp:148
#5  mozilla::layers::ShareableCanvasRenderer::UpdateCompositableClient (
    this=0x7fc96d7db0)
    at gfx/layers/ShareableCanvasRenderer.cpp:195
#6  0x0000007ff29efa84 in mozilla::layers::ClientCanvasLayer::RenderLayer (
    this=0x7fc978af90)
    at gfx/layers/client/ClientCanvasLayer.cpp:25
#7  0x0000007ff29eeba4 in mozilla::layers::ClientLayer::RenderLayerWithReadback 
    (this=<optimized out>, aReadback=<optimized out>)
    at gfx/layers/client/ClientLayerManager.h:365
#8  0x0000007ff29feeec in mozilla::layers::ClientContainerLayer::RenderLayer (
    this=0x7fc978a680)
    at gfx/layers/Layers.h:1051
#9  0x0000007ff29eeba4 in mozilla::layers::ClientLayer::RenderLayerWithReadback 
    (this=<optimized out>, aReadback=<optimized out>)
    at gfx/layers/client/ClientLayerManager.h:365
#10 0x0000007ff29feeec in mozilla::layers::ClientContainerLayer::RenderLayer (
    this=0x7fc9424e20)
    at gfx/layers/Layers.h:1051
#11 0x0000007ff29eeba4 in mozilla::layers::ClientLayer::RenderLayerWithReadback 
    (this=<optimized out>, aReadback=<optimized out>)
    at gfx/layers/client/ClientLayerManager.h:365
#12 0x0000007ff29feeec in mozilla::layers::ClientContainerLayer::RenderLayer (
    this=0x7fc8dd8d50)
    at gfx/layers/Layers.h:1051
#13 0x0000007ff2a05bd0 in mozilla::layers::ClientLayerManager::
    EndTransactionInternal (this=this@entry=0x7fc8b17440, 
    aCallback=aCallback@entry=
    0x7ff46a23d0 <mozilla::FrameLayerBuilder::DrawPaintedLayer(mozilla::layers::
    PaintedLayer*, gfxContext*, mozilla::gfx::IntRegionTyped<mozilla::gfx::
    UnknownUnits> const&, mozilla::gfx::IntRegionTyped<mozilla::gfx::
    UnknownUnits> const&, mozilla::layers::DrawRegionClip, mozilla::gfx::
    IntRegionTyped<mozilla::gfx::UnknownUnits> const&, void*)>, 
    aCallbackData=aCallbackData@entry=0x7fdf293268)
    at gfx/layers/client/ClientLayerManager.cpp:341
#14 0x0000007ff2a10ad0 in mozilla::layers::ClientLayerManager::EndTransaction (
    this=0x7fc8b17440, 
    aCallback=0x7ff46a23d0 <mozilla::FrameLayerBuilder::DrawPaintedLayer(
    mozilla::layers::PaintedLayer*, gfxContext*, mozilla::gfx::
    IntRegionTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::
    IntRegionTyped<mozilla::gfx::UnknownUnits> const&, mozilla::layers::
    DrawRegionClip, mozilla::gfx::IntRegionTyped<mozilla::gfx::UnknownUnits> 
    const&, void*)>, aCallbackData=0x7fdf293268, aFlags=mozilla::layers::
    LayerManager::END_DEFAULT)
    at gfx/layers/client/ClientLayerManager.cpp:397
#15 0x0000007ff469f7f0 in nsDisplayList::PaintRoot (
    this=this@entry=0x7fdf295078, aBuilder=aBuilder@entry=0x7fdf293268, 
    aCtx=aCtx@entry=0x0, 
    aFlags=aFlags@entry=13, aDisplayListBuildTime=...)
    at layout/painting/nsDisplayList.cpp:2622
#16 0x0000007ff442bb4c in nsLayoutUtils::PaintFrame (
    aRenderingContext=aRenderingContext@entry=0x0, 
    aFrame=aFrame@entry=0x7fc932f550, aDirtyRegion=..., 
    aBackstop=aBackstop@entry=4294967295, 
    aBuilderMode=aBuilderMode@entry=nsDisplayListBuilderMode::Painting, 
    aFlags=aFlags@entry=(nsLayoutUtils::PaintFrameFlags::WidgetLayers | 
    nsLayoutUtils::PaintFrameFlags::ExistingTransaction | nsLayoutUtils::
    PaintFrameFlags::NoComposite)) at ${PROJECT}/obj-build-mer-qt-xr/dist/
    include/mozilla/MaybeStorageBase.h:80
#17 0x0000007ff43b6240 in mozilla::PresShell::Paint (
    this=this@entry=0x7fc92a8a70, aViewToPaint=aViewToPaint@entry=0x7fc9287140, 
    aDirtyRegion=..., 
    aFlags=aFlags@entry=mozilla::PaintFlags::PaintLayers)
    at layout/base/PresShell.cpp:6400
#18 0x0000007ff41ee110 in nsViewManager::ProcessPendingUpdatesPaint (
    this=this@entry=0x7fc92870d0, aWidget=aWidget@entry=0x7fc92871c0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/gfx/RectAbsolute.h:43
[...]
#51 0x0000007fefba989c in ?? () from /lib64/libc.so.6
(gdb)
Reading through just the first few items in this backtrace, it's clear that the reason is due to the call to SharedSurface_Basic::ToSurfaceDescriptor(), which is intentionally triggering a crash based on the code that's there. Given this, it's possible the problem is that the SharedSurface_Basic object that's being used should have been one of the several other alternative surface variants.

So because I think it could be helpful in getting to the bottom of this, I'm also going to record how and where this surface is being created. Here's the backtrace for its creation:
#0  mozilla::gl::SharedSurface_Basic::SharedSurface_Basic (this=0x7fc8d77e40, 
    gl=0x7fc97f9830, size=..., hasAlpha=false, tex=1, ownsTex=true)
    at gfx/gl/SharedSurfaceGL.cpp:78
#1  0x0000007ff28d3e6c in mozilla::gl::SharedSurface_Basic::Create (
    gl=0x7fc97f9830, formats=..., size=..., hasAlpha=false)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/cxxalloc.h:33
#2  0x0000007ff28a3720 in mozilla::gl::SurfaceFactory_Basic::CreateSharedImpl (
    this=<optimized out>, desc=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/WeakPtr.h:185
#3  0x0000007ff28a3628 in mozilla::gl::SurfaceFactory::CreateShared (
    this=0x7fc93e17c0, size=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefCounted.h:240
#4  0x0000007ff28a5f24 in mozilla::gl::SwapChain::Acquire (
    this=this@entry=0x7fc92ed298, size=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:290
#5  0x0000007ff369bed4 in mozilla::WebGLContext::PresentInto (
    this=this@entry=0x7fc92ece00, swapChain=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:290
#6  0x0000007ff369c304 in mozilla::WebGLContext::Present (
    this=this@entry=0x7fc92ece00, xrFb=<optimized out>, 
    consumerType=consumerType@entry=mozilla::layers::TextureType::Unknown, 
    webvr=webvr@entry=false)
    at dom/canvas/WebGLContext.cpp:936
#7  0x0000007ff3664300 in mozilla::HostWebGLContext::Present (webvr=false, 
    t=mozilla::layers::TextureType::Unknown, xrFb=<optimized out>, 
    this=<optimized out>) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/
    mozilla/RefPtr.h:280
#8  mozilla::ClientWebGLContext::Run<void (mozilla::HostWebGLContext::*)(
    unsigned long, mozilla::layers::TextureType, bool) const, &(mozilla::
    HostWebGLContext::Present(unsigned long, mozilla::layers::TextureType, 
    bool) const), unsigned long, mozilla::layers::TextureType const&, bool 
    const&> (
    this=<optimized out>, args#0=@0x7fdf2926c0: 0, args#1=@0x7fdf2926bf: 
    mozilla::layers::TextureType::Unknown, args#2=@0x7fdf2926be: false)
    at dom/canvas/ClientWebGLContext.cpp:313
#9  0x0000007ff3664468 in mozilla::ClientWebGLContext::Present (
    this=this@entry=0x7fc9558160, xrFb=xrFb@entry=0x0, type=<optimized out>, 
    webvr=<optimized out>, webvr@entry=false)
    at dom/canvas/ClientWebGLContext.cpp:363
#10 0x0000007ff368fc78 in mozilla::ClientWebGLContext::OnBeforePaintTransaction 
    (this=0x7fc9558160)
    at dom/canvas/ClientWebGLContext.cpp:345
#11 0x0000007ff28ff0dc in mozilla::layers::CanvasRenderer::
    FirePreTransactionCallback (this=this@entry=0x7fc9646e50)
    at gfx/layers/CanvasRenderer.cpp:75
#12 0x0000007ff29afee8 in mozilla::layers::ShareableCanvasRenderer::
    UpdateCompositableClient (this=0x7fc9646e50)
    at gfx/layers/ShareableCanvasRenderer.cpp:192
#13 0x0000007ff29efa84 in mozilla::layers::ClientCanvasLayer::RenderLayer (
    this=0x7fc98f02c0)
    at gfx/layers/client/ClientCanvasLayer.cpp:25
#14 0x0000007ff29eeba4 in mozilla::layers::ClientLayer::RenderLayerWithReadback 
    (this=<optimized out>, aReadback=<optimized out>)
    at gfx/layers/client/ClientLayerManager.h:365
#15 0x0000007ff29feeec in mozilla::layers::ClientContainerLayer::RenderLayer (
    this=0x7fc991a8d0)
    at gfx/layers/Layers.h:1051
#16 0x0000007ff29eeba4 in mozilla::layers::ClientLayer::RenderLayerWithReadback 
    (this=<optimized out>, aReadback=<optimized out>)
    at gfx/layers/client/ClientLayerManager.h:365
#17 0x0000007ff29feeec in mozilla::layers::ClientContainerLayer::RenderLayer (
    this=0x7fc935acd0)
    at gfx/layers/Layers.h:1051
#18 0x0000007ff29eeba4 in mozilla::layers::ClientLayer::RenderLayerWithReadback 
    (this=<optimized out>, aReadback=<optimized out>)
    at gfx/layers/client/ClientLayerManager.h:365
#19 0x0000007ff29feeec in mozilla::layers::ClientContainerLayer::RenderLayer (
    this=0x7fc8d8f2f0)
    at gfx/layers/Layers.h:1051
#20 0x0000007ff2a05bd0 in mozilla::layers::ClientLayerManager::
    EndTransactionInternal (this=this@entry=0x7fc8b17e40, 
    aCallback=aCallback@entry=
    0x7ff46a23d0 <mozilla::FrameLayerBuilder::DrawPaintedLayer(mozilla::layers::
    PaintedLayer*, gfxContext*, mozilla::gfx::IntRegionTyped<mozilla::gfx::
    UnknownUnits> const&, mozilla::gfx::IntRegionTyped<mozilla::gfx::
    UnknownUnits> const&, mozilla::layers::DrawRegionClip, mozilla::gfx::
    IntRegionTyped<mozilla::gfx::UnknownUnits> const&, void*)>, 
    aCallbackData=aCallbackData@entry=0x7fdf293268)
    at gfx/layers/client/ClientLayerManager.cpp:341
#21 0x0000007ff2a10ad0 in mozilla::layers::ClientLayerManager::EndTransaction (
    this=0x7fc8b17e40, 
    aCallback=0x7ff46a23d0 <mozilla::FrameLayerBuilder::DrawPaintedLayer(
    mozilla::layers::PaintedLayer*, gfxContext*, mozilla::gfx::
    IntRegionTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::
    IntRegionTyped<mozilla::gfx::UnknownUnits> const&, mozilla::layers::
    DrawRegionClip, mozilla::gfx::IntRegionTyped<mozilla::gfx::UnknownUnits> 
    const&, void*)>, aCallbackData=0x7fdf293268, aFlags=mozilla::layers::
    LayerManager::END_DEFAULT)
    at gfx/layers/client/ClientLayerManager.cpp:397
#22 0x0000007ff469f7f0 in nsDisplayList::PaintRoot (
    this=this@entry=0x7fdf295078, aBuilder=aBuilder@entry=0x7fdf293268, 
    aCtx=aCtx@entry=0x0, 
    aFlags=aFlags@entry=13, aDisplayListBuildTime=...)
    at layout/painting/nsDisplayList.cpp:2622
#23 0x0000007ff442bb4c in nsLayoutUtils::PaintFrame (
    aRenderingContext=aRenderingContext@entry=0x0, 
    aFrame=aFrame@entry=0x7fc93240a0, aDirtyRegion=..., 
    aBackstop=aBackstop@entry=4294967295, 
    aBuilderMode=aBuilderMode@entry=nsDisplayListBuilderMode::Painting, 
    aFlags=aFlags@entry=(nsLayoutUtils::PaintFrameFlags::WidgetLayers | 
    nsLayoutUtils::PaintFrameFlags::ExistingTransaction | nsLayoutUtils::
    PaintFrameFlags::NoComposite)) at ${PROJECT}/obj-build-mer-qt-xr/dist/
    include/mozilla/MaybeStorageBase.h:80
#24 0x0000007ff43b6240 in mozilla::PresShell::Paint (
    this=this@entry=0x7fc9295080, aViewToPaint=aViewToPaint@entry=0x7fc9273470, 
    aDirtyRegion=..., 
    aFlags=aFlags@entry=mozilla::PaintFlags::PaintLayers)
    at layout/base/PresShell.cpp:6400
#25 0x0000007ff41ee110 in nsViewManager::ProcessPendingUpdatesPaint (
    this=this@entry=0x7fc9273430, aWidget=aWidget@entry=0x7fc92734f0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/gfx/RectAbsolute.h:43
[...]
#58 0x0000007fefba989c in ?? () from /lib64/libc.so.6
(gdb) 
Again, apologies for the lengthy backtraces. I'm not going to try to debug and fix this today, so I'm thinking it might be helpful to have these backtraces as something to refer back to for the future.

Sadly I don't have the energy to dig into this crash further today, so I'll have to leave things there. I hope it won't be too hard to fix, but at this point it's already looking a bit tricky, so we'll have to see.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
4 Jun 2024 : Day 253 #
Overnight I've been mulling options. They are to either strip out the code that chooses texture format based on device capabilities, or leave the existing large patch as it is. Going with the former will be more work and could result in incompatibilities on some devices, but has the potential to massively simplify the patch.

My decision is that I'm going to strip out the code. Already the way the decision is performed is messy. Stripping it all back and going with the bare essentials is exactly what's happened upstream; I think it makes sense to mirror these changes as much as possible. If it does cause problems for some devices, then we can look to reintroduce some of these changes. But at that point, it should be possible to do so in a much cleaner and more structured way.

The first step needed is to strip out the code from GLContext::ChooseGLFormats() so that it returns the values we saw yesterday, which were the following.
  bool any = false;
  bool color = true;
  bool alpha = false;
  bool bpp16 = false;
  bool depth = false;
  bool stencil = false;
  bool premultAlpha = true;
Stepping through the ChooseGLFormats() method we can see the consequences of these values, combined with the results returned from the GLES driver. When following this and subsequent comments it may be helpful to refer to the method as it looks upstream in ESR 78.
1763        formats.color_texType = LOCAL_GL_UNSIGNED_BYTE;
(gdb) p bpp16
$6 = false
(gdb) p caps.alpha
$7 = false
(gdb) n
1765        if (caps.alpha) {
(gdb) n
1772          formats.color_texFormat = LOCAL_GL_RGB;
(gdb) n
1773          formats.color_rbFormat = LOCAL_GL_RGB8;
(gdb) n
1779      if (IsSupported(GLFeature::packed_depth_stencil)) {
(gdb) n
1780        formats.depthStencil = LOCAL_GL_DEPTH24_STENCIL8;
(gdb) n
260         return mProfile == ContextProfile::OpenGLES;
(gdb) n
1785        if (IsExtensionSupported(OES_depth24)) {
(gdb) n
1794      formats.stencil = LOCAL_GL_STENCIL_INDEX8;
(gdb) n
1796      return formats;
(gdb) p formats
$8 = {color_texInternalFormat = 6407, color_texFormat = 6407, color_texType = 
    5121, color_rbFormat = 32849, depthStencil = 35056, depth = 33190, stencil 
    = 36168}
(gdb) 
The key values needed to determine flow through the method are these:
bpp16 == false
caps.alpha == false
IsSupported(GLFeature::packed_depth_stencil) == true
IsGLES() == true
(IsExtensionSupported(OES_depth24) == true
At the end of the method, we're left with the following capabilities:
color_texInternalFormat = 6407 = 0x1907 = LOCAL_GL_RGB
color_texFormat = 6407 = 0x1907 = LOCAL_GL_RGB
color_texType = 5121 = 0x1401 = LOCAL_GL_UNSIGNED_BYTE
color_rbFormat = 32849 = 0x8051 = LOCAL_GL_RGB8
depthStencil = 35056 = 0x88f0 = LOCAL_GL_DEPTH24_STENCIL8
depth = 33190 = 0x81a6 = LOCAL_GL_DEPTH_COMPONENT24
stencil = 36168 = 0x8d48 = LOCAL_GL_STENCIL_INDEX8
To perform the conversion from hex value to GLenum I cross-referenced against the gfx/gl/GLConsts.h file. Based on this analysis, I'm testing with the following newly reworked ChooseGLFormats() method:
GLFormats GLContext::ChooseGLFormats(const SurfaceCaps& caps) const {
  GLFormats formats;

  formats.color_texType = LOCAL_GL_UNSIGNED_BYTE;
  formats.color_texInternalFormat = LOCAL_GL_RGB;
  formats.color_texFormat = LOCAL_GL_RGB;
  formats.color_rbFormat = LOCAL_GL_RGB8;
  formats.depthStencil = LOCAL_GL_DEPTH24_STENCIL8;
  formats.depth = LOCAL_GL_DEPTH_COMPONENT24;
  formats.stencil = LOCAL_GL_STENCIL_INDEX8;

  return formats;
}
I should make clear that if this works, my plan is to remove this method entirely. I'm doing this check just to ensure I have everything correct before propagating these changes at a more fundamental level.
$ make -j1 -C obj-build-mer-qt-xr/gfx/
$ make -j16 -C `pwd`/obj-build-mer-qt-xr/toolkit
A quick test shows things are still working as expected, so the task now is to propagate this fixed configuration throughout the changes I've made. With any luck this will simplify things considerably, potentially allowing the use of SurfaceCaps to be eliminated entirely. The SurfaceCaps structure was removed between ESR 78 and ESR 91, so it'd be great if there's no need to restore it back again. Similarly for the AttachmentType class.

[...]

I've now spent a good few hours removing all references to SurfaceCaps from the code, to the extent that the structure isn't even defined in SurfaceTypes.h any more. That means big changes to the code: 79 new lines added, but more importantly 379 lines removed. That'll make a big difference to the final patch. I've also checked that the code compiles and, in theory, it's not doing anything different to the previous version that had the values fixed.

I'm going to test this out now, which means a partial build, but that will still take half an hour or so.

[...]

I've tested it; it all works well. So now I'm going to move on to the AttachmentType enumeration. Looking through the code, this attachment type is only ever set in three places: in the SharedSurface_EGLImage constructor and in one of the two SharedSurface_Basic constructors. In call cases it gets set to AttachmentType::GLTexture. Here are the three places:
SharedSurface_EGLImage::SharedSurface_EGLImage(GLContext* gl,
                                               const gfx::IntSize& size,
                                               const GLFormats& formats,
                                               GLuint prodTex, EGLImage image)
    : SharedSurface(
          SharedSurfaceType::EGLImageShare, AttachmentType::GLTexture, gl, size,
          false)  // Can't recycle, as mSync changes never update TextureHost.
      ,
[...]

SharedSurface_Basic::SharedSurface_Basic(const SharedSurfaceDesc& desc,
                                         UniquePtr<MozFramebuffer>&& fb)
    : SharedSurface(desc, std::move(fb), AttachmentType::GLTexture),

[...]

SharedSurface_Basic::SharedSurface_Basic(GLContext* gl, const IntSize& size,
                                         GLuint tex,
                                         bool ownsTex)
    : SharedSurface(SharedSurfaceType::Basic, AttachmentType::GLTexture, gl,
                    size, true),
Hopefully that gives the right idea. Given this is the case, it should be safe to remove the enumeration entirely and assume that the value is AttachmentType::GLTexture.

That simplifies the code a bunch more: we're now up to 98 inserted lines and 444 removed lines. Finally the GLFormats structure that's defined in GLContextTypes.h no longer exists in the upstream ESR 91 code. If you look back at the GLContext::ChooseGLFormats() listed out above, you'll notice that the values get set to default values; and in fact they never get changed. So this structure also looks to be a good candidate for removal.

Thankfully removing it is also pretty clean and straightforward. That leaves things in a much better state. These changes add a total of 95 lines but remove a total of 472 lines. I've checked that both the browser and WebView are still working as expected after these changes, so that feels like a good result for today.

That still leaves behind a potentially huge patch though. After these changes the patch still removes 145 lines and adds 1702 lines of code. We want both of these numbers to be as small as possible and that compares to 156 removals and 2090 additions before making these improvements.

I think we can do better; there are still further candidates for simplification. For example, one big change I had to make was to pass around the EGLDisplay value, which adds a new parameter to several methods. If I can remove this requirement, that'll take us closer again to the upstream ESR 91 code. But I've reached my limit for today, so this will have to wait until tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
3 Jun 2024 : Day 252 #
Before I get started, I want to mention that I've been heartened by all of the encouraging comments on the Sailfish Forum recently. I've not had a chance to reply there — I will do — but let me just say that while I appreciate the incredibly generous offers of donations, there's absolutely no need. It might seem a bit strange, but I'm do this for my own enjoyment and because just like everyone else I'd love to see the browser get a boost to the next version. But I also know there are many fabulous Sailfish developers who are far more deserving than I am, so if you want to splash the cash, you'll find willing recipients on the excellent The big Thank You & Coffee thread and I encourage you to donate to them!

With that said, let's get in to the day's development. Yesterday I was able to remove sixteen unused methods from my offscreen rendering patch. That's good progress, but it still leaves us with a very large patch. Large enough that it's worth continuing with the process in the hope of trimming it down further.

There are a few places where small changes upstream have caused the code to diverge, but in a way that could potentially be ironed out. If I can do this it'll remove code from the patch that can safely follow the upstream changes instead.

Chief among these is a change to the way GLContext is passed between methods. It's quite a large structure and so in ESR 78 it's typically passed as a pointer. In ESR 91 this has been changed in places so that it's now passed by reference.

In C++ there's not much practical difference between these apart from syntax: dereferencing isn't needed in the latter case. But the syntax changes propagate throughout the methods it's passed to. This can lead to what appear to be significant changes where they're actually pretty minor.

Let's take an example. Here's a method taken from ESR 78:
GLuint CreateTexture(GLContext* aGL, GLenum aInternalFormat, GLenum aFormat,
                     GLenum aType, const gfx::IntSize& aSize, bool linear) {
  GLuint tex = 0;
  aGL->fGenTextures(1, &tex);
  ScopedBindTexture autoTex(aGL, tex);

  aGL->fTexParameteri(LOCAL_GL_TEXTURE_2D, LOCAL_GL_TEXTURE_MIN_FILTER,
                      linear ? LOCAL_GL_LINEAR : LOCAL_GL_NEAREST);
  aGL->fTexParameteri(LOCAL_GL_TEXTURE_2D, LOCAL_GL_TEXTURE_MAG_FILTER,
                      linear ? LOCAL_GL_LINEAR : LOCAL_GL_NEAREST);
  aGL->fTexParameteri(LOCAL_GL_TEXTURE_2D, LOCAL_GL_TEXTURE_WRAP_S,
                      LOCAL_GL_CLAMP_TO_EDGE);
  aGL->fTexParameteri(LOCAL_GL_TEXTURE_2D, LOCAL_GL_TEXTURE_WRAP_T,
                      LOCAL_GL_CLAMP_TO_EDGE);

  aGL->fTexImage2D(LOCAL_GL_TEXTURE_2D, 0, aInternalFormat, aSize.width,
                   aSize.height, 0, aFormat, aType, nullptr);

  return tex;
}
The equivalent code in ESR 91 looks like this:
UniquePtr<Texture> CreateTexture(GLContext& gl, const gfx::IntSize& size) {
  const GLenum target = LOCAL_GL_TEXTURE_2D;
  const GLenum format = LOCAL_GL_RGBA;

  auto tex = MakeUnique<Texture>(gl);
  ScopedBindTexture autoTex(&gl, tex->name, target);

  gl.fTexParameteri(target, LOCAL_GL_TEXTURE_MIN_FILTER, LOCAL_GL_LINEAR);
  gl.fTexParameteri(target, LOCAL_GL_TEXTURE_MAG_FILTER, LOCAL_GL_LINEAR);
  gl.fTexParameteri(target, LOCAL_GL_TEXTURE_WRAP_S, LOCAL_GL_CLAMP_TO_EDGE);
  gl.fTexParameteri(target, LOCAL_GL_TEXTURE_WRAP_T, LOCAL_GL_CLAMP_TO_EDGE);

  gl.fTexImage2D(target, 0, format, size.width, size.height, 0, format,
                 LOCAL_GL_UNSIGNED_BYTE, nullptr);

  return tex;
}
As you can see, several lines have their access operators changed from a dot (".") to an arrow ("->"). That's direct access vs. dereferenced access.

The changes needed to get offscreen rendering to work mean I've switched out the ESR 91 version for a copy of the ESR 78 version. While there are clearly differences between the two, they're smaller than they look at first glance. If I can refactor the code so it's more like the ESR 91 version, that should save effort in the future when the patch is applied to the next upstream changes beyond ESR 91.

This isn't just an idle preference. Much of the work I've spent on the upgrade to ESR 91 has been caused by having to refactor patches where the underlying code has changed upstream. I predict that for every line of code I can remove from a patch I'm going to save future developers two or three times the effort having not to worry about that change in the future. The smaller and more efficient the patches, the less work is needed when reapplying them later.

Apart from the pointer vs. reference difference, another important difference between these two methods is that the ESR 78 version has the format, type and linear flag passed in, whereas in ESR 91 they're all defined statically.

Let's tackle the last of these first: the linear flag. In the ESR 78 version of the method this defaults to a value of true if left unspecified. It looks like the only place this is called is from CreateTextureForOffscreen() where the parameter is left as its default value. So I'm going to remove the parameter and assume it's always true. The compiler will tell us if there are cases where ti's set to false which I missed.

To understand whether the other parameters could change we have to take a look at the GLContext::ChooseGLFormats() method, which is where the format value is chosen. And the return value depends on the SurfaceCaps structure that's passed in.

Some potential changes of the SurfaceCaps are performed in GLContextProviderEGL::CreateOffscreen(), but looking carefully at this, that can only happen if canOffscreenUseHeadless is set to false, which itself can only happen if MOZ_WIDGET_ANDROID is defined. It's never defined for us (we're not Android!) So I can safely remove not just the assumption, but the related code as well. That's much cleaner.

Now we go through to CompositorOGL::CreateContext() where the SurfaceCaps objects are originally created. Here's the code that creates them:
    SurfaceCaps caps = SurfaceCaps::ForRGB();
    caps.bpp16 = gfxVars::OffscreenFormat() == SurfaceFormat::R5G6B5_UINT16;
By default gfxVars::OffscreenFormat() will return SurfaceFormat::X8R8G8B8_UINT32. I don't see any reason why that would be changed, which would mean that caps.bpp16 is by default set to false. What does the SurfaceCaps::ForRGB() method return? Apparently these values, according to the code:
  bool any = false;
  bool color = true;
  bool alpha = false;
  bool bpp16 = false;
  bool depth = false;
  bool stencil = false;
  bool premultAlpha = true;
This tallies with the values I'm seeing in practice using the debugger as well:
(gdb) p minOffscreenCaps
$1 = {any = false, color = true, alpha = false, bpp16 = false, depth = false, 
    stencil = false, premultAlpha = true, surfaceAllocator = {mRawPtr = 0x0}}
Armed with this knowledge I can now simplify the GLContext::ChooseGLFormats() method appropriately. Although the internal format might change, the color_texFormat value is now always set to LOCAL_GL_RGB. That gets passed in to CreateTexture() as the third parameter (aFormat). So we can remove that parameter and simply set it to LOCAL_GL_RGB in all cases. In ESR 91 this is almost what's happening as well, except there it's set to LOCAL_GL_RGBA (an extra alpha channel). I think this is a difference we're going to have to maintain. We're also going to have to keep the aInternalFormat parameter as this might change depending on the GLES capabilities of the device.

Given all of this, I'm not convinced we're going to get much more out of trying to simplify this CreateTexture() method.

It's getting quite late here now and I think I've reached the end of my viable energy for the day. I'll have to return to the topic of simplification tomorrow. Overnight I'll be pondering whether to lock the texture down to 24-bit RGB or allow other textures (primarily 16-bit textures) to be supported as well. Checking the ESR 91 code, it seems to be always assuming a LOCAL_GL_RGBA texture. Maybe we'd be safe just to do that?

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
2 Jun 2024 : Day 251 #
Unsurprisingly the build I started lat night had completed by this morning. Unsurprising because it already passed through multiple partial builds yesterday. I've uploaded and installed the packages, now it's time to find out which methods are being called and which are not.

The approach I've devised for this is to start my test harbour-webview app running using the debugger. I'll then attach breakpoints to all of the methods I listed back Day 247, then execute the application and record which breakpoints get hit.

As each is hit I'll disable the breakpoint so it doesn't trigger multiple times. Eventually enough breakpoints will be disabled that the app will be running without interference from the debugger.

At that point I should have captured a pretty good list of which methods are necessary and which are unused.

I've ended up running this in three batches to keep things manageable. The result is that the breakpoints for all of the 39 methods listed below were triggered, meaning that these methods are definitely being used by the WebView renderer.

Batch 1:
  1. GLContextProviderEGL::CreateOffscreen()
  2. DefaultEglLibrary()
  3. GLContext::fBindFramebuffer()
  4. GLContext::raw_fBindFramebuffer()
  5. GLContext::InitOffscreen()
  6. GLContext::CreateScreenBuffer()
  7. GLScreenBuffer::Create()
  8. GLScreenBuffer::GLScreenBuffer()
  9. CreateTextureForOffscreen()
  10. CreateTexture()
  11. GLContext::OffscreenSize()

Batch 2:
  1. SurfaceFactory::SurfaceFactory()
  2. ChooseBufferBits()
  3. GLScreenBuffer::Resize()
  4. SurfaceFactory::NewTexClient()
  5. SharedSurface::SharedSurface()
  6. SharedSurface::GetTextureFlags()
  7. SurfaceFactory::StartRecycling()
  8. GLScreenBuffer::Attach()
  9. GLScreenBuffer::CreateRead()
  10. ReadBuffer::Create()
  11. CreateRenderbuffersForOffscreen()
  12. GLScreenBuffer::BindFB()
  13. GLScreenBuffer::Morph()
  14. SurfaceFactory::~SurfaceFactory()
  15. SurfaceFactory::StopRecycling()
  16. ReadBuffer::Size()
  17. GLScreenBuffer::Swap()
  18. ReadBuffer::Attach()
  19. SurfaceFactory::RecycleCallback()

Batch 3:
  1. SurfaceFactory_Basic::SurfaceFactory_Basic()
  2. SharedSurface_Basic::SharedSurface_Basic()
  3. SharedSurfaceTextureClient::Create()
  4. SharedSurfaceTextureClient::SharedSurfaceTextureClient()
  5. SharedSurface_EGLImage::SharedSurface_EGLImage()
  6. SharedSurfaceTextureClient::~SharedSurfaceTextureClient()
  7. SharedSurface_Basic::~SharedSurface_Basic()
  8. CreateTextureImageEGL()
  9. TileGenFuncEGL()

The above methods all happened before the Web site had been fully rendered. The following were also hit, but a little later in the process; after rendering had apparently completed. I'm not sure whether this is really relevant, but I found it interesting:
  1. TextureImageEGL::TextureImageEGL()
  2. TextureImageEGL::BindTexture()
  3. TextureImageEGL::Resize()
  4. GLFormatForImage()
  5. GLTypeForImage()
  6. TextureImageEGL::DirectUpdate()
  7. TextureImageEGL::~TextureImageEGL()
  8. TextureImageEGL::ReleaseTexImage()
  9. TextureImageEGL::DestroyEGLSurface()

That leaves the following methods that were added as a result of the changes I've made in order to get the WebView renderer working, but don't actually appear to be used by it. These are all candidates to be removed. I've actually marked the ones which I eventually did remove, but will come to those as we progress.
  1. GLContext::GuaranteeResolve() - Removed
  2. GLContext::CreateScreenBufferImpl() - Removed
  3. GLScreenBuffer::CreateFactory() - Removed
  4. GLScreenBuffer::~GLScreenBuffer()
  5. GLScreenBuffer::BindDrawFB()
  6. GLScreenBuffer::BindReadFB()
  7. GLScreenBuffer::BindReadFB_Internal() - Removed
  8. GLScreenBuffer::GetDrawFB() - Removed
  9. GLScreenBuffer::GetReadFB() - Removed
  10. GLScreenBuffer::GetFB() - Removed
  11. GLScreenBuffer::CopyTexImage2D() - Removed
  12. GLScreenBuffer::ReadPixels() - Removed
  13. ReadBuffer::~ReadBuffer()
  14. SharedSurface::ProdCopy() - Removed
  15. SurfaceFactory::Recycle()
  16. SharedSurface_EGLImage::ReadPixels() - Removed
  17. SharedSurface_Basic::Wrap() - Removed
  18. SharedSurface_GLTexture::Create() - Removed
  19. SharedSurface_GLTexture::~SharedSurface_GLTexture() - Removed
  20. SharedSurface_GLTexture::ProducerReleaseImpl() - Removed
  21. SharedSurface_GLTexture::ToSurfaceDescriptor() - Removed
  22. TextureImageEGL::BindTexImage()

I'm now working through the code, checking where each of the above methods is actually called, if it is at all. This is quite an intricate process because even if a method isn't called within gecko, that doesn't mean it's not exported and called in code that links to libxul.so, such as qtmozembed or the sailfish-components-webview code.

After carefully working through the list above, it looks like I should be able to safely remove the following nine methods:
  1. GLContext::GuaranteeResolve() - Removed
  2. GLContext::CreateScreenBufferImpl() - Removed
  3. GLScreenBuffer::CreateFactory() - Removed
  4. GLScreenBuffer::BindReadFB_Internal() - Removed
  5. GLScreenBuffer::GetDrawFB() - Removed
  6. GLScreenBuffer::GetReadFB() - Removed
  7. GLScreenBuffer::GetFB() - Removed
  8. GLScreenBuffer::CopyTexImage2D() - Removed
  9. GLScreenBuffer::ReadPixels() - Removed

In addition to these, it looks like I'm also safe to remove the SurfaceCaps::preserve flag, since this is always set to false. Removing this flag also allows me to remove additional methods which are only ever called in the situations when this flag is set to true. So I've both removed the flag and simplified the code that's conditional on it.

Having made these changes I've built the library, installed it and run the test app again. The code built fine and everything appears to be working correctly, so these changes look to be safe. With these having gone through successfully, there are now some more methods which have become orphans as a result, so I can safely remove these as well:
  1. SharedSurface::ProdCopy() - Removed
  2. SharedSurface_EGLImage::ReadPixels() - Removed
  3. SharedSurface_Basic::Wrap() - Removed

None of the methods from SharedSurface_GLTexture are being called either, so I'm wondering whether it would make sense to remove the entire class. But on doing a quick search I can see that there is a single place where a SharedSurface_GLTexture instance is created. This is in the EmbedLiteCompositorBridgeParent::PrepareOffscreen() method where the relevant code looks like this:
  if (context->GetContextType() == GLContextType::EGL) {
    // [Basic/OGL Layers, OMTC] WebGL layer init.
    factory = SurfaceFactory_EGLImage::Create(context, screen->mCaps, nullptr, 
    flags);
  } else {
    // [Basic Layers, OMTC] WebGL layer init.
    // Well, this *should* work...
    factory = MakeUnique<SurfaceFactory_GLTexture>(context, screen->mCaps, 
    nullptr, flags);
  }
Om the device I'm using for testing the context type is set to GLContextType::EGL and so it's always SurfaceFactory_EGLImage that's used to create the surface. But can I be sure there won't be situations in which the other branch will be executed? Perhaps on other devices with different hardware and drivers available?

The value returned by context->GetContextType() is determined based on the class context is an instance of. There are multiple classes that inherit from GLContext and so could potentially be in use here. They all derive from GLContext and override the GetContextType() method to generate different return values for this call. Here's an example from GLContextEGL:
  virtual GLContextType GetContextType() const override {
    return GLContextType::EGL;
  }
To find out whether anything other than GLContextType::EGL will ever be returned I need to find out how the context object is created. Unfortunately this is a bit of a maze. For example, this is how the context is pulled in for use inside the PrepareOffscreen() method:
  GLContext* context = static_cast<CompositorOGL*>(
    state->mLayerManager->GetCompositor())->gl();
Not pretty. The place where the context appears to be created is in CompositorOGL::CreateContext(). The logic there is also a bit serpentine, but at least for the WebView, it eventually leads to a call of the following:
    context = GLContextProvider::CreateOffscreen(
        mSurfaceSize, caps, CreateContextFlags::REQUIRE_COMPAT_PROFILE,
        &discardFailureId);
That GLContextProvider::CreateOffscreen() actually goes through to GLContextProviderEGL::CreateOffscreen(). Why is that? That's because GLContextProvider provides only a macro which is actually implemented by GLContextProviderEGL. In fact, there are no other instances of CreateOffscreen() implemented by any other GLContextProvider types.

For completeness, CreateOffscreen() goes on to call the following, where as we progress through the list we go deeper down the call stack:
  1. GLContextProviderEGL::CreateHeadless()
  2. GLContextEGL::CreateEGLPBufferOffscreenContext()
  3. GLContextEGL::CreateEGLPBufferOffscreenContextImpl()
  4. GLContextEGL::CreateGLContext()
  5. new GLContextEGL()

As you can see, the deepest of these calls will create a GLContextProvider with type GLContextType::EGL. So it really does look like the context will always be of type GLContextEGL when this runs on a Sailfish device. I've therefore decided to remove the SharedSurface_GLTexture class completely, along with all of its associated code (e.g. SurfaceFactory_GLTexture) since this will never get used. That means removing all of the following:
  1. SharedSurface_GLTexture::Create() - Removed
  2. SharedSurface_GLTexture::~SharedSurface_GLTexture() - Removed
  3. SharedSurface_GLTexture::ProducerReleaseImpl() - Removed
  4. SharedSurface_GLTexture::ToSurfaceDescriptor() - Removed

Now to build and test the result:
$ make -j1 -C obj-build-mer-qt-xr/gfx/
$ make -j16 -C `pwd`/obj-build-mer-qt-xr/toolkit
Having removed it the code compiles fine, but hits a problem during linkage:
aarch64-meego-linux-gnu-ld: libxul.so: hidden symbol 
    `_ZN7mozilla2gl23SharedSurface_GLTexture6CreateEPNS0_9GLContextERKNS0_9GL
    FormatsERKNS_3gfx12IntSizeTypedINS7_12UnknownUnitsEEEb' isn't defined
aarch64-meego-linux-gnu-ld: final link failed: bad value
The reason for the failure is that I've also made changes to EmbedLiteCompositorBridgeParent.cpp, but the commands I ran to recompile the code didn't incorporate this file. That's how these partial builds work: you have to be careful to include all relevant directories in the make commands. So I'll need to ask the compiler to do a bit more work.
$ make -j1 -C obj-build-mer-qt-xr/mobile/sailfishos/
$ make -j16 -C `pwd`/obj-build-mer-qt-xr/toolkit
$ strip obj-build-mer-qt-xr/toolkit/library/build/libxul.so
This time it builds successfully and I'm able to copy the resulting library over to my device. With this new version of the library installed both the browser and WebView app run successfully without any problems.

That's at least three edit-rebuild-test cycles we've been round today. More than enough for one day I'd say. Tomorrow I'll continue stripping out unused code from the offscreen rendering patch. It's still a very large patch, so if there's any more redundant code it'd be really good to get rid of it.

After having spent so long floundering around trying to fix offscreen rendering over the last few months, it's really nice to be making steady progress again. Solid; mundane; and steady; but progress nonetheless.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
1 Jun 2024 : Day 250 #
The build completed overnight and since then I've also cleaned up and rebuilt all of the other components as well: qtmozembed, embedlite-components, sailfish-browser and sailfish-components-webview.

After installing them all on my development phone both the browser and the WebView are now working correctly and the additional debug prints I added for testing are no longer showing in the console output. So things are looking in decent shape. I'm now ready for this edit-rebuild-test cycle that I've been going on about for the last few days.

To prove the point (to myself, as much as to anyone else) that this can be a tight cycle, I'm going to start off small. The changes to mCaps look like they may not be necessary, so let's see what happens if I remove them. To do this I've made the following changes to the code.
 git diff
diff --git a/gfx/gl/GLContext.cpp b/gfx/gl/GLContext.cpp
index 03ee715a8f35..9e07257837f8 100644
--- a/gfx/gl/GLContext.cpp
+++ b/gfx/gl/GLContext.cpp
@@ -288,8 +288,6 @@ GLContext::GLContext(const GLContextDesc& desc, GLContext* 
    sharedContext,
       mSharedContext(sharedContext),
       mWorkAroundDriverBugs(
           StaticPrefs::gfx_work_around_driver_bugs_AtStartup()) {
-  mCaps.any = true;
-  mCaps.color = true;
   mOwningThreadId = PlatformThread::CurrentId();
   MOZ_ALWAYS_TRUE(sCurrentContext.init());
   sCurrentContext.set(0);
@@ -952,13 +950,6 @@ bool GLContext::InitImpl() {
   // We're ready for final setup.
   fBindFramebuffer(LOCAL_GL_FRAMEBUFFER, 0);
 
-  // TODO: Remove SurfaceCaps::any.
-  if (mCaps.any) {
-    mCaps.any = false;
-    mCaps.color = true;
-    mCaps.alpha = false;
-  }
-
   MOZ_GL_ASSERT(this, IsCurrent());
 
   if (ShouldSpew() && IsExtensionSupported(KHR_debug)) {
Having made these changes I can now perform a quick partial build directly from inside the build target using the following commands:
$ make -j1 -C obj-build-mer-qt-xr/gfx/gl/
$ make -j16 -C `pwd`/obj-build-mer-qt-xr/toolkit
$ strip obj-build-mer-qt-xr/toolkit/library/build/libxul.so
This gives us a new libxul.so file from which I've stripped the debug output in order to keep the filesize down. I've transferred this new library over to my phone and am now checking both the browser and the WebView to see whether I can spot any regressions following the changes:
sailfish-browser
harbour-webview
After running both of these and checking that the rendering is working correctly, I don't see any problems. Both are working nicely. So these changes look good and I can add them to the commit. Adding these changes to the commit will actually be reversing some changes that are in the commit already, so this will make the final patch simpler.

Next up, GLContext::GuaranteeResolve() looks unused, so I've removed it. I also notice that CreateScreenBufferImpl() only ever gets called by CreateScreenBuffer() so the two may as well be combined into a single method.

Finally there are some changes to GLContext::fCopyTexImage2D() which may or may not be important. I'm going to revert them to see whether that breaks anything.

Having made these changes it's now time to move to step two of the edit-rebuild-test cycle.
$ make -j1 -C obj-build-mer-qt-xr/gfx/gl/
$ make -j16 -C `pwd`/obj-build-mer-qt-xr/toolkit
$ strip obj-build-mer-qt-xr/toolkit/library/build/libxul.so
This is the same as before, which is why this is a cycle. With the freshly built library installed on my phone, time to check both the browser and the WebView once again for any potential regressions:
sailfish-browser
harbour-webview
All working! Great! Next, I notice that sSafeModeInitialized is set to true in gfxPlatform.cpp. I suspect this was a change I made back when I was basically trying everything and anything I could to get the code to work. But the upstream code has it set to false and the eventual patch will be simpler if I can leave it in this original state of being set to false. So I've made the following change to reflect this:
 git diff
diff --git a/gfx/thebes/gfxPlatform.cpp b/gfx/thebes/gfxPlatform.cpp
index 9c217dfc81e9..79e261e54f83 100644
--- a/gfx/thebes/gfxPlatform.cpp
+++ b/gfx/thebes/gfxPlatform.cpp
@@ -2045,7 +2045,7 @@ BackendType gfxPlatform::GetBackendPref(const char* 
    aBackendPrefName,
 }
 
 bool gfxPlatform::InSafeMode() {
-  static bool sSafeModeInitialized = true;
+  static bool sSafeModeInitialized = false;
   static bool sInSafeMode = false;
 
   if (!sSafeModeInitialized) {
Time now for step two of the edit-rebuild-test cycle again.

All working!

All of these changes have now been tested and committed, I need to continue working with these small changes and tests tomorrow and until the commit that fixes the offscreen rendering pipeline is looking more manageable in terms of size. Unfortunately as we've discussed before, the partial builds I've been doing mess up the debugging symbols and in fact I've been stripping the debug symbols from the library so as to allow it to be copied to my phone over the network more rapidly. So I'm going to run another full build overnight to get the debug symbols back.
$ sfdk build -d --with git_workaround
So with that running overnight, that's it for today. Tomorrow I'll be back to simplifying the code.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment