flypig.co.uk

Gecko-dev Diary

Starting in August 2023 I'll be upgrading the Sailfish OS browser from Gecko version ESR 78 to ESR 91. This page catalogues my progress.

Latest code changes are in the gecko-dev sailfishos-esr91 branch.

There is an index of all posts in case you want to jump to a particular day.

Gecko RSS feed Click the icon for the Gecko-dev Diary RSS feed.

Gecko

5 most recent items

28 Jun 2024 : Day 272 #
We're getting a crash with the WebView because the GLSCreenBuffer isn't defined before it's first accessed. We've been here before, but the code has changed and I need to go through the process of figuring out why, again.

Finding out shouldn't be difficult, just a little laborious. I need to install the old working packages, place a breakpoint on the GLScreenBuffer constructor, run the code and wait for a backtrace to magically appear. Here's the result:
Thread 38 "Compositor" hit Breakpoint 1, mozilla::gl::GLScreenBuffer::
    GLScreenBuffer (this=0x7edc003900, gl=0x7edc19aa50, factory=...)
    at gfx/gl/GLScreenBuffer.cpp:183
183     GLScreenBuffer::GLScreenBuffer(GLContext* gl, UniquePtr<SurfaceFactory> 
    factory)
(gdb) bt
#0  mozilla::gl::GLScreenBuffer::GLScreenBuffer (this=0x7edc003900, 
    gl=0x7edc19aa50, factory=...)
    at gfx/gl/GLScreenBuffer.cpp:183
#1  0x0000007ff1104e00 in mozilla::gl::GLScreenBuffer::Create (
    gl=gl@entry=0x7edc19aa50, size=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/cxxalloc.h:33
#2  0x0000007ff112f49c in mozilla::gl::GLContext::CreateScreenBuffer (
    this=this@entry=0x7edc19aa50, size=...)
    at gfx/gl/GLContext.cpp:2076
#3  0x0000007ff112f51c in mozilla::gl::GLContext::InitOffscreen (
    this=this@entry=0x7edc19aa50, size=...)
    at gfx/gl/GLContext.cpp:2346
#4  0x0000007ff112f65c in mozilla::gl::GLContextProviderEGL::CreateOffscreen (
    size=..., 
    flags=flags@entry=mozilla::gl::CreateContextFlags::REQUIRE_COMPAT_PROFILE, 
    out_failureId=out_failureId@entry=0x7f1f7471c8)
    at gfx/gl/GLContextProviderEGL.cpp:1462
#5  0x0000007ff11982fc in mozilla::layers::CompositorOGL::CreateContext (
    this=this@entry=0x7edc002f10)
    at gfx/layers/opengl/CompositorOGL.cpp:250
#6  0x0000007ff11adad4 in mozilla::layers::CompositorOGL::Initialize (
    this=0x7edc002f10, out_failureReason=0x7f1f747520)
    at gfx/layers/opengl/CompositorOGL.cpp:387
#7  0x0000007ff12c3850 in mozilla::layers::CompositorBridgeParent::
    NewCompositor (this=this@entry=0x7fc4b7a7e0, aBackendHints=...)
    at gfx/layers/ipc/CompositorBridgeParent.cpp:1493
#8  0x0000007ff12ce8cc in mozilla::layers::CompositorBridgeParent::
    InitializeLayerManager (this=this@entry=0x7fc4b7a7e0, aBackendHints=...)
    at gfx/layers/ipc/CompositorBridgeParent.cpp:1436
#9  0x0000007ff12ce9fc in mozilla::layers::CompositorBridgeParent::
    AllocPLayerTransactionParent (this=this@entry=0x7fc4b7a7e0, 
    aBackendHints=..., aId=...)
    at gfx/layers/ipc/CompositorBridgeParent.cpp:1546
#10 0x0000007ff3665628 in mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    AllocPLayerTransactionParent (this=0x7fc4b7a7e0, aBackendHints=..., 
    aId=...)
    at mobile/sailfishos/embedthread/EmbedLiteCompositorBridgeParent.cpp:80
#11 0x0000007ff0c5e490 in mozilla::layers::PCompositorBridgeParent::
    OnMessageReceived (this=0x7fc4b7a7e0, msg__=...) at 
    PCompositorBridgeParent.cpp:1285
[...]
#26 0x0000007ff6a0289c in ?? () from /lib64/libc.so.6
(gdb) 
I've needed this backtrace so many times, I should probably have it printed on a poster to hang on my wall.

Next up I've installed my latest WebGL-broken version and have placed breakpoints on the top n methods that appear in the backtrace:
(gdb) info break
Num     Type           Disp Enb Address    What
1       breakpoint     keep y   <PENDING>  GLContextProviderEGL::CreateOffscreen
2       breakpoint     keep y   <PENDING>  GLContext::InitOffscreen
3       breakpoint     keep y   <PENDING>  GLContext::CreateScreenBuffer
4       breakpoint     keep y   <PENDING>  GLScreenBuffer::Create
If I've chosen a list that's long enough, something at the bottom end will trigger a breakpoint, while the code at the top end will never be reached, all happening before the segfault occurs.

That will tell me the method I need to look in to find out why the GLScreenBuffer isn't being created. I don't expect the result to be a surprise, but going through this process is actually going to be quicker than me trying to remember.

First time around I got no hit... I guess I didn't cast my hook far enough. I'll need to add extra breakpoints and try again:
(gdb) info break
Num     Type           Disp Enb Address    What
1       breakpoint     keep y   <PENDING>  GLScreenBuffer::GLScreenBuffer
2       breakpoint     keep y   <PENDING>  GLScreenBuffer::Create
3       breakpoint     keep y   <PENDING>  GLContext::CreateScreenBuffer
4       breakpoint     keep y   <PENDING>  GLContext::InitOffscreen
5       breakpoint     keep y   <PENDING>  GLContextProviderEGL::CreateOffscreen
6       breakpoint     keep y   <PENDING>  CompositorOGL::CreateContext
7       breakpoint     keep y   <PENDING>  CompositorOGL::Initialize
8       breakpoint     keep y   <PENDING>  CompositorBridgeParent::NewCompositor
9       breakpoint     keep y   <PENDING>  CompositorBridgeParent::
    InitializeLayerManager
With the extra layers added, now the result comes back positive:
Thread 38 &quot;Compositor&quot; hit Breakpoint 9, mozilla::layers::
    CompositorBridgeParent::InitializeLayerManager (
    this=this@entry=0x7fc4b01f70, aBackendHints=...)
    at gfx/layers/ipc/CompositorBridgeParent.cpp:1432
1432        const nsTArray<LayersBackend>& aBackendHints) {
(gdb) c
Continuing.

Thread 38 &quot;Compositor&quot; hit Breakpoint 8, mozilla::layers::
    CompositorBridgeParent::NewCompositor (this=this@entry=0x7fc4b01f70, 
    aBackendHints=...)
    at gfx/layers/ipc/CompositorBridgeParent.cpp:1455
1455        const nsTArray<LayersBackend>& aBackendHints) {
(gdb) c
Continuing.

Thread 38 &quot;Compositor&quot; hit Breakpoint 7, mozilla::layers::
    CompositorOGL::Initialize (this=0x7ed4002ed0, 
    out_failureReason=0x7f1f9a5520)
    at gfx/layers/opengl/CompositorOGL.cpp:384
384     bool CompositorOGL::Initialize(nsCString* const out_failureReason) {
(gdb) c
Continuing.

Thread 38 &quot;Compositor&quot; hit Breakpoint 6, mozilla::layers::
    CompositorOGL::CreateContext (this=this@entry=0x7ed4002ed0)
    at gfx/layers/opengl/CompositorOGL.cpp:227
227     already_AddRefed<mozilla::gl::GLContext> CompositorOGL::CreateContext() 
    {
(gdb) c
Continuing.
=============== Preparing offscreen rendering context ===============

Thread 38 &quot;Compositor&quot; received signal SIGSEGV, Segmentation fault.
0x0000007ff366520c in mozilla::gl::GLScreenBuffer::Size (this=0x0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:290
290     ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h: No 
    such file or directory.
(gdb) 
This tells us that CompositorOGL::CreateContext() is being called, but then that method is failing to call GLContextProviderEGL::CreateOffscreen(). Let's find out why. A diff on the source file immediately clarifies the reason:
$ git diff gfx/layers/opengl/CompositorOGL.cpp
diff --git a/gfx/layers/opengl/CompositorOGL.cpp b/gfx/layers/opengl/
    CompositorOGL.cpp
index 122709eaf2de..06c84a9ebdaa 100644
--- a/gfx/layers/opengl/CompositorOGL.cpp
+++ b/gfx/layers/opengl/CompositorOGL.cpp
@@ -247,9 +247,15 @@ already_AddRefed<mozilla::gl::GLContext> CompositorOGL::
    CreateContext() {
   // Allow to create offscreen GL context for main Layer Manager
   if (!context && gfxEnv::LayersPreferOffscreen()) {
     nsCString discardFailureId;
-    context = GLContextProvider::CreateOffscreen(
-        mSurfaceSize, CreateContextFlags::REQUIRE_COMPAT_PROFILE,
-        &discardFailureId);
+
+    context = GLContextProvider::CreateHeadless(
+        {CreateContextFlags::REQUIRE_COMPAT_PROFILE}, &discardFailureId);
+    if (!context->CreateOffscreenDefaultFb(mSurfaceSize)) {
+      context = nullptr;
+    }
   }
 
   if (!context) {
Looking at this code and checking the diff of the other code I've removed since my last commit, I can see that the entire chain of code that includes all of the thread from the above backtrace is missing from the source. Here are the missing methods:
CreateOffscreen()
InitOffscreen()
CreateScreenBuffer()
GLScreenBuffer::Create()
That's because I removed all of them, hoping that GLContextProvider::CreateHeadless() would provide a suitable alternative. But if I'm going to use CreateHeadless() as an alternative I'm going to have to ensure a GLScreenBuffer instance is created somewhere in the process.

I now need to decide whether I'm going to restore the five missing methods, or alternatively try to extract their core functionality to be merged together into a single method to hook into the CreateHeadless() flow. This will require some thought and a bit of deeper investigation.

Let's summarise things. First, this is the code that's currently being executed.
    context = GLContextProvider::CreateHeadless(
        {CreateContextFlags::REQUIRE_COMPAT_PROFILE}, &discardFailureId);
    if (!context->CreateOffscreenDefaultFb(mSurfaceSize)) {
      context = nullptr;
    }
The active ingredients here are CreateHeadless() and CreateOffscreenDefaultFb(). This replaces the following three lines which were in use before:
    context = GLContextProvider::CreateOffscreen(
        mSurfaceSize, CreateContextFlags::REQUIRE_COMPAT_PROFILE,
        &discardFailureId);
Here we can see that the active ingredient is CreateOffscreen(). We want to restore this functionality, so the next question it'd be helpful to know the answer to is exactly what CreateOffscreen() does.

We already know that it calls InitOffscreen() which calls CreateScreenBuffer() which calls GLScreenBuffer::Create(). So let's be explicit about this and take a look at all of those methods. Here they are, out of context and lined up in a row like ducks:
already_AddRefed<GLContext> GLContextProviderEGL::CreateOffscreen(
    const mozilla::gfx::IntSize& size,
    CreateContextFlags flags, nsACString* const out_failureId) {

  RefPtr<GLContext> gl;
  const GLContextCreateDesc desc{flags = flags};

  gl = CreateHeadless(desc, out_failureId);
  if (!gl) {
    return nullptr;
  }

  // Init the offscreen with the updated offscreen caps.
  if (!gl->InitOffscreen(size)) {
    *out_failureId = &quot;FEATURE_FAILURE_EGL_OFFSCREEN&quot;_ns;
    return nullptr;
  }

  return gl.forget();
}

bool GLContext::InitOffscreen(const gfx::IntSize& size) {
  if (!CreateScreenBuffer(size)) return false;

  if (!MakeCurrent()) {
    return false;
  }
  fBindFramebuffer(LOCAL_GL_FRAMEBUFFER, 0);
  fScissor(0, 0, size.width, size.height);
  fViewport(0, 0, size.width, size.height);

  return true;
}

bool GLContext::CreateScreenBuffer(const IntSize& size) {
  if (!IsOffscreenSizeAllowed(size)) return false;

  UniquePtr<GLScreenBuffer> newScreen =
      GLScreenBuffer::Create(this, size);
  if (!newScreen) return false;

  if (!newScreen->Resize(size)) {
    return false;
  }

  // This will rebind to 0 (Screen) if needed when
  // it falls out of scope.
  ScopedBindFramebuffer autoFB(this);

  mScreen = std::move(newScreen);

  return true;
}
The neat thing is that almost all of the functionality being accessed by these has public accessibility. That means we can roll all three of these into a single method that we could, say, include as part of GLContextProviderEGL(). Here's the refactored functionality combined together:
already_AddRefed<GLContext> GLContextProviderEGL::CreateOffscreen(
    const mozilla::gfx::IntSize& size,
    CreateContextFlags flags, nsACString* const out_failureId) {

  RefPtr<GLContext> gl;

  gl = CreateHeadless({CreateContextFlags::REQUIRE_COMPAT_PROFILE}, 
    out_failureId);

  // Init the offscreen with the updated offscreen caps.
  if (!gl || !gl->IsOffscreenSizeAllowed(size)) {
    return nullptr;
  }

  UniquePtr<GLScreenBuffer> newScreen = GLScreenBuffer::Create(gl, size);
  if ((!newScreen) || (!newScreen->Resize(size))) {
    return nullptr;
  }

  // This will rebind to 0 (Screen) if needed when
  // it falls out of scope.
  ScopedBindFramebuffer autoFB(gl);

  gl->mScreen = std::move(newScreen);

  if (!gl->MakeCurrent()) {
    return nullptr;
  }
  gl->fBindFramebuffer(LOCAL_GL_FRAMEBUFFER, 0);
  gl->fScissor(0, 0, size.width, size.height);
  gl->fViewport(0, 0, size.width, size.height);

  return gl.forget();
}
Now this compacted version is rather theoretical at this stage, I've just manually combined everything by inspection. It's easy to make mistakes with this kind of thing, so what I want to do now is try it to see whether this works.

I've done a partial build and hooked everything together. Now when I run this I get interesting results. Running the browser gives good results: rendering works and the WebGL is functional as well.

Running the WebView is a bit more mised: there's no longer a crash (yay!) but rendering is broken (boo!). The errors generated look really serious given all the wailing and gnashing of teeth coming from the debug output, but in practice they're not blocking execution or triggering a crash. Here's the console output:
Created LOG for EmbedLiteLayerManager
Crash Annotation GraphicsCriticalError: |[0][GFX1-]: Failed to create 
    EGLConfig! (t=0.830283) [GFX1-]: Failed to create EGLConfig!
Crash Annotation GraphicsCriticalError: |[0][GFX1-]: Failed to create 
    EGLConfig! (t=0.830283) |[1][GFX1-]: Failed to create EGLConfig! (
    t=0.831443) [GFX1-]: Failed to create EGLConfig!
Crash Annotation GraphicsCriticalError: |[0][GFX1-]: Failed to create 
    EGLConfig! (t=0.830283) |[1][GFX1-]: Failed to create EGLConfig! (
    t=0.831443) |[2][GFX1-]: [OPENGL] Failed to init compositor with reason: 
    FEATURE_FAILURE_OPENGL_CREATE_CONTEXT (t=0.832053) [GFX1-]: [OPENGL] Failed 
    to init compositor with reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT
=============== Preparing offscreen rendering context ===============
The important part of this output is the error reason of FEATURE_FAILURE_OPENGL_CREATE_CONTEXT. That suggets that our rolled up method is returning null when it should be returning a GLContext instance.

The easiest way to find out why it's doing this will be to step through the code, but I can't do that using a binary from a partial build, I'll need to run a full build first.

So I've set the build going. With any luck it'll be completed by morning.

Despite this error I'm excited by these developments. The WebGL remains intact and the WebView isn't crashing. I feel like I'm honing in on where the problem is, and that makes me hopefully that we'll be able to reach a solution soon.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.

Comments

Uncover Disqus comments