flypig.co.uk

List items

Items from the current list are shown below.

Blog

18 Jun 2024 : Day 262 #
I'm continuing to look into WebGL today. Yesterday I made a change to the GLLibraryEGL::Init() method to restore a call to GLLibraryEGL::CreateDisplay(). Debugging was getting tricky due to the debug source getting out of sync with the binary, which prevented gdb from tying addresses to source lines properly. So I kicked off a build overnight.

Unfortunately the build hasn't finished yet this morning, so I'm working on the theory &mdahs; rather than the practice — in the meantime. One thing I notice is that now the CreateDisplay() method gets called twice. It's possible this is intentional, but I'm not convinced so want to explore it a little further.

The first time it gets called is in the new location, inside the Init() method. This is as expected; it's the new call I just added yesterday:
Thread 37 "Compositor" hit Breakpoint 3, mozilla::gl::GLLibraryEGL::
    CreateDisplay (this=this@entry=0x7ed81a21e0, 
    forceAccel=forceAccel@entry=false, 
    out_failureId=out_failureId@entry=0x7f2e4e7f60, aDisplay=aDisplay@entry=0x1)
    at ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp:752
752     ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp: No such file or directory.
(gdb) bt
#0  mozilla::gl::GLLibraryEGL::CreateDisplay (this=this@entry=0x7ed81a21e0, 
    forceAccel=forceAccel@entry=false, 
    out_failureId=out_failureId@entry=0x7f2e4e7f60, aDisplay=aDisplay@entry=0x1)
    at ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp:752
#1  0x0000007ff28c09e0 in mozilla::gl::GLLibraryEGL::Init (
    this=this@entry=0x7ed81a21e0, forceAccel=forceAccel@entry=false, 
    out_failureId=out_failureId@entry=0x7f2e4e7f60, aDisplay=aDisplay@entry=0x1)
    at ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp:504
#2  0x0000007ff28c191c in mozilla::gl::GLContextProviderEGL::
    CreateWrappingExisting (aContext=0x7ed80049a0, aSurface=0x5555988610, 
    aDisplay=0x1)
    at ${PROJECT}/gecko-dev/gfx/gl/GLContextProviderEGL.cpp:1177
#3  0x0000007ff4e20f4c in mozilla::embedlite::nsWindow::GetGLContext (
    this=this@entry=0x7fc8ab77c0)
    at ${PROJECT}/gecko-dev/mobile/sailfishos/embedshared/nsWindow.cpp:405
#4  0x0000007ff4e21118 in mozilla::embedlite::nsWindow::GetNativeData (
    this=0x7fc8ab77c0, aDataType=12)
    at ${PROJECT}/gecko-dev/mobile/sailfishos/embedshared/nsWindow.cpp:173
#5  0x0000007ff293ac00 in mozilla::layers::CompositorOGL::CreateContext (
    this=this@entry=0x7ed8002f10)
    at ${PROJECT}/gecko-dev/gfx/layers/opengl/CompositorOGL.cpp:232
#6  0x0000007ff29504d4 in mozilla::layers::CompositorOGL::Initialize (
    this=0x7ed8002f10, out_failureReason=0x7f2e4e8510)
    at ${PROJECT}/gecko-dev/gfx/layers/opengl/CompositorOGL.cpp:393
#7  0x0000007ff2a66244 in mozilla::layers::CompositorBridgeParent::
    NewCompositor (this=this@entry=0x7fc8a7f630, aBackendHints=...)
    at ${PROJECT}/gecko-dev/gfx/layers/ipc/CompositorBridgeParent.cpp:1493
#8  0x0000007ff2a712c0 in mozilla::layers::CompositorBridgeParent::
    InitializeLayerManager (this=this@entry=0x7fc8a7f630, aBackendHints=...)
    at ${PROJECT}/gecko-dev/gfx/layers/ipc/CompositorBridgeParent.cpp:1436
#9  0x0000007ff2a713f0 in mozilla::layers::CompositorBridgeParent::
    AllocPLayerTransactionParent (this=this@entry=0x7fc8a7f630, 
    aBackendHints=..., aId=...)
    at ${PROJECT}/gecko-dev/gfx/layers/ipc/CompositorBridgeParent.cpp:1546
#10 0x0000007ff4e08008 in mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    AllocPLayerTransactionParent (this=0x7fc8a7f630, aBackendHints=..., 
    aId=...)
    at ${PROJECT}/gecko-dev/mobile/sailfishos/embedthread/
    EmbedLiteCompositorBridgeParent.cpp:80
#11 0x0000007ff24022f0 in mozilla::layers::PCompositorBridgeParent::
    OnMessageReceived (this=0x7fc8a7f630, msg__=...) at 
    PCompositorBridgeParent.cpp:1285
[...]
#26 0x0000007fefbac89c in ?? () from /lib64/libc.so.6
(gdb) 
But it hits a second time as well. It's possible this is when it would have been called without the addition I just made, but I'm not certain. This time it's happening inside GLLibraryEGL::DefaultDisplay(). This is odd. I'll come on to why it's odd in a moment, but first here are the relevant parts of the backtrace:
Thread 37 "Compositor" hit Breakpoint 3, mozilla::gl::GLLibraryEGL::
    CreateDisplay (this=this@entry=0x7ed81a21e0, 
    forceAccel=forceAccel@entry=false, 
    out_failureId=out_failureId@entry=0x7f2e4e7f60, aDisplay=aDisplay@entry=0x1)
    at ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp:752
752     in ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp
(gdb) bt
#0  mozilla::gl::GLLibraryEGL::CreateDisplay (this=this@entry=0x7ed81a21e0, 
    forceAccel=forceAccel@entry=false, 
    out_failureId=out_failureId@entry=0x7f2e4e7f60, aDisplay=aDisplay@entry=0x1)
    at ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp:752
#1  0x0000007ff28c1554 in mozilla::gl::GLLibraryEGL::DefaultDisplay (
    this=this@entry=0x7ed81a21e0, 
    out_failureId=out_failureId@entry=0x7f2e4e7f60, 
    aDisplay=aDisplay@entry=0x1) at ${PROJECT}/gecko-dev/gfx/gl/
    GLLibraryEGL.cpp:745
#2  0x0000007ff28c19a4 in mozilla::gl::GLContextProviderEGL::
    CreateWrappingExisting (aContext=0x7ed80049a0, aSurface=0x5555988610, 
    aDisplay=0x1)
    at ${PROJECT}/gecko-dev/gfx/gl/GLContextProviderEGL.cpp:1183
#3  0x0000007ff4e20f4c in mozilla::embedlite::nsWindow::GetGLContext (
    this=this@entry=0x7fc8ab77c0)
    at ${PROJECT}/gecko-dev/mobile/sailfishos/embedshared/nsWindow.cpp:405
#4  0x0000007ff4e21118 in mozilla::embedlite::nsWindow::GetNativeData (
    this=0x7fc8ab77c0, aDataType=12)
    at ${PROJECT}/gecko-dev/mobile/sailfishos/embedshared/nsWindow.cpp:173
#5  0x0000007ff293ac00 in mozilla::layers::CompositorOGL::CreateContext (
    this=this@entry=0x7ed8002f10)
    at ${PROJECT}/gecko-dev/gfx/layers/opengl/CompositorOGL.cpp:232
#6  0x0000007ff29504d4 in mozilla::layers::CompositorOGL::Initialize (
    this=0x7ed8002f10, out_failureReason=0x7f2e4e8510)
    at ${PROJECT}/gecko-dev/gfx/layers/opengl/CompositorOGL.cpp:393
#7  0x0000007ff2a66244 in mozilla::layers::CompositorBridgeParent::
    NewCompositor (this=this@entry=0x7fc8a7f630, aBackendHints=...)
    at ${PROJECT}/gecko-dev/gfx/layers/ipc/CompositorBridgeParent.cpp:1493
#8  0x0000007ff2a712c0 in mozilla::layers::CompositorBridgeParent::
    InitializeLayerManager (this=this@entry=0x7fc8a7f630, aBackendHints=...)
    at ${PROJECT}/gecko-dev/gfx/layers/ipc/CompositorBridgeParent.cpp:1436
#9  0x0000007ff2a713f0 in mozilla::layers::CompositorBridgeParent::
    AllocPLayerTransactionParent (this=this@entry=0x7fc8a7f630, 
    aBackendHints=..., aId=...)
    at ${PROJECT}/gecko-dev/gfx/layers/ipc/CompositorBridgeParent.cpp:1546
#10 0x0000007ff4e08008 in mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    AllocPLayerTransactionParent (this=0x7fc8a7f630, aBackendHints=..., 
    aId=...)
    at ${PROJECT}/gecko-dev/mobile/sailfishos/embedthread/
    EmbedLiteCompositorBridgeParent.cpp:80
#11 0x0000007ff24022f0 in mozilla::layers::PCompositorBridgeParent::
    OnMessageReceived (this=0x7fc8a7f630, msg__=...) at 
    PCompositorBridgeParent.cpp:1285
#12 0x0000007ff2446804 in mozilla::layers::PCompositorManagerParent::
    OnMessageReceived (this=<optimized out>, msg__=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/ipc/ProtocolUtils.h:
    675
[...]
#26 0x0000007fefbac89c in ?? () from /lib64/libc.so.6
(gdb) 
If this is where it would have been called previously, why is the call here unexpected? It's because of the code directly before the call to CreateDisplay(). Here's the call that's made in the Init() method:
  std::shared_ptr<EglDisplay> defaultDisplay = CreateDisplay(forceAccel, 
    out_failureId, aDisplay);
  if (!defaultDisplay) {
    return false;
  }
  mDefaultDisplay = defaultDisplay;
Notice how mDefaultDisplay is called directly after the call returns. There's no call to Init() to be found in the second backtrace so we can assume that the first call to CreateDisplay() safely completed before the second call was made. But now the code that appears inside DefaultDisplay() looks like this:
std::shared_ptr<EglDisplay> GLLibraryEGL::DefaultDisplay(
    nsACString* const out_failureId, EGLDisplay aDisplay) {
  auto ret = mDefaultDisplay.lock();
  if (ret) return ret;

  ret = CreateDisplay(false, out_failureId, aDisplay);
  mDefaultDisplay = ret;
  return ret;
}
Given the previous chunk of code, by this time the mDefaultDisplay should be non-null and the if (ret) return ret line should therefore be returning immedately as a consequence. In other words, it should be returning before the call to CreateDisplay().

There is another possibility, which is that the first call to CreateDisplay() is returning a null pointer. Initially I thought I'd need line-by-line debugging to check this, but it turns out there's another way:
    out_failureId=out_failureId@entry=0x7f16603f60, aDisplay=aDisplay@entry=0x1)
    at ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp:752
752     ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp: No such file or directory.
(gdb) p mDefaultDisplay
$2 = std::weak_ptr<mozilla::gl::EglDisplay> (empty) = {get() = 0x0}
(gdb) c
Continuing.

Thread 37 &quot;Compositor&quot; hit Breakpoint 3, mozilla::gl::GLLibraryEGL::
    CreateDisplay (this=this@entry=0x7ed81a20a0, 
    forceAccel=forceAccel@entry=false, 
    out_failureId=out_failureId@entry=0x7f16603f60, aDisplay=aDisplay@entry=0x1)
    at ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp:752
752     in ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp
(gdb) p mDefaultDisplay
$3 = std::weak_ptr<mozilla::gl::EglDisplay> (expired, weak count 1) = {get() = 
    0x7ed819b120}
(gdb) 
This little sequence initially looks odd. The second time CreateDisplay() is called the value of mDefaultDisplay is non-null. But if that's the case, why isn't DefaultDsipay() returning early?

Assuming for the timebeing the first call isn't null, that might imply that there are two different instances of GLLibraryEGL in operation. I thought that maybe this might be as a consequence of there being WebGL content on the page, but when I checked using a page without any such content, I got exactly the same result.

But taking another look at that debug output above, the reason is right there after all. It's because in the second case the pointer (which is a weak pointer) has already expired, meaning that its reference count reached zero. Consequently the call to lock it in DefaultDisplay will be returning null.

That still seems a little odd if I'm honest. I'd have expected it not to expire until either the window or the browser is shut.

Now I don't know for sure that this is wrong, but it certainly feels wrong. Why create a display object and then immediately drop it?

I want to investigate this further, but it's time for my work day to start so I'll have to pause until this evening. This is no bad thing: it'll give time for the build to complete, by which time I'll have access to line-by-line debugging and the git history, both of which I anticipate being super-helpful in getting to the bottom of this. Until later then.

[...]

The build took a lot longer than I'd anticipated, too long to do any debugging today. But I have at least been able to transfer over and isntall the new packages. That means tomorrow debugging will be much eaiser and I should be able to find out why this weak pointer is getting released before there's a proper chance to make use of it.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.

Comments

Uncover Disqus comments