flypig.co.uk

List items

Items from the current list are shown below.

Blog

26 Feb 2024 : Day 168 #
Overnight the build I started yesterday successfully finished. That, in itself, is a bit of a surprise (no stupid syntax errors in my code!). This morning I've copied over the packages and installed them, and now I'm on the train ready to debug.

I optimistically run the app without the debugger. The window appears again. There's no rendering, just a white screen, but there's also no immediate crash and no obvious errors in the debug output.

After running for around twenty seconds or so, the app then crashes.
$ time harbour-webview 
[D] unknown:0 - QML debugging is enabled. Only use this in a safe environment.
[D] main:30 - WebView Example
[D] main:44 - Using default start URL:  "https://www.flypig.co.uk/search/"
[D] main:47 - Opening webview
[D] unknown:0 - Using Wayland-EGL
library "libutils.so" not found
[...]
JSComp: UserAgentOverrideHelper.js loaded
UserAgentOverrideHelper app-startup
CONSOLE message:
[JavaScript Error: "Unexpected event profile-after-change"
    {file: "resource://gre/modules/URLQueryStrippingListService.jsm" line: 228}]
observe@resource://gre/modules/URLQueryStrippingListService.jsm:228:12

Created LOG for EmbedPrefs
Created LOG for EmbedLiteLayerManager
Command terminated by signal 11
real    0m 20.82s
user    0m 0.87s
sys     0m 0.23s
This is quite unexpected behaviour if I'm honest. Something is causing it to crash after a prolonged period ("prolonged" meaning from the perspective of computation, rather than from the perspective of the user).

That was without the debugger; I'd better try it with the debugger to find out why it's crashing.
$ gdb harbour-webview 
GNU gdb (GDB) Mer (8.2.1+git9)
[...]
(gdb) r
Starting program: /usr/bin/harbour-webview 
[...]
Thread 37 "Compositor" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 18684]
mozilla::gl::SwapChain::Resize (this=0x0, size=...)
    at gfx/gl/GLScreenBuffer.cpp:134
134           mFactory->CreateShared(size);
(gdb) bt
#0  mozilla::gl::SwapChain::Resize (this=0x0, size=...)
    at gfx/gl/GLScreenBuffer.cpp:134
#1  0x0000007ff110dc14 in mozilla::gl::GLContext::ResizeScreenBuffer
    (this=this@entry=0x7edc19ee40, size=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:290
#2  0x0000007ff119b8d4 in mozilla::layers::CompositorOGL::CreateContext
    (this=this@entry=0x7edc002f10)
    at gfx/layers/opengl/CompositorOGL.cpp:264
#3  0x0000007ff11b0ea8 in mozilla::layers::CompositorOGL::Initialize
    (this=0x7edc002f10, out_failureReason=0x7f17aac520)
    at gfx/layers/opengl/CompositorOGL.cpp:394
#4  0x0000007ff12c68e8 in mozilla::layers::CompositorBridgeParent::NewCompositor
    (this=this@entry=0x7fc4b7b450, aBackendHints=...)
    at gfx/layers/ipc/CompositorBridgeParent.cpp:1493
#5  0x0000007ff12d1964 in mozilla::layers::CompositorBridgeParent::
    InitializeLayerManager (this=this@entry=0x7fc4b7b450, aBackendHints=...)
    at gfx/layers/ipc/CompositorBridgeParent.cpp:1436
#6  0x0000007ff12d1a94 in mozilla::layers::CompositorBridgeParent::
    AllocPLayerTransactionParent (this=this@entry=0x7fc4b7b450,
    aBackendHints=..., aId=...)
    at gfx/layers/ipc/CompositorBridgeParent.cpp:1546
#7  0x0000007ff36682b8 in mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    AllocPLayerTransactionParent (this=0x7fc4b7b450, aBackendHints=..., 
    aId=...)
    at mobile/sailfishos/embedthread/EmbedLiteCompositorBridgeParent.cpp:80
#8  0x0000007ff0c65ad0 in mozilla::layers::PCompositorBridgeParent::
    OnMessageReceived (this=0x7fc4b7b450, msg__=...)
    at PCompositorBridgeParent.cpp:1285
#9  0x0000007ff0ca9fe4 in mozilla::layers::PCompositorManagerParent::
    OnMessageReceived (this=<optimized out>, msg__=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/ipc/
    ProtocolUtils.h:675
#10 0x0000007ff0bc985c in mozilla::ipc::MessageChannel::DispatchAsyncMessage
    (this=this@entry=0x7fc4d82fb8, aProxy=aProxy@entry=0x7edc002aa0, aMsg=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/ipc/
    ProtocolUtils.h:675
[...]
#23 0x0000007ff6a0489c in ?? () from /lib64/libc.so.6
(gdb) 
As before, it runs for around twenty seconds, then crashes. The line that's causing the crash is this one:
bool SwapChain::Resize(const gfx::IntSize& size) {
  UniquePtr<SharedSurface> newBack =
      mFactory->CreateShared(size);
[...]
}
And the reason isn't because mFactory is null, it's because this (meaning the SwapChain instance) is null. But when I try to access the memory to show that it's null using the debugger I start getting strange errors:
(gdb) p mFactory
Cannot access memory at address 0x8
(gdb) frame 1
#1  0x0000007ff110dc14 in mozilla::gl::GLContext::ResizeScreenBuffer
    (this=this@entry=0x7edc19ee40, size=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:290
290     ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:
    No such file or directory.
(gdb) p mSwapChain
Cannot access memory at address 0x7edc19f838
(gdb) p this
$1 = (mozilla::gl::GLContext * const) 0x7edc19ee40
(gdb) frame 2
#2  0x0000007ff119b8d4 in mozilla::layers::CompositorOGL::CreateContext
    (this=this@entry=0x7edc002f10)
    at gfx/layers/opengl/CompositorOGL.cpp:264
264       bool success = context->ResizeScreenBuffer(mSurfaceSize);
(gdb) p context
$2 = {mRawPtr = 0x7edc19ee40}
(gdb) p context->mRawPtr
Attempt to take address of value not located in memory.
(gdb) p context->mRawPtr->mSwapChain
Attempt to take address of value not located in memory.
I wonder if this is being caused by a memory leak that quickly gets out of hand? Placing a breakpoint on GLContext::ResizeScreenBuffer()K shows that it's not due to repeated calls to this method: this gets called only once, at which point there's an immediate segfault.
(gdb) b GLContext::ResizeScreenBuffer
Breakpoint 1 at 0x7ff110dbdc: file gf
x/gl/GLContext.cpp, line 1885.
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/bin/harbour-webview
[...]
Thread 37 "Compositor" hit Breakpoint 1, mozilla::gl::GLContext::
    ResizeScreenBuffer (this=this@entry=0x7ed419ee40, size=...)
    at gfx/gl/GLContext.cpp:1885
1885    bool GLContext::ResizeScreenBuffer(const gfx::IntSize& size) {
(gdb) c
Continuing.

Thread 37 "Compositor" received signal SIGSEGV, Segmentation fault.
mozilla::gl::SwapChain::Resize (this=0x0, size=...)
    at gfx/gl/GLScreenBuffer.cpp:134
134           mFactory->CreateShared(size);
(gdb)                            
I'm curious to know what's happening after twenty seconds that would cause this. Looking more carefully at the backtrace for the crash above, it's strange that an attempt is being made to create the compositor. Shouldn't that have already been created? I wonder if this delay is related to network connectivity.

As usual I'm attempting this debugging on the train. But my development phone has no Internet connectivity here. So perhaps it's waiting for a connection before creating the compositor? Maybe the connection fails after twenty seconds at which point the compositor is created and the library segfaults.

This seems plausible, even if it doesn't quite explain the peculiar nature of the debugging that followed, where I couldn't access any of the variables.

Let's assume this is the case, back up a bit, and try to capture some state before the crash happens. If the crash is causing memory corruption, that might explain the lack of accessible variables. And if that's the case, then catching execution before the memory gets messed up should allow us to get a clearer picture.
(gdb) b CompositorOGL::CreateContext
Breakpoint 2 at 0x7ff119b764: file gfx/layers/opengl/CompositorOGL.cpp,
    line 227.
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/bin/harbour-webview 
[...]
We're coming in to London now, so time to pause and rapidly pack up my stuff before we pull in to the station!

[...]

You'll be pleased to hear I made it off the train safely and with all my belongings. It was touch-and-go for a few seconds there though. I'm now travelling in the opposite direction on (I hope) the adjacent tracks. Time to return to that debugging.

I'm happy to discover, despite having literally pulled the plug on my phone mid-debug, that on reattaching the cable and restoring my gnu screen session, the debugger is still in exactly the same state that I left it. Linux is great!

And now we have a bit more luck again from the captured backtrace:
Thread 37 "Compositor" hit Breakpoint 2, mozilla::layers::CompositorOGL::
    CreateContext (this=this@entry=0x7edc002ed0)
    at gfx/layers/opengl/CompositorOG
L.cpp:227
227     already_AddRefed<mozilla::gl::GLContext> CompositorOGL::CreateContext() {
(gdb) p context
$3 = <optimized out>
(gdb) p mSwapChain
No symbol "mSwapChain" in current context.
(gdb) p context
$4 = <optimized out>
(gdb) bt
#0  mozilla::layers::CompositorOGL::CreateContext (this=this@entry=0x7edc002ed0)
    at gfx/layers/opengl/CompositorOG
L.cpp:227
#1  0x0000007ff11b0ea8 in mozilla::layers::CompositorOGL::Initialize
    (this=0x7edc002ed0, out_failureReason=0x7f17a6b520)
    at gfx/layers/opengl/CompositorOGL.cpp:394
#2  0x0000007ff12c68e8 in mozilla::layers::CompositorBridgeParent::NewCompositor
    (this=this@entry=0x7fc4beb0e0, aBackendHints=...)
    at gfx/layers/ipc/CompositorBridgeParent.cpp:1493
#3  0x0000007ff12d1964 in mozilla::layers::CompositorBridgeParent::
    InitializeLayerManager (this=this@entry=0x7fc4beb0e0, aBackendHints=...)
    at gfx/layers/ipc/CompositorBridgeParent.cpp:1436
#4  0x0000007ff12d1a94 in mozilla::layers::CompositorBridgeParent::
    AllocPLayerTransactionParent (this=this@entry=0x7fc4beb0e0,
    aBackendHints=..., aId=...)
    at gfx/layers/ipc/CompositorBridgeParent.cpp:1546
#5  0x0000007ff36682b8 in mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    AllocPLayerTransactionParent (this=0x7fc4beb0e0, aBackendHints=..., aId=...)
    at mobile/sailfishos/embedthread/EmbedLiteCompositorBridgeParent.cpp:80
#6  0x0000007ff0c65ad0 in mozilla::layers::PCompositorBridgeParent::
    OnMessageReceived (this=0x7fc4beb0e0, msg__=...)
    at PCompositorBridgeParent.cpp:1285
[...]
#21 0x0000007ff6a0489c in ?? () from /lib64/libc.so.6
(gdb) n
[New LWP 32378]
231       nsIWidget* widget = mWidget->RealWidget();
(gdb) n
[New LWP 32389]
[LWP 7850 exited]
232       void* widgetOpenGLContext =
(gdb) n
[New LWP 32476]
[LWP 32389 exited]
234       if (widgetOpenGLContext) {
(gdb) n
248       if (!context && gfxEnv::LayersPreferOffscreen()) {
(gdb) n
249         nsCString discardFailureId;
(gdb) n
250         context = GLContextProvider::CreateHeadless(
(gdb) n
252         if (!context->CreateOffscreenDefaultFb(mSurfaceSize)) {
(gdb) n
249         nsCString discardFailureId;
(gdb) n
257       if (!context) {
(gdb) n
264       bool success = context->ResizeScreenBuffer(mSurfaceSize);
(gdb) p context
$7 = {mRawPtr = 0x7edc19ee40}
(gdb) p context.mRawPtr
$8 = (mozilla::gl::GLContext *) 0x7edc19ee40
(gdb) p context.mRawPtr.mSwapChain
$9 = {
  mTuple = {<mozilla::detail::CompactPairHelper<mozilla::gl::SwapChain*, mozilla::DefaultDelete<mozilla::gl::SwapChain>, (mozilla::detail::StorageType)1, (mozilla::detail::StorageType)0>> = {<mozilla::DefaultDelete<mozilla::gl::SwapChain>> = {<No data fields>}, mFirstA = 0x0}, <No data fields>}}
(gdb) p context.mRawPtr.mSwapChain.mTuple.mFirstA
$10 = (mozilla::gl::SwapChain *) 0x0
(gdb)
We can conclude that the SwapChain hasn't been created yet. Which means this new bit of code I added, which is the code that's crashing, is being called too early. That's not quite what I was expecting. Just to check I've added a breakpoint to EmbedLiteCompositorBridgeParent::PrepareOffscreen(), which is where the SwapChain is created. This is just to double-check the ordering.
(gdb) b EmbedLiteCompositorBridgeParent::PrepareOffscreen
Breakpoint 3 at 0x7ff366810c: file mobile/sailfishos/embedthread/EmbedLiteCompositorBridgeParent.cpp, line 104.
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/bin/harbour-webview 
[...]
Thread 36 "Compositor" hit Breakpoint 2, mozilla::layers::CompositorOGL::
    CreateContext (this=this@entry=0x7ed8002da0)
    at gfx/layers/opengl/CompositorOGL.cpp:227
227     already_AddRefed<mozilla::gl::GLContext> CompositorOGL::
    CreateContext() {
(gdb) 
This confirms it: the CreateContext() call is happening before the PrepareOffscreen() call. I'll need to think about this again then.

The train is now coming in to Cambridge. I'm not taking any chances this time and will be packing up with plenty of time to spare! Sadly that's going to have to be it for today, but I'll pick this up again tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.

Comments

Uncover Disqus comments