flypig.co.uk

List items

Items from the current list are shown below.

Blog

11 Mar 2024 : Day 182 #
We reached an important milestone yesterday when our WebView-augmented app finally ran without crashing for the first time. There's still nothing being rendered to the screen, but preventing the app from crashing feels like an important waypoint on the journey towards the result we want.

But crashes have their uses. In particular, when an app crashes the debugger provides a backtrace which can be the bedrock of an investigation; a place to start and fall back to in case things spiral out of control. Now we're presented with something more nebulous: somewhere in the program there are one or many bugs, or differences in execution, between the ESR 78 and ESR 91 rendering pipelines that we need to find and rewire. We've been here before, with the original browser rendering pipeline. That time it took a few weeks before I managed to find the root cause. I'm not expecting this to be any easier.

The first thing to check is whether the render loop is actually being called. Here we have something to go on in the form of the GLScreenBuffer::Swap() method. This should be called every frame in order to move the image on the back buffer onto the front buffer. We can use the debugger to see whether it's being called.
(gdb) b GLScreenBuffer::Swap
Breakpoint 1 at 0x7ff1106f8c: file ${PROJECT}/gecko-dev/gfx/gl/
    GLScreenBuffer.cpp, line 506.
(gdb) c
Continuing.
[LWP 1224 exited]
[LWP 1218 exited]	
[New LWP 1735]
[LWP 1255 exited]
No hits. In one sense this is bad: something is broken. On the other hand, it's also good: we already knew something was broken, this at least gives us something concrete to fix. So the next step is to see whether it's also actually being called on ESR 78. It could be that I've misunderstood how this rendering pipeline is supposed to work.
(gdb) b GLScreenBuffer::Swap
Breakpoint 1 at 0x7fb8e672b8: file obj-build-mer-qt-xr/dist/include/mozilla/
    UniquePtr.h, line 287.
(gdb) c
Continuing.
[LWP 20447 exited]
[LWP 20440 exited]
[LWP 20511 exited]
[LWP 20460 exited]
Also no hit! Ah... that is until I touch the screen. Then suddenly this:
[New LWP 20679]	
[New LWP 20680]
[New LWP 20681]
[New LWP 20682]
[Switching to LWP 20444]

Thread 36 "Compositor" hit Breakpoint 1, mozilla::gl::GLScreenBuffer::Swap
    (this=this@entry=0x7eac003c00, size=...)
    at obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:287
287     obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:
    No such file or directory.
(gdb) 
We should get a backtrace from ESR 78 to compare against.
(gdb) bt
#0  mozilla::gl::GLScreenBuffer::Swap (this=this@entry=0x7eac003c00, size=...)
    at obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:287
#1  0x0000007fbb2e2d8c in mozilla::gl::GLScreenBuffer::PublishFrame
    (size=..., this=0x7eac003c00)
    at obj-build-mer-qt-xr/dist/include/GLScreenBuffer.h:171
#2  mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    PresentOffscreenSurface (this=0x7f8c99d3b0)
    at mobile/sailfishos/embedthread/EmbedLiteCompositorBridgeParent.cpp:183
#3  0x0000007fbb2f8600 in mozilla::embedlite::nsWindow::PostRender
    (this=0x7f8c87db30, aContext=<optimized out>)
    at mobile/sailfishos/embedshared/nsWindow.cpp:395
#4  0x0000007fb8fbff4c in mozilla::layers::LayerManagerComposite::Render
    (this=this@entry=0x7eac1988c0, aInvalidRegion=..., aOpaqueRegion=...)
    at obj-build-mer-qt-xr/dist/include/mozilla/TimeStamp.h:452
#5  0x0000007fb8fc02c4 in mozilla::layers::LayerManagerComposite::
    UpdateAndRender (this=this@entry=0x7eac1988c0)
    at gfx/layers/composite/LayerManagerComposite.cpp:647
#6  0x0000007fb8fc0514 in mozilla::layers::LayerManagerComposite::
    EndTransaction (aFlags=mozilla::layers::LayerManager::END_DEFAULT,
    aTimeStamp=..., this=0x7eac1988c0) at gfx/layers/composite/
    LayerManagerComposite.cpp:566
#7  mozilla::layers::LayerManagerComposite::EndTransaction (this=0x7eac1988c0,
    aTimeStamp=..., aFlags=mozilla::layers::LayerManager::END_DEFAULT)
    at gfx/layers/composite/LayerManagerComposite.cpp:536
#8  0x0000007fb8fe7f9c in mozilla::layers::CompositorBridgeParent::
    CompositeToTarget (this=0x7f8c99d3b0, aId=..., aTarget=0x0,
    aRect=<optimized out>)
    at obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#9  0x0000007fbb2e288c in mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    CompositeToDefaultTarget (this=0x7f8c99d3b0, aId=...)
    at mobile/sailfishos/embedthread/EmbedLiteCompositorBridgeParent.cpp:159
#10 0x0000007fb8fc7988 in mozilla::layers::CompositorVsyncScheduler::Composite
    (this=0x7f8cbacc40, aId=..., aVsyncTimestamp=...)
    at gfx/layers/ipc/CompositorVsyncScheduler.cpp:249
#11 0x0000007fb8fc5ff0 in mozilla::detail::RunnableMethodArguments
    <mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>,
    mozilla::TimeStamp>::applyImpl<mozilla::layers::CompositorVsyncScheduler,
    void (mozilla::layers::CompositorVsyncScheduler::*)(mozilla::layers::
    BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp), 
    StoreCopyPassByConstLRef<mozilla::layers::BaseTransactionId<mozilla::
    VsyncIdType> >, StoreCopyPassByConstLRef<mozilla::TimeStamp>, 0ul, 1ul>
    (args=..., m=<optimized out>, o=<optimized out>)
    at obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:925
[...]
#24 0x0000007fbe70d89c in ?? () from /lib64/libc.so.6
(gdb) 
Going back to ESR 91, it turns out the method isn't missing after all, it just needs a bit of prodding to get it to be called. So on touching the screen I get the same result. We should compare the backtraces. Here's what we get from ESR 91:
Thread 38 "Compositor" hit Breakpoint 1, mozilla::gl::GLScreenBuffer::Swap
    (this=this@entry=0x5555643950, size=...)
    at ${PROJECT}/gecko-dev/gfx/gl/GLScreenBuffer.cpp:506
506     ${PROJECT}/gecko-dev/gfx/gl/GLScreenBuffer.cpp: No such file or directory.
(gdb) bt
#0  mozilla::gl::GLScreenBuffer::Swap (this=this@entry=0x5555643950, size=...)
    at ${PROJECT}/gecko-dev/gfx/gl/GLScreenBuffer.cpp:506
#1  0x0000007ff36667a8 in mozilla::gl::GLScreenBuffer::PublishFrame
    (size=..., this=0x5555643950)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/GLScreenBuffer.h:229
#2  mozilla::embedlite::EmbedLiteCompositorBridgeParent::PresentOffscreenSurface
    (this=0x7fc49fde60)
    at ${PROJECT}/gecko-dev/mobile/sailfishos/embedthread/
    EmbedLiteCompositorBridgeParent.cpp:191
#3  0x0000007ff367fc0c in mozilla::embedlite::nsWindow::PostRender
    (this=0x7fc49fc290, aContext=<optimized out>)
    at ${PROJECT}/gecko-dev/mobile/sailfishos/embedshared/nsWindow.cpp:248
#4  0x0000007ff2a64ec0 in mozilla::widget::InProcessCompositorWidget::PostRender
    (this=0x7fc4b8df00, aContext=0x7f1f970848)
    at ${PROJECT}/gecko-dev/widget/InProcessCompositorWidget.cpp:60
#5  0x0000007ff128f9f4 in mozilla::layers::LayerManagerComposite::Render
    (this=this@entry=0x7ed41bb450, aInvalidRegion=..., aOpaqueRegion=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/layers/Compositor.h:575
#6  0x0000007ff128fe70 in mozilla::layers::LayerManagerComposite::
    UpdateAndRender (this=this@entry=0x7ed41bb450)
    at ${PROJECT}/gecko-dev/gfx/layers/composite/LayerManagerComposite.cpp:657
#7  0x0000007ff1290220 in mozilla::layers::LayerManagerComposite::EndTransaction
    (this=this@entry=0x7ed41bb450, aTimeStamp=..., 
    aFlags=aFlags@entry=mozilla::layers::LayerManager::END_DEFAULT)
    at ${PROJECT}/gecko-dev/gfx/layers/composite/LayerManagerComposite.cpp:572
#8  0x0000007ff12d19bc in mozilla::layers::CompositorBridgeParent::
    CompositeToTarget (this=0x7fc49fde60, aId=..., aTarget=0x0,
    aRect=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#9  0x0000007ff12b7104 in mozilla::layers::CompositorVsyncScheduler::Composite
    (this=0x7fc429cac0, aVsyncEvent=...)
    at ${PROJECT}/gecko-dev/gfx/layers/ipc/CompositorVsyncScheduler.cpp:256
#10 0x0000007ff12af57c in mozilla::detail::RunnableMethodArguments<mozilla::
    VsyncEvent>::applyImpl<mozilla::layers::CompositorVsyncScheduler, void
    (mozilla::layers::CompositorVsyncScheduler::*)(mozilla::VsyncEvent const&),
    StoreCopyPassByConstLRef<mozilla::VsyncEvent>, 0ul> (args=...,
    m=<optimized out>, o=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:887
[...]
#22 0x0000007ff6a0489c in ?? () from /lib64/libc.so.6
(gdb) 
Both are largely similar, both are initially triggered from the vsync scheduler CompositorVsyncScheduler(), which makes sense for a render update pipeline. But there are some differences too. In between the vsync scheduler and the layer manager's EndTransaction() call we have this on ESR 78:
#7  LayerManagerComposite::EndTransaction, LayerManagerComposite.cpp:536
#8  CompositorBridgeParent::CompositeToTarget, mozilla/RefPtr.h:313
#9  EmbedLiteCompositorBridgeParent::CompositeToDefaultTarget,
    EmbedLiteCompositorBridgeParent.cpp:159
#10 CompositorVsyncScheduler::Composite, CompositorVsyncScheduler.cpp:249
Whereas on ESR 91 we have this:
#7  LayerManagerComposite::EndTransaction, LayerManagerComposite.cpp:572
#8  CompositorBridgeParent::CompositeToTarget, RefPtr.h:313
#9  CompositorVsyncScheduler::Composite, CompositorVsyncScheduler.cpp:256
It could be that this is a completely benign change, either due to us hitting a breakpoint at a slightly different point in the cycle, or due to upstream changes that are unrelated to the rendering issues we're experiencing. But I'm honing in on it because I remember there being differences in the way the target is set up and this immediately looks suspicious to me. Especially suspicious is the fact that EmbedLiteCompositorBridgeParent has been written out of the ESR 91 execution flow. That's Sailfish-specific code, so that could well indicate a problem.

So let's try and find out why. In ESR 78 the code that's being called from CompositorVsyncScheduler::Composite() is the following:
    // Tell the owner to do a composite
    mVsyncSchedulerOwner->CompositeToDefaultTarget(aId);
    mVsyncNotificationsSkipped = 0;
In ESR 91 we have a strange addition to this.
    // Tell the owner to do a composite
    mVsyncSchedulerOwner->CompositeToTarget(aVsyncEvent.mId, nullptr, nullptr);
    mVsyncSchedulerOwner->CompositeToDefaultTarget(aVsyncEvent.mId);

    mVsyncNotificationsSkipped = 0;
The extra line spacing makes this look very intentional, but the real question I'd like to know the answer to is: "was this change done by upstream or by me?". If it was upstream then it's almost certainly intentional. If it was me, well, then it could well be a mistake. We can find out, as always, using git.
 git blame gfx/layers/ipc/CompositorVsyncScheduler.cpp -L 255,259
Blaming lines:   1% (5/370), done.
7a2ef4343bb1d (Kartikaya Gupta       2018-02-01 16:28:53 -0500 255)
    // Tell the owner to do a composite
97287dc1b1d82 (Markus Stange         2020-07-18 05:17:39 +0000 256)
    mVsyncSchedulerOwner->CompositeToTarget(aVsyncEvent.mId, nullptr, nullptr);
0acadeba1ac39 (David Llewellyn-Jones 2023-08-28 14:55:57 +0100 257)
    mVsyncSchedulerOwner->CompositeToDefaultTarget(aVsyncEvent.mId);
7a2ef4343bb1d (Kartikaya Gupta       2018-02-01 16:28:53 -0500 258)
133e28473a0f8 (Sotaro Ikeda          2016-11-18 02:37:04 -0800 259)
    mVsyncNotificationsSkipped = 0;
Well, that's interesting. There is a line inserted by me, but it's not the line I was expecting. It looks very much like I added the line and intended to remove the line before, but forgot. I'm going to rectify this.
$ git diff gfx/layers/ipc/CompositorVsyncScheduler.cpp
diff --git a/gfx/layers/ipc/CompositorVsyncScheduler.cpp
           b/gfx/layers/ipc/CompositorVsyncScheduler.cpp
index 2e8e58a2c46b..3abe24ceeeea 100644
--- a/gfx/layers/ipc/CompositorVsyncScheduler.cpp
+++ b/gfx/layers/ipc/CompositorVsyncScheduler.cpp
@@ -253,9 +253,7 @@ void CompositorVsyncScheduler::Composite
    (const VsyncEvent& aVsyncEvent) {
     mLastComposeTime = SampleTime::FromVsync(aVsyncEvent.mTime);
 
     // Tell the owner to do a composite
-    mVsyncSchedulerOwner->CompositeToTarget(aVsyncEvent.mId, nullptr, nullptr);
     mVsyncSchedulerOwner->CompositeToDefaultTarget(aVsyncEvent.mId);
-
     mVsyncNotificationsSkipped = 0;
 
     TimeDuration compositeFrameTotal = TimeStamp::Now() - aVsyncEvent.mTime;
Line removed. Now to build and see what happens when we try to run it.
(gdb) b GLScreenBuffer::Swap
Breakpoint 1 at 0x7ff1106f8c: file ${PROJECT}/gecko-dev/gfx/gl/
    GLScreenBuffer.cpp, line 506.
(gdb) c
Continuing.
[LWP 22665 exited]
[New LWP 22740]
[Switching to LWP 22660]

Thread 36 "Compositor" hit Breakpoint 1, mozilla::gl::GLScreenBuffer::Swap
    (this=this@entry=0x5555642d50, size=...)
    at ${PROJECT}/gecko-dev/gfx/gl/GLScreenBuffer.cpp:506
506     ${PROJECT}/gecko-dev/gfx/gl/GLScreenBuffer.cpp:
    No such file or directory.
(gdb) bt
#0  mozilla::gl::GLScreenBuffer::Swap (this=this@entry=0x5555642d50, size=...)
    at ${PROJECT}/gecko-dev/gfx/gl/GLScreenBuffer.cpp:506
#1  0x0000007ff3666788 in mozilla::gl::GLScreenBuffer::PublishFrame
    (size=..., this=0x5555642d50)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/GLScreenBuffer.h:229
#2  mozilla::embedlite::EmbedLiteCompositorBridgeParent::PresentOffscreenSurface
    (this=0x7fc49fde10)
    at ${PROJECT}/gecko-dev/mobile/sailfishos/embedthread/
    EmbedLiteCompositorBridgeParent.cpp:191
#3  0x0000007ff367fbec in mozilla::embedlite::nsWindow::PostRender
    (this=0x7fc49fc240, aContext=<optimized out>)
    at ${PROJECT}/gecko-dev/mobile/sailfishos/embedshared/nsWindow.cpp:248
#4  0x0000007ff2a64ea4 in mozilla::widget::InProcessCompositorWidget::PostRender
    (this=0x7fc4b88770, aContext=0x7f177f87e8)
    at ${PROJECT}/gecko-dev/widget/InProcessCompositorWidget.cpp:60
#5  0x0000007ff128f9f4 in mozilla::layers::LayerManagerComposite::Render
    (this=this@entry=0x7edc1bb450, aInvalidRegion=..., aOpaqueRegion=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/layers/
    Compositor.h:575
#6  0x0000007ff128fe70 in mozilla::layers::LayerManagerComposite::
    UpdateAndRender (this=this@entry=0x7edc1bb450)
    at ${PROJECT}/gecko-dev/gfx/layers/composite/LayerManagerComposite.cpp:657
#7  0x0000007ff1290220 in mozilla::layers::LayerManagerComposite::EndTransaction
    (this=this@entry=0x7edc1bb450, aTimeStamp=..., 
    aFlags=aFlags@entry=mozilla::layers::LayerManager::END_DEFAULT)
    at ${PROJECT}/gecko-dev/gfx/layers/composite/LayerManagerComposite.cpp:572
#8  0x0000007ff12d19a0 in mozilla::layers::CompositorBridgeParent::
    CompositeToTarget (this=0x7fc49fde10, aId=..., aTarget=0x0,
    aRect=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#9  0x0000007ff3666488 in mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    CompositeToDefaultTarget (this=0x7fc49fde10, aId=...)
    at ${PROJECT}/gecko-dev/mobile/sailfishos/embedthread/
    EmbedLiteCompositorBridgeParent.cpp:165
#10 0x0000007ff12b70fc in mozilla::layers::CompositorVsyncScheduler::Composite
    (this=0x7fc4b82ef0, aVsyncEvent=...)
    at ${PROJECT}/gecko-dev/gfx/layers/ipc/CompositorVsyncScheduler.cpp:256
#11 0x0000007ff12af57c in mozilla::detail::RunnableMethodArguments<mozilla::
    VsyncEvent>::applyImpl<mozilla::layers::CompositorVsyncScheduler, void
    (mozilla::layers::CompositorVsyncScheduler::*)(mozilla::VsyncEvent const&),
    StoreCopyPassByConstLRef<mozilla::VsyncEvent>, 0ul> (args=...,
    m=<optimized out>, o=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:887
[...]
#23 0x0000007ff6a0489c in ?? () from /lib64/libc.so.6
(gdb) 
Let's strip out the part we're interested in.
#7  LayerManagerComposite::EndTransaction, LayerManagerComposite.cpp:572
#8  CompositorBridgeParent::CompositeToTarget, RefPtr.h:313
#9  EmbedLiteCompositorBridgeParent::CompositeToDefaultTarget,
    EmbedLiteCompositorBridgeParent.cpp:165
#10 CompositorVsyncScheduler::Composite, CompositorVsyncScheduler.cpp:256
If we compare this with the previous backtrace from ESR 78 we can see that's now aligning fully. Unfortunately, despite fixing this issue, it doesn't give us a working render: the screen is still just plain white. But it will be one step on the way to fixing things fully.

I'm going to leave it there for today. Tomorrow we'll look further into the differences between ESR 78 and ESR 91.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.

Comments

Uncover Disqus comments