flypig.co.uk

Personal Blog

View the blog index.

RSS feed Click the icon for the blog RSS feed.

Blog

5 most recent items

21 May 2024 : Day 239 #
I got a bit carried away yesterday delving into the reasons why QuickMozView::updatePaintNode() was failing to fire on ESR 91. My conclusion was that it came down to the mIsPainted flag, part of the QMozViewPrivate class and which was remaining stubbornly set to false when running ESR 91.

While in this state the invalidTexture flag inside updatePaintNode() never gets set to false since it's defined like this:
    const bool invalidTexture = !mComposited
            || !d->mIsPainted
            || !d->mViewInitialized
            || !d->mHasCompositor
            || !d->mContext->registeredWindow()
            || !d->mMozWindow;
Notice how if any single value in this list is set to false then the combined value of invalidTexture will be set to true. The consequence of this is that the following clause is always executed:
    if (!mTexture && invalidTexture) {
        QSGSimpleRectNode *node = static_cast<QSGSimpleRectNode *>(oldNode);
        if (!node) {
            node = new QSGSimpleRectNode;
        }
        node->setColor(d->mBackgroundColor);
        node->setRect(boundingRect);

        return node;
    }
The return at the end of this means the updatePaintNode() method returns early. This means that the code beyond the clause never gets executed and crucially this section is never entered:
    if (!node) {
        QMozExtTexture * const texture = new QMozExtTexture;
        mTexture = texture;

        connect(texture, &QMozExtTexture::getPlatformImage, d->mMozWindow, 
            &QMozWindow::getPlatformImage, Qt::DirectConnection);

        node = new MozExtMaterialNode;

        node->setTexture(mTexture);
    }
Here we can see both mTexture and the MozExtMaterialNode object being created. That's what we need for the rendering to proceed. So if this portion of code is never executed, we'll never get the rendered page to appear.

That all sounds very plausible and comprehensible, but it begs the question: "why is mIsPainted never being set to true?". That's what I plan to find out over the coming days.

Unsurprisingly we can see the initial value it's set to in the QMozViewPrivate constructor, where it's initially set to false:
QMozViewPrivate::QMozViewPrivate(IMozQViewIface *aViewIface, QObject *publicPtr)
    : mViewIface(aViewIface)
[...]
    , mIsPainted(false)
[...]
It's also set to false after a call to QMozViewPrivate::reset():
QMozViewPrivate::reset()
{
    if (mIsPainted) {
        mIsPainted = false;
        mViewIface->firstPaint(-1, -1);
    }
[...]
But the important part is where it gets set to true and that only happens in one place; here in the OnFirstPaint() method:
void QMozViewPrivate::OnFirstPaint(int32_t aX, int32_t aY)
{
    mIsPainted = true;
    mViewIface->firstPaint(aX, aY);
}
This method doesn't get explicitly called anywhere in the QtMozEmbed code, so to find out where it's actually getting used I'm going to drop to the debugger on my ESR 78 device.
(gdb) break QMozViewPrivate::OnFirstPaint
Breakpoint 13 at 0x7fbfbf56b8: file qmozview_p.cpp, line 1113.
(gdb) r
[...]
Thread 1 &quot;harbour-webview&quot; hit Breakpoint 13, QMozViewPrivate::
    OnFirstPaint (this=0x55557b08f0, aX=0, aY=0) at qmozview_p.cpp:1113
1113        mIsPainted = true;
(gdb) bt                         
#0  QMozViewPrivate::OnFirstPaint (this=0x55557b08f0, aX=0, aY=0) at 
    qmozview_p.cpp:1113
#1  0x0000007fbb2f5d90 in ?? () from /usr/lib64/xulrunner-qt5-78.15.1/libxul.so
#2  0x0000007fb8932f50 in ?? () from /usr/lib64/xulrunner-qt5-78.15.1/libxul.so
#3  0x0000007fb8920238 in ?? () from /usr/lib64/xulrunner-qt5-78.15.1/libxul.so
#4  0x0000007fb88428cc in ?? () from /usr/lib64/xulrunner-qt5-78.15.1/libxul.so
#5  0x0000007fb88482d8 in ?? () from /usr/lib64/xulrunner-qt5-78.15.1/libxul.so
#6  0x0000007fb8849d58 in ?? () from /usr/lib64/xulrunner-qt5-78.15.1/libxul.so
#7  0x0000007fb8809d48 in ?? () from /usr/lib64/xulrunner-qt5-78.15.1/libxul.so
#8  0x0000007fb880e740 in ?? () from /usr/lib64/xulrunner-qt5-78.15.1/libxul.so
#9  0x0000007fbfbed924 in MessagePumpQt::HandleDispatch (this=0x555578ef30) at 
    qmessagepump.cpp:63
#10 0x0000007fbfbeda9c in MessagePumpQt::event (this=<optimized out>, 
    e=<optimized out>) at qmessagepump.cpp:51
#11 0x0000007fbebe1144 in QCoreApplication::notify(QObject*, QEvent*) () from /
    usr/lib64/libQt5Core.so.5
#12 0x0000007fbebe12e8 in QCoreApplication::notifyInternal2(QObject*, QEvent*) (
    ) from /usr/lib64/libQt5Core.so.5
#13 0x0000007fbebe36b8 in QCoreApplicationPrivate::sendPostedEvents(QObject*, 
    int, QThreadData*) () from /usr/lib64/libQt5Core.so.5
[...]
#21 0x0000005555557bcc in main ()
(gdb) 
Well that's an unhelpful mess. But it is at least getting called from inside the Gecko code and that in itself is both helpful to know and a good thing (I was concerned it might be called from inside Qt which would have made it considerably harder to track down; the fact it's coming from Gecko means we can skip a layer of Qt indirection and get closer to the source of the problem more quickly).

But as you can see the Gecko debug symbols are messed up: we can see eight calls to methods inside libxul in a row, but it's not telling us the names of the methods.

So I'm going to reinstall the packages, including the debug packages, to get the symbols back. Before the gap I was using a custom build with extra debug output and exporting of textures, but I don't need any of that any more so I can just reinstall the packages from the official repositories. Much easier and quicker than rebuilding the packages locally and installing them that way:
# zypper install --force xulrunner-qt5-78.15.1+git33.2-1.21.1.jolla 
    xulrunner-qt5-debuginfo-78.15.1+git33.2-1.21.1.jolla xul
runner-qt5-debugsource-78.15.1+git33.2-1.21.1.jolla 
    xulrunner-qt5-misc-78.15.1+git33.2-1.21.1.jolla
Loading repository data...
Reading installed packages...
Forcing installation of 'xulrunner-qt5-78.15.1+git33.2-1.21.1.jolla.aarch64' 
    from repository 'jolla'.
Forcing installation of 
    'xulrunner-qt5-debuginfo-78.15.1+git33.2-1.21.1.jolla.aarch64' from 
    repository 'jolla'.
Forcing installation of 
    'xulrunner-qt5-debugsource-78.15.1+git33.2-1.21.1.jolla.aarch64' from 
    repository 'jolla'.
Forcing installation of 
    'xulrunner-qt5-misc-78.15.1+git33.2-1.21.1.jolla.aarch64' from repository 
    'jolla'.
[...]
2 packages to downgrade, 2 to reinstall, 2  to change vendor.
Overall download size: 758.8 MiB. Already cached: 0 B. After the operation, 2.4 
    GiB will be freed.
Continue? [y/n/v/...? shows all options] (y): y
[...]
Okay, with the debug symbols now properly aligned we can try getting the backtrace again, but hopefully this time with a bit more helpful detail.
Thread 1 &quot;harbour-webview&quot; hit Breakpoint 1, QMozViewPrivate::
    OnFirstPaint (this=0x55555cb830, aX=0, aY=0) at qmozview_p.cpp:1113
1113        mIsPainted = true;
(gdb) bt
#0  QMozViewPrivate::OnFirstPaint (this=0x55555cb830, aX=0, aY=0) at 
    qmozview_p.cpp:1113
#1  0x0000007fbb2f5d90 in mozilla::embedlite::EmbedLiteViewParent::
    RecvOnFirstPaint (this=0x55555d9d70, aX=@0x7fffffea18: 0, aY=@0x7fffffea28: 
    0)
    at mobile/sailfishos/embedshared/EmbedLiteViewParent.cpp:237
#2  0x0000007fb8932f50 in mozilla::embedlite::PEmbedLiteViewParent::
    OnMessageReceived (this=0x55555d9d70, msg__=...) at 
    PEmbedLiteViewParent.cpp:1600
#3  0x0000007fb8920238 in mozilla::embedlite::PEmbedLiteAppParent::
    OnMessageReceived (this=<optimized out>, msg__=...)
    at obj-build-mer-qt-xr/dist/include/mozilla/ipc/ProtocolUtils.h:866
#4  0x0000007fb88428cc in mozilla::ipc::MessageChannel::DispatchAsyncMessage (
    this=this@entry=0x7f8c888578, aProxy=aProxy@entry=0x555560d1d0, aMsg=...)
    at obj-build-mer-qt-xr/dist/include/mozilla/ipc/ProtocolUtils.h:866
#5  0x0000007fb88482d8 in mozilla::ipc::MessageChannel::DispatchMessage (
    this=0x7f8c888578, aMsg=...)
    at ipc/glue/MessageChannel.cpp:2100
#6  0x0000007fb8849b50 in mozilla::ipc::MessageChannel::RunMessage (
    this=<optimized out>, aTask=...)
    at ipc/glue/MessageChannel.cpp:1959
#7  0x0000007fb8849d58 in mozilla::ipc::MessageChannel::MessageTask::Run (
    this=0x7f8e127650)
    at obj-build-mer-qt-xr/dist/include/mozilla/ipc/MessageChannel.h:610
#8  0x0000007fb8809d48 in MessageLoop::RunTask (aTask=..., this=0x55557c5200)
    at ipc/chromium/src/base/message_loop.cc:487
#9  MessageLoop::DeferOrRunPendingTask (pending_task=..., this=0x55557c5200)
    at ipc/chromium/src/base/message_loop.cc:478
#10 MessageLoop::DeferOrRunPendingTask (this=0x55557c5200, pending_task=...)
    at ipc/chromium/src/base/message_loop.cc:476
#11 0x0000007fb880e740 in MessageLoop::DoWork (this=<optimized out>)
    at ipc/chromium/src/base/message_loop.cc:551
#12 MessageLoop::DoWork (this=0x55557c5200) at ipc/chromium/src/base/
    message_loop.cc:530
#13 0x0000007fbfbed924 in MessagePumpQt::HandleDispatch (this=0x55557a6f40) at 
    qmessagepump.cpp:63
#14 0x0000007fbfbeda9c in MessagePumpQt::event (this=<optimized out>, 
    e=<optimized out>) at qmessagepump.cpp:51
#15 0x0000007fbebe1144 in QCoreApplication::notify(QObject*, QEvent*) () from /
    usr/lib64/libQt5Core.so.5
#16 0x0000007fbebe12e8 in QCoreApplication::notifyInternal2(QObject*, QEvent*) (
    ) from /usr/lib64/libQt5Core.so.5
#17 0x0000007fbebe36b8 in QCoreApplicationPrivate::sendPostedEvents(QObject*, 
    int, QThreadData*) () from /usr/lib64/libQt5Core.so.5
[...]
#25 0x0000005555557bcc in main ()
(gdb) 
That's much better and gives us a much clearer idea of where OnFirstPaint() is getting called. But as we can see from this it's actually being triggered by receipt of a message. The next step will therefore be to find out where the OnFirstPaint message is being sent from.

For completeness I also performed the same checks on ESR 91. You'll not be surprised to hear that the QMozViewPrivate::OnFirstPaint() method is never called in the newer build.

The ESR 78 backtrace tells us what should be happening on ESR 91. So our task now is to find out why it's not. This takes us a step forwards and gives us something to look into further; we'll pick this avenue of investigation up again tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
20 May 2024 : Day 238 #
It's been an exciting and packed day of Jolla and Sailfish OS news today with the announcement of not one but two new Jolla devices, plus some significant changes to the way Sailfish OS will be developed and funded in the future. This isn't the place to dwell on these topics, but I am excited for the future of the operating system. I didn't want it to go without saying, but I do want to focus on Gecko here, so let's head straight in to development.

Yesterday I made what could be an important discovery. The QMozExtTexture::updateTexture() method is being called on ESR 78, but not on ESR 91. If I can track down the underlying reason for this, then it may well help getting the offscreen rendering back up and running again.

The next step is to find out at which point the failure is happening and try to understand why. Before we get into it I want to warn you that this is rather a long post, mostly containing debugging output which is dull at the best of times. But it does lead to some useful conclusions, so if you're interest to follow but don't want to have to read through all of the details, don't feel bad if you want to skip straight to the end!

If you're still reading this then I'll assume you're in it for the long haul. Don't say I didn't warn you though! So the best place to start this investigation is with the backtrace of the call. We saw this yesterday and it looked like this:
#0  QMozExtTexture::updateTexture (this=0x7f90024e30) at qmozexttexture.cpp:64
#1  0x0000007fbfc04f18 in MozMaterialNode::preprocess (this=0x7f9001ef60) at 
    qmozextmaterialnode.cpp:81
#2  0x0000007fbf8f0c44 in QSGRenderer::preprocess() () from /usr/lib64/
    libQt5Quick.so.5
#3  0x0000007fbf8f0e5c in QSGRenderer::renderScene(QSGBindable const&) () from /
    usr/lib64/libQt5Quick.so.5
#4  0x0000007fbf8f14c0 in QSGRenderer::renderScene(unsigned int) () from /usr/
    lib64/libQt5Quick.so.5
#5  0x0000007fbf8ffe58 in QSGRenderContext::renderNextFrame(QSGRenderer*, 
    unsigned int) () from /usr/lib64/libQt5Quick.so.5
#6  0x0000007fbf9429d0 in QQuickWindowPrivate::renderSceneGraph(QSize const&) (
    ) from /usr/lib64/libQt5Quick.so.5
#7  0x0000007fbf916c04 in ?? () from /usr/lib64/libQt5Quick.so.5
#8  0x0000007fbf91cc10 in ?? () from /usr/lib64/libQt5Quick.so.5
#9  0x0000007fbea290e8 in ?? () from /usr/lib64/libQt5Core.so.5
#10 0x0000007fb74b0a4c in start_thread (arg=0x7fffffe9bf) at pthread_create.c:
    479
#11 0x0000007fbe70b89c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/
    clone.S:78
This backtrace is, of course, from the ESR 78 build. There is no equivalent backtrace for the ESR 91 build because the QMozExtTexture::updateTexture() method never gets called in ESR 91. But maybe some of the other methods in the backtrace are being called. Let's find out.

To establish this I'm putting breakpoints on each of the functions in the backtrace and then running the ESR 91 build. If one of them hits I move to one higher up the stack (that is, the next line up that has a smaller number). Eventually one will fail to hit and at that point I'll know where the failure is happening. So here goes.
Thread 68 &quot;QSGRenderThread&quot; hit Breakpoint 4, 0x0000007ff7b6e984 in 
    QQuickWindowPrivate::renderSceneGraph(QSize const&)@plt ()
   from /usr/lib64/libQt5Quick.so.5
(gdb) delete break 4
(gdb) b QSGRenderContext::renderNextFrame
Breakpoint 5 at 0x7fe82e4474 (2 locations)
(gdb) c
Continuing.

Thread 68 &quot;QSGRenderThread&quot; hit Breakpoint 5, 0x0000007fe82e4474 in 
    QSGRenderContext::renderNextFrame(QSGRenderer*, unsigned int)@plt ()
   from /usr/lib64/qt5/plugins/scenegraph/libcustomcontext.so
(gdb) delete break 5
(gdb) b QSGRenderer::preprocess
Breakpoint 6 at 0x7ff7be7a50
(gdb) c
Continuing.

Thread 68 &quot;QSGRenderThread&quot; hit Breakpoint 6, 0x0000007ff7be7a50 in 
    QSGRenderer::preprocess() () from /usr/lib64/libQt5Quick.so.5
(gdb) delete break 6
(gdb) b MozMaterialNode::preprocess
Breakpoint 7 at 0x7ff7ef899c
(gdb) c
Continuing.
frame008.data: Colour before: (0, 0, 0, 255), 1
frame009.data: Colour before: (0, 0, 0, 255), 1
frame010.data: Colour before: (0, 0, 0, 255), 1
frame011.data: Colour before: (0, 0, 0, 255), 1
[...]
As we can see from the above, the QQuickWindowPrivate::renderSceneGraph() method does get called. So does the QSGRenderContext::renderNextFrame() method. but the QSGRenderer::preprocess() method? That never gets called.

That means the failure is happening inside the QSGRenderContext::renderNextFrame() method. This call isn't part of the Gecko code, or indeed part of the QtMozEmbed code, but rather part of the Qt stack. In particular, part of the "Qt Scene Graph" stack (hence the "QSG" prefix of the method).

I'm not super familiar with how the Qt Scene Graph does it's thing, but there is plenty of good documentation out there. So I've done some reading around to get a better idea.

One part sprang out at me, since it specifically relates to the method that isn't getting called. The explanation that follows is taken from the Qt documentation:
 
void QSGNode::preprocess()

Override this function to do processing on the node before it is rendered.

Preprocessing needs to be explicitly enabled by setting the flag QSGNode::UsePreprocess. The flag needs to be set before the node is added to the scene graph and will cause the preprocess() function to be called for every frame the node is rendered.

So maybe the problem is that the QSGNode::UsePreprocess flag is never being set? A quick scan of the QtMozEmbed code throws up the following bit of code for controlling the flag, in the qmozextmaterialnode.cpp file:
MozMaterialNode::MozMaterialNode()
{
    setFlag(UsePreprocess);

    setGeometry(&m_geometry);
}
So, we should try to establish whether this is getting called or not. Running with some breakpoints shows that it is indeed getting called on the ESR 78 build.
(gdb) break MozMaterialNode::MozMaterialNode
Breakpoint 2 at 0x7fbfbdadc4 (2 locations)
(gdb) r
[...]
Thread 11 &quot;QSGRenderThread&quot; hit Breakpoint 2, MozMaterialNode::
    MozMaterialNode (this=this@entry=0x7f0001ef60) at qmozextmaterialnode.cpp:27
27      MozMaterialNode::MozMaterialNode()
(gdb) n
619     /usr/include/qt5/QtCore/qrect.h: No such file or directory.
(gdb) 
27      MozMaterialNode::MozMaterialNode()
(gdb) 
31          setGeometry(&m_geometry);
(gdb) 
That's no big surprise, since if it were not being called the other parts of the render pipeline would also be failing and they're not. How about on ESR 91 though? We know the render pipeline isn't working there and that could be because this call isn't happening. Let's find out by adding the same breakpoint on the ESR 91 build and executing it just as we did for ESR 78:
(gdb) break MozMaterialNode::MozMaterialNode
Breakpoint 8 at 0x7ff7ece9e4 (2 locations)
(gdb) r
[...]
frame014.data: Colour before: (8, 0, 152, 255), 1
frame015.data: Colour before: (1, 0, 255, 255), 1
frame016.data: Colour before: (223, 5, 0, 255), 1
frame017.data: Colour before: (71, 94, 162, 255), 1
^C
Thread 1 &quot;harbour-webview&quot; received signal SIGINT, Interrupt.
0x0000007ff69f8740 in poll () from /lib64/libc.so.6
(gdb) info break
Num     Type           Disp Enb Address            What
8       breakpoint     keep y   <MULTIPLE>         
8.1                         y     0x0000007ff7ece9e4 <MozMaterialNode::
    MozMaterialNode()@plt+4>
8.2                         y     0x0000007ff7ef9028 <MozMaterialNode::
    MozMaterialNode()+16>
(gdb) 
Here the breakpoint doesn't get called. You can see that I double-checked that the breakpoint is set correctly and it is. So this is looking very much like a smoking gun.

We'll use the same approach we used before to try to figure out why this isn't getting called. We take the backtrace of the ESR 78 build where it is getting called and work our way up through it until we find out what's failing on the ESR 91 side. Here's the backtrace, taken from ESR 78:
#0  MozMaterialNode::MozMaterialNode (this=this@entry=0x7f0001ef60) at 
    qmozextmaterialnode.cpp:31
#1  0x0000007fbfc05834 in MozExtMaterialNode::MozExtMaterialNode (
    this=0x7f0001ef60) at qmozextmaterialnode.cpp:376
#2  0x0000007fbfc03ea4 in QuickMozView::updatePaintNode (this=0x55558433a0, 
    oldNode=<optimized out>)
    at /usr/include/xulrunner-qt5-78.15.1/mozilla/cxxalloc.h:33
#3  0x0000007fbf941c50 in QQuickWindowPrivate::updateDirtyNode(QQuickItem*) () 
    from /usr/lib64/libQt5Quick.so.5
#4  0x0000007fbf94214c in QQuickWindowPrivate::updateDirtyNodes() () from /usr/
    lib64/libQt5Quick.so.5
#5  0x0000007fbf943270 in QQuickWindowPrivate::syncSceneGraph() () from /usr/
    lib64/libQt5Quick.so.5
#6  0x0000007fbf9164a8 in ?? () from /usr/lib64/libQt5Quick.so.5
#7  0x0000007fbf917134 in ?? () from /usr/lib64/libQt5Quick.so.5
#8  0x0000007fbf91cc10 in ?? () from /usr/lib64/libQt5Quick.so.5
#9  0x0000007fbea290e8 in ?? () from /usr/lib64/libQt5Core.so.5
#10 0x0000007fb74b0a4c in start_thread (arg=0x7fffffe9bf) at pthread_create.c:
    479
#11 0x0000007fbe70b89c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/
    clone.S:78
It's important to note that the call at the top to MozMaterialNode::MozMaterialNode() — which is the one we're interested in — isn't a normal call, it's a constructor. It's not being called directly, but rather as a side-effect of a call to the MozExtMaterialNode class constructor. This inherits from the MozMaterialNode class as we can see in this code copied from qozextmaterialnode.cpp:
class MozExtMaterialNode : public MozMaterialNode
{
public:
    MozExtMaterialNode();
    ~MozExtMaterialNode();
[...]
As an aside, there's also a MozRgbMaterialNode class, but it looks like we're not using that:
class MozRgbMaterialNode : public MozMaterialNode
{
public:
    MozRgbMaterialNode();
    ~MozRgbMaterialNode();
[...]
A quick check confirms that it's definitely only the MozExtMaterialNode class in use for us here:
(gdb) break MozRgbMaterialNode
Breakpoint 3 at 0x7fbfc056a0: file qmozextmaterialnode.cpp, line 176.
(gdb) b MozExtMaterialNode
Breakpoint 4 at 0x7fbfbdc174 (2 locations)
(gdb) r
[...]
Thread 10 &quot;QSGRenderThread&quot; hit Breakpoint 4, 0x0000007fbfbdc174 in 
    MozExtMaterialNode::MozExtMaterialNode()@plt () from /usr/lib64/
    libqt5embedwidget.so.1
(gdb)                            
Another check of the backtrace and an examination of the code shows that the place where this is constructed is inside the QuickMozView::updatePaintNode() method; part of QtMozEmbed. We're going to be spending a bit of time inside this code, so it's worth taking a look for yourself at the source. But for reference, here's the full source for the method we're dealing with:
QSGNode *
QuickMozView::updatePaintNode(QSGNode *oldNode, UpdatePaintNodeData *)
{
    // If the dimensions are entirely invalid return no node.
    if (width() <= 0 || height() <= 0) {
        delete oldNode;

        delete mTexture;
        mTexture = nullptr;

        return nullptr;
    }

    const bool invalidTexture = !mComposited
            || !d->mIsPainted
            || !d->mViewInitialized
            || !d->mHasCompositor
            || !d->mContext->registeredWindow()
            || !d->mMozWindow;

    if (mTexture && invalidTexture) {
        delete oldNode;
        oldNode = nullptr;

        delete mTexture;
        mTexture = nullptr;
    }

    QRectF boundingRect(d->renderingOffset(), d->mSize);

    if (!mTexture && invalidTexture) {
        QSGSimpleRectNode *node = static_cast<QSGSimpleRectNode *>(oldNode);
        if (!node) {
            node = new QSGSimpleRectNode;
        }
        node->setColor(d->mBackgroundColor);
        node->setRect(boundingRect);

        return node;
    }

    if (!mTexture) {
        delete oldNode;
        oldNode = nullptr;
    }

    MozMaterialNode *node = static_cast<MozMaterialNode *>(oldNode);

    if (!node) {
#if defined(QT_OPENGL_ES_2)
        QMozExtTexture * const texture = new QMozExtTexture;
        mTexture = texture;

        connect(texture, &QMozExtTexture::getPlatformImage, d->mMozWindow, 
    &QMozWindow::getPlatformImage, Qt::DirectConnection);

        node = new MozExtMaterialNode;
#else
#warning &quot;Implement me for non ES2 platform&quot;
//        node = new MozRgbMaterialNode;
        return nullptr;
#endif
Note the call there to new MozExtMaterialNode. That's the line we're really interested in. On ESR 78 when we step through the code we initially get an execution flow that enters the (!mTexture && invalidTexture) clause. Once we're in there we know the method is going to return early and the MozExtMaterialNode constructor isn't going to get called until we re-enter the method.

After a few cycles through like this, there's suddenly a change. While the mTexture value remains unset, the invalidTexture value flips to false which means the clause gets skipped. We then end up inside the (!node) section, which is where things get interesting. At that point the texture is created, as is the MozExtMaterialNode object and now things are properly set up for rendering.

Let's break this down a bit further. Here's what the initial few cycles look like:
(gdb) break QuickMozView::updatePaintNode
Breakpoint 8 at 0x7faf06f764 (2 locations)
(gdb) r
[...]
Thread 10 &quot;QSGRenderThread&quot; hit Breakpoint 8, QuickMozView::
    updatePaintNode (this=0x55558454e0, oldNode=0x0) at quickmozview.cpp:172
172         if (width() <= 0 || height() <= 0) {
(gdb) delete break
Delete all breakpoints? (y or n) y
(gdb) break quickmozview.cpp:214
Breakpoint 9 at 0x7fbfc03fac: file quickmozview.cpp, line 214.
(gdb) c
Continuing.
[...]
Thread 10 &quot;QSGRenderThread&quot; hit Breakpoint 9, QuickMozView::
    updatePaintNode (this=0x55558454e0, oldNode=0x7f0801eab0) at 
    quickmozview.cpp:216
216         if (!node) {
(gdb) 
Stepping through this section more carefully we can see why the invalidTexture flag is set to true in all these cases:
Thread 10 &quot;QSGRenderThread&quot; hit Breakpoint 12, QuickMozView::
    updatePaintNode (this=0x5555842660, oldNode=0x7f0800c400) at 
    quickmozview.cpp:172
172         if (width() <= 0 || height() <= 0) {
(gdb) p mComposited
$23 = true
(gdb) p d->mIsPainted
$24 = false
(gdb) p d->mViewInitialized
$25 = true
(gdb) p d->mHasCompositor
$26 = true
(gdb) p d->mContext->registeredWindow()
$27 = (QMozWindow *) 0x555560ce60
(gdb) p d->mMozWindow
$28 = {wp = {d = 0x5555bbe8e0, value = 0x555560ce60}}
(gdb) 
Crucially here, we can see that p d->mIsPainted is set to false. While any of the items shown here are set to false the invalidTexture flag will be set to true. But then after three or four cycles like this the d->mIsPainted flag suddenly flips to true at which point everything else flows through as we want.
Thread 10 &quot;QSGRenderThread&quot; hit Breakpoint 12, QuickMozView::
    updatePaintNode (this=0x5555842660, oldNode=0x7f0800c400) at 
    quickmozview.cpp:172
172         if (width() <= 0 || height() <= 0) {
(gdb) p d->mIsPainted
$37 = true
(gdb) n
181         const bool invalidTexture = !mComposited
(gdb) 
186                 || !d->mMozWindow;
(gdb) 
196         QRectF boundingRect(d->renderingOffset(), d->mSize);
(gdb) 
198         if (!mTexture && invalidTexture) {
(gdb) 
210             delete oldNode;
(gdb) 
218             QMozExtTexture * const texture = new QMozExtTexture;
(gdb) 
219             mTexture = texture;
(gdb) 
221             connect(texture, &QMozExtTexture::getPlatformImage, 
    d->mMozWindow, &QMozWindow::getPlatformImage, Qt::DirectConnection);
(gdb) p mTexture
$38 = (QSGTexture *) 0x7f08024070
(gdb) n
223             node = new MozExtMaterialNode;
(gdb) 
230             node->setTexture(mTexture);
(gdb) 
Nice! But this never happens on ESR 91. In the case of ESR 91 the d->mIsPainted flag remains set to false throughout. Consequently we never get to the point where the texture or MozExtMaterialNode object are created on ESR 91:
(gdb) break QuickMozView::updatePaintNode
Breakpoint 9 at 0x7fe790b764 (2 locations)
(gdb) r
[...]
Thread 9 &quot;QSGRenderThread&quot; hit Breakpoint 1, QuickMozView::
    updatePaintNode (this=0x555586cf70, oldNode=0x0) at quickmozview.cpp:172
172         if (width() <= 0 || height() <= 0) {

(gdb) break quickmozview.cpp:223
Breakpoint 11 at 0x7fbfc03ea4: file quickmozview.cpp, line 230.
(gdb) r
[...]
We end up just getting this same sequence again and again and again:
Thread 9 &quot;QSGRenderThread&quot; hit Breakpoint 7, QuickMozView::
    updatePaintNode (this=0x555586d0d0, oldNode=0x7fc800c640) at 
    quickmozview.cpp:172
172         if (width() <= 0 || height() <= 0) {
(gdb) n
181         const bool invalidTexture = !mComposited
(gdb) p mComposited
$7 = true
(gdb) p d->mIsPainted
$8 = false
(gdb) n
188         if (mTexture && invalidTexture) {
(gdb) p mTexture
$9 = (QSGTexture *) 0x0
(gdb) n
196         QRectF boundingRect(d->renderingOffset(), d->mSize);
(gdb) 
198         if (!mTexture && invalidTexture) {
(gdb) 
200             if (!node) {
(gdb) 
203             node->setColor(d->mBackgroundColor);
(gdb) 
204             node->setRect(boundingRect);
(gdb) 
206             return node;
(gdb) c
Clearly the question we have to answer is "why is d->mIsPainted never set to true?". If we can answer this, who knows, we might even be on the right path to solving the rendering problem.

This has been a rather long investigation today already, so I'm going to leave it there for now. But tomorrow I'm going to come back to this in order to try to answer this question. This feels like it could be a very fruitful line of enquiry!

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
19 May 2024 : Day 237 #
I was easing myself gently back into Gecko development yesterday with a quick look over some of the work that others in the Sailfish community have been doing while I've been having a break, related to deciphering the messed up textures that are coming out of the render surface.

Today I'm back looking at the code in earnest. And it's been a fruitful process. My starting point has been to take a look not at the Gecko code, but from the other end: starting with QtMozEmbed and sailfish-components-webview. Taking things easy, I'm just browsing through the code, trying to find where the Surface, as used by Gecko, connects with the texture rendered to the screen by the WebView. If I can figure that out I'll feel like I'm on solid ground.

It looks like a key piece of the puzzle happens in QMozExtTexture::updateTexture(). For example, on ESR 78, when rendering is taking place, the method gets hit often. So often that it appears to be every frame:
Thread 9 &quot;QSGRenderThread&quot; hit Breakpoint 1, QMozExtTexture::
    updateTexture (this=0x7f90024e30) at qmozexttexture.cpp:64
64      {
(gdb) bt
#0  QMozExtTexture::updateTexture (this=0x7f90024e30) at qmozexttexture.cpp:64
#1  0x0000007fbfc04f18 in MozMaterialNode::preprocess (this=0x7f9001ef60) at 
    qmozextmaterialnode.cpp:81
#2  0x0000007fbf8f0c44 in QSGRenderer::preprocess() () from /usr/lib64/
    libQt5Quick.so.5
#3  0x0000007fbf8f0e5c in QSGRenderer::renderScene(QSGBindable const&) () from /
    usr/lib64/libQt5Quick.so.5
#4  0x0000007fbf8f14c0 in QSGRenderer::renderScene(unsigned int) () from /usr/
    lib64/libQt5Quick.so.5
#5  0x0000007fbf8ffe58 in QSGRenderContext::renderNextFrame(QSGRenderer*, 
    unsigned int) () from /usr/lib64/libQt5Quick.so.5
#6  0x0000007fbf9429d0 in QQuickWindowPrivate::renderSceneGraph(QSize const&) (
    ) from /usr/lib64/libQt5Quick.so.5
#7  0x0000007fbf916c04 in ?? () from /usr/lib64/libQt5Quick.so.5
#8  0x0000007fbf91cc10 in ?? () from /usr/lib64/libQt5Quick.so.5
#9  0x0000007fbea290e8 in ?? () from /usr/lib64/libQt5Core.so.5
#10 0x0000007fb74b0a4c in start_thread (arg=0x7fffffe9bf) at pthread_create.c:
    479
#11 0x0000007fbe70b89c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/
    clone.S:78
This contrasts with the ESR 91 build, where the same breakpoint is never hit. I placed a breakpoint on all of the updateTexture() methods just to be sure:
Num Disp Enb Address       What
3   keep y   <MULTIPLE>         
3.1      y   0x07ff7b64ba4 <QSGDistanceFieldGlyphCache::updateTexture()@plt+4>
3.2      y   0x07ff7b64d64 <QSGDefaultPainterNode::updateTexture()@plt+4>
3.3      y   0x07ff7bf071c <QSGDefaultPainterNode::updateTexture()+20>
3.4      y   0x07ff7bf5230 <QSGDistanceFieldGlyphCache::updateTexture()+24>
3.5      y   0x07ff7c1babc <QSGDefaultLayer::updateTexture()+12>
3.6      y   0x07ff7ef846c <QMozExtTexture::updateTexture()+20>
The fact there are hits for ESR 78, but not for ESR 91, is definitely of interest.

This isn't something that's come up before. Until now I've focused almost exclusively on getting the texture rendered to at the other end. But it looks now like the QtMozEmbed code is failing somewhere; the render update isn't being called. Just to be clear though, this isn't due to any change that I've made in QtMozEmbed. At least that seems unlikely at any rate, given I've not made an notable changes to these parts of the code.

More likely, some part of the initialisation process in the Gecko code is failing, or not happening as it should. Something like that could well lead to a situation where the render update doesn't get called. For example, it could be that the texture is never set as part of the Qt Scene Graph. But that's speculation, it could be all sorts of things.

I'm rather excited by this. Maybe this is the key problem we've been searching for? But I don't want to go too far today while I'm still getting myself up to speed with things. So I'll continue looking in to this further tomorrow.

Also worth mentioning is that tomorrow is Jolla Love Day 2. I'm excited to find out what Jolla have in store for us and while I won't be able to attend in-person, I'll definitely be there both online and in spirit!

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
18 May 2024 : Day 236 #
It's been two weeks since I last published an entry in my Gecko dev diary series. The gap was necessary for me to fit various other things into my life which just weren't going to be compatible with me working on the Gecko code, specifically my attendance of the — HPC/AI Days — conference in Durham for a week, followed by preparations for a couple of Pint of Science events I was involved with during this last week. This was my first involvement with Pint of Science and I have to admit it was a lot of fun.

The break from Gecko offered me some healthy respite, but I promised to start back up again today and here we are. I've not yet got back in to the rhythm of Gecko development yet, so my first few entries may be somewhat relaxed. I'll be taking it gently to begin with.

One important reason for this is the problem I'm trying to solve. As I left things a fortnight ago I'm still trying to figure out why the offscreen rendering pipeline is broken. Since starting on trying to fix this problem I've come a long way, but I'm still not there. More importantly, by this point it's become a problem without a clear solution. That means I no longer have a clear path forwards.

The daily rhythm of these posts is important for keeping me focused on the task, but it can also be a distraction. It means chopping the work into day-sized chunks. That can be a hindrance when what the task really needs is a deep analysis, during which there may not be so much to write about. This work is easiest to write about when I'm making code changes. When I'm not making changes to the code, things can be a bit too intangible.

That's where I am today.

Before I get in to the actual work I want to first talk about confusing textures and the amazing work of the Sailfish community in their attempts to disentangle them. If you've been following these diary entries for some time you'll know that before the gap I spent a fair amount of time trying to extract texture image data from the render surface used by the offline renderer.

The result of all this work ended up being a collection of raw dumps of pixel data which, after attempting to convert them to something visible, looked like this.
 
The corrupted, swizzled image data from before the gap. After two weeks setting around unchanged it still just looks like fuzzy pixels to me.

I tried all sorts to get the images into better shape but without any luck. But the courageous Sailfish community took up the baton and continued the work while I was taking a break from it.

There were great contributions from across the community, but I want to especially highlight the efforts and ideas of Tone (tortoisedoc), Adam Pigg (piggz), Ville Nummela (vige, my ex-colleague from Jolla), Mark Washeim (poetaster), thigg, attah, remote and kan_ibal, all of whom provided genuinely useful input.
 
Three screenshots: a texture with some ghostly images shown on it, a zoomed in texture with a few bright pixels and a window showing a similar texture with some metadata in a desktop window

Here you can see some of the efforts to decipher the code. On the left is the result of poetaster having processed the image using ImageMagick. As poetaster explains:
 
I thought it might be an offset issue, but after tooling about in gimp, I'm not so sure. What 'should' they look like. The RGB data does seem to be RBG, the RGBA, I'm not sure. You said unsigned?... bits of sailfish logo can also be recognized. It does look unsigned judging from the results of the convert operation.

In the centre is a close up rendered by piggz using PixelViewer.
 
There is definitely almost an image there! For this I used PixelViewer, which allows to quickly try many different possible formats, this one is RGBA_8888... This is making a great puzzle... just need to figure out how to put the bits together.

When Adam talks about bits I'm not sure whether he's talking about computer bits or puzzle pieces, but either way I agree it should just be a matter of putting them together int the right way.

Finally on the right we can see the image also rendered using PixelViewer this time by kan_ibal in combination with a Python script.
 
There are blocks 8x8 pixels. It reminds me a jpeg compression and looks like a stage of quantization and DCT.

Again, this all sounds very plausible to me, alghouth the image still isn't popping out just yet. Thank you to everyone for your input on this. Unfortunately while none of these resulted in a definitive conclusion, it's all very helpful input. If anyone else would like to give this a go, feel free to take a look at the textures and pass them through your favourite raw image processing pipeline. I'd love to get to the bottom of this.

I'm going to leave it there for today. But tomorrow I'll be starting to look at the Gecko code again in earnest.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
3 May 2024 : Day 235 #
As discussed at the start of the week, this is going to be my last developer diary post for a little bit. But I want to make absolutely clear that this is a temporary pause. I'm heading to the HPC Days Conference in Durham next week where I'll be giving a talk on Matching AI Research to HPC Resource. I'm expecting it to be a packed schedule and, as is often the case with this kind of event, I'm not expecting to be able to fit in my usual gecko work alongside this. So there will be a pause, but I'll be back on the 18th May to restart from where I'm leaving things today.

During the pause I'm hoping write up some of the blog posts that I've put on hold while working on the gecko engine, so things may not be completely silent around here. But the key message I want to get across is that I'm not abandoning gecko. Not at all. I'll be back to it in no time.

Nevertheless, as I write this I'm still frustrated and stumped by my lack of progress with the offscreen rendering. So I'm also hoping that a break will give me the chance to come up with some alternative approaches. Over the last couple of days I attempted to capture the contents of the surface used to transfer the offscreen texture onscreen, but the results were inconclusive at best.

On the forums I received helpful advice from Tone (tortoisedoc). Tone highlighted the various areas where texture decoding can go awry:
 
two things can go wrong (assuming the data is correct in the texture):
  • alignment of data (start, stride)
  • format of pixel (ARGB / RGBA etc)
Which one is the texture you are look at configured with?

Is the image the same the browser would show? Which image are you displaying in the embedded view?


These are all great points and great questions. Although I know what sort of image I'm asking for (either RGB or RGBA in UNSIGNED_BYTE format) the results don't seem to be matching that. It's made more complex by the fact that the outline of the image is there (which suggests things like start and stride are correct) but still many of the pixels are just blank. It's somewhat baffling and I think the reason might be more to do with an image that doesn't exist rather than an image which is being generated in the wrong format.

But I could be wrong and I'll continue to consider these possibilities and try to figure out the underlying reason. I really appreciate the input and as always it gives me more go to on.

But today I've shifted focus just briefly by giving the code a last review before the pause: sifting through the code again to try to spot differences.

There were many places I thought there might be differences, such as the fact that the it wasn't clear to me that the GLContext->mDesc.isOffscreen flag was being set. But stepping through the application in the debugger showed it was set after all. So no fix to be had there.

The only difference I can see — and it's too small to make a difference I'm sure — is that the Surface capabilities are set at slightly different places. It's possible that in the gap between where it's set in ESR 78 and the place it's set slightly later in ESR 91, something makes use of it and a difference results.

I'm not convinced to be honest, but to avoid even the slightest doubt I've updated the code so that the values are now set explicitly:
GLContext::GLContext(const GLContextDesc& desc, GLContext* sharedContext,
                     bool useTLSIsCurrent)
    : mDesc(desc),
      mUseTLSIsCurrent(ShouldUseTLSIsCurrent(useTLSIsCurrent)),
      mDebugFlags(ChooseDebugFlags(mDesc.flags)),
      mSharedContext(sharedContext),
      mWorkAroundDriverBugs(
          StaticPrefs::gfx_work_around_driver_bugs_AtStartup()) {
  mCaps.any = true;
  mCaps.color = true;
[...]
}

[...]

bool GLContext::InitImpl() {
[...]
  // TODO: Remove SurfaceCaps::any.
  if (mCaps.any) {
    mCaps.any = false;
    mCaps.color = true;
    mCaps.alpha = false;
  }
[...]
The updated code is built and right now transferring over to my development phone. As I say though, this doesn't look especially promising to me. I'm not holding my breath for a successful render.

And I was right: no improvements after this change. Argh. How frustrating.

I'll continue looking through the fine details of the code. There has to be a difference in here that I'm missing. And while it doesn't make for stimulating diary entries, just sitting and sifting through the code seems like an essential task nonetheless.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.

If you've got this far, then not only do you have my respect and thanks, but I'd also urge you to return on the 18th May when I'll be picking things up from where I've left them today.
Comment