flypig.co.uk

Gecko-dev Diary

Starting in August 2023 I'll be upgrading the Sailfish OS browser from Gecko version ESR 78 to ESR 91. This page catalogues my progress.

Latest code changes are in the gecko-dev sailfishos-esr91 branch.

There is an index of all posts in case you want to jump to a particular day.

Gecko RSS feed Click the icon for the Gecko-dev Diary RSS feed.

Gecko

5 most recent items

20 May 2024 : Day 238 #
It's been an exciting and packed day of Jolla and Sailfish OS news today with the announcement of not one but two new Jolla devices, plus some significant changes to the way Sailfish OS will be developed and funded in the future. This isn't the place to dwell on these topics, but I am excited for the future of the operating system. I didn't want it to go without saying, but I do want to focus on Gecko here, so let's head straight in to development.

Yesterday I made what could be an important discovery. The QMozExtTexture::updateTexture() method is being called on ESR 78, but not on ESR 91. If I can track down the underlying reason for this, then it may well help getting the offscreen rendering back up and running again.

The next step is to find out at which point the failure is happening and try to understand why. Before we get into it I want to warn you that this is rather a long post, mostly containing debugging output which is dull at the best of times. But it does lead to some useful conclusions, so if you're interest to follow but don't want to have to read through all of the details, don't feel bad if you want to skip straight to the end!

If you're still reading this then I'll assume you're in it for the long haul. Don't say I didn't warn you though! So the best place to start this investigation is with the backtrace of the call. We saw this yesterday and it looked like this:
#0  QMozExtTexture::updateTexture (this=0x7f90024e30) at qmozexttexture.cpp:64
#1  0x0000007fbfc04f18 in MozMaterialNode::preprocess (this=0x7f9001ef60) at 
    qmozextmaterialnode.cpp:81
#2  0x0000007fbf8f0c44 in QSGRenderer::preprocess() () from /usr/lib64/
    libQt5Quick.so.5
#3  0x0000007fbf8f0e5c in QSGRenderer::renderScene(QSGBindable const&) () from /
    usr/lib64/libQt5Quick.so.5
#4  0x0000007fbf8f14c0 in QSGRenderer::renderScene(unsigned int) () from /usr/
    lib64/libQt5Quick.so.5
#5  0x0000007fbf8ffe58 in QSGRenderContext::renderNextFrame(QSGRenderer*, 
    unsigned int) () from /usr/lib64/libQt5Quick.so.5
#6  0x0000007fbf9429d0 in QQuickWindowPrivate::renderSceneGraph(QSize const&) (
    ) from /usr/lib64/libQt5Quick.so.5
#7  0x0000007fbf916c04 in ?? () from /usr/lib64/libQt5Quick.so.5
#8  0x0000007fbf91cc10 in ?? () from /usr/lib64/libQt5Quick.so.5
#9  0x0000007fbea290e8 in ?? () from /usr/lib64/libQt5Core.so.5
#10 0x0000007fb74b0a4c in start_thread (arg=0x7fffffe9bf) at pthread_create.c:
    479
#11 0x0000007fbe70b89c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/
    clone.S:78
This backtrace is, of course, from the ESR 78 build. There is no equivalent backtrace for the ESR 91 build because the QMozExtTexture::updateTexture() method never gets called in ESR 91. But maybe some of the other methods in the backtrace are being called. Let's find out.

To establish this I'm putting breakpoints on each of the functions in the backtrace and then running the ESR 91 build. If one of them hits I move to one higher up the stack (that is, the next line up that has a smaller number). Eventually one will fail to hit and at that point I'll know where the failure is happening. So here goes.
Thread 68 "QSGRenderThread" hit Breakpoint 4, 0x0000007ff7b6e984 in 
    QQuickWindowPrivate::renderSceneGraph(QSize const&)@plt ()
   from /usr/lib64/libQt5Quick.so.5
(gdb) delete break 4
(gdb) b QSGRenderContext::renderNextFrame
Breakpoint 5 at 0x7fe82e4474 (2 locations)
(gdb) c
Continuing.

Thread 68 "QSGRenderThread" hit Breakpoint 5, 0x0000007fe82e4474 in 
    QSGRenderContext::renderNextFrame(QSGRenderer*, unsigned int)@plt ()
   from /usr/lib64/qt5/plugins/scenegraph/libcustomcontext.so
(gdb) delete break 5
(gdb) b QSGRenderer::preprocess
Breakpoint 6 at 0x7ff7be7a50
(gdb) c
Continuing.

Thread 68 "QSGRenderThread" hit Breakpoint 6, 0x0000007ff7be7a50 in 
    QSGRenderer::preprocess() () from /usr/lib64/libQt5Quick.so.5
(gdb) delete break 6
(gdb) b MozMaterialNode::preprocess
Breakpoint 7 at 0x7ff7ef899c
(gdb) c
Continuing.
frame008.data: Colour before: (0, 0, 0, 255), 1
frame009.data: Colour before: (0, 0, 0, 255), 1
frame010.data: Colour before: (0, 0, 0, 255), 1
frame011.data: Colour before: (0, 0, 0, 255), 1
[...]
As we can see from the above, the QQuickWindowPrivate::renderSceneGraph() method does get called. So does the QSGRenderContext::renderNextFrame() method. but the QSGRenderer::preprocess() method? That never gets called.

That means the failure is happening inside the QSGRenderContext::renderNextFrame() method. This call isn't part of the Gecko code, or indeed part of the QtMozEmbed code, but rather part of the Qt stack. In particular, part of the "Qt Scene Graph" stack (hence the "QSG" prefix of the method).

I'm not super familiar with how the Qt Scene Graph does it's thing, but there is plenty of good documentation out there. So I've done some reading around to get a better idea.

One part sprang out at me, since it specifically relates to the method that isn't getting called. The explanation that follows is taken from the Qt documentation:
 
void QSGNode::preprocess()

Override this function to do processing on the node before it is rendered.

Preprocessing needs to be explicitly enabled by setting the flag QSGNode::UsePreprocess. The flag needs to be set before the node is added to the scene graph and will cause the preprocess() function to be called for every frame the node is rendered.

So maybe the problem is that the QSGNode::UsePreprocess flag is never being set? A quick scan of the QtMozEmbed code throws up the following bit of code for controlling the flag, in the qmozextmaterialnode.cpp file:
MozMaterialNode::MozMaterialNode()
{
    setFlag(UsePreprocess);

    setGeometry(&m_geometry);
}
So, we should try to establish whether this is getting called or not. Running with some breakpoints shows that it is indeed getting called on the ESR 78 build.
(gdb) break MozMaterialNode::MozMaterialNode
Breakpoint 2 at 0x7fbfbdadc4 (2 locations)
(gdb) r
[...]
Thread 11 "QSGRenderThread" hit Breakpoint 2, MozMaterialNode::
    MozMaterialNode (this=this@entry=0x7f0001ef60) at qmozextmaterialnode.cpp:27
27      MozMaterialNode::MozMaterialNode()
(gdb) n
619     /usr/include/qt5/QtCore/qrect.h: No such file or directory.
(gdb) 
27      MozMaterialNode::MozMaterialNode()
(gdb) 
31          setGeometry(&m_geometry);
(gdb) 
That's no big surprise, since if it were not being called the other parts of the render pipeline would also be failing and they're not. How about on ESR 91 though? We know the render pipeline isn't working there and that could be because this call isn't happening. Let's find out by adding the same breakpoint on the ESR 91 build and executing it just as we did for ESR 78:
(gdb) break MozMaterialNode::MozMaterialNode
Breakpoint 8 at 0x7ff7ece9e4 (2 locations)
(gdb) r
[...]
frame014.data: Colour before: (8, 0, 152, 255), 1
frame015.data: Colour before: (1, 0, 255, 255), 1
frame016.data: Colour before: (223, 5, 0, 255), 1
frame017.data: Colour before: (71, 94, 162, 255), 1
^C
Thread 1 "harbour-webview" received signal SIGINT, Interrupt.
0x0000007ff69f8740 in poll () from /lib64/libc.so.6
(gdb) info break
Num     Type           Disp Enb Address            What
8       breakpoint     keep y   <MULTIPLE>         
8.1                         y     0x0000007ff7ece9e4 <MozMaterialNode::
    MozMaterialNode()@plt+4>
8.2                         y     0x0000007ff7ef9028 <MozMaterialNode::
    MozMaterialNode()+16>
(gdb) 
Here the breakpoint doesn't get called. You can see that I double-checked that the breakpoint is set correctly and it is. So this is looking very much like a smoking gun.

We'll use the same approach we used before to try to figure out why this isn't getting called. We take the backtrace of the ESR 78 build where it is getting called and work our way up through it until we find out what's failing on the ESR 91 side. Here's the backtrace, taken from ESR 78:
#0  MozMaterialNode::MozMaterialNode (this=this@entry=0x7f0001ef60) at 
    qmozextmaterialnode.cpp:31
#1  0x0000007fbfc05834 in MozExtMaterialNode::MozExtMaterialNode (
    this=0x7f0001ef60) at qmozextmaterialnode.cpp:376
#2  0x0000007fbfc03ea4 in QuickMozView::updatePaintNode (this=0x55558433a0, 
    oldNode=<optimized out>)
    at /usr/include/xulrunner-qt5-78.15.1/mozilla/cxxalloc.h:33
#3  0x0000007fbf941c50 in QQuickWindowPrivate::updateDirtyNode(QQuickItem*) () 
    from /usr/lib64/libQt5Quick.so.5
#4  0x0000007fbf94214c in QQuickWindowPrivate::updateDirtyNodes() () from /usr/
    lib64/libQt5Quick.so.5
#5  0x0000007fbf943270 in QQuickWindowPrivate::syncSceneGraph() () from /usr/
    lib64/libQt5Quick.so.5
#6  0x0000007fbf9164a8 in ?? () from /usr/lib64/libQt5Quick.so.5
#7  0x0000007fbf917134 in ?? () from /usr/lib64/libQt5Quick.so.5
#8  0x0000007fbf91cc10 in ?? () from /usr/lib64/libQt5Quick.so.5
#9  0x0000007fbea290e8 in ?? () from /usr/lib64/libQt5Core.so.5
#10 0x0000007fb74b0a4c in start_thread (arg=0x7fffffe9bf) at pthread_create.c:
    479
#11 0x0000007fbe70b89c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/
    clone.S:78
It's important to note that the call at the top to MozMaterialNode::MozMaterialNode() — which is the one we're interested in — isn't a normal call, it's a constructor. It's not being called directly, but rather as a side-effect of a call to the MozExtMaterialNode class constructor. This inherits from the MozMaterialNode class as we can see in this code copied from qozextmaterialnode.cpp:
class MozExtMaterialNode : public MozMaterialNode
{
public:
    MozExtMaterialNode();
    ~MozExtMaterialNode();
[...]
As an aside, there's also a MozRgbMaterialNode class, but it looks like we're not using that:
class MozRgbMaterialNode : public MozMaterialNode
{
public:
    MozRgbMaterialNode();
    ~MozRgbMaterialNode();
[...]
A quick check confirms that it's definitely only the MozExtMaterialNode class in use for us here:
(gdb) break MozRgbMaterialNode
Breakpoint 3 at 0x7fbfc056a0: file qmozextmaterialnode.cpp, line 176.
(gdb) b MozExtMaterialNode
Breakpoint 4 at 0x7fbfbdc174 (2 locations)
(gdb) r
[...]
Thread 10 &quot;QSGRenderThread&quot; hit Breakpoint 4, 0x0000007fbfbdc174 in 
    MozExtMaterialNode::MozExtMaterialNode()@plt () from /usr/lib64/
    libqt5embedwidget.so.1
(gdb)                            
Another check of the backtrace and an examination of the code shows that the place where this is constructed is inside the QuickMozView::updatePaintNode() method; part of QtMozEmbed. We're going to be spending a bit of time inside this code, so it's worth taking a look for yourself at the source. But for reference, here's the full source for the method we're dealing with:
QSGNode *
QuickMozView::updatePaintNode(QSGNode *oldNode, UpdatePaintNodeData *)
{
    // If the dimensions are entirely invalid return no node.
    if (width() <= 0 || height() <= 0) {
        delete oldNode;

        delete mTexture;
        mTexture = nullptr;

        return nullptr;
    }

    const bool invalidTexture = !mComposited
            || !d->mIsPainted
            || !d->mViewInitialized
            || !d->mHasCompositor
            || !d->mContext->registeredWindow()
            || !d->mMozWindow;

    if (mTexture && invalidTexture) {
        delete oldNode;
        oldNode = nullptr;

        delete mTexture;
        mTexture = nullptr;
    }

    QRectF boundingRect(d->renderingOffset(), d->mSize);

    if (!mTexture && invalidTexture) {
        QSGSimpleRectNode *node = static_cast<QSGSimpleRectNode *>(oldNode);
        if (!node) {
            node = new QSGSimpleRectNode;
        }
        node->setColor(d->mBackgroundColor);
        node->setRect(boundingRect);

        return node;
    }

    if (!mTexture) {
        delete oldNode;
        oldNode = nullptr;
    }

    MozMaterialNode *node = static_cast<MozMaterialNode *>(oldNode);

    if (!node) {
#if defined(QT_OPENGL_ES_2)
        QMozExtTexture * const texture = new QMozExtTexture;
        mTexture = texture;

        connect(texture, &QMozExtTexture::getPlatformImage, d->mMozWindow, 
    &QMozWindow::getPlatformImage, Qt::DirectConnection);

        node = new MozExtMaterialNode;
#else
#warning &quot;Implement me for non ES2 platform&quot;
//        node = new MozRgbMaterialNode;
        return nullptr;
#endif
Note the call there to new MozExtMaterialNode. That's the line we're really interested in. On ESR 78 when we step through the code we initially get an execution flow that enters the (!mTexture && invalidTexture) clause. Once we're in there we know the method is going to return early and the MozExtMaterialNode constructor isn't going to get called until we re-enter the method.

After a few cycles through like this, there's suddenly a change. While the mTexture value remains unset, the invalidTexture value flips to false which means the clause gets skipped. We then end up inside the (!node) section, which is where things get interesting. At that point the texture is created, as is the MozExtMaterialNode object and now things are properly set up for rendering.

Let's break this down a bit further. Here's what the initial few cycles look like:
(gdb) break QuickMozView::updatePaintNode
Breakpoint 8 at 0x7faf06f764 (2 locations)
(gdb) r
[...]
Thread 10 &quot;QSGRenderThread&quot; hit Breakpoint 8, QuickMozView::
    updatePaintNode (this=0x55558454e0, oldNode=0x0) at quickmozview.cpp:172
172         if (width() <= 0 || height() <= 0) {
(gdb) delete break
Delete all breakpoints? (y or n) y
(gdb) break quickmozview.cpp:214
Breakpoint 9 at 0x7fbfc03fac: file quickmozview.cpp, line 214.
(gdb) c
Continuing.
[...]
Thread 10 &quot;QSGRenderThread&quot; hit Breakpoint 9, QuickMozView::
    updatePaintNode (this=0x55558454e0, oldNode=0x7f0801eab0) at 
    quickmozview.cpp:216
216         if (!node) {
(gdb) 
Stepping through this section more carefully we can see why the invalidTexture flag is set to true in all these cases:
Thread 10 &quot;QSGRenderThread&quot; hit Breakpoint 12, QuickMozView::
    updatePaintNode (this=0x5555842660, oldNode=0x7f0800c400) at 
    quickmozview.cpp:172
172         if (width() <= 0 || height() <= 0) {
(gdb) p mComposited
$23 = true
(gdb) p d->mIsPainted
$24 = false
(gdb) p d->mViewInitialized
$25 = true
(gdb) p d->mHasCompositor
$26 = true
(gdb) p d->mContext->registeredWindow()
$27 = (QMozWindow *) 0x555560ce60
(gdb) p d->mMozWindow
$28 = {wp = {d = 0x5555bbe8e0, value = 0x555560ce60}}
(gdb) 
Crucially here, we can see that p d->mIsPainted is set to false. While any of the items shown here are set to false the invalidTexture flag will be set to true. But then after three or four cycles like this the d->mIsPainted flag suddenly flips to true at which point everything else flows through as we want.
Thread 10 &quot;QSGRenderThread&quot; hit Breakpoint 12, QuickMozView::
    updatePaintNode (this=0x5555842660, oldNode=0x7f0800c400) at 
    quickmozview.cpp:172
172         if (width() <= 0 || height() <= 0) {
(gdb) p d->mIsPainted
$37 = true
(gdb) n
181         const bool invalidTexture = !mComposited
(gdb) 
186                 || !d->mMozWindow;
(gdb) 
196         QRectF boundingRect(d->renderingOffset(), d->mSize);
(gdb) 
198         if (!mTexture && invalidTexture) {
(gdb) 
210             delete oldNode;
(gdb) 
218             QMozExtTexture * const texture = new QMozExtTexture;
(gdb) 
219             mTexture = texture;
(gdb) 
221             connect(texture, &QMozExtTexture::getPlatformImage, 
    d->mMozWindow, &QMozWindow::getPlatformImage, Qt::DirectConnection);
(gdb) p mTexture
$38 = (QSGTexture *) 0x7f08024070
(gdb) n
223             node = new MozExtMaterialNode;
(gdb) 
230             node->setTexture(mTexture);
(gdb) 
Nice! But this never happens on ESR 91. In the case of ESR 91 the d->mIsPainted flag remains set to false throughout. Consequently we never get to the point where the texture or MozExtMaterialNode object are created on ESR 91:
(gdb) break QuickMozView::updatePaintNode
Breakpoint 9 at 0x7fe790b764 (2 locations)
(gdb) r
[...]
Thread 9 &quot;QSGRenderThread&quot; hit Breakpoint 1, QuickMozView::
    updatePaintNode (this=0x555586cf70, oldNode=0x0) at quickmozview.cpp:172
172         if (width() <= 0 || height() <= 0) {

(gdb) break quickmozview.cpp:223
Breakpoint 11 at 0x7fbfc03ea4: file quickmozview.cpp, line 230.
(gdb) r
[...]
We end up just getting this same sequence again and again and again:
Thread 9 &quot;QSGRenderThread&quot; hit Breakpoint 7, QuickMozView::
    updatePaintNode (this=0x555586d0d0, oldNode=0x7fc800c640) at 
    quickmozview.cpp:172
172         if (width() <= 0 || height() <= 0) {
(gdb) n
181         const bool invalidTexture = !mComposited
(gdb) p mComposited
$7 = true
(gdb) p d->mIsPainted
$8 = false
(gdb) n
188         if (mTexture && invalidTexture) {
(gdb) p mTexture
$9 = (QSGTexture *) 0x0
(gdb) n
196         QRectF boundingRect(d->renderingOffset(), d->mSize);
(gdb) 
198         if (!mTexture && invalidTexture) {
(gdb) 
200             if (!node) {
(gdb) 
203             node->setColor(d->mBackgroundColor);
(gdb) 
204             node->setRect(boundingRect);
(gdb) 
206             return node;
(gdb) c
Clearly the question we have to answer is "why is d->mIsPainted never set to true?". If we can answer this, who knows, we might even be on the right path to solving the rendering problem.

This has been a rather long investigation today already, so I'm going to leave it there for now. But tomorrow I'm going to come back to this in order to try to answer this question. This feels like it could be a very fruitful line of enquiry!

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.

Comments

Uncover Disqus comments