Gecko-dev Diary
Between August 2023 and September 2024 I upgrading the Sailfish OS browser from Gecko version ESR 78 to ESR 91, writing a daily blog as I went along. This page catalogues my progress, alongside the other browser-related topics I've looked in to since.
Latest code changes are in the gecko-dev sailfishos-esr91 branch.
There is an index of all posts in case you want to jump to a particular day.
Gecko
5 most recent items
20 May 2024 : Day 238 #
It's been an exciting and packed day of Jolla and Sailfish OS news today with the announcement of not one but two new Jolla devices, plus some significant changes to the way Sailfish OS will be developed and funded in the future. This isn't the place to dwell on these topics, but I am excited for the future of the operating system. I didn't want it to go without saying, but I do want to focus on Gecko here, so let's head straight in to development.
Yesterday I made what could be an important discovery. The QMozExtTexture::updateTexture() method is being called on ESR 78, but not on ESR 91. If I can track down the underlying reason for this, then it may well help getting the offscreen rendering back up and running again.
The next step is to find out at which point the failure is happening and try to understand why. Before we get into it I want to warn you that this is rather a long post, mostly containing debugging output which is dull at the best of times. But it does lead to some useful conclusions, so if you're interest to follow but don't want to have to read through all of the details, don't feel bad if you want to skip straight to the end!
If you're still reading this then I'll assume you're in it for the long haul. Don't say I didn't warn you though! So the best place to start this investigation is with the backtrace of the call. We saw this yesterday and it looked like this:
To establish this I'm putting breakpoints on each of the functions in the backtrace and then running the ESR 91 build. If one of them hits I move to one higher up the stack (that is, the next line up that has a smaller number). Eventually one will fail to hit and at that point I'll know where the failure is happening. So here goes.
That means the failure is happening inside the QSGRenderContext::renderNextFrame() method. This call isn't part of the Gecko code, or indeed part of the QtMozEmbed code, but rather part of the Qt stack. In particular, part of the "Qt Scene Graph" stack (hence the "QSG" prefix of the method).
I'm not super familiar with how the Qt Scene Graph does it's thing, but there is plenty of good documentation out there. So I've done some reading around to get a better idea.
One part sprang out at me, since it specifically relates to the method that isn't getting called. The explanation that follows is taken from the Qt documentation:
So maybe the problem is that the QSGNode::UsePreprocess flag is never being set? A quick scan of the QtMozEmbed code throws up the following bit of code for controlling the flag, in the qmozextmaterialnode.cpp file:
We'll use the same approach we used before to try to figure out why this isn't getting called. We take the backtrace of the ESR 78 build where it is getting called and work our way up through it until we find out what's failing on the ESR 91 side. Here's the backtrace, taken from ESR 78:
After a few cycles through like this, there's suddenly a change. While the mTexture value remains unset, the invalidTexture value flips to false which means the clause gets skipped. We then end up inside the (!node) section, which is where things get interesting. At that point the texture is created, as is the MozExtMaterialNode object and now things are properly set up for rendering.
Let's break this down a bit further. Here's what the initial few cycles look like:
This has been a rather long investigation today already, so I'm going to leave it there for now. But tomorrow I'm going to come back to this in order to try to answer this question. This feels like it could be a very fruitful line of enquiry!
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Yesterday I made what could be an important discovery. The QMozExtTexture::updateTexture() method is being called on ESR 78, but not on ESR 91. If I can track down the underlying reason for this, then it may well help getting the offscreen rendering back up and running again.
The next step is to find out at which point the failure is happening and try to understand why. Before we get into it I want to warn you that this is rather a long post, mostly containing debugging output which is dull at the best of times. But it does lead to some useful conclusions, so if you're interest to follow but don't want to have to read through all of the details, don't feel bad if you want to skip straight to the end!
If you're still reading this then I'll assume you're in it for the long haul. Don't say I didn't warn you though! So the best place to start this investigation is with the backtrace of the call. We saw this yesterday and it looked like this:
#0 QMozExtTexture::updateTexture (this=0x7f90024e30) at qmozexttexture.cpp:64 #1 0x0000007fbfc04f18 in MozMaterialNode::preprocess (this=0x7f9001ef60) at qmozextmaterialnode.cpp:81 #2 0x0000007fbf8f0c44 in QSGRenderer::preprocess() () from /usr/lib64/ libQt5Quick.so.5 #3 0x0000007fbf8f0e5c in QSGRenderer::renderScene(QSGBindable const&) () from / usr/lib64/libQt5Quick.so.5 #4 0x0000007fbf8f14c0 in QSGRenderer::renderScene(unsigned int) () from /usr/ lib64/libQt5Quick.so.5 #5 0x0000007fbf8ffe58 in QSGRenderContext::renderNextFrame(QSGRenderer*, unsigned int) () from /usr/lib64/libQt5Quick.so.5 #6 0x0000007fbf9429d0 in QQuickWindowPrivate::renderSceneGraph(QSize const&) ( ) from /usr/lib64/libQt5Quick.so.5 #7 0x0000007fbf916c04 in ?? () from /usr/lib64/libQt5Quick.so.5 #8 0x0000007fbf91cc10 in ?? () from /usr/lib64/libQt5Quick.so.5 #9 0x0000007fbea290e8 in ?? () from /usr/lib64/libQt5Core.so.5 #10 0x0000007fb74b0a4c in start_thread (arg=0x7fffffe9bf) at pthread_create.c: 479 #11 0x0000007fbe70b89c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/ clone.S:78This backtrace is, of course, from the ESR 78 build. There is no equivalent backtrace for the ESR 91 build because the QMozExtTexture::updateTexture() method never gets called in ESR 91. But maybe some of the other methods in the backtrace are being called. Let's find out.
To establish this I'm putting breakpoints on each of the functions in the backtrace and then running the ESR 91 build. If one of them hits I move to one higher up the stack (that is, the next line up that has a smaller number). Eventually one will fail to hit and at that point I'll know where the failure is happening. So here goes.
Thread 68 "QSGRenderThread" hit Breakpoint 4, 0x0000007ff7b6e984 in QQuickWindowPrivate::renderSceneGraph(QSize const&)@plt () from /usr/lib64/libQt5Quick.so.5 (gdb) delete break 4 (gdb) b QSGRenderContext::renderNextFrame Breakpoint 5 at 0x7fe82e4474 (2 locations) (gdb) c Continuing. Thread 68 "QSGRenderThread" hit Breakpoint 5, 0x0000007fe82e4474 in QSGRenderContext::renderNextFrame(QSGRenderer*, unsigned int)@plt () from /usr/lib64/qt5/plugins/scenegraph/libcustomcontext.so (gdb) delete break 5 (gdb) b QSGRenderer::preprocess Breakpoint 6 at 0x7ff7be7a50 (gdb) c Continuing. Thread 68 "QSGRenderThread" hit Breakpoint 6, 0x0000007ff7be7a50 in QSGRenderer::preprocess() () from /usr/lib64/libQt5Quick.so.5 (gdb) delete break 6 (gdb) b MozMaterialNode::preprocess Breakpoint 7 at 0x7ff7ef899c (gdb) c Continuing. frame008.data: Colour before: (0, 0, 0, 255), 1 frame009.data: Colour before: (0, 0, 0, 255), 1 frame010.data: Colour before: (0, 0, 0, 255), 1 frame011.data: Colour before: (0, 0, 0, 255), 1 [...]As we can see from the above, the QQuickWindowPrivate::renderSceneGraph() method does get called. So does the QSGRenderContext::renderNextFrame() method. but the QSGRenderer::preprocess() method? That never gets called.
That means the failure is happening inside the QSGRenderContext::renderNextFrame() method. This call isn't part of the Gecko code, or indeed part of the QtMozEmbed code, but rather part of the Qt stack. In particular, part of the "Qt Scene Graph" stack (hence the "QSG" prefix of the method).
I'm not super familiar with how the Qt Scene Graph does it's thing, but there is plenty of good documentation out there. So I've done some reading around to get a better idea.
One part sprang out at me, since it specifically relates to the method that isn't getting called. The explanation that follows is taken from the Qt documentation:
void QSGNode::preprocess()
Override this function to do processing on the node before it is rendered.
Preprocessing needs to be explicitly enabled by setting the flag QSGNode::UsePreprocess. The flag needs to be set before the node is added to the scene graph and will cause the preprocess() function to be called for every frame the node is rendered.
Override this function to do processing on the node before it is rendered.
Preprocessing needs to be explicitly enabled by setting the flag QSGNode::UsePreprocess. The flag needs to be set before the node is added to the scene graph and will cause the preprocess() function to be called for every frame the node is rendered.
So maybe the problem is that the QSGNode::UsePreprocess flag is never being set? A quick scan of the QtMozEmbed code throws up the following bit of code for controlling the flag, in the qmozextmaterialnode.cpp file:
MozMaterialNode::MozMaterialNode() { setFlag(UsePreprocess); setGeometry(&m_geometry); }So, we should try to establish whether this is getting called or not. Running with some breakpoints shows that it is indeed getting called on the ESR 78 build.
(gdb) break MozMaterialNode::MozMaterialNode Breakpoint 2 at 0x7fbfbdadc4 (2 locations) (gdb) r [...] Thread 11 "QSGRenderThread" hit Breakpoint 2, MozMaterialNode:: MozMaterialNode (this=this@entry=0x7f0001ef60) at qmozextmaterialnode.cpp:27 27 MozMaterialNode::MozMaterialNode() (gdb) n 619 /usr/include/qt5/QtCore/qrect.h: No such file or directory. (gdb) 27 MozMaterialNode::MozMaterialNode() (gdb) 31 setGeometry(&m_geometry); (gdb)That's no big surprise, since if it were not being called the other parts of the render pipeline would also be failing and they're not. How about on ESR 91 though? We know the render pipeline isn't working there and that could be because this call isn't happening. Let's find out by adding the same breakpoint on the ESR 91 build and executing it just as we did for ESR 78:
(gdb) break MozMaterialNode::MozMaterialNode Breakpoint 8 at 0x7ff7ece9e4 (2 locations) (gdb) r [...] frame014.data: Colour before: (8, 0, 152, 255), 1 frame015.data: Colour before: (1, 0, 255, 255), 1 frame016.data: Colour before: (223, 5, 0, 255), 1 frame017.data: Colour before: (71, 94, 162, 255), 1 ^C Thread 1 "harbour-webview" received signal SIGINT, Interrupt. 0x0000007ff69f8740 in poll () from /lib64/libc.so.6 (gdb) info break Num Type Disp Enb Address What 8 breakpoint keep y <MULTIPLE> 8.1 y 0x0000007ff7ece9e4 <MozMaterialNode:: MozMaterialNode()@plt+4> 8.2 y 0x0000007ff7ef9028 <MozMaterialNode:: MozMaterialNode()+16> (gdb)Here the breakpoint doesn't get called. You can see that I double-checked that the breakpoint is set correctly and it is. So this is looking very much like a smoking gun.
We'll use the same approach we used before to try to figure out why this isn't getting called. We take the backtrace of the ESR 78 build where it is getting called and work our way up through it until we find out what's failing on the ESR 91 side. Here's the backtrace, taken from ESR 78:
#0 MozMaterialNode::MozMaterialNode (this=this@entry=0x7f0001ef60) at qmozextmaterialnode.cpp:31 #1 0x0000007fbfc05834 in MozExtMaterialNode::MozExtMaterialNode ( this=0x7f0001ef60) at qmozextmaterialnode.cpp:376 #2 0x0000007fbfc03ea4 in QuickMozView::updatePaintNode (this=0x55558433a0, oldNode=<optimized out>) at /usr/include/xulrunner-qt5-78.15.1/mozilla/cxxalloc.h:33 #3 0x0000007fbf941c50 in QQuickWindowPrivate::updateDirtyNode(QQuickItem*) () from /usr/lib64/libQt5Quick.so.5 #4 0x0000007fbf94214c in QQuickWindowPrivate::updateDirtyNodes() () from /usr/ lib64/libQt5Quick.so.5 #5 0x0000007fbf943270 in QQuickWindowPrivate::syncSceneGraph() () from /usr/ lib64/libQt5Quick.so.5 #6 0x0000007fbf9164a8 in ?? () from /usr/lib64/libQt5Quick.so.5 #7 0x0000007fbf917134 in ?? () from /usr/lib64/libQt5Quick.so.5 #8 0x0000007fbf91cc10 in ?? () from /usr/lib64/libQt5Quick.so.5 #9 0x0000007fbea290e8 in ?? () from /usr/lib64/libQt5Core.so.5 #10 0x0000007fb74b0a4c in start_thread (arg=0x7fffffe9bf) at pthread_create.c: 479 #11 0x0000007fbe70b89c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/ clone.S:78It's important to note that the call at the top to MozMaterialNode::MozMaterialNode() — which is the one we're interested in — isn't a normal call, it's a constructor. It's not being called directly, but rather as a side-effect of a call to the MozExtMaterialNode class constructor. This inherits from the MozMaterialNode class as we can see in this code copied from qozextmaterialnode.cpp:
class MozExtMaterialNode : public MozMaterialNode { public: MozExtMaterialNode(); ~MozExtMaterialNode(); [...]As an aside, there's also a MozRgbMaterialNode class, but it looks like we're not using that:
class MozRgbMaterialNode : public MozMaterialNode { public: MozRgbMaterialNode(); ~MozRgbMaterialNode(); [...]A quick check confirms that it's definitely only the MozExtMaterialNode class in use for us here:
(gdb) break MozRgbMaterialNode Breakpoint 3 at 0x7fbfc056a0: file qmozextmaterialnode.cpp, line 176. (gdb) b MozExtMaterialNode Breakpoint 4 at 0x7fbfbdc174 (2 locations) (gdb) r [...] Thread 10 "QSGRenderThread" hit Breakpoint 4, 0x0000007fbfbdc174 in MozExtMaterialNode::MozExtMaterialNode()@plt () from /usr/lib64/ libqt5embedwidget.so.1 (gdb)Another check of the backtrace and an examination of the code shows that the place where this is constructed is inside the QuickMozView::updatePaintNode() method; part of QtMozEmbed. We're going to be spending a bit of time inside this code, so it's worth taking a look for yourself at the source. But for reference, here's the full source for the method we're dealing with:
QSGNode * QuickMozView::updatePaintNode(QSGNode *oldNode, UpdatePaintNodeData *) { // If the dimensions are entirely invalid return no node. if (width() <= 0 || height() <= 0) { delete oldNode; delete mTexture; mTexture = nullptr; return nullptr; } const bool invalidTexture = !mComposited || !d->mIsPainted || !d->mViewInitialized || !d->mHasCompositor || !d->mContext->registeredWindow() || !d->mMozWindow; if (mTexture && invalidTexture) { delete oldNode; oldNode = nullptr; delete mTexture; mTexture = nullptr; } QRectF boundingRect(d->renderingOffset(), d->mSize); if (!mTexture && invalidTexture) { QSGSimpleRectNode *node = static_cast<QSGSimpleRectNode *>(oldNode); if (!node) { node = new QSGSimpleRectNode; } node->setColor(d->mBackgroundColor); node->setRect(boundingRect); return node; } if (!mTexture) { delete oldNode; oldNode = nullptr; } MozMaterialNode *node = static_cast<MozMaterialNode *>(oldNode); if (!node) { #if defined(QT_OPENGL_ES_2) QMozExtTexture * const texture = new QMozExtTexture; mTexture = texture; connect(texture, &QMozExtTexture::getPlatformImage, d->mMozWindow, &QMozWindow::getPlatformImage, Qt::DirectConnection); node = new MozExtMaterialNode; #else #warning "Implement me for non ES2 platform" // node = new MozRgbMaterialNode; return nullptr; #endifNote the call there to new MozExtMaterialNode. That's the line we're really interested in. On ESR 78 when we step through the code we initially get an execution flow that enters the (!mTexture && invalidTexture) clause. Once we're in there we know the method is going to return early and the MozExtMaterialNode constructor isn't going to get called until we re-enter the method.
After a few cycles through like this, there's suddenly a change. While the mTexture value remains unset, the invalidTexture value flips to false which means the clause gets skipped. We then end up inside the (!node) section, which is where things get interesting. At that point the texture is created, as is the MozExtMaterialNode object and now things are properly set up for rendering.
Let's break this down a bit further. Here's what the initial few cycles look like:
(gdb) break QuickMozView::updatePaintNode Breakpoint 8 at 0x7faf06f764 (2 locations) (gdb) r [...] Thread 10 "QSGRenderThread" hit Breakpoint 8, QuickMozView:: updatePaintNode (this=0x55558454e0, oldNode=0x0) at quickmozview.cpp:172 172 if (width() <= 0 || height() <= 0) { (gdb) delete break Delete all breakpoints? (y or n) y (gdb) break quickmozview.cpp:214 Breakpoint 9 at 0x7fbfc03fac: file quickmozview.cpp, line 214. (gdb) c Continuing. [...] Thread 10 "QSGRenderThread" hit Breakpoint 9, QuickMozView:: updatePaintNode (this=0x55558454e0, oldNode=0x7f0801eab0) at quickmozview.cpp:216 216 if (!node) { (gdb)Stepping through this section more carefully we can see why the invalidTexture flag is set to true in all these cases:
Thread 10 "QSGRenderThread" hit Breakpoint 12, QuickMozView:: updatePaintNode (this=0x5555842660, oldNode=0x7f0800c400) at quickmozview.cpp:172 172 if (width() <= 0 || height() <= 0) { (gdb) p mComposited $23 = true (gdb) p d->mIsPainted $24 = false (gdb) p d->mViewInitialized $25 = true (gdb) p d->mHasCompositor $26 = true (gdb) p d->mContext->registeredWindow() $27 = (QMozWindow *) 0x555560ce60 (gdb) p d->mMozWindow $28 = {wp = {d = 0x5555bbe8e0, value = 0x555560ce60}} (gdb)Crucially here, we can see that p d->mIsPainted is set to false. While any of the items shown here are set to false the invalidTexture flag will be set to true. But then after three or four cycles like this the d->mIsPainted flag suddenly flips to true at which point everything else flows through as we want.
Thread 10 "QSGRenderThread" hit Breakpoint 12, QuickMozView:: updatePaintNode (this=0x5555842660, oldNode=0x7f0800c400) at quickmozview.cpp:172 172 if (width() <= 0 || height() <= 0) { (gdb) p d->mIsPainted $37 = true (gdb) n 181 const bool invalidTexture = !mComposited (gdb) 186 || !d->mMozWindow; (gdb) 196 QRectF boundingRect(d->renderingOffset(), d->mSize); (gdb) 198 if (!mTexture && invalidTexture) { (gdb) 210 delete oldNode; (gdb) 218 QMozExtTexture * const texture = new QMozExtTexture; (gdb) 219 mTexture = texture; (gdb) 221 connect(texture, &QMozExtTexture::getPlatformImage, d->mMozWindow, &QMozWindow::getPlatformImage, Qt::DirectConnection); (gdb) p mTexture $38 = (QSGTexture *) 0x7f08024070 (gdb) n 223 node = new MozExtMaterialNode; (gdb) 230 node->setTexture(mTexture); (gdb)Nice! But this never happens on ESR 91. In the case of ESR 91 the d->mIsPainted flag remains set to false throughout. Consequently we never get to the point where the texture or MozExtMaterialNode object are created on ESR 91:
(gdb) break QuickMozView::updatePaintNode Breakpoint 9 at 0x7fe790b764 (2 locations) (gdb) r [...] Thread 9 "QSGRenderThread" hit Breakpoint 1, QuickMozView:: updatePaintNode (this=0x555586cf70, oldNode=0x0) at quickmozview.cpp:172 172 if (width() <= 0 || height() <= 0) { (gdb) break quickmozview.cpp:223 Breakpoint 11 at 0x7fbfc03ea4: file quickmozview.cpp, line 230. (gdb) r [...]We end up just getting this same sequence again and again and again:
Thread 9 "QSGRenderThread" hit Breakpoint 7, QuickMozView:: updatePaintNode (this=0x555586d0d0, oldNode=0x7fc800c640) at quickmozview.cpp:172 172 if (width() <= 0 || height() <= 0) { (gdb) n 181 const bool invalidTexture = !mComposited (gdb) p mComposited $7 = true (gdb) p d->mIsPainted $8 = false (gdb) n 188 if (mTexture && invalidTexture) { (gdb) p mTexture $9 = (QSGTexture *) 0x0 (gdb) n 196 QRectF boundingRect(d->renderingOffset(), d->mSize); (gdb) 198 if (!mTexture && invalidTexture) { (gdb) 200 if (!node) { (gdb) 203 node->setColor(d->mBackgroundColor); (gdb) 204 node->setRect(boundingRect); (gdb) 206 return node; (gdb) cClearly the question we have to answer is "why is d->mIsPainted never set to true?". If we can answer this, who knows, we might even be on the right path to solving the rendering problem.
This has been a rather long investigation today already, so I'm going to leave it there for now. But tomorrow I'm going to come back to this in order to try to answer this question. This feels like it could be a very fruitful line of enquiry!
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comments
Uncover Disqus comments