flypig.co.uk

List items

Items from the current list are shown below.

Gecko

All items from May 2024

24 May 2024 : Day 242 #
It's funny how even when asleep I'm capable of making stupid mistakes. Yesterday I applied patch 0044 in order to restore the OnFirstPaint() trigger functionality. This is critical for getting the offscreen rendering pipeline to work. But as is the case after applying a patch I needed to rebuild the packages before they could be tested on my phone.

I kicked the build off to run overnight and then — d'oh! — promptly sent my laptop to sleep. It doesn't take a genius to realise that the build isn't going to run while the laptop is sleeping.

So here I am this morning without packages to test. At least my laptop is now awake and the build is continuing. But this has lost me the opportunity to do some development this morning.

Instead I've spent the time ordering myself a Jolla C2, the new Sailfish OS phone announced at Jolla Love Day 2 this last Monday. It's an interesting move by Jolla. For years now members of the community have been encouraging Jolla to offer Sailfish OS pre-installed. On the other hand, many in the community are also keen to see Sailfish OS offered on a very specific selection of handsets, be they PinePHones, Fairphones or brands that are more widely available in Europe, especially those known for being more "open".

My own opinion is that this is a good move by Jolla. There is no perfect phone and every choice has its benefits and drawbacks. Going with Sony phones all those years ago was controversial, but Sony hardware has (in my opinion) been both excellent and offering good value for the hardware. Nice materials, unassuming design and unexpectedly solid features (with headphones jacks, SD card slots, notification lights, and excellent screens to boot). Even more importantly Sony retained their commitment to openness over the long term, when other manufacturers have long since lost interest. On the other hand PinePhone and Fairphone have credentials that are really appealing for Sailfish users, who tend to prize openness and ethical considerations over price and features alone.

The new Reeder phone offers something a little different to either of these, which is the opportunity for Jolla to work hand-in-hand with the manufacturer. Jolla has a good track record doing this and the results historically have been very good. The new Reeder phone might not be open in the same way that the Sony or PinePhone are, but it's likely Jolla will have been given access to much more of what they need, including support, in order to build a really solid software experience for the hardware.

I'm speculating of course and only time will tell, but given we're still expecting Sailfish OS releases for the Xperia 10 IV and Xperia 10 V, and given I already have one of the latter, these are exciting times as far as I'm concerned.

Alright, let's get back to development. The build has carried on for a while but eventually failed with an error. Here's the debug output generated from the build process.
813:10.10 ${PROJECT}/gecko-dev/layout/base/nsRefreshDriver.cpp: In member 
    function ‘void nsRefreshDriver::Tick(mozilla::VsyncId, mozilla::TimeStamp, 
    nsRefreshDriver::IsExtraTick)’:
813:10.10 ${PROJECT}/gecko-dev/layout/base/nsRefreshDriver.cpp:2430:41: error: 
    ‘mNextTransactionId’ was not declared in this scope
813:10.10        transactionId.AppendInt((uint64_t)mNextTransactionId);
813:10.10                                          ^~~~~~~~~~~~~~~~~~
813:10.12 ${PROJECT}/gecko-dev/layout/base/nsRefreshDriver.cpp:2430:41: note: 
    suggested alternative: ‘GetTransactionId’
813:10.13        transactionId.AppendInt((uint64_t)mNextTransactionId);
813:10.13                                          ^~~~~~~~~~~~~~~~~~
813:10.13                                          GetTransactionId
813:14.49 make[4]: *** [${PROJECT}/gecko-dev/config/rules.mk:694: 
    nsRefreshDriver.o] Error 1
The error is telling us that mNextTransactionIddoesn't exist and that's correct because it was removed by the patch. But there's still an instance remaining that the patch didn't capture. To get this through I've had to make one small change. This code looks to be related to profiling so probably isn't really needed for our purposes anyway, but we do need it to compile.
$ git diff layout/base/nsRefreshDriver.cpp
diff --git a/layout/base/nsRefreshDriver.cpp b/layout/base/nsRefreshDriver.cpp
index 33c90bb1f324..bb43dea053ea 100644
--- a/layout/base/nsRefreshDriver.cpp
+++ b/layout/base/nsRefreshDriver.cpp
@@ -2427,7 +2427,7 @@ void nsRefreshDriver::Tick(VsyncId aId, TimeStamp 
    aNowTime,
     nsCString transactionId;
     if (profiler_can_accept_markers()) {
       transactionId.AppendLiteral("Transaction ID: ");
-      transactionId.AppendInt((uint64_t)mNextTransactionId);
+      transactionId.AppendInt((uint64_t)mPendingTransaction);
     }
     AUTO_PROFILER_MARKER_TEXT(
         "ViewManagerFlush", GRAPHICS,
This is a similar change to that made elsewhere in the code, so should hopefully be pretty benign.

As the build is still taking a while I've decide I can't let all this potential development time go to waste. So I've decided to take the plunge and perform a test while the build continues. The current issue my most recent changes are aimed at addressing is the fact that the mIsPainted variable in the QMozViewPrivate class never gets set to true. So I'm wondering what will happen if I forcefully set it to true at a suitable time using the debugger.

If I'm honest I'm a little nervous about performing this test. If it turns out to have no effect then it'll leave me scratching my head yet again. I really want this rendering to work.

But I've plucked up the courage and am going to give it a go. First set the breakpoint on the method where the test for this flag is being made.
(gdb) b QuickMozView::updatePaintNode
Function "QuickMozView::updatePaintNode" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (QuickMozView::updatePaintNode) pending.
(gdb) r
Starting program: /usr/bin/harbour-webview
[...]
Now run the program through to the first time this breakpoint is hit. I don't want to flip the flag here because the rendering pipeline and textures aren't fully set up yet, so I'm going to wait for a later hit of the breakpoint.
Thread 9 "QSGRenderThread" hit Breakpoint 1, QuickMozView::
    updatePaintNode (this=0x555586dc20, oldNode=0x0) at quickmozview.cpp:172
172         if (width() <= 0 || height() <= 0) {
(gdb) c
Continuing.
[...]
=============== Preparing offscreen rendering context ===============
[...]
As we can see the rendering context has now been configured. Following this the breakpoint is hit again. Now is the time to change the flag from true to false. Here goes:
Thread 9 &quot;QSGRenderThread&quot; hit Breakpoint 1, QuickMozView::
    updatePaintNode (this=0x555586dc20, oldNode=0x7fc800c640) at 
    quickmozview.cpp:172
172         if (width() <= 0 || height() <= 0) {
(gdb) p d->mIsPainted
$1 = false
(gdb) set d->mIsPainted = true
(gdb) p d->mIsPainted
$2 = true
(gdb) disable break
(gdb) c
Continuing.
[...]
Finally I disabled the breakpoint and set the program running again. And the result? I'm astonished to find that the result is a fully rendered Sailfish OS website!
 
A phone showing the sailfishos.org website next to a laptop on a table with the debugger running in a console window.

Scrolling around, selecting links and interacting with the site all seem to work nicely. This is really encouraging. This means that once this patch is properly applied, that should be the final piece of the puzzle needed to get the WebView working again.
 
Three screenshots of the WebView rendering: the sailfish.org site at the top; the same site scrolled down a bit; and shop.jolla.com with the new Jolla C2 on display

I'm quite excited by this!

Unfortunately this is a debugging hack so we're not there yet. And the build has failed again. At least now, motivated by my excitement, I have plenty of energy to fix the build.

Here's the latest output from the failed build:
302:48.25 In file included from Unified_cpp_layout_base2.cpp:20:
302:48.25 ${PROJECT}/gecko-dev/layout/base/nsPresContext.cpp: In member 
    function ‘void nsPresContext::DetachPresShell()’:
302:48.25 ${PROJECT}/gecko-dev/layout/base/nsPresContext.cpp:843:15: error: 
    ‘class nsRootPresContext’ has no member named 
    ‘CancelApplyPluginGeometryTimer’; did you mean ‘mApplyPluginGeometryTimer’?
302:48.25      thisRoot->CancelApplyPluginGeometryTimer();
302:48.25                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
302:48.25                mApplyPluginGeometryTimer
302:48.40 ${PROJECT}/gecko-dev/layout/base/nsPresContext.cpp: In destructor 
    ‘virtual nsRootPresContext::~nsRootPresContext()’:
302:48.40 ${PROJECT}/gecko-dev/layout/base/nsPresContext.cpp:2683:3: error: 
    ‘CancelApplyPluginGeometryTimer’ was not declared in this scope
302:48.40    CancelApplyPluginGeometryTimer();
302:48.40    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
302:48.47 ${PROJECT}/gecko-dev/layout/base/nsPresContext.cpp:2683:3: note: 
    suggested alternative:  mApplyPluginGeometryTimer’
302:48.47    CancelApplyPluginGeometryTimer();
302:48.47    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
302:48.47    mApplyPluginGeometryTimer
302:48.51 ${PROJECT}/gecko-dev/layout/base/nsPresContext.cpp: In member 
    function ‘void nsRootPresContext::ComputePluginGeometryUpdates(nsIFrame*, 
    nsDisplayListBuilder*, nsDisplayList*)’:
302:48.51 ${PROJECT}/gecko-dev/layout/base/nsPresContext.cpp:2712:21: error: 
    ‘nsPluginFrame’ does not name a type; did you mean ‘nsPageFrame’?
302:48.51          static_cast<nsPluginFrame*>(iter.Get()->GetKey(
    )->GetPrimaryFrame());
302:48.51                      ^~~~~~~~~~~~~
302:48.51                      nsPageFrame
These are actually rather odd errors. For example nsPageFrame has been added to the code even though it's not part of the upstream change. It must have been pulled in due to some changes applied later.

I'm not very happy with this, it's all a bit messy. So I've decided to revert my attempt to automatically merge in the patch and go back to a manual approach instead.

This will require a bit of a rework, which unfortunately I don't have time to do tonight. But I'll start on it in the morning and with any luck may be able to get it done tomorrow. Which means that's it for today. But we're definitely getting there!

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
23 May 2024 : Day 241 #
It's a good job I stopped working on this last night when I did. On returning to it this morning with a clearer head it didn't take long to identify that the changes needed to revert the D5005 upstream code changes had already been wrapped up into a patch — patch 0044 to be precise — that I'd not yet applied.

If the patch can be applied cleanly, it could save me a lot of time. So first step is to do a dry run of applying the patch. This won't actually make any changes to the code, but it will tell me whether the patch is going to apply without incident if I attempt it properly.
$ patch --dry-run -d gecko-dev -p1 < rpm/
    0044-Revert-Bug-1445570-Remove-EnsureEventualAfterPaint-t.patch 
checking file dom/base/nsDOMWindowUtils.cpp
Hunk #1 succeeded at 407 (offset 51 lines).
checking file layout/base/nsDocumentViewer.cpp
Hunk #1 succeeded at 1573 (offset -44 lines).
Hunk #2 succeeded at 1718 (offset -45 lines).
Hunk #3 succeeded at 3649 (offset -254 lines).
checking file layout/base/nsPresContext.cpp
Hunk #1 succeeded at 314 (offset 3 lines).
Hunk #2 FAILED at 804.
Hunk #3 succeeded at 2021 (offset 66 lines).
Hunk #4 FAILED at 2039.
Hunk #5 FAILED at 2121.
Hunk #6 FAILED at 2613.
Hunk #7 succeeded at 2716 with fuzz 2 (offset -132 lines).
4 out of 7 hunks FAILED
checking file layout/base/nsPresContext.h
Hunk #1 succeeded at 355 (offset -5 lines).
Hunk #2 succeeded at 912 (offset -4 lines).
Hunk #3 FAILED at 1363.
Hunk #4 FAILED at 1455.
2 out of 4 hunks FAILED
checking file layout/base/nsRefreshDriver.cpp
Hunk #1 FAILED at 1166.
Hunk #2 FAILED at 1239.
Hunk #3 FAILED at 2337.
Hunk #4 FAILED at 2396.
Hunk #5 FAILED at 2404.
5 out of 5 hunks FAILED
checking file layout/base/nsRefreshDriver.h
Hunk #1 FAILED at 507.
1 out of 1 hunk FAILED
checking file layout/style/test/test_restyles_in_smil_animation.html
There are quite a few nice succeeded lines there. But also a disturbing number of FAILED lines as well. That's FAILED in all caps, just in case the word itself doesn't sufficiently convey the damage being done.

The advice Raine would give me in this situation is to apply a three-way merge using git. So that's what I'm going to try.
$ git am --3way ../rpm/
    0044-Revert-Bug-1445570-Remove-EnsureEventualAfterPaint-t.patch
Applying: Revert &quot;Bug 1445570 - Remove EnsureEventualAfterPaint timer. 
    r=tnikkel&quot;
Using index info to reconstruct a base tree...
M       dom/base/nsDOMWindowUtils.cpp
M       layout/base/nsDocumentViewer.cpp
M       layout/base/nsPresContext.cpp
M       layout/base/nsPresContext.h
M       layout/base/nsRefreshDriver.cpp
M       layout/base/nsRefreshDriver.h
Falling back to patching base and 3-way merge...
Auto-merging layout/base/nsRefreshDriver.h
CONFLICT (content): Merge conflict in layout/base/nsRefreshDriver.h
Auto-merging layout/base/nsRefreshDriver.cpp
CONFLICT (content): Merge conflict in layout/base/nsRefreshDriver.cpp
Auto-merging layout/base/nsPresContext.h
CONFLICT (content): Merge conflict in layout/base/nsPresContext.h
Auto-merging layout/base/nsPresContext.cpp
CONFLICT (content): Merge conflict in layout/base/nsPresContext.cpp
Auto-merging layout/base/nsDocumentViewer.cpp
Auto-merging dom/base/nsDOMWindowUtils.cpp
error: Failed to merge in the changes.
Patch failed at 0001 Revert &quot;Bug 1445570 - Remove EnsureEventualAfterPaint 
    timer. r=tnikkel&quot;
hint: Use 'git am --show-current-patch=diff' to see the failed patch
When you have resolved this problem, run &quot;git am --continue&quot;.
If you prefer to skip this patch, run &quot;git am --skip&quot; instead.
To restore the original branch and stop patching, run &quot;git am 
    --abort&quot;.
This has done a better job than the patch did, but there are still some problems which require manual intervention. But after lots of manual changes, comparing the source code against the patch file, eventually I'm able to get it there.
git am --continue
It's still the morning and before I start work. It would be nice to kick off the build before I have to move on to my proper work day. That way there's an outside chance that the build will be completed and ready to test by this evening. So although I'd normally read carefully through the changes to check whether they look sound, I'm just going to kick off the build and hope for the best. Here we go!
sfdk build -d -p --with git_workaround
Sadly as the build trundles on into my work day it hits an error in the code.
27:10.41 In file included from ${PROJECT}/obj-build-mer-qt-xr/dist/include/
    mozilla/dom/DocumentTimeline.h:16,
27:10.41                  from ${PROJECT}/gecko-dev/dom/animation/Animation.cpp:
    15,
27:10.41                  from Unified_cpp_dom_animation0.cpp:2:
27:10.41 ${PROJECT}/gecko-dev/layout/base/nsRefreshDriver.h: In member function 
    ‘bool nsRefreshDriver::AtPendingTransactionLimit()’:
27:10.41 ${PROJECT}/gecko-dev/layout/base/nsRefreshDriver.h:514:12: error: 
    ‘mPendingTransactions’ was not declared in this scope
27:10.41      return mPendingTransactions.Length() == 2;
27:10.42             ^~~~~~~~~~~~~~~~~~~~
27:10.42 ${PROJECT}/gecko-dev/layout/base/nsRefreshDriver.h:514:12: note: 
    suggested alternative: ‘mPendingTransaction’
27:10.42      return mPendingTransactions.Length() == 2;
27:10.42             ^~~~~~~~~~~~~~~~~~~~
27:10.42             mPendingTransaction
27:10.42 ${PROJECT}/gecko-dev/layout/base/nsRefreshDriver.h: In member function 
    ‘bool nsRefreshDriver::TooManyPendingTransactions()’:
27:10.42 ${PROJECT}/gecko-dev/layout/base/nsRefreshDriver.h:517:12: error: 
    ‘mPendingTransactions’ was not declared in this scope
27:10.42      return mPendingTransactions.Length() >= 2;
27:10.43             ^~~~~~~~~~~~~~~~~~~~
27:10.43 ${PROJECT}/gecko-dev/layout/base/nsRefreshDriver.h:517:12: note: 
    suggested alternative: ‘mPendingTransaction’
27:10.43      return mPendingTransactions.Length() >= 2;
27:10.43             ^~~~~~~~~~~~~~~~~~~~
27:10.43             mPendingTransaction
27:24.10 make[4]: *** [${PROJECT}/gecko-dev/config/rules.mk:694: 
    Unified_cpp_dom_animation0.o] Error 1
27:24.10 make[3]: *** [${PROJECT}/gecko-dev/config/recurse.mk:72: dom/animation/
    target-objects] Error 2
These errors are clearly all caused by the patches I applied. Unfortunately I can't fix them while I'm at work, which means looking in to them in the evening. If I can get them done today there's a chance I'll be able to run the build overnight, but let's see.

The error is about use of mPendingTransactions in the nsRefreshDriver class. It's true that the member variable isn't defined there. Should it be?

It's not defined in the code in the upstream changeset on either side of the diff. Nor does it appear in the patch that we applied. Which is a little odd, except that there is reference in all of those to a mPendingTransaction member variable. Note the subtle difference: mPendingTransaction vs. mPendingTransactions (singular vs. plural).

And looking more carefully at the patch we can see the reason. Here are the changes that the patch tried to apply:
diff --git a/layout/base/nsRefreshDriver.h b/layout/base/nsRefreshDriver.h
index 07feab12e079..deec935f25f4 100644
--- a/layout/base/nsRefreshDriver.h
+++ b/layout/base/nsRefreshDriver.h
@@ -507,12 +507,7 @@ class nsRefreshDriver final : public mozilla::layers::
    TransactionIdAllocator,
   RefPtr<nsRefreshDriver> mRootRefresh;
 
   // The most recently allocated transaction id.
-  TransactionId mNextTransactionId;
-  // This number is mCompletedTransaction + (pending transaction count).
-  // When we revoke a transaction id, we revert this number (since it's
-  // no longer outstanding), but not mNextTransactionId (since we don't
-  // want to reuse the number).
-  TransactionId mOutstandingTransactionId;
+  TransactionId mPendingTransaction;
   // The most recently completed transaction id.
   TransactionId mCompletedTransaction;
But this failed and I made a manual intervention; one which I thought looked pretty similar, but which in fact included a crucial destructive change:
diff --git a/layout/base/nsRefreshDriver.h b/layout/base/nsRefreshDriver.h
index 29c595ba5ba8..b9a0fa4bb2ed 100644
--- a/layout/base/nsRefreshDriver.h
+++ b/layout/base/nsRefreshDriver.h
@@ -528,8 +528,9 @@ class nsRefreshDriver final : public mozilla::layers::
    TransactionIdAllocator,
   RefPtr<nsRefreshDriver> mRootRefresh;
 
   // The most recently allocated transaction id.
-  TransactionId mNextTransactionId;
-  AutoTArray<TransactionId, 3> mPendingTransactions;
+  TransactionId mPendingTransaction;
+  // The most recently completed transaction id.
+  TransactionId mCompletedTransaction;
As you can see, my change removed the mPendingTransactions (plural) variable. The good news is that after applying the patch this mPendingTransactions variable is only used in two places:
  bool AtPendingTransactionLimit() {
    return mPendingTransactions.Length() == 2;
  }
  bool TooManyPendingTransactions() {
    return mPendingTransactions.Length() >= 2;
  }
Neither of these two methods is used anywhere, so it should be safe to remove them. On the other hand, it seems that by applying the patch I've trampled over rather more changes than I'd anticipated. So while removing these two methods to get the build to pass is a simple solution, I'm also going to need to spend some time checking the other changes.

But, having worked through the patch and read through it side-by-side with the changes made to the ESR 91 code, it all looks sensible, albeit rather messy. It looks like the ESR 91 code now matches the code that would have been expected had the patch been applied to ESR 78 as a base.

I've therefore set the build running again. If it builds that'll be a good sign. The good news is also that once it's built, due to the work I've been doing over the last few days, it should be really easy to check whether this change has had the desired effect. All I need to do is check whether the OnFirstPaint() method is being called. If it is, then that's a good sign things have worked out.

Since it's building now I can't do much else until it completes, so I'm going to call it a night. By the morning it should be done and we can continue with some testing.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
22 May 2024 : Day 240 #
We're continuing to dig deeper into the first paint trigger code today. I've spent a couple of days digging around this already now and it's turning out to be a surprisingly complex investigation. I'm hoping the reward will be worth it, and to be honest, I think it will be: WebView rendering isn't going to work without getting to the bottom of this.

Yesterday we saw that the crucial OnFirstPaint() call was being made from EmbedLiteViewParent::RecvOnFirstPaint(). That's because the QMozViewPrivate class implements the EmbedLiteViewListener interface like this:
class QMozViewPrivate : public QObject,
    public mozilla::embedlite::EmbedLiteViewListener
{
[...]
This listener is set up to get called by EmbedLiteViewParent on receipt of the OnFirstPaint message:
mozilla::ipc::IPCResult EmbedLiteViewParent::RecvOnFirstPaint(const int32_t &aX,
                                                              const int32_t &aY)
{
  LOGT();
  NS_ENSURE_TRUE(mView && !mViewAPIDestroyed, IPC_OK());

  mView->GetListener()->OnFirstPaint(aX, aY);
  return IPC_OK();
}
This is called at the far end of a channel after receipt of a message sent potentially across a thread or process boundary. Sending things across process boundaries like this messes up the backtraces but are otherwise a pretty commonplace in Gecko. We just need to find out where the message is being sent from. Searching the code for where the Send happens is usually the way to go with this, and here it's no different. These send and receive messages are auto-generated from the interface files, so they always follow a strict pattern. A search for a call to SendOnFirstPaint() therefore throws up the following in the ESR 78 code:
NS_IMETHODIMP
EmbedLiteViewChild::OnFirstPaint(int32_t aX, int32_t aY)
{
  if (mDOMWindow) {
    nsCOMPtr<nsIDocShell> docShell = mDOMWindow->GetDocShell();
    if (docShell) {
      RefPtr<PresShell> presShell = docShell->GetPresShell();
      if (presShell) {
        nscolor bgcolor = presShell->GetCanvasBackground();
        Unused << SendSetBackgroundColor(bgcolor);
      }
    }
  }

  return SendOnFirstPaint(aX, aY) ? NS_OK : NS_ERROR_FAILURE;
}
We want to find out where this is called and for that we can use the debugger. Using it gives us the following backtrace.
Thread 7 &quot;GeckoWorkerThre&quot; hit Breakpoint 2, non-virtual thunk to 
    mozilla::embedlite::EmbedLiteViewChild::OnFirstPaint(int, int) ()
    at mobile/sailfishos/embedshared/EmbedLiteViewChild.h:57
57        NS_DECL_NSIEMBEDBROWSERCHROMELISTENER
(gdb) bt
#0  non-virtual thunk to mozilla::embedlite::EmbedLiteViewChild::OnFirstPaint(
    int, int) ()
    at mobile/sailfishos/embedshared/EmbedLiteViewChild.h:57
#1  0x0000007fbb306f38 in WebBrowserChrome::HandleEvent (this=0x7f8ccba130, 
    aEvent=0x7f8dd57a60)
    at mobile/sailfishos/utils/WebBrowserChrome.cpp:443
#2  0x0000007fb9d4b4a0 in mozilla::EventListenerManager::HandleEventSubType (
    this=this@entry=0x7f8ccbd3d0, aListener=<optimized out>, 
    aListener@entry=0x7f8cdb7d78, aDOMEvent=0x7f8dd57a60, 
    aCurrentTarget=<optimized out>, aCurrentTarget@entry=0x7eac138da0)
    at dom/events/EventListenerManager.cpp:1087
#3  0x0000007fb9d4e984 in mozilla::EventListenerManager::HandleEventInternal (
    this=0x7f8ccbd3d0, aPresContext=0x7f8cf82940, aEvent=0x7f8d5c7e80, 
    aDOMEvent=0x7fa7972858, aCurrentTarget=<optimized out>, 
    aEventStatus=<optimized out>, aItemInShadowTree=<optimized out>)
    at dom/events/EventListenerManager.cpp:1279
#4  0x0000007fb9d4f604 in mozilla::EventTargetChainItem::HandleEventTargetChain 
    (aChain=..., aVisitor=..., aCallback=aCallback@entry=0x0, aCd=...)
    at dom/events/EventDispatcher.cpp:558
#5  0x0000007fb9d50eb8 in mozilla::EventDispatcher::Dispatch (
    aTarget=<optimized out>, aPresContext=0x7f8cf82940, aEvent=0x7f8d5c7e80, 
    aDOMEvent=0x7f8dd57a60, aEventStatus=<optimized out>, aCallback=0x0, 
    aTargets=<optimized out>)
    at dom/events/EventDispatcher.cpp:1055
#6  0x0000007fba7f8f50 in nsPresContext::FireDOMPaintEvent (this=0x7f8cf82940, 
    aList=aList@entry=0x7ecc05ae78, aTransactionId=..., aTimeStamp=...)
    at obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:847
#7  0x0000007fba7f9090 in DelayedFireDOMPaintEvent::Run (this=0x7ecc05ae50)
    at obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#8  0x0000007fb917d1a8 in nsContentUtils::RemoveScriptBlocker ()
    at dom/base/nsContentUtils.cpp:5407
#9  0x0000007fb917d218 in nsContentUtils::RemoveScriptBlocker ()
    at dom/base/nsContentUtils.cpp:5386
#10 0x0000007fba7fd020 in nsAutoScriptBlocker::~nsAutoScriptBlocker (
    this=<synthetic pointer>, __in_chrg=<optimized out>)
    at dom/base/nsContentUtils.h:3461
#11 nsRootPresContext::<lambda()>::operator() (__closure=0x7f8dd1cfe8)
    at layout/base/nsPresContext.cpp:2825
#12 mozilla::layers::GenericNamedTimerCallback<nsRootPresContext::
    EnsureEventualDidPaintEvent(nsPresContext::TransactionId)::<lambda()> >::
    Notify(nsITimer *) (this=0x7f8dd1cfd0) at obj-build-mer-qt-xr/dist/include/
    mozilla/layers/APZThreadUtils.h:81
#13 0x0000007fb84653e0 in nsTimerImpl::Fire (this=0x7f8cecfb50, aGeneration=1)
    at obj-build-mer-qt-xr/dist/include/mozilla/TimeStamp.h:136
#14 0x0000007fb84654c0 in nsTimerEvent::Run (this=0x7ed4001ab0)
    at obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#15 0x0000007fb845e4c4 in nsThread::ProcessNextEvent (aResult=0x7fa7972d77, 
    aMayWait=<optimized out>, this=0x7f8c02f610)
    at xpcom/threads/nsThread.cpp:1211
[...]
#32 0x0000007fbe70b89c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/
    clone.S:78
(gdb) 
That's a long backtrace but the magic appears to be happening inside a call to DelayedFireDOMPaintEvent() and we can check for this by searching through the code. It gets set up in nsPresContext::NotifyDidPaintForSubtree() inside nsPresContext.cpp where we have the following code, which is basically the same in both ESR 78 and ESR 91. This is a long piece of code, but the important parts are the two calls to DelayedFireDOMPaintEvent() which you can see around the middle and towards the end of the method:
void nsPresContext::NotifyDidPaintForSubtree(
    TransactionId aTransactionId, const mozilla::TimeStamp& aTimeStamp) {
  if (mFirstContentfulPaintTransactionId && !mHadContentfulPaintComposite) {
    if (aTransactionId >= *mFirstContentfulPaintTransactionId) {
      mHadContentfulPaintComposite = true;
      RefPtr<nsDOMNavigationTiming> timing = mDocument->GetNavigationTiming();
      if (timing && !aTimeStamp.IsNull()) {
        timing->NotifyContentfulPaintForRootContentDocument(aTimeStamp);
      }
    }
  }

  if (IsRoot()) {
    static_cast<nsRootPresContext*>(this)->CancelDidPaintTimers(aTransactionId);

    if (mTransactions.IsEmpty()) {
      return;
    }
  }

  if (!PresShell()->IsVisible() && mTransactions.IsEmpty()) {
    return;
  }

  // Non-root prescontexts fire MozAfterPaint to all their descendants
  // unconditionally, even if no invalidations have been collected. This is
  // because we don't want to eat the cost of collecting invalidations for
  // every subdocument (which would require putting every subdocument in its
  // own layer).

  bool sent = false;
  uint32_t i = 0;
  while (i < mTransactions.Length()) {
    if (mTransactions[i].mTransactionId <= aTransactionId) {
      if (!mTransactions[i].mInvalidations.IsEmpty()) {
        nsCOMPtr<nsIRunnable> ev = new DelayedFireDOMPaintEvent(
            this, &mTransactions[i].mInvalidations,
            mTransactions[i].mTransactionId, aTimeStamp);
        nsContentUtils::AddScriptRunner(ev);
        sent = true;
      }
      mTransactions.RemoveElementAt(i);
    } else {
      i++;
    }
  }

  if (!sent) {
    nsTArray<nsRect> dummy;
    nsCOMPtr<nsIRunnable> ev =
        new DelayedFireDOMPaintEvent(this, &dummy, aTransactionId, aTimeStamp);
    nsContentUtils::AddScriptRunner(ev);
  }
As anticipated there's no hit for this method on ESR 91, but on ESR 78 we can get a backtrace:
(gdb) break DelayedFireDOMPaintEvent::DelayedFireDOMPaintEvent
Breakpoint 8 at 0x7fba7fcc8c: DelayedFireDOMPaintEvent::
    DelayedFireDOMPaintEvent. (2 locations)
(gdb) r
[...]
Thread 7 &quot;GeckoWorkerThre&quot; hit Breakpoint 8, 0x0000007fba7fcc8c in 
    DelayedFireDOMPaintEvent::DelayedFireDOMPaintEvent (aTimeStamp=..., 
    aTransactionId=..., aList=<optimized out>, aPresContext=<optimized out>, 
    this=<optimized out>)
    at layout/base/nsPresContext.h:173
173         return mPresShell;
(gdb) bt
#0  0x0000007fba7fcc8c in DelayedFireDOMPaintEvent::DelayedFireDOMPaintEvent (
    aTimeStamp=..., aTransactionId=..., aList=<optimized out>, 
    aPresContext=<optimized out>, this=<optimized out>)
    at layout/base/nsPresContext.h:173
#1  nsPresContext::NotifyDidPaintForSubtree (this=0x7f8ce6dda0, 
    aTransactionId=..., aTimeStamp=...)
    at layout/base/nsPresContext.cpp:2082
#2  0x0000007fba7fd01c in nsRootPresContext::<lambda()>::operator() (
    __closure=0x7f8dc1af58)
    at obj-build-mer-qt-xr/dist/include/mozilla/TimeStamp.h:400
#3  mozilla::layers::GenericNamedTimerCallback<nsRootPresContext::
    EnsureEventualDidPaintEvent(nsPresContext::TransactionId)::<lambda()> >::
    Notify(nsITimer *) (this=0x7f8dc1af40) at obj-build-mer-qt-xr/dist/include/
    mozilla/layers/APZThreadUtils.h:81
#4  0x0000007fb84653e0 in nsTimerImpl::Fire (this=0x7f8c72bfd0, aGeneration=1)
    at obj-build-mer-qt-xr/dist/include/mozilla/TimeStamp.h:136
#5  0x0000007fb84654c0 in nsTimerEvent::Run (this=0x7ecc001ab0)
    at obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#6  0x0000007fb845e4c4 in nsThread::ProcessNextEvent (aResult=0x7fa7972d77, 
    aMayWait=<optimized out>, this=0x7f8c02f610)
    at xpcom/threads/nsThread.cpp:1211
#7  nsThread::ProcessNextEvent (this=0x7f8c02f610, aMayWait=<optimized out>, 
    aResult=0x7fa7972d77)
    at xpcom/threads/nsThread.cpp:1047
#8  0x0000007fb845cdcc in NS_ProcessPendingEvents (
    aThread=aThread@entry=0x7f8c02f610, aTimeout=10)
    at xpcom/threads/nsThreadUtils.cpp:449
#9  0x0000007fba63c430 in nsBaseAppShell::NativeEventCallback (
    this=0x7f8c3e6ac0)
    at widget/nsBaseAppShell.cpp:87
#10 0x0000007fba64f0bc in nsAppShell::event (this=<optimized out>, e=<optimized 
    out>)
    at widget/qt/nsAppShell.cpp:75
[...]
#23 0x0000007fbe70b89c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/
    clone.S:78
(gdb) 
Frustratingly it's called within a lambda function, which is messing up the backtrace yet again. So we'll have to resort to digging through the code again. There we can see that the NotifyDidPaintForSubtree() method gets called in a lambda function that's triggered inside itself. But this can't be the one we're after because we'd have seen the hit on the containing method first.

But there's another case where a lambda function gets configured to then call nsPresContext::NotifyDidPaintForSubtree() and that's in the following method from nsPresContext.cpp (this is the ESR 78 version):
void
nsRootPresContext::EnsureEventualDidPaintEvent(TransactionId aTransactionId)
{
  for (NotifyDidPaintTimer& t : mNotifyDidPaintTimers) {
    if (t.mTransactionId == aTransactionId) {
      return;
    }
  }

  nsCOMPtr<nsITimer> timer;
  RefPtr<nsRootPresContext> self = this;
  nsresult rv = NS_NewTimerWithCallback(
    getter_AddRefs(timer),
    NewNamedTimerCallback([self, aTransactionId](){
      nsAutoScriptBlocker blockScripts;
      self->NotifyDidPaintForSubtree(aTransactionId);
     }, &quot;NotifyDidPaintForSubtree&quot;), 100, nsITimer::TYPE_ONE_SHOT,
    Document()->EventTargetFor(TaskCategory::Other));

  if (NS_SUCCEEDED(rv)) {
    NotifyDidPaintTimer* t = mNotifyDidPaintTimers.AppendElement();
    t->mTransactionId = aTransactionId;
    t->mTimer = timer;
  }
}
That's the only other case I can see, so we could check to find out how this EnsureEventualDidPaintEvent() method gets called. However as I was in the process of configuring the debugger I noticed that this method doesn't exist in ESR 91. That's unexpected. In ESR 78 it's called from nsPresContext::NotifyInvalidation() like this:
void nsPresContext::NotifyInvalidation(TransactionId aTransactionId,
                                       const nsRect& aRect) {
  MOZ_ASSERT(GetContainerWeak(), &quot;Invalidation in detached pres 
    context&quot;);

  // If there is no paint event listener, then we don't need to fire
  // the asynchronous event. We don't even need to record invalidation.
  // MayHavePaintEventListener is pretty cheap and we could make it
  // even cheaper by providing a more efficient
  // nsPIDOMWindow::GetListenerManager.

  nsPresContext* pc;
  for (pc = this; pc; pc = pc->GetParentPresContext()) {
    TransactionInvalidations* transaction =
        pc->GetInvalidations(aTransactionId);
    if (transaction) {
      break;
    } else {
      transaction = pc->mTransactions.AppendElement();
      transaction->mTransactionId = aTransactionId;
    }
  }
  if (!pc) {
    nsRootPresContext* rpc = GetRootPresContext();
    if (rpc) {
      rpc->EnsureEventualDidPaintEvent(aTransactionId);
    }
  }

  TransactionInvalidations* transaction = GetInvalidations(aTransactionId);
  MOZ_ASSERT(transaction);
  transaction->mInvalidations.AppendElement(aRect);
}
But as we can see, the call to EnsureEventualDidPaintEvent() has been removed in the ESR 91 version of the same method:
void nsPresContext::NotifyInvalidation(TransactionId aTransactionId,
                                       const nsRect& aRect) {
  MOZ_ASSERT(GetContainerWeak(), &quot;Invalidation in detached pres 
    context&quot;);

  // If there is no paint event listener, then we don't need to fire
  // the asynchronous event. We don't even need to record invalidation.
  // MayHavePaintEventListener is pretty cheap and we could make it
  // even cheaper by providing a more efficient
  // nsPIDOMWindow::GetListenerManager.

  nsPresContext* pc;
  for (pc = this; pc; pc = pc->GetParentPresContext()) {
    TransactionInvalidations* transaction =
        pc->GetInvalidations(aTransactionId);
    if (transaction) {
      break;
    } else {
      transaction = pc->mTransactions.AppendElement();
      transaction->mTransactionId = aTransactionId;
    }
  }

  TransactionInvalidations* transaction = GetInvalidations(aTransactionId);
  MOZ_ASSERT(transaction);
  transaction->mInvalidations.AppendElement(aRect);
}
I wasn't expecting that. For completeness, the backtrace from ESR 78 confirms that this is indeed the correct method to be looking at, at least on the ESR 78 side:
Thread 7 &quot;GeckoWorkerThre&quot; hit Breakpoint 9, nsRootPresContext::
    EnsureEventualDidPaintEvent (this=0x7f8ce70330, 
    aTransactionId=aTransactionId@entry=...)
    at layout/base/nsPresContext.cpp:2813
2813    {
(gdb) bt
#0  nsRootPresContext::EnsureEventualDidPaintEvent (this=0x7f8ce70330, 
    aTransactionId=aTransactionId@entry=...)
    at layout/base/nsPresContext.cpp:2813
#1  0x0000007fba7fb480 in nsPresContext::NotifyInvalidation (
    this=this@entry=0x7f8ce70330, aTransactionId=..., aRect=...)
    at layout/base/nsPresContext.cpp:1964
#2  0x0000007fba7fb5d8 in nsPresContext::NotifyInvalidation (
    this=this@entry=0x7f8ce70330, aTransactionId=..., aRect=...)
    at layout/base/nsPresContext.cpp:1927
#3  0x0000007fbaa4dbe4 in nsDisplayList::PaintRoot (
    this=this@entry=0x7fa7971f48, aBuilder=aBuilder@entry=0x7fa7970170, 
    aCtx=aCtx@entry=0x0, 
    aFlags=aFlags@entry=13) at layout/painting/nsDisplayList.cpp:2526
#4  0x0000007fba7d656c in nsLayoutUtils::PaintFrame (
    aRenderingContext=aRenderingContext@entry=0x0, 
    aFrame=aFrame@entry=0x7f8dba5c40, aDirtyRegion=..., 
    aBackstop=aBackstop@entry=4294967295, 
    aBuilderMode=aBuilderMode@entry=nsDisplayListBuilderMode::Painting, 
    aFlags=<optimized out>, 
    aFlags@entry=(nsLayoutUtils::PaintFrameFlags::WidgetLayers | nsLayoutUtils::
    PaintFrameFlags::ExistingTransaction | nsLayoutUtils::PaintFrameFlags::
    NoComposite)) at layout/base/nsLayoutUtils.cpp:4168
#5  0x0000007fba79a258 in mozilla::PresShell::Paint (
    this=this@entry=0x7f8d2dd790, aViewToPaint=aViewToPaint@entry=0x7f8d210390, 
    aDirtyRegion=..., 
    aFlags=aFlags@entry=mozilla::PaintFlags::PaintLayers)
    at layout/base/PresShell.cpp:6254
#6  0x0000007fba6110ac in nsViewManager::ProcessPendingUpdatesPaint (
    this=this@entry=0x7f8d210df0, aWidget=aWidget@entry=0x7f8d210e60)
    at obj-build-mer-qt-xr/dist/include/nsTArray.h:554
#7  0x0000007fba6113c0 in nsViewManager::ProcessPendingUpdatesForView (
    this=this@entry=0x7f8d210df0, aView=<optimized out>, 
    aFlushDirtyRegion=aFlushDirtyRegion@entry=true) at view/nsViewManager.cpp:
    395
#8  0x0000007fba611b50 in nsViewManager::ProcessPendingUpdates (
    this=0x7f8d210df0)
    at view/nsViewManager.cpp:1018
#9  nsViewManager::ProcessPendingUpdates (this=<optimized out>)
    at view/nsViewManager.cpp:1004
#10 0x0000007fba611c0c in nsViewManager::WillPaintWindow (
    this=this@entry=0x7f8d210df0, aWidget=0x7f8d210e60)
    at view/nsViewManager.cpp:664
#11 0x0000007fba611c88 in nsView::WillPaintWindow (this=<optimized out>, 
    aWidget=<optimized out>)
    at view/nsView.cpp:1048
#12 0x0000007fbb2f6344 in mozilla::embedlite::PuppetWidgetBase::Invalidate (
    aRect=..., this=0x7f8d210e60)
    at mobile/sailfishos/embedshared/PuppetWidgetBase.cpp:268
#13 mozilla::embedlite::PuppetWidgetBase::Invalidate (this=0x7f8d210e60, 
    aRect=...)
    at mobile/sailfishos/embedshared/PuppetWidgetBase.cpp:253
#14 0x0000007fbb2ef768 in mozilla::embedlite::EmbedLitePuppetWidget::Show (
    this=0x7f8d210e60, aState=<optimized out>)
    at mobile/sailfishos/embedshared/EmbedLitePuppetWidget.cpp:97
#15 0x0000007fba7f03e8 in nsDocumentViewer::Show (this=0x7f8d15df90)
    at layout/base/nsDocumentViewer.cpp:2132
#16 0x0000007fba7f8b7c in nsPresContext::EnsureVisible (this=0x7f8ce70330)
    at obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:857
#17 0x0000007fba79c114 in mozilla::PresShell::UnsuppressAndInvalidate (
    this=0x7f8d2dd790)
    at obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#18 0x0000007fba7a1638 in mozilla::PresShell::ProcessReflowCommands (
    this=this@entry=0x7f8d2dd790, aInterruptible=aInterruptible@entry=false)
    at layout/base/PresShell.cpp:9712
#19 0x0000007fba7a0630 in mozilla::PresShell::DoFlushPendingNotifications (
    this=this@entry=0x7f8d2dd790, aFlush=...)
    at layout/base/PresShell.cpp:4209
[...]
#33 0x0000007fba772840 in mozilla::VsyncRefreshDriverTimer::
    RefreshDriverVsyncObserver::NotifyVsync (this=0x7f8cc9de00, aVsync=...)
    at layout/base/nsRefreshDriver.cpp:644
[...]
#53 0x0000007fbe70b89c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/
    clone.S:78
(gdb)
Alright, let's rewind back to the top again. We've now reached the point where we know how the OnFirstPaint() method is getting called in ESR 78. And we can see why it's not happening the same way in ESR 91, viz. that the code to call it has been removed from the NotifyInvalidation() method. The next step is to find out why it's been removed and what should be triggering the OnFirstPaint() instead (assuming there is still something).

The best way to investigate this further will be to look a the Bugzilla ticket and associated changeset where this change was made upstream. We've searched for these kinds of changes many times before and now is no different. Here's the log for the change:
$ git log -S EnsureEventualDidPaintEvent -1 -- layout/base/nsPresContext.cpp
commit 1deccd7ac14706ad1849343cfb2b93df191a1c42
Author: Mantaroh Yoshinaga <mantaroh@gmail.com>
Date:   Thu Sep 6 02:21:39 2018 +0000

    Bug 1445570 - Remove EnsureEventualAfterPaint timer. r=tnikkel
    
    MozReview-Commit-ID: C7WICJ5Q0ES
    
    Differential Revision: https://phabricator.services.mozilla.com/D5005
    
    --HG--
    extra : moz-landing-system : lando
After reading through Bug 1445570 and the associated code changes my conclusion is that unravelling this is going to be too complicated for my brain in its current state, so best to let it sink in overnight. I'll pick this up again tomorrow where we'll try to find out what's replacing the trigger that we need in the updated ESR 91 code.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
21 May 2024 : Day 239 #
I got a bit carried away yesterday delving into the reasons why QuickMozView::updatePaintNode() was failing to fire on ESR 91. My conclusion was that it came down to the mIsPainted flag, part of the QMozViewPrivate class and which was remaining stubbornly set to false when running ESR 91.

While in this state the invalidTexture flag inside updatePaintNode() never gets set to false since it's defined like this:
    const bool invalidTexture = !mComposited
            || !d->mIsPainted
            || !d->mViewInitialized
            || !d->mHasCompositor
            || !d->mContext->registeredWindow()
            || !d->mMozWindow;
Notice how if any single value in this list is set to false then the combined value of invalidTexture will be set to true. The consequence of this is that the following clause is always executed:
    if (!mTexture && invalidTexture) {
        QSGSimpleRectNode *node = static_cast<QSGSimpleRectNode *>(oldNode);
        if (!node) {
            node = new QSGSimpleRectNode;
        }
        node->setColor(d->mBackgroundColor);
        node->setRect(boundingRect);

        return node;
    }
The return at the end of this means the updatePaintNode() method returns early. This means that the code beyond the clause never gets executed and crucially this section is never entered:
    if (!node) {
        QMozExtTexture * const texture = new QMozExtTexture;
        mTexture = texture;

        connect(texture, &QMozExtTexture::getPlatformImage, d->mMozWindow, 
            &QMozWindow::getPlatformImage, Qt::DirectConnection);

        node = new MozExtMaterialNode;

        node->setTexture(mTexture);
    }
Here we can see both mTexture and the MozExtMaterialNode object being created. That's what we need for the rendering to proceed. So if this portion of code is never executed, we'll never get the rendered page to appear.

That all sounds very plausible and comprehensible, but it begs the question: "why is mIsPainted never being set to true?". That's what I plan to find out over the coming days.

Unsurprisingly we can see the initial value it's set to in the QMozViewPrivate constructor, where it's initially set to false:
QMozViewPrivate::QMozViewPrivate(IMozQViewIface *aViewIface, QObject *publicPtr)
    : mViewIface(aViewIface)
[...]
    , mIsPainted(false)
[...]
It's also set to false after a call to QMozViewPrivate::reset():
QMozViewPrivate::reset()
{
    if (mIsPainted) {
        mIsPainted = false;
        mViewIface->firstPaint(-1, -1);
    }
[...]
But the important part is where it gets set to true and that only happens in one place; here in the OnFirstPaint() method:
void QMozViewPrivate::OnFirstPaint(int32_t aX, int32_t aY)
{
    mIsPainted = true;
    mViewIface->firstPaint(aX, aY);
}
This method doesn't get explicitly called anywhere in the QtMozEmbed code, so to find out where it's actually getting used I'm going to drop to the debugger on my ESR 78 device.
(gdb) break QMozViewPrivate::OnFirstPaint
Breakpoint 13 at 0x7fbfbf56b8: file qmozview_p.cpp, line 1113.
(gdb) r
[...]
Thread 1 &quot;harbour-webview&quot; hit Breakpoint 13, QMozViewPrivate::
    OnFirstPaint (this=0x55557b08f0, aX=0, aY=0) at qmozview_p.cpp:1113
1113        mIsPainted = true;
(gdb) bt                         
#0  QMozViewPrivate::OnFirstPaint (this=0x55557b08f0, aX=0, aY=0) at 
    qmozview_p.cpp:1113
#1  0x0000007fbb2f5d90 in ?? () from /usr/lib64/xulrunner-qt5-78.15.1/libxul.so
#2  0x0000007fb8932f50 in ?? () from /usr/lib64/xulrunner-qt5-78.15.1/libxul.so
#3  0x0000007fb8920238 in ?? () from /usr/lib64/xulrunner-qt5-78.15.1/libxul.so
#4  0x0000007fb88428cc in ?? () from /usr/lib64/xulrunner-qt5-78.15.1/libxul.so
#5  0x0000007fb88482d8 in ?? () from /usr/lib64/xulrunner-qt5-78.15.1/libxul.so
#6  0x0000007fb8849d58 in ?? () from /usr/lib64/xulrunner-qt5-78.15.1/libxul.so
#7  0x0000007fb8809d48 in ?? () from /usr/lib64/xulrunner-qt5-78.15.1/libxul.so
#8  0x0000007fb880e740 in ?? () from /usr/lib64/xulrunner-qt5-78.15.1/libxul.so
#9  0x0000007fbfbed924 in MessagePumpQt::HandleDispatch (this=0x555578ef30) at 
    qmessagepump.cpp:63
#10 0x0000007fbfbeda9c in MessagePumpQt::event (this=<optimized out>, 
    e=<optimized out>) at qmessagepump.cpp:51
#11 0x0000007fbebe1144 in QCoreApplication::notify(QObject*, QEvent*) () from /
    usr/lib64/libQt5Core.so.5
#12 0x0000007fbebe12e8 in QCoreApplication::notifyInternal2(QObject*, QEvent*) (
    ) from /usr/lib64/libQt5Core.so.5
#13 0x0000007fbebe36b8 in QCoreApplicationPrivate::sendPostedEvents(QObject*, 
    int, QThreadData*) () from /usr/lib64/libQt5Core.so.5
[...]
#21 0x0000005555557bcc in main ()
(gdb) 
Well that's an unhelpful mess. But it is at least getting called from inside the Gecko code and that in itself is both helpful to know and a good thing (I was concerned it might be called from inside Qt which would have made it considerably harder to track down; the fact it's coming from Gecko means we can skip a layer of Qt indirection and get closer to the source of the problem more quickly).

But as you can see the Gecko debug symbols are messed up: we can see eight calls to methods inside libxul in a row, but it's not telling us the names of the methods.

So I'm going to reinstall the packages, including the debug packages, to get the symbols back. Before the gap I was using a custom build with extra debug output and exporting of textures, but I don't need any of that any more so I can just reinstall the packages from the official repositories. Much easier and quicker than rebuilding the packages locally and installing them that way:
# zypper install --force xulrunner-qt5-78.15.1+git33.2-1.21.1.jolla 
    xulrunner-qt5-debuginfo-78.15.1+git33.2-1.21.1.jolla xul
runner-qt5-debugsource-78.15.1+git33.2-1.21.1.jolla 
    xulrunner-qt5-misc-78.15.1+git33.2-1.21.1.jolla
Loading repository data...
Reading installed packages...
Forcing installation of 'xulrunner-qt5-78.15.1+git33.2-1.21.1.jolla.aarch64' 
    from repository 'jolla'.
Forcing installation of 
    'xulrunner-qt5-debuginfo-78.15.1+git33.2-1.21.1.jolla.aarch64' from 
    repository 'jolla'.
Forcing installation of 
    'xulrunner-qt5-debugsource-78.15.1+git33.2-1.21.1.jolla.aarch64' from 
    repository 'jolla'.
Forcing installation of 
    'xulrunner-qt5-misc-78.15.1+git33.2-1.21.1.jolla.aarch64' from repository 
    'jolla'.
[...]
2 packages to downgrade, 2 to reinstall, 2  to change vendor.
Overall download size: 758.8 MiB. Already cached: 0 B. After the operation, 2.4 
    GiB will be freed.
Continue? [y/n/v/...? shows all options] (y): y
[...]
Okay, with the debug symbols now properly aligned we can try getting the backtrace again, but hopefully this time with a bit more helpful detail.
Thread 1 &quot;harbour-webview&quot; hit Breakpoint 1, QMozViewPrivate::
    OnFirstPaint (this=0x55555cb830, aX=0, aY=0) at qmozview_p.cpp:1113
1113        mIsPainted = true;
(gdb) bt
#0  QMozViewPrivate::OnFirstPaint (this=0x55555cb830, aX=0, aY=0) at 
    qmozview_p.cpp:1113
#1  0x0000007fbb2f5d90 in mozilla::embedlite::EmbedLiteViewParent::
    RecvOnFirstPaint (this=0x55555d9d70, aX=@0x7fffffea18: 0, aY=@0x7fffffea28: 
    0)
    at mobile/sailfishos/embedshared/EmbedLiteViewParent.cpp:237
#2  0x0000007fb8932f50 in mozilla::embedlite::PEmbedLiteViewParent::
    OnMessageReceived (this=0x55555d9d70, msg__=...) at 
    PEmbedLiteViewParent.cpp:1600
#3  0x0000007fb8920238 in mozilla::embedlite::PEmbedLiteAppParent::
    OnMessageReceived (this=<optimized out>, msg__=...)
    at obj-build-mer-qt-xr/dist/include/mozilla/ipc/ProtocolUtils.h:866
#4  0x0000007fb88428cc in mozilla::ipc::MessageChannel::DispatchAsyncMessage (
    this=this@entry=0x7f8c888578, aProxy=aProxy@entry=0x555560d1d0, aMsg=...)
    at obj-build-mer-qt-xr/dist/include/mozilla/ipc/ProtocolUtils.h:866
#5  0x0000007fb88482d8 in mozilla::ipc::MessageChannel::DispatchMessage (
    this=0x7f8c888578, aMsg=...)
    at ipc/glue/MessageChannel.cpp:2100
#6  0x0000007fb8849b50 in mozilla::ipc::MessageChannel::RunMessage (
    this=<optimized out>, aTask=...)
    at ipc/glue/MessageChannel.cpp:1959
#7  0x0000007fb8849d58 in mozilla::ipc::MessageChannel::MessageTask::Run (
    this=0x7f8e127650)
    at obj-build-mer-qt-xr/dist/include/mozilla/ipc/MessageChannel.h:610
#8  0x0000007fb8809d48 in MessageLoop::RunTask (aTask=..., this=0x55557c5200)
    at ipc/chromium/src/base/message_loop.cc:487
#9  MessageLoop::DeferOrRunPendingTask (pending_task=..., this=0x55557c5200)
    at ipc/chromium/src/base/message_loop.cc:478
#10 MessageLoop::DeferOrRunPendingTask (this=0x55557c5200, pending_task=...)
    at ipc/chromium/src/base/message_loop.cc:476
#11 0x0000007fb880e740 in MessageLoop::DoWork (this=<optimized out>)
    at ipc/chromium/src/base/message_loop.cc:551
#12 MessageLoop::DoWork (this=0x55557c5200) at ipc/chromium/src/base/
    message_loop.cc:530
#13 0x0000007fbfbed924 in MessagePumpQt::HandleDispatch (this=0x55557a6f40) at 
    qmessagepump.cpp:63
#14 0x0000007fbfbeda9c in MessagePumpQt::event (this=<optimized out>, 
    e=<optimized out>) at qmessagepump.cpp:51
#15 0x0000007fbebe1144 in QCoreApplication::notify(QObject*, QEvent*) () from /
    usr/lib64/libQt5Core.so.5
#16 0x0000007fbebe12e8 in QCoreApplication::notifyInternal2(QObject*, QEvent*) (
    ) from /usr/lib64/libQt5Core.so.5
#17 0x0000007fbebe36b8 in QCoreApplicationPrivate::sendPostedEvents(QObject*, 
    int, QThreadData*) () from /usr/lib64/libQt5Core.so.5
[...]
#25 0x0000005555557bcc in main ()
(gdb) 
That's much better and gives us a much clearer idea of where OnFirstPaint() is getting called. But as we can see from this it's actually being triggered by receipt of a message. The next step will therefore be to find out where the OnFirstPaint message is being sent from.

For completeness I also performed the same checks on ESR 91. You'll not be surprised to hear that the QMozViewPrivate::OnFirstPaint() method is never called in the newer build.

The ESR 78 backtrace tells us what should be happening on ESR 91. So our task now is to find out why it's not. This takes us a step forwards and gives us something to look into further; we'll pick this avenue of investigation up again tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
20 May 2024 : Day 238 #
It's been an exciting and packed day of Jolla and Sailfish OS news today with the announcement of not one but two new Jolla devices, plus some significant changes to the way Sailfish OS will be developed and funded in the future. This isn't the place to dwell on these topics, but I am excited for the future of the operating system. I didn't want it to go without saying, but I do want to focus on Gecko here, so let's head straight in to development.

Yesterday I made what could be an important discovery. The QMozExtTexture::updateTexture() method is being called on ESR 78, but not on ESR 91. If I can track down the underlying reason for this, then it may well help getting the offscreen rendering back up and running again.

The next step is to find out at which point the failure is happening and try to understand why. Before we get into it I want to warn you that this is rather a long post, mostly containing debugging output which is dull at the best of times. But it does lead to some useful conclusions, so if you're interest to follow but don't want to have to read through all of the details, don't feel bad if you want to skip straight to the end!

If you're still reading this then I'll assume you're in it for the long haul. Don't say I didn't warn you though! So the best place to start this investigation is with the backtrace of the call. We saw this yesterday and it looked like this:
#0  QMozExtTexture::updateTexture (this=0x7f90024e30) at qmozexttexture.cpp:64
#1  0x0000007fbfc04f18 in MozMaterialNode::preprocess (this=0x7f9001ef60) at 
    qmozextmaterialnode.cpp:81
#2  0x0000007fbf8f0c44 in QSGRenderer::preprocess() () from /usr/lib64/
    libQt5Quick.so.5
#3  0x0000007fbf8f0e5c in QSGRenderer::renderScene(QSGBindable const&) () from /
    usr/lib64/libQt5Quick.so.5
#4  0x0000007fbf8f14c0 in QSGRenderer::renderScene(unsigned int) () from /usr/
    lib64/libQt5Quick.so.5
#5  0x0000007fbf8ffe58 in QSGRenderContext::renderNextFrame(QSGRenderer*, 
    unsigned int) () from /usr/lib64/libQt5Quick.so.5
#6  0x0000007fbf9429d0 in QQuickWindowPrivate::renderSceneGraph(QSize const&) (
    ) from /usr/lib64/libQt5Quick.so.5
#7  0x0000007fbf916c04 in ?? () from /usr/lib64/libQt5Quick.so.5
#8  0x0000007fbf91cc10 in ?? () from /usr/lib64/libQt5Quick.so.5
#9  0x0000007fbea290e8 in ?? () from /usr/lib64/libQt5Core.so.5
#10 0x0000007fb74b0a4c in start_thread (arg=0x7fffffe9bf) at pthread_create.c:
    479
#11 0x0000007fbe70b89c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/
    clone.S:78
This backtrace is, of course, from the ESR 78 build. There is no equivalent backtrace for the ESR 91 build because the QMozExtTexture::updateTexture() method never gets called in ESR 91. But maybe some of the other methods in the backtrace are being called. Let's find out.

To establish this I'm putting breakpoints on each of the functions in the backtrace and then running the ESR 91 build. If one of them hits I move to one higher up the stack (that is, the next line up that has a smaller number). Eventually one will fail to hit and at that point I'll know where the failure is happening. So here goes.
Thread 68 &quot;QSGRenderThread&quot; hit Breakpoint 4, 0x0000007ff7b6e984 in 
    QQuickWindowPrivate::renderSceneGraph(QSize const&)@plt ()
   from /usr/lib64/libQt5Quick.so.5
(gdb) delete break 4
(gdb) b QSGRenderContext::renderNextFrame
Breakpoint 5 at 0x7fe82e4474 (2 locations)
(gdb) c
Continuing.

Thread 68 &quot;QSGRenderThread&quot; hit Breakpoint 5, 0x0000007fe82e4474 in 
    QSGRenderContext::renderNextFrame(QSGRenderer*, unsigned int)@plt ()
   from /usr/lib64/qt5/plugins/scenegraph/libcustomcontext.so
(gdb) delete break 5
(gdb) b QSGRenderer::preprocess
Breakpoint 6 at 0x7ff7be7a50
(gdb) c
Continuing.

Thread 68 &quot;QSGRenderThread&quot; hit Breakpoint 6, 0x0000007ff7be7a50 in 
    QSGRenderer::preprocess() () from /usr/lib64/libQt5Quick.so.5
(gdb) delete break 6
(gdb) b MozMaterialNode::preprocess
Breakpoint 7 at 0x7ff7ef899c
(gdb) c
Continuing.
frame008.data: Colour before: (0, 0, 0, 255), 1
frame009.data: Colour before: (0, 0, 0, 255), 1
frame010.data: Colour before: (0, 0, 0, 255), 1
frame011.data: Colour before: (0, 0, 0, 255), 1
[...]
As we can see from the above, the QQuickWindowPrivate::renderSceneGraph() method does get called. So does the QSGRenderContext::renderNextFrame() method. but the QSGRenderer::preprocess() method? That never gets called.

That means the failure is happening inside the QSGRenderContext::renderNextFrame() method. This call isn't part of the Gecko code, or indeed part of the QtMozEmbed code, but rather part of the Qt stack. In particular, part of the "Qt Scene Graph" stack (hence the "QSG" prefix of the method).

I'm not super familiar with how the Qt Scene Graph does it's thing, but there is plenty of good documentation out there. So I've done some reading around to get a better idea.

One part sprang out at me, since it specifically relates to the method that isn't getting called. The explanation that follows is taken from the Qt documentation:
 
void QSGNode::preprocess()

Override this function to do processing on the node before it is rendered.

Preprocessing needs to be explicitly enabled by setting the flag QSGNode::UsePreprocess. The flag needs to be set before the node is added to the scene graph and will cause the preprocess() function to be called for every frame the node is rendered.

So maybe the problem is that the QSGNode::UsePreprocess flag is never being set? A quick scan of the QtMozEmbed code throws up the following bit of code for controlling the flag, in the qmozextmaterialnode.cpp file:
MozMaterialNode::MozMaterialNode()
{
    setFlag(UsePreprocess);

    setGeometry(&m_geometry);
}
So, we should try to establish whether this is getting called or not. Running with some breakpoints shows that it is indeed getting called on the ESR 78 build.
(gdb) break MozMaterialNode::MozMaterialNode
Breakpoint 2 at 0x7fbfbdadc4 (2 locations)
(gdb) r
[...]
Thread 11 &quot;QSGRenderThread&quot; hit Breakpoint 2, MozMaterialNode::
    MozMaterialNode (this=this@entry=0x7f0001ef60) at qmozextmaterialnode.cpp:27
27      MozMaterialNode::MozMaterialNode()
(gdb) n
619     /usr/include/qt5/QtCore/qrect.h: No such file or directory.
(gdb) 
27      MozMaterialNode::MozMaterialNode()
(gdb) 
31          setGeometry(&m_geometry);
(gdb) 
That's no big surprise, since if it were not being called the other parts of the render pipeline would also be failing and they're not. How about on ESR 91 though? We know the render pipeline isn't working there and that could be because this call isn't happening. Let's find out by adding the same breakpoint on the ESR 91 build and executing it just as we did for ESR 78:
(gdb) break MozMaterialNode::MozMaterialNode
Breakpoint 8 at 0x7ff7ece9e4 (2 locations)
(gdb) r
[...]
frame014.data: Colour before: (8, 0, 152, 255), 1
frame015.data: Colour before: (1, 0, 255, 255), 1
frame016.data: Colour before: (223, 5, 0, 255), 1
frame017.data: Colour before: (71, 94, 162, 255), 1
^C
Thread 1 &quot;harbour-webview&quot; received signal SIGINT, Interrupt.
0x0000007ff69f8740 in poll () from /lib64/libc.so.6
(gdb) info break
Num     Type           Disp Enb Address            What
8       breakpoint     keep y   <MULTIPLE>         
8.1                         y     0x0000007ff7ece9e4 <MozMaterialNode::
    MozMaterialNode()@plt+4>
8.2                         y     0x0000007ff7ef9028 <MozMaterialNode::
    MozMaterialNode()+16>
(gdb) 
Here the breakpoint doesn't get called. You can see that I double-checked that the breakpoint is set correctly and it is. So this is looking very much like a smoking gun.

We'll use the same approach we used before to try to figure out why this isn't getting called. We take the backtrace of the ESR 78 build where it is getting called and work our way up through it until we find out what's failing on the ESR 91 side. Here's the backtrace, taken from ESR 78:
#0  MozMaterialNode::MozMaterialNode (this=this@entry=0x7f0001ef60) at 
    qmozextmaterialnode.cpp:31
#1  0x0000007fbfc05834 in MozExtMaterialNode::MozExtMaterialNode (
    this=0x7f0001ef60) at qmozextmaterialnode.cpp:376
#2  0x0000007fbfc03ea4 in QuickMozView::updatePaintNode (this=0x55558433a0, 
    oldNode=<optimized out>)
    at /usr/include/xulrunner-qt5-78.15.1/mozilla/cxxalloc.h:33
#3  0x0000007fbf941c50 in QQuickWindowPrivate::updateDirtyNode(QQuickItem*) () 
    from /usr/lib64/libQt5Quick.so.5
#4  0x0000007fbf94214c in QQuickWindowPrivate::updateDirtyNodes() () from /usr/
    lib64/libQt5Quick.so.5
#5  0x0000007fbf943270 in QQuickWindowPrivate::syncSceneGraph() () from /usr/
    lib64/libQt5Quick.so.5
#6  0x0000007fbf9164a8 in ?? () from /usr/lib64/libQt5Quick.so.5
#7  0x0000007fbf917134 in ?? () from /usr/lib64/libQt5Quick.so.5
#8  0x0000007fbf91cc10 in ?? () from /usr/lib64/libQt5Quick.so.5
#9  0x0000007fbea290e8 in ?? () from /usr/lib64/libQt5Core.so.5
#10 0x0000007fb74b0a4c in start_thread (arg=0x7fffffe9bf) at pthread_create.c:
    479
#11 0x0000007fbe70b89c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/
    clone.S:78
It's important to note that the call at the top to MozMaterialNode::MozMaterialNode() — which is the one we're interested in — isn't a normal call, it's a constructor. It's not being called directly, but rather as a side-effect of a call to the MozExtMaterialNode class constructor. This inherits from the MozMaterialNode class as we can see in this code copied from qozextmaterialnode.cpp:
class MozExtMaterialNode : public MozMaterialNode
{
public:
    MozExtMaterialNode();
    ~MozExtMaterialNode();
[...]
As an aside, there's also a MozRgbMaterialNode class, but it looks like we're not using that:
class MozRgbMaterialNode : public MozMaterialNode
{
public:
    MozRgbMaterialNode();
    ~MozRgbMaterialNode();
[...]
A quick check confirms that it's definitely only the MozExtMaterialNode class in use for us here:
(gdb) break MozRgbMaterialNode
Breakpoint 3 at 0x7fbfc056a0: file qmozextmaterialnode.cpp, line 176.
(gdb) b MozExtMaterialNode
Breakpoint 4 at 0x7fbfbdc174 (2 locations)
(gdb) r
[...]
Thread 10 &quot;QSGRenderThread&quot; hit Breakpoint 4, 0x0000007fbfbdc174 in 
    MozExtMaterialNode::MozExtMaterialNode()@plt () from /usr/lib64/
    libqt5embedwidget.so.1
(gdb)                            
Another check of the backtrace and an examination of the code shows that the place where this is constructed is inside the QuickMozView::updatePaintNode() method; part of QtMozEmbed. We're going to be spending a bit of time inside this code, so it's worth taking a look for yourself at the source. But for reference, here's the full source for the method we're dealing with:
QSGNode *
QuickMozView::updatePaintNode(QSGNode *oldNode, UpdatePaintNodeData *)
{
    // If the dimensions are entirely invalid return no node.
    if (width() <= 0 || height() <= 0) {
        delete oldNode;

        delete mTexture;
        mTexture = nullptr;

        return nullptr;
    }

    const bool invalidTexture = !mComposited
            || !d->mIsPainted
            || !d->mViewInitialized
            || !d->mHasCompositor
            || !d->mContext->registeredWindow()
            || !d->mMozWindow;

    if (mTexture && invalidTexture) {
        delete oldNode;
        oldNode = nullptr;

        delete mTexture;
        mTexture = nullptr;
    }

    QRectF boundingRect(d->renderingOffset(), d->mSize);

    if (!mTexture && invalidTexture) {
        QSGSimpleRectNode *node = static_cast<QSGSimpleRectNode *>(oldNode);
        if (!node) {
            node = new QSGSimpleRectNode;
        }
        node->setColor(d->mBackgroundColor);
        node->setRect(boundingRect);

        return node;
    }

    if (!mTexture) {
        delete oldNode;
        oldNode = nullptr;
    }

    MozMaterialNode *node = static_cast<MozMaterialNode *>(oldNode);

    if (!node) {
#if defined(QT_OPENGL_ES_2)
        QMozExtTexture * const texture = new QMozExtTexture;
        mTexture = texture;

        connect(texture, &QMozExtTexture::getPlatformImage, d->mMozWindow, 
    &QMozWindow::getPlatformImage, Qt::DirectConnection);

        node = new MozExtMaterialNode;
#else
#warning &quot;Implement me for non ES2 platform&quot;
//        node = new MozRgbMaterialNode;
        return nullptr;
#endif
Note the call there to new MozExtMaterialNode. That's the line we're really interested in. On ESR 78 when we step through the code we initially get an execution flow that enters the (!mTexture && invalidTexture) clause. Once we're in there we know the method is going to return early and the MozExtMaterialNode constructor isn't going to get called until we re-enter the method.

After a few cycles through like this, there's suddenly a change. While the mTexture value remains unset, the invalidTexture value flips to false which means the clause gets skipped. We then end up inside the (!node) section, which is where things get interesting. At that point the texture is created, as is the MozExtMaterialNode object and now things are properly set up for rendering.

Let's break this down a bit further. Here's what the initial few cycles look like:
(gdb) break QuickMozView::updatePaintNode
Breakpoint 8 at 0x7faf06f764 (2 locations)
(gdb) r
[...]
Thread 10 &quot;QSGRenderThread&quot; hit Breakpoint 8, QuickMozView::
    updatePaintNode (this=0x55558454e0, oldNode=0x0) at quickmozview.cpp:172
172         if (width() <= 0 || height() <= 0) {
(gdb) delete break
Delete all breakpoints? (y or n) y
(gdb) break quickmozview.cpp:214
Breakpoint 9 at 0x7fbfc03fac: file quickmozview.cpp, line 214.
(gdb) c
Continuing.
[...]
Thread 10 &quot;QSGRenderThread&quot; hit Breakpoint 9, QuickMozView::
    updatePaintNode (this=0x55558454e0, oldNode=0x7f0801eab0) at 
    quickmozview.cpp:216
216         if (!node) {
(gdb) 
Stepping through this section more carefully we can see why the invalidTexture flag is set to true in all these cases:
Thread 10 &quot;QSGRenderThread&quot; hit Breakpoint 12, QuickMozView::
    updatePaintNode (this=0x5555842660, oldNode=0x7f0800c400) at 
    quickmozview.cpp:172
172         if (width() <= 0 || height() <= 0) {
(gdb) p mComposited
$23 = true
(gdb) p d->mIsPainted
$24 = false
(gdb) p d->mViewInitialized
$25 = true
(gdb) p d->mHasCompositor
$26 = true
(gdb) p d->mContext->registeredWindow()
$27 = (QMozWindow *) 0x555560ce60
(gdb) p d->mMozWindow
$28 = {wp = {d = 0x5555bbe8e0, value = 0x555560ce60}}
(gdb) 
Crucially here, we can see that p d->mIsPainted is set to false. While any of the items shown here are set to false the invalidTexture flag will be set to true. But then after three or four cycles like this the d->mIsPainted flag suddenly flips to true at which point everything else flows through as we want.
Thread 10 &quot;QSGRenderThread&quot; hit Breakpoint 12, QuickMozView::
    updatePaintNode (this=0x5555842660, oldNode=0x7f0800c400) at 
    quickmozview.cpp:172
172         if (width() <= 0 || height() <= 0) {
(gdb) p d->mIsPainted
$37 = true
(gdb) n
181         const bool invalidTexture = !mComposited
(gdb) 
186                 || !d->mMozWindow;
(gdb) 
196         QRectF boundingRect(d->renderingOffset(), d->mSize);
(gdb) 
198         if (!mTexture && invalidTexture) {
(gdb) 
210             delete oldNode;
(gdb) 
218             QMozExtTexture * const texture = new QMozExtTexture;
(gdb) 
219             mTexture = texture;
(gdb) 
221             connect(texture, &QMozExtTexture::getPlatformImage, 
    d->mMozWindow, &QMozWindow::getPlatformImage, Qt::DirectConnection);
(gdb) p mTexture
$38 = (QSGTexture *) 0x7f08024070
(gdb) n
223             node = new MozExtMaterialNode;
(gdb) 
230             node->setTexture(mTexture);
(gdb) 
Nice! But this never happens on ESR 91. In the case of ESR 91 the d->mIsPainted flag remains set to false throughout. Consequently we never get to the point where the texture or MozExtMaterialNode object are created on ESR 91:
(gdb) break QuickMozView::updatePaintNode
Breakpoint 9 at 0x7fe790b764 (2 locations)
(gdb) r
[...]
Thread 9 &quot;QSGRenderThread&quot; hit Breakpoint 1, QuickMozView::
    updatePaintNode (this=0x555586cf70, oldNode=0x0) at quickmozview.cpp:172
172         if (width() <= 0 || height() <= 0) {

(gdb) break quickmozview.cpp:223
Breakpoint 11 at 0x7fbfc03ea4: file quickmozview.cpp, line 230.
(gdb) r
[...]
We end up just getting this same sequence again and again and again:
Thread 9 &quot;QSGRenderThread&quot; hit Breakpoint 7, QuickMozView::
    updatePaintNode (this=0x555586d0d0, oldNode=0x7fc800c640) at 
    quickmozview.cpp:172
172         if (width() <= 0 || height() <= 0) {
(gdb) n
181         const bool invalidTexture = !mComposited
(gdb) p mComposited
$7 = true
(gdb) p d->mIsPainted
$8 = false
(gdb) n
188         if (mTexture && invalidTexture) {
(gdb) p mTexture
$9 = (QSGTexture *) 0x0
(gdb) n
196         QRectF boundingRect(d->renderingOffset(), d->mSize);
(gdb) 
198         if (!mTexture && invalidTexture) {
(gdb) 
200             if (!node) {
(gdb) 
203             node->setColor(d->mBackgroundColor);
(gdb) 
204             node->setRect(boundingRect);
(gdb) 
206             return node;
(gdb) c
Clearly the question we have to answer is "why is d->mIsPainted never set to true?". If we can answer this, who knows, we might even be on the right path to solving the rendering problem.

This has been a rather long investigation today already, so I'm going to leave it there for now. But tomorrow I'm going to come back to this in order to try to answer this question. This feels like it could be a very fruitful line of enquiry!

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
19 May 2024 : Day 237 #
I was easing myself gently back into Gecko development yesterday with a quick look over some of the work that others in the Sailfish community have been doing while I've been having a break, related to deciphering the messed up textures that are coming out of the render surface.

Today I'm back looking at the code in earnest. And it's been a fruitful process. My starting point has been to take a look not at the Gecko code, but from the other end: starting with QtMozEmbed and sailfish-components-webview. Taking things easy, I'm just browsing through the code, trying to find where the Surface, as used by Gecko, connects with the texture rendered to the screen by the WebView. If I can figure that out I'll feel like I'm on solid ground.

It looks like a key piece of the puzzle happens in QMozExtTexture::updateTexture(). For example, on ESR 78, when rendering is taking place, the method gets hit often. So often that it appears to be every frame:
Thread 9 &quot;QSGRenderThread&quot; hit Breakpoint 1, QMozExtTexture::
    updateTexture (this=0x7f90024e30) at qmozexttexture.cpp:64
64      {
(gdb) bt
#0  QMozExtTexture::updateTexture (this=0x7f90024e30) at qmozexttexture.cpp:64
#1  0x0000007fbfc04f18 in MozMaterialNode::preprocess (this=0x7f9001ef60) at 
    qmozextmaterialnode.cpp:81
#2  0x0000007fbf8f0c44 in QSGRenderer::preprocess() () from /usr/lib64/
    libQt5Quick.so.5
#3  0x0000007fbf8f0e5c in QSGRenderer::renderScene(QSGBindable const&) () from /
    usr/lib64/libQt5Quick.so.5
#4  0x0000007fbf8f14c0 in QSGRenderer::renderScene(unsigned int) () from /usr/
    lib64/libQt5Quick.so.5
#5  0x0000007fbf8ffe58 in QSGRenderContext::renderNextFrame(QSGRenderer*, 
    unsigned int) () from /usr/lib64/libQt5Quick.so.5
#6  0x0000007fbf9429d0 in QQuickWindowPrivate::renderSceneGraph(QSize const&) (
    ) from /usr/lib64/libQt5Quick.so.5
#7  0x0000007fbf916c04 in ?? () from /usr/lib64/libQt5Quick.so.5
#8  0x0000007fbf91cc10 in ?? () from /usr/lib64/libQt5Quick.so.5
#9  0x0000007fbea290e8 in ?? () from /usr/lib64/libQt5Core.so.5
#10 0x0000007fb74b0a4c in start_thread (arg=0x7fffffe9bf) at pthread_create.c:
    479
#11 0x0000007fbe70b89c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/
    clone.S:78
This contrasts with the ESR 91 build, where the same breakpoint is never hit. I placed a breakpoint on all of the updateTexture() methods just to be sure:
Num Disp Enb Address       What
3   keep y   <MULTIPLE>         
3.1      y   0x07ff7b64ba4 <QSGDistanceFieldGlyphCache::updateTexture()@plt+4>
3.2      y   0x07ff7b64d64 <QSGDefaultPainterNode::updateTexture()@plt+4>
3.3      y   0x07ff7bf071c <QSGDefaultPainterNode::updateTexture()+20>
3.4      y   0x07ff7bf5230 <QSGDistanceFieldGlyphCache::updateTexture()+24>
3.5      y   0x07ff7c1babc <QSGDefaultLayer::updateTexture()+12>
3.6      y   0x07ff7ef846c <QMozExtTexture::updateTexture()+20>
The fact there are hits for ESR 78, but not for ESR 91, is definitely of interest.

This isn't something that's come up before. Until now I've focused almost exclusively on getting the texture rendered to at the other end. But it looks now like the QtMozEmbed code is failing somewhere; the render update isn't being called. Just to be clear though, this isn't due to any change that I've made in QtMozEmbed. At least that seems unlikely at any rate, given I've not made an notable changes to these parts of the code.

More likely, some part of the initialisation process in the Gecko code is failing, or not happening as it should. Something like that could well lead to a situation where the render update doesn't get called. For example, it could be that the texture is never set as part of the Qt Scene Graph. But that's speculation, it could be all sorts of things.

I'm rather excited by this. Maybe this is the key problem we've been searching for? But I don't want to go too far today while I'm still getting myself up to speed with things. So I'll continue looking in to this further tomorrow.

Also worth mentioning is that tomorrow is Jolla Love Day 2. I'm excited to find out what Jolla have in store for us and while I won't be able to attend in-person, I'll definitely be there both online and in spirit!

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
18 May 2024 : Day 236 #
It's been two weeks since I last published an entry in my Gecko dev diary series. The gap was necessary for me to fit various other things into my life which just weren't going to be compatible with me working on the Gecko code, specifically my attendance of the — HPC/AI Days — conference in Durham for a week, followed by preparations for a couple of Pint of Science events I was involved with during this last week. This was my first involvement with Pint of Science and I have to admit it was a lot of fun.

The break from Gecko offered me some healthy respite, but I promised to start back up again today and here we are. I've not yet got back in to the rhythm of Gecko development yet, so my first few entries may be somewhat relaxed. I'll be taking it gently to begin with.

One important reason for this is the problem I'm trying to solve. As I left things a fortnight ago I'm still trying to figure out why the offscreen rendering pipeline is broken. Since starting on trying to fix this problem I've come a long way, but I'm still not there. More importantly, by this point it's become a problem without a clear solution. That means I no longer have a clear path forwards.

The daily rhythm of these posts is important for keeping me focused on the task, but it can also be a distraction. It means chopping the work into day-sized chunks. That can be a hindrance when what the task really needs is a deep analysis, during which there may not be so much to write about. This work is easiest to write about when I'm making code changes. When I'm not making changes to the code, things can be a bit too intangible.

That's where I am today.

Before I get in to the actual work I want to first talk about confusing textures and the amazing work of the Sailfish community in their attempts to disentangle them. If you've been following these diary entries for some time you'll know that before the gap I spent a fair amount of time trying to extract texture image data from the render surface used by the offline renderer.

The result of all this work ended up being a collection of raw dumps of pixel data which, after attempting to convert them to something visible, looked like this.
 
The corrupted, swizzled image data from before the gap. After two weeks setting around unchanged it still just looks like fuzzy pixels to me.

I tried all sorts to get the images into better shape but without any luck. But the courageous Sailfish community took up the baton and continued the work while I was taking a break from it.

There were great contributions from across the community, but I want to especially highlight the efforts and ideas of Tone (tortoisedoc), Adam Pigg (piggz), Ville Nummela (vige, my ex-colleague from Jolla), Mark Washeim (poetaster), thigg, attah, remote and kan_ibal, all of whom provided genuinely useful input.
 
Three screenshots: a texture with some ghostly images shown on it, a zoomed in texture with a few bright pixels and a window showing a similar texture with some metadata in a desktop window

Here you can see some of the efforts to decipher the code. On the left is the result of poetaster having processed the image using ImageMagick. As poetaster explains:
 
I thought it might be an offset issue, but after tooling about in gimp, I'm not so sure. What 'should' they look like. The RGB data does seem to be RBG, the RGBA, I'm not sure. You said unsigned?... bits of sailfish logo can also be recognized. It does look unsigned judging from the results of the convert operation.

In the centre is a close up rendered by piggz using PixelViewer.
 
There is definitely almost an image there! For this I used PixelViewer, which allows to quickly try many different possible formats, this one is RGBA_8888... This is making a great puzzle... just need to figure out how to put the bits together.

When Adam talks about bits I'm not sure whether he's talking about computer bits or puzzle pieces, but either way I agree it should just be a matter of putting them together int the right way.

Finally on the right we can see the image also rendered using PixelViewer this time by kan_ibal in combination with a Python script.
 
There are blocks 8x8 pixels. It reminds me a jpeg compression and looks like a stage of quantization and DCT.

Again, this all sounds very plausible to me, alghouth the image still isn't popping out just yet. Thank you to everyone for your input on this. Unfortunately while none of these resulted in a definitive conclusion, it's all very helpful input. If anyone else would like to give this a go, feel free to take a look at the textures and pass them through your favourite raw image processing pipeline. I'd love to get to the bottom of this.

I'm going to leave it there for today. But tomorrow I'll be starting to look at the Gecko code again in earnest.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
18 May 2024 : Gecko dev diary #
I'm back to writing my daily Gecko dev diary again. Staring with a recap of things that have been happening over the last two weeks while I was taking a break from it. There will be updates daily.
6 May 2024 : Gecko dev diary pause #
As I explained in my latest post, I'm pausing my Gecko dev diary for two weeks while I attend Durham HPC/AI Days 2024 and sort a few other things out in my life afterwards. I'll be back up and writing daily again from Saturday 18th May.
3 May 2024 : Day 235 #
As discussed at the start of the week, this is going to be my last developer diary post for a little bit. But I want to make absolutely clear that this is a temporary pause. I'm heading to the HPC Days Conference in Durham next week where I'll be giving a talk on Matching AI Research to HPC Resource. I'm expecting it to be a packed schedule and, as is often the case with this kind of event, I'm not expecting to be able to fit in my usual gecko work alongside this. So there will be a pause, but I'll be back on the 18th May to restart from where I'm leaving things today.

During the pause I'm hoping write up some of the blog posts that I've put on hold while working on the gecko engine, so things may not be completely silent around here. But the key message I want to get across is that I'm not abandoning gecko. Not at all. I'll be back to it in no time.

Nevertheless, as I write this I'm still frustrated and stumped by my lack of progress with the offscreen rendering. So I'm also hoping that a break will give me the chance to come up with some alternative approaches. Over the last couple of days I attempted to capture the contents of the surface used to transfer the offscreen texture onscreen, but the results were inconclusive at best.

On the forums I received helpful advice from Tone (tortoisedoc). Tone highlighted the various areas where texture decoding can go awry:
 
two things can go wrong (assuming the data is correct in the texture):
  • alignment of data (start, stride)
  • format of pixel (ARGB / RGBA etc)
Which one is the texture you are look at configured with?

Is the image the same the browser would show? Which image are you displaying in the embedded view?


These are all great points and great questions. Although I know what sort of image I'm asking for (either RGB or RGBA in UNSIGNED_BYTE format) the results don't seem to be matching that. It's made more complex by the fact that the outline of the image is there (which suggests things like start and stride are correct) but still many of the pixels are just blank. It's somewhat baffling and I think the reason might be more to do with an image that doesn't exist rather than an image which is being generated in the wrong format.

But I could be wrong and I'll continue to consider these possibilities and try to figure out the underlying reason. I really appreciate the input and as always it gives me more go to on.

But today I've shifted focus just briefly by giving the code a last review before the pause: sifting through the code again to try to spot differences.

There were many places I thought there might be differences, such as the fact that the it wasn't clear to me that the GLContext->mDesc.isOffscreen flag was being set. But stepping through the application in the debugger showed it was set after all. So no fix to be had there.

The only difference I can see — and it's too small to make a difference I'm sure — is that the Surface capabilities are set at slightly different places. It's possible that in the gap between where it's set in ESR 78 and the place it's set slightly later in ESR 91, something makes use of it and a difference results.

I'm not convinced to be honest, but to avoid even the slightest doubt I've updated the code so that the values are now set explicitly:
GLContext::GLContext(const GLContextDesc& desc, GLContext* sharedContext,
                     bool useTLSIsCurrent)
    : mDesc(desc),
      mUseTLSIsCurrent(ShouldUseTLSIsCurrent(useTLSIsCurrent)),
      mDebugFlags(ChooseDebugFlags(mDesc.flags)),
      mSharedContext(sharedContext),
      mWorkAroundDriverBugs(
          StaticPrefs::gfx_work_around_driver_bugs_AtStartup()) {
  mCaps.any = true;
  mCaps.color = true;
[...]
}

[...]

bool GLContext::InitImpl() {
[...]
  // TODO: Remove SurfaceCaps::any.
  if (mCaps.any) {
    mCaps.any = false;
    mCaps.color = true;
    mCaps.alpha = false;
  }
[...]
The updated code is built and right now transferring over to my development phone. As I say though, this doesn't look especially promising to me. I'm not holding my breath for a successful render.

And I was right: no improvements after this change. Argh. How frustrating.

I'll continue looking through the fine details of the code. There has to be a difference in here that I'm missing. And while it doesn't make for stimulating diary entries, just sitting and sifting through the code seems like an essential task nonetheless.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.

If you've got this far, then not only do you have my respect and thanks, but I'd also urge you to return on the 18th May when I'll be picking things up from where I've left them today.
Comment
2 May 2024 : Day 234 #
Yesterday I added code to the build to try to capture the image from the surface texture that's supposed to be used for rendering to the screen. I got poor results with it though: the colours are wrong and the images look corrupted somehow.

So today I've been playing around with the data to try to figure out why. One possibility that sprang to mind is that potentially the data needs swizzling. Swizzling is the act of rearranging pixels, components or bits within the components. One possibility is that the data is being read little-endian (or big-endian) but GIMP is interpreting it as big-endian (or little-endian). So reversing the endianness might help.

In order to check this I've written a simple Python script that allows me to rearrange bits and bytes within the image in a simplified way. I just have to run the image through the script:
#!/bin/python3
import sys

def swizzle(byte):
    result = 0
    for pos in range(8):
        result |= ((byte & (1 << pos)) >> pos) << (7 - pos)
    return result

def convert(filein, fileout):
    with open(filein, &quot;rb&quot;) as fin:
        data = fin.read(-1)
    with open(fileout, &quot;wb&quot;) as fout:
        for pos in range(0, len(data), 4):
            r1, g1, b1, a1 = data[pos:pos + 4]

            # Rearrange the bits and bytes
            r2 = swizzle(r1)
            g2 = swizzle(g1)
            b2 = swizzle(b1)

            fout.write(r2.to_bytes(1, 'little'))
            fout.write(g2.to_bytes(1, 'little'))
            fout.write(b2.to_bytes(1, 'little'))

if len(sys.argv) != 3:
    print(&quot;Please provide input and output filenames&quot;)
else:
    filein = sys.argv[1]
    fileout = sys.argv[2]
    print(f&quot;Converting from: {filein}&quot;)
    print(f&quot;Converting to: {fileout}&quot;)
    convert(filein, fileout)
    print(f&quot;Done&quot;)
After passing the images through this script and rearranging the data in as many ways as I can think of, I'm still left with a very messy result. The two new versions I created are one with just the alpha channel removed; and a second which also reverses the bits in each byte. Here are the three variants I'm left with:
 
  1. frame060-00.data: Original file (RGBA).
  2. frame060-01.data: Alpha removed (RGB).
  3. frame060-02.data: Alpha removed and swizzled (RGB).


The version without the alpha channel was created using a change to the gecko code by setting the texture format to RGB. I did this because I was concerned all of the zeroed-out values (which appear as black pixels in the image) might have been somehow related to this. But as you can see, the result is identical to generating an RGBA texture and then using the Python script to remove the alpha channel.

It's hard to see the difference between the first two versions and the third, which has the bit direction reversed. The colours are actually different, but because black remains black even after swizzling, the overall darkness of the image remains.
 
The swizzled image data still looks corrupted.

As you can see we get very similar results irrespective of these changes. When we look at the data using a hex editor we can see why. The majority of the entries are zero, which is why we're getting such a preponderance of black in the images:
$ hexdump -vC -s 0x1e0 -n 0x090 frame060-00.data | less
000001e0  ff 03 04 ff 0f 00 00 ff  00 01 00 ff 00 00 f0 ff  |................|
000001f0  00 00 00 ff 00 00 00 ff  00 00 00 ff 00 00 00 ff  |................|
00000200  ff 03 04 ff 0f 00 00 ff  80 00 00 ff 00 00 d0 ff  |................|
00000210  00 00 00 ff 00 00 00 ff  00 00 00 ff 00 00 00 ff  |................|
00000220  00 00 00 ff 00 00 00 ff  00 00 00 ff 00 00 00 ff  |................|
00000230  00 00 00 ff 00 00 00 ff  00 00 00 ff 00 00 00 ff  |................|
00000240  ff 03 04 ff 0f 00 00 ff  00 01 00 ff 00 00 e0 ff  |................|
00000250  00 00 00 ff 00 00 00 ff  00 00 00 ff 00 00 00 ff  |................|
00000260  00 00 00 ff 00 00 00 ff  00 00 00 ff 00 00 00 ff  |................|
00000270
$ hexdump -vC -s 0x1e0 -n 0x090 frame060-01.data | less
000001e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000001f0  00 00 00 00 00 00 00 00  ff 03 04 0f 00 00 00 01  |................|
00000200  00 00 00 d0 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000210  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000220  00 00 00 00 00 00 00 00  ff 03 04 0f 00 00 80 00  |................|
00000230  00 00 00 d0 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000240  ff 03 04 0f 00 00 00 01  00 00 00 e0 00 00 00 00  |................|
00000250  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000260  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000270
$ hexdump -vC -s 0x1e0 -n 0x090 frame060-02.data | less
000001e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000001f0  00 00 00 00 00 00 00 00  ff c0 20 f0 00 00 00 80  |.......... .....|
00000200  00 00 00 0b 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000210  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000220  00 00 00 00 00 00 00 00  ff c0 20 f0 00 00 01 00  |.......... .....|
00000230  00 00 00 0b 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000240  ff c0 20 f0 00 00 00 80  00 00 00 07 00 00 00 00  |.. .............|
00000250  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000260  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000270
Inverting the colours doesn't help: we just end up with a predominance of white. It looks more like only certain points in the texture are being exported.

Swizzling then doesn't appear to be the answer. Another possibility, I thought, might be that I'm using the wrong parameters for the call to raw_fReadPixels() when actually requesting the data. Maybe these parameters have to match the underlying texture?

To test this out I've tried to determine the values that are used when the surface textures are created. I used the debugger for this. But I thought I'd also take the opportunity to check that we have the same values for ESR 78 and ESR 91. So first off, this is what I get when I check ESR 78:
Thread 37 &quot;Compositor&quot; hit Breakpoint 3, mozilla::gl::
    SharedSurface_Basic::Create (gl=0x7eac109140, formats=..., size=..., 
    hasAlpha=false)
    at gfx/gl/SharedSurfaceGL.cpp:24
24          bool hasAlpha) {
(gdb) n
25        UniquePtr<SharedSurface_Basic> ret;
(gdb) p formats
$1 = (const mozilla::gl::GLFormats &) @0x7eac00595c: {color_texInternalFormat = 
    6407, color_texFormat = 6407, color_texType = 5121, 
  color_rbFormat = 32849, depthStencil = 35056, depth = 33190, stencil = 36168}
(gdb) p size
$2 = (const mozilla::gfx::IntSize &) @0x7eac003564: {<mozilla::gfx::
    BaseSize<int, mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> >> = 
    {{{
        width = 1080, height = 2520}, components = {1080, 2520}}}, <mozilla::
    gfx::UnknownUnits> = {<No data fields>}, <No data fields>}
(gdb) p hasAlpha
$3 = false
(gdb) p gl
$4 = (mozilla::gl::GLContext *) 0x7eac109140
(gdb) 
And here's the check for ESR 91. The details all looks identical to me:
Thread 37 &quot;Compositor&quot; hit Breakpoint 3, mozilla::gl::
    SharedSurface_Basic::Create (gl=0x7ee019aa50, formats=..., size=..., 
    hasAlpha=false)
    at gfx/gl/SharedSurfaceGL.cpp:59
59          bool hasAlpha) {
(gdb) n
60        UniquePtr<SharedSurface_Basic> ret;
(gdb) p formats
$4 = (const mozilla::gl::GLFormats &) @0x7ee00044e4: {color_texInternalFormat = 
    6407, color_texFormat = 6407, color_texType = 5121, 
  color_rbFormat = 32849, depthStencil = 35056, depth = 33190, stencil = 36168}
(gdb) p size
$5 = (const mozilla::gfx::IntSize &) @0x7f1f92effc: {<mozilla::gfx::
    BaseSize<int, mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> >> = 
    {{{
        width = 1080, height = 2520}, components = {1080, 2520}}}, <mozilla::
    gfx::UnknownUnits> = {<No data fields>}, <No data fields>}
(gdb) p hasAlpha
$6 = false
(gdb) p gl
$7 = (mozilla::gl::GLContext *) 0x7ee019aa50
(gdb) 
The actual format is specified in GLContext::ChooseGLFormats(). Here are the values taken from the debugger:
{
    color_texInternalFormat = 6407,
    color_texFormat = 6407,
    color_texType = 5121, 
    color_rbFormat = 32849,
    depthStencil = 35056,
    depth = 33190,
    stencil = 36168
}
Checking these against the appropriate enums, these values are equivalent to the following:
{
    color_texInternalFormat = LOCAL_GL_RGB,
    color_texFormat = LOCAL_GL_RGB,
    color_texType = LOCAL_GL_UNSIGNED_BYTE, 
    color_rbFormat = LOCAL_GL_RGB8,
    depthStencil = LOCAL_GL_DEPTH24_STENCIL8,
    depth = LOCAL_GL_DEPTH_COMPONENT24,
    stencil = LOCAL_GL_STENCIL_INDEX8
}
I've used these values for the parameters to ReadPixles(), including removing the alpha channel. But sadly the results are practically identical. Here's the new output generated during the capture for reference:
frame000.data: Colour before: (0, 0, 0), 1
frame001.data: Colour before: (0, 0, 0), 1
frame002.data: Colour before: (0, 0, 0), 1
frame003.data: Colour before: (0, 0, 0), 1
frame004.data: Colour before: (208, 175, 205), 1
frame005.data: Colour before: (39, 0, 0), 1
frame006.data: Colour before: (67, 115, 196), 1
frame007.data: Colour before: (0, 0, 0), 1
frame008.data: Colour before: (0, 0, 0), 1
frame009.data: Colour before: (157, 149, 42), 1
frame010.data: Colour before: (0, 0, 0), 1
frame011.data: Colour before: (0, 32, 12), 1
frame012.data: Colour before: (71, 118, 198), 1
frame013.data: Colour before: (0, 0, 0), 1
frame014.data: Colour before: (0, 0, 0), 1
This is all interesting stuff, but it doesn't seem to get us any closer to what we were hoping for, which is supposed to be a copy of the image that ought to be shown on the screen. It's all been a bit of an unproductive diversion. I'll have to try to make more progress on that tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
1 May 2024 : Day 233 #
Before I get in to my diary entry today I want to add a reminder that I'll not be posting diaries next week, or the week after. Next week I'll be attending a conference and I need a bit of time to sort out a few other things in my wider life. So this will give me the chance to do that. But this is only a temporary gap; I'll be back straight after to continue where I left off.

With that out of the way, let's get on with the entry for today. If you've read any of my diary entries over the last few days you'll know I've been trying to extract something useful from the GLScreenBuffer::Swap() method. I added some code in to the method to read off pixel data from the render surface which generated some plausible looking output.

But I wasn't totally convinced: it looked to me like there were too many zero entries that I couldn't explain

So today I've been trying to do a couple of things. First I tried to get something to compare against; second I tried to make the code that captures the colour at a point a little more general by also saving out the entire image buffer to disk.

Let's tackle these in order. The obvious way to get something to compare against is to add the same code to the ESR 78 library and try running that. And this is exactly what I've been doing. The surrounding code in ESR 91 is almost identical to that in ESR 78, so it's a pretty straightforward task to just copy over the code changes from ESR 91 to ESR 78.

Having done this, built the library and deployed it to my phone, here's what's output when I now execute the browser:
=============== Preparing offscreen rendering context ===============
Colour before: (0, 0, 0, 255), 1
Colour after: (0, 0, 0, 255), 1
Colour before: (0, 0, 0, 255), 1
Colour after: (0, 0, 0, 255), 1
Colour before: (0, 0, 0, 255), 1
Colour after: (0, 0, 0, 255), 1
Colour before: (0, 0, 0, 255), 1
Colour after: (0, 0, 0, 255), 1
Colour before: (0, 0, 0, 255), 1
[...]
Colour before: (0, 0, 0, 255), 1
Colour after: (0, 0, 0, 255), 1
Colour before: (0, 0, 0, 255), 1
Colour after: (0, 0, 0, 255), 1
Colour before: (45, 91, 86, 255), 1
Colour after: (45, 91, 86, 255), 1
Colour before: (0, 0, 0, 255), 1
Colour after: (0, 0, 0, 255), 1
Colour before: (0, 0, 0, 255), 1
Colour after: (0, 0, 0, 255), 1
Colour before: (0, 0, 0, 255), 1
Colour after: (0, 0, 0, 255), 1
Colour before: (0, 0, 0, 255), 1
Colour after: (0, 0, 0, 255), 1
Colour before: (0, 0, 0, 255), 1
Colour after: (0, 0, 0, 255), 1
Colour before: (128, 13, 160, 255), 1
Colour after: (128, 13, 160, 255), 1
Colour before: (0, 0, 0, 255), 1
Colour after: (0, 0, 0, 255), 1
Colour before: (0, 0, 0, 255), 1
Colour after: (0, 0, 0, 255), 1
[...]
There are a couple of lines of interesting output here, but most of them just show black pixels and nothing else. That's not ideal. To be clear, this is output from a WebView app that is fully working. So I'd have expected to see a lot more colour data coming out in the debug output.

This has left me more confused than enlightened, so I've gone on to implement the second idea as well: exporting the image data to file so I can check what the image actually looks like.

Once again the code for this is pretty straightforward. All I'm doing is taking a copy of the entire surface, rather than just one pixel of it. This is provided as a contiguous block of memory: a raw buffer with the pixel values stored in it. So I'm just dumping this out to a file. To avoid completely thrashing my phone I've set it up to output an image only once in every ten renders. The filenames increment each time an image is exported, so I should capture several steps as the page completes is render.

Here's the code I'm using for this, added to the GLScreenBuffer::Swap() method. This is hacky code at best, but it's also temporary, so I'm not too concerned about the style here. Something quick and dirty is fine, as long as... well, as long as it works!
  static int count = 0;
  static int filecount = 0;
  size_t bufferSize;
  uint8_t* buf;
  bool result;
  int xpos;
  int ypos;
  int pos;
  volatile char red;
  volatile char green;
  volatile char blue;
  volatile char alpha;

  if (count % 10 == 0) {
    //bool GLScreenBuffer::ReadPixels(GLint x, GLint y, GLsizei width, GLsizei 
    height, GLenum format, GLenum type, GLvoid* pixels)
    bufferSize = sizeof(char) * size.width * size.height * 4;
    buf = static_cast&lt;uint8_t*&gt;(calloc(sizeof(uint8_t), bufferSize));
    result = ReadPixels(0, 0, size.width, size.height, LOCAL_GL_RGBA, 
    LOCAL_GL_UNSIGNED_BYTE, buf);

    xpos = size.width / 2;
    ypos = size.height / 2;
    pos = (xpos + (size.width * ypos)) * 4;

    red = buf[pos];
    green = buf[pos + 1];
    blue = buf[pos + 2];
    alpha = buf[pos + 3];

    printf_stderr(&quot;Colour before: (%d, %d, %d, %d), %d\n&quot;, red, 
    green, blue, alpha, result);

    #define FORMAT &quot;/home/defaultuser/Documents/Development/gecko/
    frame%03d.dat&quot;

    // Export out the pixel data
    int const len = 61 + 10;
    char filename[61 + 10];
    snprintf(filename, len, FORMAT, filecount);
    FILE *fh = fopen(filename, &quot;w&quot;);
    fwrite(buf, sizeof(char), bufferSize, fh);
    fclose(fh);
    free(buf);
    filecount += 1;
  }
  count += 1;
After building and running the code I get some sensible looking output. As you can see there are sixteen frames generated before I quit the application. The first five look empty, but eventually some colours start coming through.
frame000.data: Colour before: (0, 0, 0, 255), 1
frame001.data: Colour before: (0, 0, 0, 255), 1
frame002.data: Colour before: (0, 0, 0, 255), 1
frame003.data: Colour before: (0, 0, 0, 255), 1
frame004.data: Colour before: (0, 0, 0, 255), 1
frame005.data: Colour before: (187, 125, 127, 255), 1
frame006.data: Colour before: (67, 115, 196, 255), 1
frame007.data: Colour before: (18, 0, 240, 255), 1
frame008.data: Colour before: (162, 225, 0, 255), 1
frame009.data: Colour before: (128, 202, 255, 255), 1
frame010.data: Colour before: (0, 0, 0, 255), 1
frame011.data: Colour before: (240, 255, 255, 255), 1
frame012.data: Colour before: (255, 159, 66, 255), 1
frame013.data: Colour before: (68, 115, 196, 255), 1
frame014.data: Colour before: (0, 0, 0, 255), 1
frame015.data: Colour before: (0, 192, 7, 255), 1
The textures captured from the execution; the colours are wrong and the image looks corrupted.

After copying the frame data from my phone over to my laptop I'm able to load them into GIMP (the GNU Image Manipulation Package) using the raw plugin. This is activated for files with a .data extension and allows the data to be loaded in as if it were pure pixel data without any header or metadata. Because there's no header you have to specify certain parameters manually, such as the width, height and format of the image data.

I always forget exactly what the dimensions of the screens on my development Xperia 10 II devices are, but thankfully GSMArena is only a few clicks away to check:
 
Resolution: 1080 x 2520 pixels, 21:9 ratio (~457 ppi density)

Adding the dimensions into the raw data loader seems to do the trick.

The code I added to gecko requested a texture format of RGBA, so that's what I need to use when loading the data in. Sadly the results are not what I had hoped. There are clearly some data related to the render and it's worth noting that the buffer where these are stored is initialised to contain zeroes each frame, so the data is real, not just artefacts from the memory or previous render.

But most of the pixels are black, the colours are wrong and the images seem to be mangled in very strange ways as you can see in the screenshots.

It's too late for me to figure this out tonight so I'll have a think about it overnight and come back to it in the morning.

As always, if you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment