flypig.co.uk

Personal Blog

View the blog index.

RSS feed Click the icon for the blog RSS feed.

Blog

5 most recent items

25 May 2024 : Day 243 #
It was nice to get such appreciative comments after my last post, thank you for those who got in touch saying they were interested to hear my thoughts on the Jolla C2. I'm glad something good came out of my stupidity in sending my laptop to sleep mid-build! A special mention has to go to Leif-Jöran Olsson (ljo) for his amazing poetry in describing the situation!
 
The 3-way merged double oh double four patch brought the light to webview rendering. Manually reapplying might bring the build in shape. Community squared arrived, squared in joy. Stoically stochastic interventions might as well bring the young ones to reappear. The gecko's nest will be filled with seedlings to feast on. Congrats!

This really made my day; thank you ljo! And I needed it to because I'm just not happy with how all this patching, or more precisely reverting, has been going. As ljo aludes to in his poem, yesterday I applied a three-way merge of the Sailfish patch 0044. This patch was designed to revert upstream commit 1deccd7ac14706ad, but it's intended to be applied on top of the ESR 78 codebse. Things have changed with ESR 91, so the patch no longer applies directly.

For reference, here's the description that goes alongside patch 0044.
commit c3d0a8fb29cbb2998c238a2c85801059ed636070 (HEAD -> 
    FIREFOX_ESR_91_9_X_RELBRANCH_patches)
Author: Andrew den Exter <andrew.den.exter@qinetic.com.au>
Date:   Mon Sep 6 09:52:04 2021 +0000

    Revert &quot;Bug 1445570 - Remove EnsureEventualAfterPaint timer. 
    r=tnikkel&quot;

    This reverts commit 1deccd7ac14706ad1849343cfb2b93df191a1c42.
And here's the description of the commit that the patch is aiming to revert.
$ git log -1 --grep &quot;Bug 1445570&quot;
commit 1deccd7ac14706ad1849343cfb2b93df191a1c42
Author: Mantaroh Yoshinaga <mantaroh@gmail.com>
Date:   Thu Sep 6 02:21:39 2018 +0000

    Bug 1445570 - Remove EnsureEventualAfterPaint timer. r=tnikkel
    
    MozReview-Commit-ID: C7WICJ5Q0ES
    
    Differential Revision: https://phabricator.services.mozilla.com/D5005
    
    --HG--
    extra : moz-landing-system : lando
I'm copying these out here because I think they're going to be helpful as we continue onwards.

Since patching has been so unsuccessful, rather than trying to apply the Sailfish patch for ESR 78, why don't I go directly to the source and attempt to revert commit 1deccd7ac14706ad instead? Maybe that will bring better results.

It seems like a good plan, but in practice when I try to do this I get a bunch of conflicts.
$ git revert 1deccd7ac14706ad1849343cfb2b93df191a1c42
Auto-merging dom/base/nsDOMWindowUtils.cpp
CONFLICT (content): Merge conflict in dom/base/nsDOMWindowUtils.cpp
Auto-merging layout/base/nsPresContext.cpp
CONFLICT (content): Merge conflict in layout/base/nsPresContext.cpp
Auto-merging layout/base/nsPresContext.h
CONFLICT (content): Merge conflict in layout/base/nsPresContext.h
Auto-merging layout/base/nsRefreshDriver.cpp
CONFLICT (content): Merge conflict in layout/base/nsRefreshDriver.cpp
Auto-merging layout/base/nsRefreshDriver.h
CONFLICT (content): Merge conflict in layout/base/nsRefreshDriver.h
Auto-merging layout/reftests/text/reftest.list
CONFLICT (content): Merge conflict in layout/reftests/text/reftest.list
Auto-merging layout/style/test/test_restyles_in_smil_animation.html
error: could not revert 1deccd7ac147... Bug 1445570 - Remove 
    EnsureEventualAfterPaint timer. r=tnikkel
hint: After resolving the conflicts, mark them with
hint: &quot;git add/rm <pathspec>&quot;, then run
hint: &quot;git revert --continue&quot;.
hint: You can instead skip this commit with &quot;git revert --skip&quot;.
hint: To abort and get back to the state before &quot;git revert&quot;,
hint: run &quot;git revert --abort&quot;.
I don't much enjoy conflicts so I wonder if there's a different revert strategy that I can use to make this go more smoothly. Checking the git-revert documentation does indeed throw up some promising suggestions.
$ man git-revert
[...]
       --strategy=<strategy>
           Use the given merge strategy. Should only be used once. See the 
    MERGE STRATEGIES section in git-merge(1) for details.
This suggests checking the MERGE STRATEGIES section of the git-merge man pages. But that section is rather long, so rather than copying it all out here, I recommend to check out the online documentation instead. According to this page, alongside the standard ort ("Ostensibly Recursive’s Twin") strategy — which is what I used here — there's also a so-called recursive strategy. The ort approach didn't go great, so lets try the alternative.
$ git revert --strategy=recursive 1deccd7ac14706ad1849343cfb2b93df191a1c42
[...]
Unfortunately that gives me similarly poor results. What I really want is a strategy that goes back through the history, removes the commit to be reverted, applies all subsequent commits on top of that, then generates a diff of the result against the current codebase. It's possible this is exactly what these strategies are doing already, but I'm not smart enough to figure that out from the documentation.

Actually, after more reflection, where the documentation talks about common ancestors, I think it's this that it's refering to. Nevertheless it didn't give good results, so I want to try a different approach.

Rather than applying the patch, or reverting the commit, there is a third option. And that's to figure out how NotifyInvalidation() ought to be called in the existing ESR 91 codebase. If there is such a way then it's possible this patch could be skipped entirely, which would have the nice side-effect of making future maintenance easier.

I'll explore this third option a little further over the next few days (I say "few days" because it's going to take at least one full build, so this isn't a task that can be completed in a day).

The first thing I notice is that the code that's been removed isn't the only place that nsPresContext::NotifyDidPaintForSubtree() gets called. It also gets called from inside the nsAutoNotifyDidPaint class destructor:
class nsAutoNotifyDidPaint {
 public:
  nsAutoNotifyDidPaint(PresShell* aShell, PaintFlags aFlags)
      : mShell(aShell), mFlags(aFlags) {}
  ~nsAutoNotifyDidPaint() {
    if (!!(mFlags & PaintFlags::PaintComposite)) {
      mShell->GetPresContext()->NotifyDidPaintForSubtree();
    }
  }

 private:
  PresShell* mShell;
  PaintFlags mFlags;
};
This destructor does get called as a result of in instance being created on the stack in the PresShell::Paint() method. At the end of the method the instance is destroyed and the destructor called.

However, in this case the NotifyDidPaintForSubtree() method doesn't get called because !!(mFlags & PaintFlags::PaintComposite) evaluates to false.

There's also a call inside nsView::DidCompositeWindow() but this never gets called. In this case the method is part of an implementation of the nsIWidgetListener interface. We do have some instances of this interfaces in use, most notably inherited from nsBaseWidget into EmbedLitePuppetWidget:
class EmbedLitePuppetWidget : public PuppetWidgetBase
{
[...]
class PuppetWidgetBase : public nsBaseWidget
{
[...]
class nsBaseWidget : public nsIWidget, public nsSupportsWeakReference {
[...]
  nsIWidgetListener* mWidgetListener;
  nsIWidgetListener* mAttachedWidgetListener;
[...]
However whilst these are used in the EmbedLitePuppetWidget code they don't seem to be set there. The mAttachedWidgetListener variable gets set by nsView but that's a direction we already explored. The mWidgetListener variable is also set by nsView in addition to AppWindow. The AppWindow class is never used, but although the nsView class is instantiated we already know that nsView::DidCompositeWindow() is never called. I'm now not entirely sure why it doesn't get called. But it doesn't.

None of this is directly helping us to solve the problem, but it does allow us to explore possibilties and rule things out. This is all going to require a bit more thought. I'll continue with this line of investigation tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
24 May 2024 : Day 242 #
It's funny how even when asleep I'm capable of making stupid mistakes. Yesterday I applied patch 0044 in order to restore the OnFirstPaint() trigger functionality. This is critical for getting the offscreen rendering pipeline to work. But as is the case after applying a patch I needed to rebuild the packages before they could be tested on my phone.

I kicked the build off to run overnight and then — d'oh! — promptly sent my laptop to sleep. It doesn't take a genius to realise that the build isn't going to run while the laptop is sleeping.

So here I am this morning without packages to test. At least my laptop is now awake and the build is continuing. But this has lost me the opportunity to do some development this morning.

Instead I've spent the time ordering myself a Jolla C2, the new Sailfish OS phone announced at Jolla Love Day 2 this last Monday. It's an interesting move by Jolla. For years now members of the community have been encouraging Jolla to offer Sailfish OS pre-installed. On the other hand, many in the community are also keen to see Sailfish OS offered on a very specific selection of handsets, be they PinePHones, Fairphones or brands that are more widely available in Europe, especially those known for being more "open".

My own opinion is that this is a good move by Jolla. There is no perfect phone and every choice has its benefits and drawbacks. Going with Sony phones all those years ago was controversial, but Sony hardware has (in my opinion) been both excellent and offering good value for the hardware. Nice materials, unassuming design and unexpectedly solid features (with headphones jacks, SD card slots, notification lights, and excellent screens to boot). Even more importantly Sony retained their commitment to openness over the long term, when other manufacturers have long since lost interest. On the other hand PinePhone and Fairphone have credentials that are really appealing for Sailfish users, who tend to prize openness and ethical considerations over price and features alone.

The new Reeder phone offers something a little different to either of these, which is the opportunity for Jolla to work hand-in-hand with the manufacturer. Jolla has a good track record doing this and the results historically have been very good. The new Reeder phone might not be open in the same way that the Sony or PinePhone are, but it's likely Jolla will have been given access to much more of what they need, including support, in order to build a really solid software experience for the hardware.

I'm speculating of course and only time will tell, but given we're still expecting Sailfish OS releases for the Xperia 10 IV and Xperia 10 V, and given I already have one of the latter, these are exciting times as far as I'm concerned.

Alright, let's get back to development. The build has carried on for a while but eventually failed with an error. Here's the debug output generated from the build process.
813:10.10 ${PROJECT}/gecko-dev/layout/base/nsRefreshDriver.cpp: In member 
    function ‘void nsRefreshDriver::Tick(mozilla::VsyncId, mozilla::TimeStamp, 
    nsRefreshDriver::IsExtraTick)’:
813:10.10 ${PROJECT}/gecko-dev/layout/base/nsRefreshDriver.cpp:2430:41: error: 
    ‘mNextTransactionId’ was not declared in this scope
813:10.10        transactionId.AppendInt((uint64_t)mNextTransactionId);
813:10.10                                          ^~~~~~~~~~~~~~~~~~
813:10.12 ${PROJECT}/gecko-dev/layout/base/nsRefreshDriver.cpp:2430:41: note: 
    suggested alternative: ‘GetTransactionId’
813:10.13        transactionId.AppendInt((uint64_t)mNextTransactionId);
813:10.13                                          ^~~~~~~~~~~~~~~~~~
813:10.13                                          GetTransactionId
813:14.49 make[4]: *** [${PROJECT}/gecko-dev/config/rules.mk:694: 
    nsRefreshDriver.o] Error 1
The error is telling us that mNextTransactionIddoesn't exist and that's correct because it was removed by the patch. But there's still an instance remaining that the patch didn't capture. To get this through I've had to make one small change. This code looks to be related to profiling so probably isn't really needed for our purposes anyway, but we do need it to compile.
$ git diff layout/base/nsRefreshDriver.cpp
diff --git a/layout/base/nsRefreshDriver.cpp b/layout/base/nsRefreshDriver.cpp
index 33c90bb1f324..bb43dea053ea 100644
--- a/layout/base/nsRefreshDriver.cpp
+++ b/layout/base/nsRefreshDriver.cpp
@@ -2427,7 +2427,7 @@ void nsRefreshDriver::Tick(VsyncId aId, TimeStamp 
    aNowTime,
     nsCString transactionId;
     if (profiler_can_accept_markers()) {
       transactionId.AppendLiteral(&quot;Transaction ID: &quot;);
-      transactionId.AppendInt((uint64_t)mNextTransactionId);
+      transactionId.AppendInt((uint64_t)mPendingTransaction);
     }
     AUTO_PROFILER_MARKER_TEXT(
         &quot;ViewManagerFlush&quot;, GRAPHICS,
This is a similar change to that made elsewhere in the code, so should hopefully be pretty benign.

As the build is still taking a while I've decide I can't let all this potential development time go to waste. So I've decided to take the plunge and perform a test while the build continues. The current issue my most recent changes are aimed at addressing is the fact that the mIsPainted variable in the QMozViewPrivate class never gets set to true. So I'm wondering what will happen if I forcefully set it to true at a suitable time using the debugger.

If I'm honest I'm a little nervous about performing this test. If it turns out to have no effect then it'll leave me scratching my head yet again. I really want this rendering to work.

But I've plucked up the courage and am going to give it a go. First set the breakpoint on the method where the test for this flag is being made.
(gdb) b QuickMozView::updatePaintNode
Function &quot;QuickMozView::updatePaintNode&quot; not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (QuickMozView::updatePaintNode) pending.
(gdb) r
Starting program: /usr/bin/harbour-webview
[...]
Now run the program through to the first time this breakpoint is hit. I don't want to flip the flag here because the rendering pipeline and textures aren't fully set up yet, so I'm going to wait for a later hit of the breakpoint.
Thread 9 &quot;QSGRenderThread&quot; hit Breakpoint 1, QuickMozView::
    updatePaintNode (this=0x555586dc20, oldNode=0x0) at quickmozview.cpp:172
172         if (width() <= 0 || height() <= 0) {
(gdb) c
Continuing.
[...]
=============== Preparing offscreen rendering context ===============
[...]
As we can see the rendering context has now been configured. Following this the breakpoint is hit again. Now is the time to change the flag from true to false. Here goes:
Thread 9 &quot;QSGRenderThread&quot; hit Breakpoint 1, QuickMozView::
    updatePaintNode (this=0x555586dc20, oldNode=0x7fc800c640) at 
    quickmozview.cpp:172
172         if (width() <= 0 || height() <= 0) {
(gdb) p d->mIsPainted
$1 = false
(gdb) set d->mIsPainted = true
(gdb) p d->mIsPainted
$2 = true
(gdb) disable break
(gdb) c
Continuing.
[...]
Finally I disabled the breakpoint and set the program running again. And the result? I'm astonished to find that the result is a fully rendered Sailfish OS website!
 
A phone showing the sailfishos.org website next to a laptop on a table with the debugger running in a console window.

Scrolling around, selecting links and interacting with the site all seem to work nicely. This is really encouraging. This means that once this patch is properly applied, that should be the final piece of the puzzle needed to get the WebView working again.
 
Three screenshots of the WebView rendering: the sailfish.org site at the top; the same site scrolled down a bit; and shop.jolla.com with the new Jolla C2 on display

I'm quite excited by this!

Unfortunately this is a debugging hack so we're not there yet. And the build has failed again. At least now, motivated by my excitement, I have plenty of energy to fix the build.

Here's the latest output from the failed build:
302:48.25 In file included from Unified_cpp_layout_base2.cpp:20:
302:48.25 ${PROJECT}/gecko-dev/layout/base/nsPresContext.cpp: In member 
    function ‘void nsPresContext::DetachPresShell()’:
302:48.25 ${PROJECT}/gecko-dev/layout/base/nsPresContext.cpp:843:15: error: 
    ‘class nsRootPresContext’ has no member named 
    ‘CancelApplyPluginGeometryTimer’; did you mean ‘mApplyPluginGeometryTimer’?
302:48.25      thisRoot->CancelApplyPluginGeometryTimer();
302:48.25                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
302:48.25                mApplyPluginGeometryTimer
302:48.40 ${PROJECT}/gecko-dev/layout/base/nsPresContext.cpp: In destructor 
    ‘virtual nsRootPresContext::~nsRootPresContext()’:
302:48.40 ${PROJECT}/gecko-dev/layout/base/nsPresContext.cpp:2683:3: error: 
    ‘CancelApplyPluginGeometryTimer’ was not declared in this scope
302:48.40    CancelApplyPluginGeometryTimer();
302:48.40    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
302:48.47 ${PROJECT}/gecko-dev/layout/base/nsPresContext.cpp:2683:3: note: 
    suggested alternative:  mApplyPluginGeometryTimer’
302:48.47    CancelApplyPluginGeometryTimer();
302:48.47    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
302:48.47    mApplyPluginGeometryTimer
302:48.51 ${PROJECT}/gecko-dev/layout/base/nsPresContext.cpp: In member 
    function ‘void nsRootPresContext::ComputePluginGeometryUpdates(nsIFrame*, 
    nsDisplayListBuilder*, nsDisplayList*)’:
302:48.51 ${PROJECT}/gecko-dev/layout/base/nsPresContext.cpp:2712:21: error: 
    ‘nsPluginFrame’ does not name a type; did you mean ‘nsPageFrame’?
302:48.51          static_cast<nsPluginFrame*>(iter.Get()->GetKey(
    )->GetPrimaryFrame());
302:48.51                      ^~~~~~~~~~~~~
302:48.51                      nsPageFrame
These are actually rather odd errors. For example nsPageFrame has been added to the code even though it's not part of the upstream change. It must have been pulled in due to some changes applied later.

I'm not very happy with this, it's all a bit messy. So I've decided to revert my attempt to automatically merge in the patch and go back to a manual approach instead.

This will require a bit of a rework, which unfortunately I don't have time to do tonight. But I'll start on it in the morning and with any luck may be able to get it done tomorrow. Which means that's it for today. But we're definitely getting there!

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
23 May 2024 : Day 241 #
It's a good job I stopped working on this last night when I did. On returning to it this morning with a clearer head it didn't take long to identify that the changes needed to revert the D5005 upstream code changes had already been wrapped up into a patch — patch 0044 to be precise — that I'd not yet applied.

If the patch can be applied cleanly, it could save me a lot of time. So first step is to do a dry run of applying the patch. This won't actually make any changes to the code, but it will tell me whether the patch is going to apply without incident if I attempt it properly.
$ patch --dry-run -d gecko-dev -p1 < rpm/
    0044-Revert-Bug-1445570-Remove-EnsureEventualAfterPaint-t.patch 
checking file dom/base/nsDOMWindowUtils.cpp
Hunk #1 succeeded at 407 (offset 51 lines).
checking file layout/base/nsDocumentViewer.cpp
Hunk #1 succeeded at 1573 (offset -44 lines).
Hunk #2 succeeded at 1718 (offset -45 lines).
Hunk #3 succeeded at 3649 (offset -254 lines).
checking file layout/base/nsPresContext.cpp
Hunk #1 succeeded at 314 (offset 3 lines).
Hunk #2 FAILED at 804.
Hunk #3 succeeded at 2021 (offset 66 lines).
Hunk #4 FAILED at 2039.
Hunk #5 FAILED at 2121.
Hunk #6 FAILED at 2613.
Hunk #7 succeeded at 2716 with fuzz 2 (offset -132 lines).
4 out of 7 hunks FAILED
checking file layout/base/nsPresContext.h
Hunk #1 succeeded at 355 (offset -5 lines).
Hunk #2 succeeded at 912 (offset -4 lines).
Hunk #3 FAILED at 1363.
Hunk #4 FAILED at 1455.
2 out of 4 hunks FAILED
checking file layout/base/nsRefreshDriver.cpp
Hunk #1 FAILED at 1166.
Hunk #2 FAILED at 1239.
Hunk #3 FAILED at 2337.
Hunk #4 FAILED at 2396.
Hunk #5 FAILED at 2404.
5 out of 5 hunks FAILED
checking file layout/base/nsRefreshDriver.h
Hunk #1 FAILED at 507.
1 out of 1 hunk FAILED
checking file layout/style/test/test_restyles_in_smil_animation.html
There are quite a few nice succeeded lines there. But also a disturbing number of FAILED lines as well. That's FAILED in all caps, just in case the word itself doesn't sufficiently convey the damage being done.

The advice Raine would give me in this situation is to apply a three-way merge using git. So that's what I'm going to try.
$ git am --3way ../rpm/
    0044-Revert-Bug-1445570-Remove-EnsureEventualAfterPaint-t.patch
Applying: Revert &quot;Bug 1445570 - Remove EnsureEventualAfterPaint timer. 
    r=tnikkel&quot;
Using index info to reconstruct a base tree...
M       dom/base/nsDOMWindowUtils.cpp
M       layout/base/nsDocumentViewer.cpp
M       layout/base/nsPresContext.cpp
M       layout/base/nsPresContext.h
M       layout/base/nsRefreshDriver.cpp
M       layout/base/nsRefreshDriver.h
Falling back to patching base and 3-way merge...
Auto-merging layout/base/nsRefreshDriver.h
CONFLICT (content): Merge conflict in layout/base/nsRefreshDriver.h
Auto-merging layout/base/nsRefreshDriver.cpp
CONFLICT (content): Merge conflict in layout/base/nsRefreshDriver.cpp
Auto-merging layout/base/nsPresContext.h
CONFLICT (content): Merge conflict in layout/base/nsPresContext.h
Auto-merging layout/base/nsPresContext.cpp
CONFLICT (content): Merge conflict in layout/base/nsPresContext.cpp
Auto-merging layout/base/nsDocumentViewer.cpp
Auto-merging dom/base/nsDOMWindowUtils.cpp
error: Failed to merge in the changes.
Patch failed at 0001 Revert &quot;Bug 1445570 - Remove EnsureEventualAfterPaint 
    timer. r=tnikkel&quot;
hint: Use 'git am --show-current-patch=diff' to see the failed patch
When you have resolved this problem, run &quot;git am --continue&quot;.
If you prefer to skip this patch, run &quot;git am --skip&quot; instead.
To restore the original branch and stop patching, run &quot;git am 
    --abort&quot;.
This has done a better job than the patch did, but there are still some problems which require manual intervention. But after lots of manual changes, comparing the source code against the patch file, eventually I'm able to get it there.
git am --continue
It's still the morning and before I start work. It would be nice to kick off the build before I have to move on to my proper work day. That way there's an outside chance that the build will be completed and ready to test by this evening. So although I'd normally read carefully through the changes to check whether they look sound, I'm just going to kick off the build and hope for the best. Here we go!
sfdk build -d -p --with git_workaround
Sadly as the build trundles on into my work day it hits an error in the code.
27:10.41 In file included from ${PROJECT}/obj-build-mer-qt-xr/dist/include/
    mozilla/dom/DocumentTimeline.h:16,
27:10.41                  from ${PROJECT}/gecko-dev/dom/animation/Animation.cpp:
    15,
27:10.41                  from Unified_cpp_dom_animation0.cpp:2:
27:10.41 ${PROJECT}/gecko-dev/layout/base/nsRefreshDriver.h: In member function 
    ‘bool nsRefreshDriver::AtPendingTransactionLimit()’:
27:10.41 ${PROJECT}/gecko-dev/layout/base/nsRefreshDriver.h:514:12: error: 
    ‘mPendingTransactions’ was not declared in this scope
27:10.41      return mPendingTransactions.Length() == 2;
27:10.42             ^~~~~~~~~~~~~~~~~~~~
27:10.42 ${PROJECT}/gecko-dev/layout/base/nsRefreshDriver.h:514:12: note: 
    suggested alternative: ‘mPendingTransaction’
27:10.42      return mPendingTransactions.Length() == 2;
27:10.42             ^~~~~~~~~~~~~~~~~~~~
27:10.42             mPendingTransaction
27:10.42 ${PROJECT}/gecko-dev/layout/base/nsRefreshDriver.h: In member function 
    ‘bool nsRefreshDriver::TooManyPendingTransactions()’:
27:10.42 ${PROJECT}/gecko-dev/layout/base/nsRefreshDriver.h:517:12: error: 
    ‘mPendingTransactions’ was not declared in this scope
27:10.42      return mPendingTransactions.Length() >= 2;
27:10.43             ^~~~~~~~~~~~~~~~~~~~
27:10.43 ${PROJECT}/gecko-dev/layout/base/nsRefreshDriver.h:517:12: note: 
    suggested alternative: ‘mPendingTransaction’
27:10.43      return mPendingTransactions.Length() >= 2;
27:10.43             ^~~~~~~~~~~~~~~~~~~~
27:10.43             mPendingTransaction
27:24.10 make[4]: *** [${PROJECT}/gecko-dev/config/rules.mk:694: 
    Unified_cpp_dom_animation0.o] Error 1
27:24.10 make[3]: *** [${PROJECT}/gecko-dev/config/recurse.mk:72: dom/animation/
    target-objects] Error 2
These errors are clearly all caused by the patches I applied. Unfortunately I can't fix them while I'm at work, which means looking in to them in the evening. If I can get them done today there's a chance I'll be able to run the build overnight, but let's see.

The error is about use of mPendingTransactions in the nsRefreshDriver class. It's true that the member variable isn't defined there. Should it be?

It's not defined in the code in the upstream changeset on either side of the diff. Nor does it appear in the patch that we applied. Which is a little odd, except that there is reference in all of those to a mPendingTransaction member variable. Note the subtle difference: mPendingTransaction vs. mPendingTransactions (singular vs. plural).

And looking more carefully at the patch we can see the reason. Here are the changes that the patch tried to apply:
diff --git a/layout/base/nsRefreshDriver.h b/layout/base/nsRefreshDriver.h
index 07feab12e079..deec935f25f4 100644
--- a/layout/base/nsRefreshDriver.h
+++ b/layout/base/nsRefreshDriver.h
@@ -507,12 +507,7 @@ class nsRefreshDriver final : public mozilla::layers::
    TransactionIdAllocator,
   RefPtr<nsRefreshDriver> mRootRefresh;
 
   // The most recently allocated transaction id.
-  TransactionId mNextTransactionId;
-  // This number is mCompletedTransaction + (pending transaction count).
-  // When we revoke a transaction id, we revert this number (since it's
-  // no longer outstanding), but not mNextTransactionId (since we don't
-  // want to reuse the number).
-  TransactionId mOutstandingTransactionId;
+  TransactionId mPendingTransaction;
   // The most recently completed transaction id.
   TransactionId mCompletedTransaction;
But this failed and I made a manual intervention; one which I thought looked pretty similar, but which in fact included a crucial destructive change:
diff --git a/layout/base/nsRefreshDriver.h b/layout/base/nsRefreshDriver.h
index 29c595ba5ba8..b9a0fa4bb2ed 100644
--- a/layout/base/nsRefreshDriver.h
+++ b/layout/base/nsRefreshDriver.h
@@ -528,8 +528,9 @@ class nsRefreshDriver final : public mozilla::layers::
    TransactionIdAllocator,
   RefPtr<nsRefreshDriver> mRootRefresh;
 
   // The most recently allocated transaction id.
-  TransactionId mNextTransactionId;
-  AutoTArray<TransactionId, 3> mPendingTransactions;
+  TransactionId mPendingTransaction;
+  // The most recently completed transaction id.
+  TransactionId mCompletedTransaction;
As you can see, my change removed the mPendingTransactions (plural) variable. The good news is that after applying the patch this mPendingTransactions variable is only used in two places:
  bool AtPendingTransactionLimit() {
    return mPendingTransactions.Length() == 2;
  }
  bool TooManyPendingTransactions() {
    return mPendingTransactions.Length() >= 2;
  }
Neither of these two methods is used anywhere, so it should be safe to remove them. On the other hand, it seems that by applying the patch I've trampled over rather more changes than I'd anticipated. So while removing these two methods to get the build to pass is a simple solution, I'm also going to need to spend some time checking the other changes.

But, having worked through the patch and read through it side-by-side with the changes made to the ESR 91 code, it all looks sensible, albeit rather messy. It looks like the ESR 91 code now matches the code that would have been expected had the patch been applied to ESR 78 as a base.

I've therefore set the build running again. If it builds that'll be a good sign. The good news is also that once it's built, due to the work I've been doing over the last few days, it should be really easy to check whether this change has had the desired effect. All I need to do is check whether the OnFirstPaint() method is being called. If it is, then that's a good sign things have worked out.

Since it's building now I can't do much else until it completes, so I'm going to call it a night. By the morning it should be done and we can continue with some testing.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
22 May 2024 : Day 240 #
We're continuing to dig deeper into the first paint trigger code today. I've spent a couple of days digging around this already now and it's turning out to be a surprisingly complex investigation. I'm hoping the reward will be worth it, and to be honest, I think it will be: WebView rendering isn't going to work without getting to the bottom of this.

Yesterday we saw that the crucial OnFirstPaint() call was being made from EmbedLiteViewParent::RecvOnFirstPaint(). That's because the QMozViewPrivate class implements the EmbedLiteViewListener interface like this:
class QMozViewPrivate : public QObject,
    public mozilla::embedlite::EmbedLiteViewListener
{
[...]
This listener is set up to get called by EmbedLiteViewParent on receipt of the OnFirstPaint message:
mozilla::ipc::IPCResult EmbedLiteViewParent::RecvOnFirstPaint(const int32_t &aX,
                                                              const int32_t &aY)
{
  LOGT();
  NS_ENSURE_TRUE(mView && !mViewAPIDestroyed, IPC_OK());

  mView->GetListener()->OnFirstPaint(aX, aY);
  return IPC_OK();
}
This is called at the far end of a channel after receipt of a message sent potentially across a thread or process boundary. Sending things across process boundaries like this messes up the backtraces but are otherwise a pretty commonplace in Gecko. We just need to find out where the message is being sent from. Searching the code for where the Send happens is usually the way to go with this, and here it's no different. These send and receive messages are auto-generated from the interface files, so they always follow a strict pattern. A search for a call to SendOnFirstPaint() therefore throws up the following in the ESR 78 code:
NS_IMETHODIMP
EmbedLiteViewChild::OnFirstPaint(int32_t aX, int32_t aY)
{
  if (mDOMWindow) {
    nsCOMPtr<nsIDocShell> docShell = mDOMWindow->GetDocShell();
    if (docShell) {
      RefPtr<PresShell> presShell = docShell->GetPresShell();
      if (presShell) {
        nscolor bgcolor = presShell->GetCanvasBackground();
        Unused << SendSetBackgroundColor(bgcolor);
      }
    }
  }

  return SendOnFirstPaint(aX, aY) ? NS_OK : NS_ERROR_FAILURE;
}
We want to find out where this is called and for that we can use the debugger. Using it gives us the following backtrace.
Thread 7 &quot;GeckoWorkerThre&quot; hit Breakpoint 2, non-virtual thunk to 
    mozilla::embedlite::EmbedLiteViewChild::OnFirstPaint(int, int) ()
    at mobile/sailfishos/embedshared/EmbedLiteViewChild.h:57
57        NS_DECL_NSIEMBEDBROWSERCHROMELISTENER
(gdb) bt
#0  non-virtual thunk to mozilla::embedlite::EmbedLiteViewChild::OnFirstPaint(
    int, int) ()
    at mobile/sailfishos/embedshared/EmbedLiteViewChild.h:57
#1  0x0000007fbb306f38 in WebBrowserChrome::HandleEvent (this=0x7f8ccba130, 
    aEvent=0x7f8dd57a60)
    at mobile/sailfishos/utils/WebBrowserChrome.cpp:443
#2  0x0000007fb9d4b4a0 in mozilla::EventListenerManager::HandleEventSubType (
    this=this@entry=0x7f8ccbd3d0, aListener=<optimized out>, 
    aListener@entry=0x7f8cdb7d78, aDOMEvent=0x7f8dd57a60, 
    aCurrentTarget=<optimized out>, aCurrentTarget@entry=0x7eac138da0)
    at dom/events/EventListenerManager.cpp:1087
#3  0x0000007fb9d4e984 in mozilla::EventListenerManager::HandleEventInternal (
    this=0x7f8ccbd3d0, aPresContext=0x7f8cf82940, aEvent=0x7f8d5c7e80, 
    aDOMEvent=0x7fa7972858, aCurrentTarget=<optimized out>, 
    aEventStatus=<optimized out>, aItemInShadowTree=<optimized out>)
    at dom/events/EventListenerManager.cpp:1279
#4  0x0000007fb9d4f604 in mozilla::EventTargetChainItem::HandleEventTargetChain 
    (aChain=..., aVisitor=..., aCallback=aCallback@entry=0x0, aCd=...)
    at dom/events/EventDispatcher.cpp:558
#5  0x0000007fb9d50eb8 in mozilla::EventDispatcher::Dispatch (
    aTarget=<optimized out>, aPresContext=0x7f8cf82940, aEvent=0x7f8d5c7e80, 
    aDOMEvent=0x7f8dd57a60, aEventStatus=<optimized out>, aCallback=0x0, 
    aTargets=<optimized out>)
    at dom/events/EventDispatcher.cpp:1055
#6  0x0000007fba7f8f50 in nsPresContext::FireDOMPaintEvent (this=0x7f8cf82940, 
    aList=aList@entry=0x7ecc05ae78, aTransactionId=..., aTimeStamp=...)
    at obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:847
#7  0x0000007fba7f9090 in DelayedFireDOMPaintEvent::Run (this=0x7ecc05ae50)
    at obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#8  0x0000007fb917d1a8 in nsContentUtils::RemoveScriptBlocker ()
    at dom/base/nsContentUtils.cpp:5407
#9  0x0000007fb917d218 in nsContentUtils::RemoveScriptBlocker ()
    at dom/base/nsContentUtils.cpp:5386
#10 0x0000007fba7fd020 in nsAutoScriptBlocker::~nsAutoScriptBlocker (
    this=<synthetic pointer>, __in_chrg=<optimized out>)
    at dom/base/nsContentUtils.h:3461
#11 nsRootPresContext::<lambda()>::operator() (__closure=0x7f8dd1cfe8)
    at layout/base/nsPresContext.cpp:2825
#12 mozilla::layers::GenericNamedTimerCallback<nsRootPresContext::
    EnsureEventualDidPaintEvent(nsPresContext::TransactionId)::<lambda()> >::
    Notify(nsITimer *) (this=0x7f8dd1cfd0) at obj-build-mer-qt-xr/dist/include/
    mozilla/layers/APZThreadUtils.h:81
#13 0x0000007fb84653e0 in nsTimerImpl::Fire (this=0x7f8cecfb50, aGeneration=1)
    at obj-build-mer-qt-xr/dist/include/mozilla/TimeStamp.h:136
#14 0x0000007fb84654c0 in nsTimerEvent::Run (this=0x7ed4001ab0)
    at obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#15 0x0000007fb845e4c4 in nsThread::ProcessNextEvent (aResult=0x7fa7972d77, 
    aMayWait=<optimized out>, this=0x7f8c02f610)
    at xpcom/threads/nsThread.cpp:1211
[...]
#32 0x0000007fbe70b89c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/
    clone.S:78
(gdb) 
That's a long backtrace but the magic appears to be happening inside a call to DelayedFireDOMPaintEvent() and we can check for this by searching through the code. It gets set up in nsPresContext::NotifyDidPaintForSubtree() inside nsPresContext.cpp where we have the following code, which is basically the same in both ESR 78 and ESR 91. This is a long piece of code, but the important parts are the two calls to DelayedFireDOMPaintEvent() which you can see around the middle and towards the end of the method:
void nsPresContext::NotifyDidPaintForSubtree(
    TransactionId aTransactionId, const mozilla::TimeStamp& aTimeStamp) {
  if (mFirstContentfulPaintTransactionId && !mHadContentfulPaintComposite) {
    if (aTransactionId >= *mFirstContentfulPaintTransactionId) {
      mHadContentfulPaintComposite = true;
      RefPtr<nsDOMNavigationTiming> timing = mDocument->GetNavigationTiming();
      if (timing && !aTimeStamp.IsNull()) {
        timing->NotifyContentfulPaintForRootContentDocument(aTimeStamp);
      }
    }
  }

  if (IsRoot()) {
    static_cast<nsRootPresContext*>(this)->CancelDidPaintTimers(aTransactionId);

    if (mTransactions.IsEmpty()) {
      return;
    }
  }

  if (!PresShell()->IsVisible() && mTransactions.IsEmpty()) {
    return;
  }

  // Non-root prescontexts fire MozAfterPaint to all their descendants
  // unconditionally, even if no invalidations have been collected. This is
  // because we don't want to eat the cost of collecting invalidations for
  // every subdocument (which would require putting every subdocument in its
  // own layer).

  bool sent = false;
  uint32_t i = 0;
  while (i < mTransactions.Length()) {
    if (mTransactions[i].mTransactionId <= aTransactionId) {
      if (!mTransactions[i].mInvalidations.IsEmpty()) {
        nsCOMPtr<nsIRunnable> ev = new DelayedFireDOMPaintEvent(
            this, &mTransactions[i].mInvalidations,
            mTransactions[i].mTransactionId, aTimeStamp);
        nsContentUtils::AddScriptRunner(ev);
        sent = true;
      }
      mTransactions.RemoveElementAt(i);
    } else {
      i++;
    }
  }

  if (!sent) {
    nsTArray<nsRect> dummy;
    nsCOMPtr<nsIRunnable> ev =
        new DelayedFireDOMPaintEvent(this, &dummy, aTransactionId, aTimeStamp);
    nsContentUtils::AddScriptRunner(ev);
  }
As anticipated there's no hit for this method on ESR 91, but on ESR 78 we can get a backtrace:
(gdb) break DelayedFireDOMPaintEvent::DelayedFireDOMPaintEvent
Breakpoint 8 at 0x7fba7fcc8c: DelayedFireDOMPaintEvent::
    DelayedFireDOMPaintEvent. (2 locations)
(gdb) r
[...]
Thread 7 &quot;GeckoWorkerThre&quot; hit Breakpoint 8, 0x0000007fba7fcc8c in 
    DelayedFireDOMPaintEvent::DelayedFireDOMPaintEvent (aTimeStamp=..., 
    aTransactionId=..., aList=<optimized out>, aPresContext=<optimized out>, 
    this=<optimized out>)
    at layout/base/nsPresContext.h:173
173         return mPresShell;
(gdb) bt
#0  0x0000007fba7fcc8c in DelayedFireDOMPaintEvent::DelayedFireDOMPaintEvent (
    aTimeStamp=..., aTransactionId=..., aList=<optimized out>, 
    aPresContext=<optimized out>, this=<optimized out>)
    at layout/base/nsPresContext.h:173
#1  nsPresContext::NotifyDidPaintForSubtree (this=0x7f8ce6dda0, 
    aTransactionId=..., aTimeStamp=...)
    at layout/base/nsPresContext.cpp:2082
#2  0x0000007fba7fd01c in nsRootPresContext::<lambda()>::operator() (
    __closure=0x7f8dc1af58)
    at obj-build-mer-qt-xr/dist/include/mozilla/TimeStamp.h:400
#3  mozilla::layers::GenericNamedTimerCallback<nsRootPresContext::
    EnsureEventualDidPaintEvent(nsPresContext::TransactionId)::<lambda()> >::
    Notify(nsITimer *) (this=0x7f8dc1af40) at obj-build-mer-qt-xr/dist/include/
    mozilla/layers/APZThreadUtils.h:81
#4  0x0000007fb84653e0 in nsTimerImpl::Fire (this=0x7f8c72bfd0, aGeneration=1)
    at obj-build-mer-qt-xr/dist/include/mozilla/TimeStamp.h:136
#5  0x0000007fb84654c0 in nsTimerEvent::Run (this=0x7ecc001ab0)
    at obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#6  0x0000007fb845e4c4 in nsThread::ProcessNextEvent (aResult=0x7fa7972d77, 
    aMayWait=<optimized out>, this=0x7f8c02f610)
    at xpcom/threads/nsThread.cpp:1211
#7  nsThread::ProcessNextEvent (this=0x7f8c02f610, aMayWait=<optimized out>, 
    aResult=0x7fa7972d77)
    at xpcom/threads/nsThread.cpp:1047
#8  0x0000007fb845cdcc in NS_ProcessPendingEvents (
    aThread=aThread@entry=0x7f8c02f610, aTimeout=10)
    at xpcom/threads/nsThreadUtils.cpp:449
#9  0x0000007fba63c430 in nsBaseAppShell::NativeEventCallback (
    this=0x7f8c3e6ac0)
    at widget/nsBaseAppShell.cpp:87
#10 0x0000007fba64f0bc in nsAppShell::event (this=<optimized out>, e=<optimized 
    out>)
    at widget/qt/nsAppShell.cpp:75
[...]
#23 0x0000007fbe70b89c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/
    clone.S:78
(gdb) 
Frustratingly it's called within a lambda function, which is messing up the backtrace yet again. So we'll have to resort to digging through the code again. There we can see that the NotifyDidPaintForSubtree() method gets called in a lambda function that's triggered inside itself. But this can't be the one we're after because we'd have seen the hit on the containing method first.

But there's another case where a lambda function gets configured to then call nsPresContext::NotifyDidPaintForSubtree() and that's in the following method from nsPresContext.cpp (this is the ESR 78 version):
void
nsRootPresContext::EnsureEventualDidPaintEvent(TransactionId aTransactionId)
{
  for (NotifyDidPaintTimer& t : mNotifyDidPaintTimers) {
    if (t.mTransactionId == aTransactionId) {
      return;
    }
  }

  nsCOMPtr<nsITimer> timer;
  RefPtr<nsRootPresContext> self = this;
  nsresult rv = NS_NewTimerWithCallback(
    getter_AddRefs(timer),
    NewNamedTimerCallback([self, aTransactionId](){
      nsAutoScriptBlocker blockScripts;
      self->NotifyDidPaintForSubtree(aTransactionId);
     }, &quot;NotifyDidPaintForSubtree&quot;), 100, nsITimer::TYPE_ONE_SHOT,
    Document()->EventTargetFor(TaskCategory::Other));

  if (NS_SUCCEEDED(rv)) {
    NotifyDidPaintTimer* t = mNotifyDidPaintTimers.AppendElement();
    t->mTransactionId = aTransactionId;
    t->mTimer = timer;
  }
}
That's the only other case I can see, so we could check to find out how this EnsureEventualDidPaintEvent() method gets called. However as I was in the process of configuring the debugger I noticed that this method doesn't exist in ESR 91. That's unexpected. In ESR 78 it's called from nsPresContext::NotifyInvalidation() like this:
void nsPresContext::NotifyInvalidation(TransactionId aTransactionId,
                                       const nsRect& aRect) {
  MOZ_ASSERT(GetContainerWeak(), &quot;Invalidation in detached pres 
    context&quot;);

  // If there is no paint event listener, then we don't need to fire
  // the asynchronous event. We don't even need to record invalidation.
  // MayHavePaintEventListener is pretty cheap and we could make it
  // even cheaper by providing a more efficient
  // nsPIDOMWindow::GetListenerManager.

  nsPresContext* pc;
  for (pc = this; pc; pc = pc->GetParentPresContext()) {
    TransactionInvalidations* transaction =
        pc->GetInvalidations(aTransactionId);
    if (transaction) {
      break;
    } else {
      transaction = pc->mTransactions.AppendElement();
      transaction->mTransactionId = aTransactionId;
    }
  }
  if (!pc) {
    nsRootPresContext* rpc = GetRootPresContext();
    if (rpc) {
      rpc->EnsureEventualDidPaintEvent(aTransactionId);
    }
  }

  TransactionInvalidations* transaction = GetInvalidations(aTransactionId);
  MOZ_ASSERT(transaction);
  transaction->mInvalidations.AppendElement(aRect);
}
But as we can see, the call to EnsureEventualDidPaintEvent() has been removed in the ESR 91 version of the same method:
void nsPresContext::NotifyInvalidation(TransactionId aTransactionId,
                                       const nsRect& aRect) {
  MOZ_ASSERT(GetContainerWeak(), &quot;Invalidation in detached pres 
    context&quot;);

  // If there is no paint event listener, then we don't need to fire
  // the asynchronous event. We don't even need to record invalidation.
  // MayHavePaintEventListener is pretty cheap and we could make it
  // even cheaper by providing a more efficient
  // nsPIDOMWindow::GetListenerManager.

  nsPresContext* pc;
  for (pc = this; pc; pc = pc->GetParentPresContext()) {
    TransactionInvalidations* transaction =
        pc->GetInvalidations(aTransactionId);
    if (transaction) {
      break;
    } else {
      transaction = pc->mTransactions.AppendElement();
      transaction->mTransactionId = aTransactionId;
    }
  }

  TransactionInvalidations* transaction = GetInvalidations(aTransactionId);
  MOZ_ASSERT(transaction);
  transaction->mInvalidations.AppendElement(aRect);
}
I wasn't expecting that. For completeness, the backtrace from ESR 78 confirms that this is indeed the correct method to be looking at, at least on the ESR 78 side:
Thread 7 &quot;GeckoWorkerThre&quot; hit Breakpoint 9, nsRootPresContext::
    EnsureEventualDidPaintEvent (this=0x7f8ce70330, 
    aTransactionId=aTransactionId@entry=...)
    at layout/base/nsPresContext.cpp:2813
2813    {
(gdb) bt
#0  nsRootPresContext::EnsureEventualDidPaintEvent (this=0x7f8ce70330, 
    aTransactionId=aTransactionId@entry=...)
    at layout/base/nsPresContext.cpp:2813
#1  0x0000007fba7fb480 in nsPresContext::NotifyInvalidation (
    this=this@entry=0x7f8ce70330, aTransactionId=..., aRect=...)
    at layout/base/nsPresContext.cpp:1964
#2  0x0000007fba7fb5d8 in nsPresContext::NotifyInvalidation (
    this=this@entry=0x7f8ce70330, aTransactionId=..., aRect=...)
    at layout/base/nsPresContext.cpp:1927
#3  0x0000007fbaa4dbe4 in nsDisplayList::PaintRoot (
    this=this@entry=0x7fa7971f48, aBuilder=aBuilder@entry=0x7fa7970170, 
    aCtx=aCtx@entry=0x0, 
    aFlags=aFlags@entry=13) at layout/painting/nsDisplayList.cpp:2526
#4  0x0000007fba7d656c in nsLayoutUtils::PaintFrame (
    aRenderingContext=aRenderingContext@entry=0x0, 
    aFrame=aFrame@entry=0x7f8dba5c40, aDirtyRegion=..., 
    aBackstop=aBackstop@entry=4294967295, 
    aBuilderMode=aBuilderMode@entry=nsDisplayListBuilderMode::Painting, 
    aFlags=<optimized out>, 
    aFlags@entry=(nsLayoutUtils::PaintFrameFlags::WidgetLayers | nsLayoutUtils::
    PaintFrameFlags::ExistingTransaction | nsLayoutUtils::PaintFrameFlags::
    NoComposite)) at layout/base/nsLayoutUtils.cpp:4168
#5  0x0000007fba79a258 in mozilla::PresShell::Paint (
    this=this@entry=0x7f8d2dd790, aViewToPaint=aViewToPaint@entry=0x7f8d210390, 
    aDirtyRegion=..., 
    aFlags=aFlags@entry=mozilla::PaintFlags::PaintLayers)
    at layout/base/PresShell.cpp:6254
#6  0x0000007fba6110ac in nsViewManager::ProcessPendingUpdatesPaint (
    this=this@entry=0x7f8d210df0, aWidget=aWidget@entry=0x7f8d210e60)
    at obj-build-mer-qt-xr/dist/include/nsTArray.h:554
#7  0x0000007fba6113c0 in nsViewManager::ProcessPendingUpdatesForView (
    this=this@entry=0x7f8d210df0, aView=<optimized out>, 
    aFlushDirtyRegion=aFlushDirtyRegion@entry=true) at view/nsViewManager.cpp:
    395
#8  0x0000007fba611b50 in nsViewManager::ProcessPendingUpdates (
    this=0x7f8d210df0)
    at view/nsViewManager.cpp:1018
#9  nsViewManager::ProcessPendingUpdates (this=<optimized out>)
    at view/nsViewManager.cpp:1004
#10 0x0000007fba611c0c in nsViewManager::WillPaintWindow (
    this=this@entry=0x7f8d210df0, aWidget=0x7f8d210e60)
    at view/nsViewManager.cpp:664
#11 0x0000007fba611c88 in nsView::WillPaintWindow (this=<optimized out>, 
    aWidget=<optimized out>)
    at view/nsView.cpp:1048
#12 0x0000007fbb2f6344 in mozilla::embedlite::PuppetWidgetBase::Invalidate (
    aRect=..., this=0x7f8d210e60)
    at mobile/sailfishos/embedshared/PuppetWidgetBase.cpp:268
#13 mozilla::embedlite::PuppetWidgetBase::Invalidate (this=0x7f8d210e60, 
    aRect=...)
    at mobile/sailfishos/embedshared/PuppetWidgetBase.cpp:253
#14 0x0000007fbb2ef768 in mozilla::embedlite::EmbedLitePuppetWidget::Show (
    this=0x7f8d210e60, aState=<optimized out>)
    at mobile/sailfishos/embedshared/EmbedLitePuppetWidget.cpp:97
#15 0x0000007fba7f03e8 in nsDocumentViewer::Show (this=0x7f8d15df90)
    at layout/base/nsDocumentViewer.cpp:2132
#16 0x0000007fba7f8b7c in nsPresContext::EnsureVisible (this=0x7f8ce70330)
    at obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:857
#17 0x0000007fba79c114 in mozilla::PresShell::UnsuppressAndInvalidate (
    this=0x7f8d2dd790)
    at obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#18 0x0000007fba7a1638 in mozilla::PresShell::ProcessReflowCommands (
    this=this@entry=0x7f8d2dd790, aInterruptible=aInterruptible@entry=false)
    at layout/base/PresShell.cpp:9712
#19 0x0000007fba7a0630 in mozilla::PresShell::DoFlushPendingNotifications (
    this=this@entry=0x7f8d2dd790, aFlush=...)
    at layout/base/PresShell.cpp:4209
[...]
#33 0x0000007fba772840 in mozilla::VsyncRefreshDriverTimer::
    RefreshDriverVsyncObserver::NotifyVsync (this=0x7f8cc9de00, aVsync=...)
    at layout/base/nsRefreshDriver.cpp:644
[...]
#53 0x0000007fbe70b89c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/
    clone.S:78
(gdb)
Alright, let's rewind back to the top again. We've now reached the point where we know how the OnFirstPaint() method is getting called in ESR 78. And we can see why it's not happening the same way in ESR 91, viz. that the code to call it has been removed from the NotifyInvalidation() method. The next step is to find out why it's been removed and what should be triggering the OnFirstPaint() instead (assuming there is still something).

The best way to investigate this further will be to look a the Bugzilla ticket and associated changeset where this change was made upstream. We've searched for these kinds of changes many times before and now is no different. Here's the log for the change:
$ git log -S EnsureEventualDidPaintEvent -1 -- layout/base/nsPresContext.cpp
commit 1deccd7ac14706ad1849343cfb2b93df191a1c42
Author: Mantaroh Yoshinaga <mantaroh@gmail.com>
Date:   Thu Sep 6 02:21:39 2018 +0000

    Bug 1445570 - Remove EnsureEventualAfterPaint timer. r=tnikkel
    
    MozReview-Commit-ID: C7WICJ5Q0ES
    
    Differential Revision: https://phabricator.services.mozilla.com/D5005
    
    --HG--
    extra : moz-landing-system : lando
After reading through Bug 1445570 and the associated code changes my conclusion is that unravelling this is going to be too complicated for my brain in its current state, so best to let it sink in overnight. I'll pick this up again tomorrow where we'll try to find out what's replacing the trigger that we need in the updated ESR 91 code.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
21 May 2024 : Day 239 #
I got a bit carried away yesterday delving into the reasons why QuickMozView::updatePaintNode() was failing to fire on ESR 91. My conclusion was that it came down to the mIsPainted flag, part of the QMozViewPrivate class and which was remaining stubbornly set to false when running ESR 91.

While in this state the invalidTexture flag inside updatePaintNode() never gets set to false since it's defined like this:
    const bool invalidTexture = !mComposited
            || !d->mIsPainted
            || !d->mViewInitialized
            || !d->mHasCompositor
            || !d->mContext->registeredWindow()
            || !d->mMozWindow;
Notice how if any single value in this list is set to false then the combined value of invalidTexture will be set to true. The consequence of this is that the following clause is always executed:
    if (!mTexture && invalidTexture) {
        QSGSimpleRectNode *node = static_cast<QSGSimpleRectNode *>(oldNode);
        if (!node) {
            node = new QSGSimpleRectNode;
        }
        node->setColor(d->mBackgroundColor);
        node->setRect(boundingRect);

        return node;
    }
The return at the end of this means the updatePaintNode() method returns early. This means that the code beyond the clause never gets executed and crucially this section is never entered:
    if (!node) {
        QMozExtTexture * const texture = new QMozExtTexture;
        mTexture = texture;

        connect(texture, &QMozExtTexture::getPlatformImage, d->mMozWindow, 
            &QMozWindow::getPlatformImage, Qt::DirectConnection);

        node = new MozExtMaterialNode;

        node->setTexture(mTexture);
    }
Here we can see both mTexture and the MozExtMaterialNode object being created. That's what we need for the rendering to proceed. So if this portion of code is never executed, we'll never get the rendered page to appear.

That all sounds very plausible and comprehensible, but it begs the question: "why is mIsPainted never being set to true?". That's what I plan to find out over the coming days.

Unsurprisingly we can see the initial value it's set to in the QMozViewPrivate constructor, where it's initially set to false:
QMozViewPrivate::QMozViewPrivate(IMozQViewIface *aViewIface, QObject *publicPtr)
    : mViewIface(aViewIface)
[...]
    , mIsPainted(false)
[...]
It's also set to false after a call to QMozViewPrivate::reset():
QMozViewPrivate::reset()
{
    if (mIsPainted) {
        mIsPainted = false;
        mViewIface->firstPaint(-1, -1);
    }
[...]
But the important part is where it gets set to true and that only happens in one place; here in the OnFirstPaint() method:
void QMozViewPrivate::OnFirstPaint(int32_t aX, int32_t aY)
{
    mIsPainted = true;
    mViewIface->firstPaint(aX, aY);
}
This method doesn't get explicitly called anywhere in the QtMozEmbed code, so to find out where it's actually getting used I'm going to drop to the debugger on my ESR 78 device.
(gdb) break QMozViewPrivate::OnFirstPaint
Breakpoint 13 at 0x7fbfbf56b8: file qmozview_p.cpp, line 1113.
(gdb) r
[...]
Thread 1 &quot;harbour-webview&quot; hit Breakpoint 13, QMozViewPrivate::
    OnFirstPaint (this=0x55557b08f0, aX=0, aY=0) at qmozview_p.cpp:1113
1113        mIsPainted = true;
(gdb) bt                         
#0  QMozViewPrivate::OnFirstPaint (this=0x55557b08f0, aX=0, aY=0) at 
    qmozview_p.cpp:1113
#1  0x0000007fbb2f5d90 in ?? () from /usr/lib64/xulrunner-qt5-78.15.1/libxul.so
#2  0x0000007fb8932f50 in ?? () from /usr/lib64/xulrunner-qt5-78.15.1/libxul.so
#3  0x0000007fb8920238 in ?? () from /usr/lib64/xulrunner-qt5-78.15.1/libxul.so
#4  0x0000007fb88428cc in ?? () from /usr/lib64/xulrunner-qt5-78.15.1/libxul.so
#5  0x0000007fb88482d8 in ?? () from /usr/lib64/xulrunner-qt5-78.15.1/libxul.so
#6  0x0000007fb8849d58 in ?? () from /usr/lib64/xulrunner-qt5-78.15.1/libxul.so
#7  0x0000007fb8809d48 in ?? () from /usr/lib64/xulrunner-qt5-78.15.1/libxul.so
#8  0x0000007fb880e740 in ?? () from /usr/lib64/xulrunner-qt5-78.15.1/libxul.so
#9  0x0000007fbfbed924 in MessagePumpQt::HandleDispatch (this=0x555578ef30) at 
    qmessagepump.cpp:63
#10 0x0000007fbfbeda9c in MessagePumpQt::event (this=<optimized out>, 
    e=<optimized out>) at qmessagepump.cpp:51
#11 0x0000007fbebe1144 in QCoreApplication::notify(QObject*, QEvent*) () from /
    usr/lib64/libQt5Core.so.5
#12 0x0000007fbebe12e8 in QCoreApplication::notifyInternal2(QObject*, QEvent*) (
    ) from /usr/lib64/libQt5Core.so.5
#13 0x0000007fbebe36b8 in QCoreApplicationPrivate::sendPostedEvents(QObject*, 
    int, QThreadData*) () from /usr/lib64/libQt5Core.so.5
[...]
#21 0x0000005555557bcc in main ()
(gdb) 
Well that's an unhelpful mess. But it is at least getting called from inside the Gecko code and that in itself is both helpful to know and a good thing (I was concerned it might be called from inside Qt which would have made it considerably harder to track down; the fact it's coming from Gecko means we can skip a layer of Qt indirection and get closer to the source of the problem more quickly).

But as you can see the Gecko debug symbols are messed up: we can see eight calls to methods inside libxul in a row, but it's not telling us the names of the methods.

So I'm going to reinstall the packages, including the debug packages, to get the symbols back. Before the gap I was using a custom build with extra debug output and exporting of textures, but I don't need any of that any more so I can just reinstall the packages from the official repositories. Much easier and quicker than rebuilding the packages locally and installing them that way:
# zypper install --force xulrunner-qt5-78.15.1+git33.2-1.21.1.jolla 
    xulrunner-qt5-debuginfo-78.15.1+git33.2-1.21.1.jolla xul
runner-qt5-debugsource-78.15.1+git33.2-1.21.1.jolla 
    xulrunner-qt5-misc-78.15.1+git33.2-1.21.1.jolla
Loading repository data...
Reading installed packages...
Forcing installation of 'xulrunner-qt5-78.15.1+git33.2-1.21.1.jolla.aarch64' 
    from repository 'jolla'.
Forcing installation of 
    'xulrunner-qt5-debuginfo-78.15.1+git33.2-1.21.1.jolla.aarch64' from 
    repository 'jolla'.
Forcing installation of 
    'xulrunner-qt5-debugsource-78.15.1+git33.2-1.21.1.jolla.aarch64' from 
    repository 'jolla'.
Forcing installation of 
    'xulrunner-qt5-misc-78.15.1+git33.2-1.21.1.jolla.aarch64' from repository 
    'jolla'.
[...]
2 packages to downgrade, 2 to reinstall, 2  to change vendor.
Overall download size: 758.8 MiB. Already cached: 0 B. After the operation, 2.4 
    GiB will be freed.
Continue? [y/n/v/...? shows all options] (y): y
[...]
Okay, with the debug symbols now properly aligned we can try getting the backtrace again, but hopefully this time with a bit more helpful detail.
Thread 1 &quot;harbour-webview&quot; hit Breakpoint 1, QMozViewPrivate::
    OnFirstPaint (this=0x55555cb830, aX=0, aY=0) at qmozview_p.cpp:1113
1113        mIsPainted = true;
(gdb) bt
#0  QMozViewPrivate::OnFirstPaint (this=0x55555cb830, aX=0, aY=0) at 
    qmozview_p.cpp:1113
#1  0x0000007fbb2f5d90 in mozilla::embedlite::EmbedLiteViewParent::
    RecvOnFirstPaint (this=0x55555d9d70, aX=@0x7fffffea18: 0, aY=@0x7fffffea28: 
    0)
    at mobile/sailfishos/embedshared/EmbedLiteViewParent.cpp:237
#2  0x0000007fb8932f50 in mozilla::embedlite::PEmbedLiteViewParent::
    OnMessageReceived (this=0x55555d9d70, msg__=...) at 
    PEmbedLiteViewParent.cpp:1600
#3  0x0000007fb8920238 in mozilla::embedlite::PEmbedLiteAppParent::
    OnMessageReceived (this=<optimized out>, msg__=...)
    at obj-build-mer-qt-xr/dist/include/mozilla/ipc/ProtocolUtils.h:866
#4  0x0000007fb88428cc in mozilla::ipc::MessageChannel::DispatchAsyncMessage (
    this=this@entry=0x7f8c888578, aProxy=aProxy@entry=0x555560d1d0, aMsg=...)
    at obj-build-mer-qt-xr/dist/include/mozilla/ipc/ProtocolUtils.h:866
#5  0x0000007fb88482d8 in mozilla::ipc::MessageChannel::DispatchMessage (
    this=0x7f8c888578, aMsg=...)
    at ipc/glue/MessageChannel.cpp:2100
#6  0x0000007fb8849b50 in mozilla::ipc::MessageChannel::RunMessage (
    this=<optimized out>, aTask=...)
    at ipc/glue/MessageChannel.cpp:1959
#7  0x0000007fb8849d58 in mozilla::ipc::MessageChannel::MessageTask::Run (
    this=0x7f8e127650)
    at obj-build-mer-qt-xr/dist/include/mozilla/ipc/MessageChannel.h:610
#8  0x0000007fb8809d48 in MessageLoop::RunTask (aTask=..., this=0x55557c5200)
    at ipc/chromium/src/base/message_loop.cc:487
#9  MessageLoop::DeferOrRunPendingTask (pending_task=..., this=0x55557c5200)
    at ipc/chromium/src/base/message_loop.cc:478
#10 MessageLoop::DeferOrRunPendingTask (this=0x55557c5200, pending_task=...)
    at ipc/chromium/src/base/message_loop.cc:476
#11 0x0000007fb880e740 in MessageLoop::DoWork (this=<optimized out>)
    at ipc/chromium/src/base/message_loop.cc:551
#12 MessageLoop::DoWork (this=0x55557c5200) at ipc/chromium/src/base/
    message_loop.cc:530
#13 0x0000007fbfbed924 in MessagePumpQt::HandleDispatch (this=0x55557a6f40) at 
    qmessagepump.cpp:63
#14 0x0000007fbfbeda9c in MessagePumpQt::event (this=<optimized out>, 
    e=<optimized out>) at qmessagepump.cpp:51
#15 0x0000007fbebe1144 in QCoreApplication::notify(QObject*, QEvent*) () from /
    usr/lib64/libQt5Core.so.5
#16 0x0000007fbebe12e8 in QCoreApplication::notifyInternal2(QObject*, QEvent*) (
    ) from /usr/lib64/libQt5Core.so.5
#17 0x0000007fbebe36b8 in QCoreApplicationPrivate::sendPostedEvents(QObject*, 
    int, QThreadData*) () from /usr/lib64/libQt5Core.so.5
[...]
#25 0x0000005555557bcc in main ()
(gdb) 
That's much better and gives us a much clearer idea of where OnFirstPaint() is getting called. But as we can see from this it's actually being triggered by receipt of a message. The next step will therefore be to find out where the OnFirstPaint message is being sent from.

For completeness I also performed the same checks on ESR 91. You'll not be surprised to hear that the QMozViewPrivate::OnFirstPaint() method is never called in the newer build.

The ESR 78 backtrace tells us what should be happening on ESR 91. So our task now is to find out why it's not. This takes us a step forwards and gives us something to look into further; we'll pick this avenue of investigation up again tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment