flypig.co.uk

List items

Items from the current list are shown below.

Gecko

All items from August 2024

31 Aug 2024 : Day 336 #
After spending the last few days arranging commits I'm happy to have two rather exciting tasks to perform today. The first is to convert all of the commits into patches. Once I've done that I'll be able to do a complete build from beginning to end, including the "prepare" step that applies the patches. If that goes through I can create a pull request into Jolla's repositories. I'm not expecting this to be a final release version, but I'm hoping it will provide a suitably stable base to apply any other changes on top of.

The second thing I have to do is test out the new libhybris packages that Frajo has kindly built and provided me with. He actually sent them to me a couple of days back but I've been so wrapped up with the commits that I didn't yet have a chance to test them.

The new packages came about following a discussion on GitHub started by Affe Null (affenull2345) and then contributed to by Mal (mlehtima)) and Frajo (krnlyng). Affe is a secret porter, in that they've developed their own private Samsung Galaxy A40 port. Pretty cool! But Affe also hit an issue when testing ESR 91, in that it triggered the same Wayland EGL dynamic library loading crash in spite of the workaround being present in qtmozembed.

Some brilliant investigation by the three of them led to the tweaking of libhybris to avoid the crash. But, even more exciting for me, is that if I'm reading the discussion correctly this may also have fixed the "cover" crash that's been troubling me for so long.

As far as I'm concerned this cover crash is currently the most serious unresolved issue with the ESR 91 build. Not having a cover is fine, but sadly it doesn't just affect the browser: it affects WebView apps as well. That means that apps — like for example the email app — that change their covers also seem to crash if they've previously been displaying a WebView. The workaround I added to qtmozembed won't work for these apps because it has to be executed when the app starts up. That works for the browser and "pure" WebView apps, but apps like the browser don't initialise the WebView until much later on in their execution. A proper fix is needed for these.

The symptoms that result for the email app is that it'll crash if you move to the homescreen and then try to move back in to the app after having read an email. This is horrible of course and it absolutely needs to be fixed.

Now all these things may not be related, but if they are, then it would be great to have them fixed. And maybe, just maybe, these new libhybris packages will do the trick.

But first things first: patches. Well nearly patches at any rate. It turns out that in order to generate the patches I first need to sort out the commits in the gecko-dev repository. That's different to the gecko-dev-mirror repository. It's the latter that I need to generate patches for and all of the commits there have been arranged. The gecko-dev repository contains the gecko-dev-mirror repository as a submodule. The patches are stored in the gecko-dev repository where they're applied to the submodule at the start of a build.

But the new commits I've made in the gecko-dev repository are still in a bit of a messy state. For example, every few commits I've created a separate commit updating the submodule. That was necessary for me to do a rebuild, but I don't really want to keep them all in the tree. My plan is to merge the submodule updates into a single commit which, eventually, will be entirely removed.

Here are all of the submodule update commits:
$ git log --oneline HEAD...9cb2ba8396d2~ -- gecko-dev/
f270bad5788e (HEAD -> sailfishos-esr91) Update gecko-dev mirror
5a68997c8450 Update gecko-dev mirror
515537b61413 Update gecko-dev mirror
08f9959dc235 Update gecko-dev mirror
14a2700021e5 Update gecko-dev mirror
67927c0423df Update gecko-dev mirror
2fea2e212b35 Update gecko-dev mirror
09260531aceb Update gecko-dev mirror
d30cd18ee613 Update gecko-dev mirror
5b9c330998db Restore GLScreenBuffer and TextureImageEGL
96d960d6a455 Update gecko-dev mirror
f803db211dfc Update gecko-dev mirror
720b4f6574e7 Convert AddBoolVarCache variables to static prefs
638ae4117394 Update gecko-dev mirror
f659dbc79934 Update gecko-dev mirror
2df29a103c6f (tag: sailfishos/91.9.1+git1) Update gecko-dev mirror
9d155ca9394f Update gecko-dev mirror
98ac90c53a12 Update gecko-dev mirror
6ecb2bad664b Update gecko-dev mirror
46fc26b5752c Update gecko-dev mirror
4d365de6c475 Update gecko-dev mirror
72638c7a9872 Update gecko-dev mirror
04cd443a0a5f Update gecko-dev submodule
75c281796b78 Update gecko-dev mirror
e4cea9f962a0 Rename nsIdleService to nsUserIdleService
d53c33b45ca6 Update submodule to incorporate recent changes
12cef49647fa Update gecko-dev mirror
1603f42a923d Add freetype2 include directory
91c0e1a7eeee Update gecko-dev submodule
There are quite a few of them. This list makes it look like they're all consecutive, but they're not, it's just that the commits in between these have been skipped.

My first attempt to consolidate all of these submodule commits into a single commit involved first trying to shift them all so that they are all consecutive. I used a git rebase -i to do this. Unfortunately this approach failed. Why? Because while all of these commits update the submodule, some of them do other things as well.

To resolve this I've determined to take any commit that both updates the submodule and makes changes into two commits: one to update the submodule, the other to apply the other changes.

My process for doing this is as follows:
  1. Execute git rebase -i 9cb2ba8396d2~. The commit always stays the same because it's the last commit before I made changes.
  2. Set the commit that needs splitting to edit.
  3. Execute git reset HEAD~ to reset the changes.
  4. Add the changes that don't relate to the submodule using git add.
  5. Commit these changes using git commit with the same commit message as the commit being edited.
  6. Add the submodule change using git add gecko-dev.
  7. Commit these changes using git commit -m "Update gecko-dev mirror".
  8. Continue the rebase with git rebase --continue.
  9. At this point it'll complain about a conflict, so reset the submodule with git reset <commit> -- gecko-dev where <commit> is the hash of the conflicting commit.
  10. Finally git rebase --continue.
That's quite a sequence, which is actually as laborious as it sounds. But there are actually only four commits that need splitting this way, so it could be worse. Now this is done I can corral all of the submodule update commits so they're consecutive. It doesn't really matter where the final single commit lives in the sequence because eventually the submodule is going to need to be reset back to ESR 91 HEAD. The patches that go on top of it will take it to the same place as this commit. But for the time being this commit needs to stay.
git rebase -i 9cb2ba8396d2~
After squashing all of the submodule commits together apart from the first and last I'm left with this rather ominous looking list of commits:
$ git log --oneline HEAD...9cb2ba8396d2~~
ab9cf7698875 (HEAD -> sailfishos-esr91) Update gecko-dev mirror
8725b48c1cf1 Set MOZ_USE_NATIVE_POPUP_WINDOWS as an environment variable
54b7a119188e Update patches
209beaa1221c Restore GLScreenBuffer and TextureImageEGL
f366c03302b6 Ensure the Session History is initialised.
f573fce46416 Support LOAD_FLAGS_FROM_EXTERNAL flag
5797043901a7 Introduce window hidden flag and expose isForPrinting through it
bb9c40a96b42 Convert AddBoolVarCache variables to static prefs
9ffa340e2d21 Default keyword.enabled to true
f8f4df84b0a3 Add appinfo annotations
dffb82220d0a [sailfishos][embedlite] Fix AsyncPanZoomController fling state...
2b4c467f0d42 Disable patch &quot;Fix 32-bit builds&quot;
2db5567ddbcd Fix 32-bit builds
b693fa1db02c Use system python executable
bc78ac45646d Add Qt5Widgets build dependency
54c54a463e86 Help ensure the window is active
a1f85adebda5 Set correct submodule path
3ce0728ceb34 Set compositor preferences correctly
5179dd071280 Build with only a single process
7967765674f7 Fix specfile versions
08230a27decd Hide SetIsActive() aActiveId parameter
cbf00e47117f Add missing Fission methods to nsIXULRuntime
3bc37a85f5b7 Add missing GetChromeOuterWindowID() implementation
b51363671814 Update WebBrowserChrome to align with upstream
293dbb9e8810 Move active flag handling explicitly to BrowsingContext
a19eb4d211de Have the main thread double-check APZ's consumable state
e12784b8e2f5 Apply refactoring from upstream
7dfb178cf4dd Reintroduce NS_APP_DISTRIBUTION_SEARCH_DIR_LIST
8ec4aa2fd581 Use mMessageListeners.GetOrInsertNew()
6dbb1073118a Switch APZEnentResult::mStatus for GetStatus()
d3f9a43a17a1 Move mShouldSendWebProgressEventsToParent activation
1a575c80c4a9 Remove NS_LITERAL_CSTRING usage
4f3891a7a746 Switch CSSRect to ZoomTarget
1807b74962b0 Rename nsBaseHashtable::Put to InsertOrUpdate
0caddae65dc2 Add aActionId parameter to SetIsActive
1479b47c142c Add user interaction parameter to GoForward and GoBack
b42962a118c6 Remove AffectPrivateSessionLifetime call
207faaaf245c Get outerWindowID from docShell rather than DOMWindowUtils
9f9aae313e1a Switch use of MOZ_MUST_USE for [[nodiscard]]
a23bd2c76713 Specify the BrowsingContextGroup
f68fadb53c6d Make `IMEState::Enabled` an enum class
a853d8864c7b Remove plugin support from layout
4bb3ecf92186 Avoid unnecessary array copies in NotifyLayerTransforms and...
6de2af56afe2 Support the GetFrameUniformity API in content processes
12069f25adc4 Add flag to CompositorOptions to allow SW-WR on a per widget basis
9d2582254e30 Use a message router for primary child process channels
add5a1974d00 Switch GLScreenBuffer for SwapChain
cf26a8ce1fbd Add _WITH_DELETE_ON_EVENT_TARGET macros to nsISupportsImpl
2b0b1ef5c57a Remove all AddBoolVarCache usage
cdb168d9f5a3 Remove use of NS_LITERAL_STRING
d8b7845b563c Expose nsIThread to nsClipboard
05431db1eb64 Remove ChromeOuterWindowID from MessageManager override
b9dec65db14a Switch use of MOZ_MUST_USE for [[nodiscard]]
7f64138b89e5 Switch from using nsDataHashtable to nsTHashMap
1594ad5e797a Rename nsIdleService to nsUserIdleService
fec0b0d4789c Reduce optimisation to O1
3460285ab906 Add freetype2 include directory
75cb7986205c Update gecko-dev submodule
696c3801cdbd Update EmbedLite IPDL syntax
0eacc60a27ee Switch from disabling marionette to disabling webdriver
9e3dc13725a9 Update spec for ESR 91.9.0
902d36af8868 Enable dialog element
1ae9e714a3b6 (tag: sailfishos/78.15.1+git35) Merge pull request #153 from...
880b89ac0f8d [sailfishos][gecko] Backport support for ffmpeg 5.0. JB#60117...
It's ominous, but all of the changes individually actually look quite sane now. So I think I'm going to say this is where things should be. So now it's time to generate the patches from the commits in gecko-dev-mirror. First off I need to actually generate all of the patches. I'll keep a backup of the ESR 78 patches as well, but I don't plan to commit them to the repository. During the last upgrade they were kept, so maybe I'll need to commit them too in due course. Nevertheless, generating the patches turns out to be pretty straightforward:
$ cd rpm
$ mkdir esr78-patches
$ mv *.patch esr78-patches/
$ cd ../gecko-dev/
$ git format-patch -o ../rpm -N --no-signature --zero-commit 78d17b06b04f
$ git rebase -i 9cb2ba8396d2~
The next step is to copy all of the patch filenames into the spec file so that the rpm builder knows what to apply. Then I can create a commit that contains all of the new patches:
$ cd ../rpm
$ git add *.patch
$ git commit
$ cd ../gecko-dev
$ git checkout FIREFOX_ESR_91_9_X_RELBRANCH
$ cd ..
$ git commit -m &quot;Update gecko-dev mirror&quot;
$ git rebase -i 9cb2ba8396d2~
$ git clean -xdf
Notice that I've also created a commit that resets the submodule to the head of FIREFOX_ESR_91_9_X_RELBRANCH. I've committed this change to the repository as well. The patches should bring FIREFOX_ESR_91_9_X_RELBRANCH all the way up to the same commit as the HEAD of FIREFOX_ESR_91_9_X_RELBRANCH_sfos.

Now I'm ready to do the test build.
sfdk build -d -p --with git_workaround
Rather excitingly, all of the patches apply one on top of the other without any glitches.
[...]
+ unset DISPLAY
+ /bin/cat ${PROJECT}/rpm/
    0001-sailfishos-gecko-Add-symlink-to-embedlite.-JB-52893.patch
+ patch -p1 -s --fuzz=2 --no-backup-if-mismatch
[546bceac6ff5] *sfdk-prepare* Add symlink to embedlite. JB#52893
 1 file changed, 1 insertion(+)
+ /bin/cat ${PROJECT}/rpm/0002-sailfishos-qt-Bring-back-Qt-layer.-JB-50505.patch
+ patch -p1 -s --fuzz=2 --no-backup-if-mismatch
[4aa4f2c0af0c] *sfdk-prepare* Bring back Qt layer. JB#50505
 109 files changed, 4780 insertions(+), 89 deletions(-)
[...]
+ /bin/cat ${PROJECT}/rpm/
    0078-sailfishos-embedlite-egl-Fix-mesa-egl-display-and-bu.patch
+ patch -p1 -s --fuzz=2 --no-backup-if-mismatch
[70128c619c7e] *sfdk-prepare* Fix mesa egl display and buffer initialisation
 1 file changed, 228 insertions(+), 7 deletions(-)
+ echo 'Target is aarch64-unknown-linux-gnu'
Target is aarch64-unknown-linux-gnu
[...]
Originally I'd planned to sort out the patches this morning, run the build during the day and then test out Frajo's new libhybris packages this evening. But unfortunately I didn't quite get everything done this morning, which means I couldn't run the build during the day, which means that the build will now have to run overnight tonight instead.

That's fine, but it means unfortunately I won't have time to test out Frajo's packages today. It'll be something to look forward to in the morning instead.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
30 Aug 2024 : Day 335 #
I'm back to corralling commits again this morning. Yesterday I got as far as commit 86 in the list. That means there are still 40 to go and I'd like to get them done before the end of the weekend.

The process is the same as yesterday. First, list all of the relevant commits:
$ git log --oneline HEAD...78d17b06b04f~
Next, work through the logs and diffs a few commits at a time to figure out what needs doing to them. Do they need rewording? Do they need the attribution fixing? Do they need merging with some other commit? Once that's determined and I have a few lined up in my head, I can then perform an interactive rebase to perform all of these changes:
git rebase -i 78d17b06b04f
The commit 78d17b06b04f is set to the HEAD of ESR 91 before I started making changes, so this commit is always the base of everything I need to do. During the rebase I select the commits to edit, reword or squash and then manually perform appropriate actions to get everything into the correct state. For example, to set the attribution I might use this:
git commit --amend --author=&quot;<AUTHOR>&quot; --date=&quot;<DATE>&quot;
Most of the commits need to be amended so that their titles include the topic prefix [sailfishos][gecko] as typically I left these out during development.

And that's pretty much it! The good news is that following this cycle I've managed to complete the remaining 41 commits. Here's what I started with (I've left the last commit in from yesterday so it's clear how they hook together):
5fdfd3eb0500 (HEAD) Set gfx.webrender.force-disabled to default to true
3c72121dd859 [sailfishos][embedlite][egl] Fix mesa egl display and buffer...
16f8accb9c47 Fix `unstable-name-collisions` warning by using fully qualified...
3aa670a3cc6a Fix build failure due to rust-lang/rust PR 99413
60d354ad440c Properly fill out videoframe buffer in GeckoCameraVideoDecoder
edb9deacd049 Create GLLibraryEGL as a singleton in GLContextProviderEGL
5f814f20a01f Combined with patdh 0089 - Add a video decoder based on...
33133f24782c Ensure audio continues when screen is locked. Contributes to...
631eb9c9daaf Bug 1758948 [FFmpeg] Use AVFrame::pts instead of...
61617321d2dd Bug 1761471 [FFmpeg 5.0] Get frame color range and color space...
ac7f5ffbb2fa Bug 1750760 Update audio and video decoders to ffmpeg 5.0...
f5d4bf0de62e Bug 1750760 Open libavcodec.so.59 library and bind ffmpeg 5.0...
1e3291df5048 Bug 1750760 Create ffmpeg59 module for ffmpeg5.0 r=alwu
6d52cc7e9c32 Fix audio underruns for fullduplex mode. JB#55461
874b633f01f1 Add a video decoder based on gecko-camera. JB#56755
dbdf4a057c57 Fix video hardware accelaration not being used on first...
d7912ed1ceec Force use of mobile video controls. JB#55484 OMP#JOLLA-371
c4e35d97756e Force recycling of gmp-droid instances. JB#51730
e72aff45bbf1 Prioritize GMP plugins over all others, and support decoding...
f2f7a716ae1b Set embedlite.azpc.json.doubletap static pref to default to...
9859c4cc05f4 Make fullscreen enabling work as used to with pref ...
b2e11c4108c0 Set security.external_protocol_requires_permission to default...
379f31eef47a Fix content action integration to work. Fixes JB#51235
95da0e408f53 Use libcontentaction for custom scheme uri handling. JB#47892
a9bc82c90f8e Update hash for mapped_hyph
fad2d997cea7 [sailfishos][gecko] Add support for prefers-color-scheme
7caca291fbac [sailfishos][gecko] Allow LoginManagerPrompter to find its...
b67d3cf6a2d3 Convert panic into early return in Hyphenator
891bd6933db7 Get ContentFrameMessageManager via nsIDocShellTreeOwner...
941e612af842 Drop AudioPlayback messages if no embedder element is defined
36cb2567f3c2 [sailfishos][webrtc] Implement video capture module. JB#53982
ffc93a057267 [sailfishos][webrtc] Implement video capture module. JB#53982
2e65e420780a Enable GMP for encoding/decoding. JB#53982
9a481753fd3b Regenerate moz.build files.
c1b06c57549c Switch vp8_encoder_simulcast_proxy for encoder_simulcast_proxy
534ffad75341 Disable desktop sharing feature on SFOS. JB#53756
7bda060a012f [sailfishos][webrtc] Regenerate moz.build files. JB#53756
71ea0764f5ef [sailfishos][webrtc] Update GN build files for WebRTC. JB#53756
7ea10b57b465 Adapt build configuration for SailfishOS. JB#53756
1765dd0460a9 Restore GLScreenBuffer and TextureImageEGL
d3ba4df29a32 (temp) Restore NotifyDidPaint event and timers
f55057391ac0 Prevent errors from DownloadPrompter
These have reduced down by 10 to make 31 new commits to replace them. The new commits look like this:
24f9ec2df3d3 (HEAD) [sailfishos][embedlite][egl] Fix mesa egl display and...
7e5553ee4b22 [sailfishos][gecko] Fix `unstable-name-collisions` warning by...
b08a73b0d2f1 [sailfishos][gecko] Fix build failure due to rust-lang/rust PR...
3705ec69aafa [sailfishos][gecko] Ensure audio continues when screen is...
5cc32e715fac [sailfishos][gecko] Bug 1758948 [FFmpeg] Use AVFrame::pts...
370917e54b8e [sailfishos][gecko] Bug 1761471 [FFmpeg 5.0] Get frame color...
56e2ee73233f [sailfishos][gecko] Bug 1750760 Update audio and video decoders...
d4a651ccd7be [sailfishos][gecko] Bug 1750760 Open libavcodec.so.59 library...
65f0141c466a [sailfishos][gecko] Bug 1750760 Create ffmpeg59 module for...
b4c938ac9240 [sailfishos][gecko] Fix audio underruns for fullduplex mode...
37bd01779d83 [sailfishos][gecko] Add a video decoder based on gecko-camera...
4c4f0626452d [sailfishos][gecko] Fix video hardware accelaration not being...
570ef20869fd [sailfishos][gecko] Force use of mobile video controls...
7a46a215d649 [sailfishos][gecko] Force recycling of gmp-droid instances...
a5faa0c8aabe [sailfishos][gecko] Prioritize GMP plugins over all others, and...
22f531b9a58b [sailfishos][gecko] Make fullscreen enabling work as used to...
974010cd0fda [sailfishos][gecko] Fix content action integration to work...
ea2a8a393a1f [sailfishos][gecko] Update hash for mapped_hyph
a62d17ace5cb [sailfishos][gecko] Add support for prefers-color-scheme...
83477c046211 [sailfishos][gecko] Allow LoginManagerPrompter to find its...
e7209e460314 [sailfishos][gecko] Convert panic into early return in Hyphenator
d46228ac0fc2 [sailfishos][gecko] Get ContentFrameMessageManager via...
cd66aada3508 [sailfishos][gecko] Drop AudioPlayback messages if no embedder...
18a71b573840 [sailfishos][webrtc] Regenerate moz.build files. JB#53756
4e7332543828 [sailfishos][webrtc] Implement video capture module. JB#53982
1a290ea1db22 [sailfishos][gecko] Enable GMP for encoding/decoding. JB#53982
f10b5c2cf0cf [sailfishos][gecko] Disable desktop sharing feature on SFOS...
c961f8a04412 [sailfishos][webrtc] Update GN build files for WebRTC. JB#53756
d69d8be0fea4 [sailfishos][gecko] Adapt build configuration for SailfishOS...
36d30b7f7566 [sailfishos][gecko] Restore NotifyDidPaint event and timers
b1406101475c [sailfishos][gecko] Prevent errors from DownloadPrompter
In addition to these I was also able to squash together all of the GLScreenBuffer changes into just a couple of commits. The end result is a total of 78 commits, down from the original 126:
$ git log --oneline HEAD...78d17b06b04f | wc -l
78
That's pretty good going. That means there will be 78 patches in the rpm directory. This won't be all of them as I've intentionally skipped some patches that may yet need to be added. But this brings things to a pretty solid state and there's a reasonable chance that the total patch count at the end of all this will be less than 100.

Before I turn these into patches there's a consistency check I want to do. As I mentioned a couple of days back, this process of rewording, re attributing, reordering and restructuring the commits should have left the final source tree in exactly the same state as before. Nothing should have changed apart from the git history.

But there's plenty scope for something to have gone wrong. For example deleting a line of the interactive rebase list can result in an entire commit being lost. If I did that accidentally, it'll affect the final result. So I need to figure out a way to perform a diff on the entire codebase: what I had previously vs. what I have now.

I don't really want to do a diff though, just a hash of the files should be sufficient. Here's what I've devised for this:
$ find . -type f -exec sha256sum {} \; | sha256sum -
Short but sweet. This will cycle through all files in the directory and any subdirectories and take the SHA256 checksum of the file. The list of these checksums is then piped into the same SHA256 algorithm to give a final checksum that represents all files and filenames. It's like a super-simple single-layer Merkle tree.

My plan is to run this on the current FIREFOX_ESR_91_9_X_RELBRANCH_sfos branch with all of my newly minted commits. I'll record the result and then switch to the FIREFOX_ESR_91_9_X_RELBRANCH_patches branch before running the command again. When this outputs a checksum I'll be able to compare the two in order to determine whether or not anything has changed. I've not cleaned the repository and I don't plan to, because although the checksum will pick up all of the untracked files as well, none of these should change when I switch branch, so these shouldn't affect whether or not the final results differ.

If even a single bit has changed between the two versions, it'll show up in the checksum. It'll be a miracle if both give the same result. Let's try it.
$ git checkout FIREFOX_ESR_91_9_X_RELBRANCH_sfos
Already on 'FIREFOX_ESR_91_9_X_RELBRANCH_sfos'
$ find . -type f -exec sha256sum {} \; | sha256sum -
2f8460ce00ef4304058421a444d1d5a08f58cc2f006cf009ee3fa6f7e5cc50b5  -
$ git checkout FIREFOX_ESR_91_9_X_RELBRANCH_patches
Switched to branch 'FIREFOX_ESR_91_9_X_RELBRANCH_patches'
Your branch is up-to-date with 'origin/FIREFOX_ESR_91_9_X_RELBRANCH_patches'.
$ find . -type f -exec sha256sum {} \; | sha256sum -
2f8460ce00ef4304058421a444d1d5a08f58cc2f006cf009ee3fa6f7e5cc50b5  -
And they're identical! Superb. That puts things in a good place. The next step will be to turn the 78 new commits into patches using git format-patch and commit the result to the gecko-dev repository in order to create a pull request.

That's not a huge task, but it'll have to wait until tomorrow; I've done enough for today.

The comments and issues from the testing have come flooding in on the Sailfish OS Forum and that's great. What's more there's even been effort started to fix some of the remaining issues. This is really where I was hoping things would go and it bodes well for the future, so I'm very excited to see how everything progresses. I've not had a chance to join the discussion as I've just been concentrating on these patches for the last few days.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
29 Aug 2024 : Day 334 #
I'm continuing the process of patch rationalisation today. Yesterday I got through 24% of the patches, so I'm expecting this task to run up until the weekend. For a bit of context on this, if I were working full-time on the project I'd expect to spend a day doing this, but because I'm dipping in and out, it's inevitably taking a bit longer.

Nevertheless I've made good progress today. The following are the patches I've worked through:
f55057391ac0 Prevent errors from DownloadPrompter
eab04b8c0d80 Enable dconf
c6ea49286566 Disable SessionStore functionality
e035d6ff3a78 Add embedlite static prefs
5ae7644719f8 Allow file scheme when loading OpenSearch providers
1fe7c56c7363 Supress URLQueryStrippingListService.jsm error
80fa227aee92 [sailfishos][gecko] Fix gfxPlatform::AsyncPanZoomEnabled...
b58093d23fb2 Add patch to fix 32-bit builds
ad1b74f33c3d Drop swap_buffers_with_damage extension support...
249be020a9d3 Update EglDisplay pathway
cef201e30159 Make PresShell::SetIsActive() public
009c0fc0134c Avoid crash on theme change
49148b0b591a [PATCH] [sailfishos][egl] Do not create CreateFallbackSurface...
9882c3c821a9 Create EmbedLiteCompositorBridgeParent in...
78c9c07e4ffe Update GetBoostrap return value
f9505b8e3cc8 Split namespace into two blocks
bdb5a929ce64 Introduce EmbedInitGlue to the mozglue. JB#50788
2f6cdae7162d Fixup for f4c2a96b3363e7e64bd728126b5a1d9c720dea7b
aaed7ec58c1d Revert &quot;Bug 1333826 - Remove a few references from .mk files...&quot;
7ec6ebc30d76 Revert &quot;Bug 1333826 - Remove the make-sdk build target, the...&quot;
bf9598fdc451 Revert &quot;Bug 1333826 - Remove SDK_FILES, SDK_LIBRARY, and...&quot;
5f2f418800dc Revert &quot;Bug 1427455 - Remove unused variables from...&quot;
ab0a86414349 Revert &quot;Bug 445128 - Stop putting the version number in the...&quot;
a3dcc0d45a5a Force to build mozglue and xpcomglue static libraries
f05364dbad53 Allow gen_last_modified.py to complete
a0206e85d4ec Add support for aarch64 to elfhack. JB#57563
1257cda3f006 Restore nsAppShell.h
bb66c816d2e5 Revert &quot;Bug 1567888 - remove unneeded QT-related rules and...&quot;
4f87cfb7621c Revert &quot;Remove reference to moc_nsAppShell.cpp&quot;
efab5ee79c4d Revert &quot;Remove build reference to moc_nsNativeAppSupportQt.cpp&quot;
3733c9668313 Add Qt nsNativeBasicThemeQt class
b8069777e723 Fix nsLookAndFeel code
c31b0b87e992 Add missing GfxInfo methods
2fd26a23b173 Add gfxQtPlatform to method implementations
a7d148ce50d1 Add dom/system/qt directory to build
42c6aeccffab Fixup dbddd968a5eb Update nsLookAndFeel
cccc969f3668 Remove reference to moc_nsAppShell.cpp
52f8c2eb9c70 Fixup e31ed7aaa670 Update Qt implementaiton of nsPrintSettings
f8b62e39b45e Add missing methos to GfxInfo
e31ed7aaa670 Update Qt implementaiton of nsPrintSettings
dbddd968a5eb Update nsLookAndFeel
cea87136481f Rename MediaControlKeysEventSource to MediaControlKeySource
ff3f6467a771 Remove nsPrinterEnumeratorQt
87f95304afe3 Update ProcInfo
4626f98315b8 Remove build reference to moc_nsNativeAppSupportQt.cpp
24181a26e795 Fixup for 38be5c5c7302 [PATCH] Revert &quot;Bug 1611386&quot;
5dd03fbeaba4 Fix embedlite building. JB#50505
cea66a6b7aa0 Revert &quot;Bug 1494175 - Remove unimplemented nsIWebBrowserChrome...&quot;
51e18eb9b16e Restore GLContextProviderEGL::CreateWrappingExisting()
2642f0d8d1bc Remove NS_LITERAL_CSTRING usage
b7f6b44b8074 Revert &quot;Bug 1706051 - Remove some IPC messages that are unused...&quot;
7a9fa03f27b3 Hackish fix for preferences usage in Parent process (part 1)
f536dbf9f6f8 Switch GLScreenBuffer for SwapChain
48bbc655f6e3 Revert &quot;Bug 1676576 - Remove unused functions of...&quot;
0acadeba1ac3 Allow compositor specializations to override the composite...
6889dbfaabe5 Make it possible to extend CompositorBridgeParent
5a88215c32d6 Work around upstream membarrier changes
I've consolidated these 57 patches down into the following 36 patches:
c1b4d42cc6bd [sailfishos][gecko] Prevent errors from DownloadPrompter
e5d73eba9939 [sailfishos][gecko] Enable dconf
3c091af6fb4c [sailfishos][gecko] Disable SessionStore functionality
69cc843f6b5d [sailfishos][gecko] Add embedlite static prefs
76df2956569a [sailfishos][gecko] Allow file scheme when loading OpenSearch...
2a397ab1febd [sailfishos][gecko] Supress URLQueryStrippingListService.jsm error
be2dd5f7f549 [sailfishos][gecko] Fix gfxPlatform::AsyncPanZoomEnabled for...
6e1b47607843 [sailfishos][gecko] Add patch to fix 32-bit builds
f0199909b71f [sailfishos][egl] Drop swap_buffers_with_damage extension...
7ea1768c2aeb [sailfishos][gecko] Make PresShell::SetIsActive() public
d05693baba5f [sailfishos][egl] Do not create CreateFallbackSurface...
f6c1cf32effe [sailfishos][gecko] Create EmbedLiteCompositorBridgeParent in...
ce922a93b50a [sailfishos][gecko] Split namespace into two blocks
881758877c5d [sailfishos][gecko] Introduce EmbedInitGlue to the mozglue...
9e9430751d8b [sailfishos][gecko] Revert &quot;Bug 1333826 - Remove a few...&quot;
73ffb3a64c50 [sailfishos][gecko] Revert &quot;Bug 1333826 - Remove the make-sdk...&quot;
bffc049cab51 [sailfishos][gecko] Revert &quot;Bug 1333826 - Remove SDK_FILES,...&quot;
d27da57a5840 [sailfishos][gecko] Revert &quot;Bug 1427455 - Remove unused...&quot;
a1902a2f4a8b [sailfishos][gecko] Revert &quot;Bug 445128 - Stop putting the...&quot;
f67aeb2da728 [sailfishos][gecko] Force to build mozglue and xpcomglue static...
097ce48f8019 [sailfishos][gecko] Allow gen_last_modified.py to complete
73cb3c320c43 [sailfishos][gecko] Add support for aarch64 to elfhack. JB#57563
e184424c5a1d [sailfishos][gecko] Restore nsAppShell.h
d9d4edaae30a [sailfishos][gecko] Revert &quot;Bug 1567888 - remove unneeded...&quot;
aa5bf20f2893 [sailfishos][gecko] Update ProcInfo
1b7857d792bd [sailfishos][gecko] Fix embedlite building. JB#50505
4823c745e163 [sailfishos][gecko] Revert &quot;Bug 1494175 - Remove unimplemented...&quot;
e548b4bb2a87 [sailfishos][gecko] Remove NS_LITERAL_CSTRING usage
1106e80173ea [sailfishos][gecko] Revert &quot;Bug 1706051 - Remove some IPC...&quot;
4a89fad38f8c [sailfishos][gecko] Hackish fix for preferences usage in Parent...
9d5a1fc090d4 Update EglDisplay pathway
70af371328eb Restore GLContextProviderEGL::CreateWrappingExisting()
12cae7270918 Switch GLScreenBuffer for SwapChain
cba3155a76e3 [sailfishos][gecko] Revert &quot;Bug 1676576 - Remove unused...&quot;
5c198b5f1ae2 [sailfishos][gecko] Allow compositor specializations to...
0dd9598d9874 [sailfishos][gecko] Work around upstream membarrier changes
Apart from merging together patches that are related I've also ensured existing patches that carried over from ESR 78 have had their details (title, description, author, date) transferred over in full.

You'll notice there are three patches that don't have a topic prefix of [sailfishos][gecko] added. That's because these three patches related to the rendering pipeline. I'm planning to combine these together, but want to first batch them up so I can clearly see how many they are and what files they touch. With any luck, combining these should offer a nice opportunity to tidy things up as well.

I've not checked, but the plan with all of these changes is to ensure the final repository contents stays identical to what we have now. Moving, merging, rewording and re-attributing commits shouldn't have any impact on the final contents. The commit hashes will change, partly due to changes in the descriptions (which are included in the hash) and partly as a result of consolidating patches. But it's the final result that needs to stay the same so that should all be fine.

In summary, I've now worked through 86 of the original 126 commits. That's 68% of them. There are now just 92 patches in total which is already lower than the 98 patches needed to build ESR 78 and it's likely to go down further. The fewer the patches the easier the browser will be to maintain in the future, so this is a good sign.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
28 Aug 2024 : Day 333 #
I'm really happy with the response I got from yesterday's "release" of the browser packages for testing. It's great to see users taking the trouble to install, test and post comments or issues about it. It's important to get all of these problems recorded, even if it's going to take me a while to work through them all. Thank you to everyone who's provided feedback.

Quite a few comments noted that I've only made packages available for aarch64 devices. That is the case I'm afraid and, right now, I'm not in an immediate position to release arm32 packges. I've not tested the build process for arm32 recently and I expect it to require a bit of work. I know how patient everyone has been and I know this will be disapointing to hear for some of you. I can only apologise and emphasise that I would dearly love to have the time to get this done as well. I will do, but I can't quite yet.

There have also been some unsuccessful attempts to run the binaries on native devices such as the PinePhone. It's a shame to hear it's not working there, but admitedly not a surprise. Native builds have slightly different rendering requirements and I don't have a device to test on, so this part of the update was performed in the dark. Once other things have been ironed out I'll return to this.

Finally, a quick word about timelines. At some point in the near future it's no longer going to make sense for me to continue writing these daily diary entries. I expect that to happen once all of the patches have been organised and I've finalised pull requests into the Jolla repositories. Although that's not far off, it looks like it may go beyond this weekend.

Next week I'll be attending a conference so will have to pause gecko development while I'm away. Given current progress I won't have everything done by then, so I'll have seven days of hiatus, but will return to daily blogging after that, on the 9th September.

With all of the responses to the release today I have lots still to work on. But today I'm moving in to "tidying-up" mode. What does that mean? Well, up until now for the browser development I've been adding all of my changes as commits to a long-lived branch called FIREFOX_ESR_91_9_X_RELBRANCH_patches in my own mirror of the upstream gecko repository. When I say "upstream" I actually mean Mozilla upstream, via the Sailfish OS mirror of the repository.

However, we don't want to have a completely new fork of the gecko engine for Sailfish OS. If we did it'd make it much harder to benefit from the ongoing development of gecko by Mozilla. Instead Jolla completely mirrors Mozilla's repository and applies all of its changes on top of the upstream code as patches.

Currently my changes are commits, not patches. So the "tidying-up" process is going to involve me converting them into patches.

There's a git command to do this: git format-patch. If it were as simple as just running this command I'd be laughing. But in practice I've not always been conscious of the need to create patches when committing changes. That was intentional: getting gecko working has been the priority. But it means there are far too many commits, many of which can and should be combined in order to turn them into a smaller set of patches.

Let's make some comparisons. The ESR 78 build of Gecko had 98 patches. We can see this by looking inside the rpm directory. The patches are numbered sequentially which makes them really easy to count. Here are the pathces from ESR 78:
$ push gecko-dev-project/gecko-dev/gecko-dev
$ ls -1 *.patch | wc -l
98
$ pop
Let's compare that with the number of commits I've made on the FIREFOX_ESR_91_9_X_RELBRANCH_patches since I branched it from the upstream FIREFOX_ESR_91_9_X_RELBRANCH branch:
$ push gecko-dev-esr91/gecko-dev/gecko-dev
$ git log --oneline HEAD...$(git log -1 \
    --pretty=&quot;format:%H&quot; FIREFOX_ESR_91_9_X_RELBRANCH) | wc -l
126
$ pop
So that's 126 commits compared to 98 patches. The smaller the number the better and these two numbers are closer than I had feared. It's not necessary to get back to exactly the same number of patches. But I do think it should be close and, realistically, it ought to be less than 100.

Ideally with patches you'd avoid having one patch overlay a change made by an earlier patch. Not only is any such overlay strictly unecessary, it also makes patches harder to deal with because it means they can't be reordered so easily. There are situations where it can make sense, for example for the sake of ensuring patches capture atomic changes. But this is soemthing I also need to figure out: how to combined the comments in a way that minimises overlap.

Finally I also want to try to align my commits with the ESR 78 patches. For example, in many cases I was able to apply a patch from ESR 78 unmodified. In these cases the original patch and author should be retained. Even in cases where I had to make substantial manual changes, it's still important to keep the original author and description.

Most of this is administration, but important administration nonetheless.

So how to proceed? My plan is to make a separate branch FIREFOX_ESR_91_9_X_RELBRANCH_sfos that starts as a copy of the branch I've been doing all my work on. I can then safely rebase the new branch starting from when it diverged from FIREFOX_ESR_91_9_X_RELBRANCH. I'll do my best to combined patches that need combining, update their descriptions and authors to match the ESR 78 patches and finally reorder patches to align with ER 78 if that turns out to be straightforward.

I'll be left with a branch that contains all of the new consolidated commit. The HEAD of the branch should contain exactly the same content as the HEAD of my current development branch. Only the commits in between will have changed. Once I have that I can then use git format-patch to generate all of the patches from it.

I'll then create a new branch in the gecko-dev repository (which, take care, is different from the gecko-dev-mirror repository!), copy all of the patches into it and then check everything is good by performing a full, clean rebuild including the prepare step.

That sounds like a plan. Let's see how I get on.

First up I'll create a new branch:
$ git checkout FIREFOX_ESR_91_9_X_RELBRANCH_patches
Already on 'FIREFOX_ESR_91_9_X_RELBRANCH_patches'
Your branch is up-to-date with 'origin/FIREFOX_ESR_91_9_X_RELBRANCH_patches'.
$ git checkout -b FIREFOX_ESR_91_9_X_RELBRANCH_sfos
Switched to a new branch 'FIREFOX_ESR_91_9_X_RELBRANCH_sfos'
Now I need to start editing the commits, but in a way that doesn't change the final result. The key commit is 78d17b06b04f since that's the most recent commit before I started making changes. It means I can easily list all of the relevant commits using the following command:
$ git log --oneline HEAD...78d17b06b04f~
I'm not going to list all 126 of the new commits here, but since I'll be starting at the oldest commits and working forwards, it makes sense for me to list some of the oldest:
6889dbfaabe5 Make it possible to extend CompositorBridgeParent
5a88215c32d6 Work around upstream membarrier changes
8a6d40752f22 Ensure PGIOChannel methods are included in Necko
0ca273a6a639 Rename nsIdleService to nsUserIdleService
44eb2761c0b7 Backport Embed MessageLoop contructor back (sha1 eb2dcea271970)
9ac31b7d5097 Disable MOC code generation for message_pump_qt
8c36088e5b72 Build mSize into fontlist::Face::InitData
f83d943479b4 Patch glslopt to build on arm
f64b6cf84dc6 Reduce optimisation when building swgl
dc0c1d441db8 Drop debuginfo for Rust components
41f1be11db66 Fix font method signatures
073ba69f3fe4 Use QRegion::united()
15f3e54524f9 Remove NS_LITERAL_CSTRING macro usage
f7abb2cf1c66 Remove nsDataHashtable.h include
c77bb39fd6bc Ensure QWidget header can be found
6729baec49bc Register GfxInfo service with components.conf
2ab356e4e31a [PATCH] [sailfishos][components] Cleanup static components...
9d28582c540a [PATCH] [sailfishos][ipc] Whitelist sync messages of EmbedLite...
ff27901dc3ed [PATCH] [sailfishos][compositor] Fix GLContextProvider defines
2da747df9cc7 Ensure PGIOChannel is correctly exposed
9bcea9c706cc Update lib.rs checksum
aa9461129437 Revert cbindgen dependency version workaround
20d07143125c [PATCH] [sailfishos][qt] Provide checkbox/radio renderer for...
cf33677bb09a [PATCH] [gecko][configure] Read rustc host from environment...
1e70489f8c76 Work around build version requirements
6cf2926be653 Adjust NSPR version
38be5c5c7302 [PATCH] Revert &quot;Bug 1611386 - Drop support for...&quot;
57b3e5b766e8 Add --enable-dconf option
1c02a359c368 Add --with-embedlite flag
f4c2a96b3363 [PATCH] [sailfishos][qt] Bring back Qt layer. JB#50505
4a17000a7b9b [PATCH] [sailfishos][gecko] Add symlink to embedlite. JB#52893
78d17b06b04f (upstream/FIREFOX_ESR_91_9_X_RELBRANCH) Bug 1770137 - Part 2...
There are several changes I'll need to make to these:
  1. Remove the [PATCH] prefix from the title. These are added by git when the patch is applied so will need to be removed again.
  2. Add the [sailfishos][gecko] prefix to the commit titles. This is standard for Sailfish OS and allows the changelog to be generated more easily.
  3. Update the commit author. Since I committed these myself they'll almost all have my name attached, but where these have come from the patches, I need to restore the original author details.
  4. Update the commit date. Similarly for commits that have come from existing patches I'll need to adjust the date to match the patch they came from.
  5. Squash patches that are semantically related or touch the same part of the code.
The key tool I'll be using for all of this is git rebase:
$ git rebase -i 78d17b06b04f
This will allow me to reorder the commits, mark them with squash in case I need to merge several together, or mark them with edit to update authors and dates. When performing the latter, the details can be updated using the following commands (with suitable adjustments for the author and date):
$ git commit --amend \
    --author=&quot;Raine Makelainen <raine.makelainen@jolla.com>&quot;  \
    --date=&quot;Tue, 26 Jan 2021 14:13:31 +0200&quot;
$ git rebase --continue
Using various combinations of these commands this evening I've been able to work through the oldest 30 commits listed above and convert them into the following 15 commits:
3035e931bc43 Make it possible to extend CompositorBridgeParent
52a4830c18cc [sailfishos][gecko] Work around upstream membarrier changes
0822548619a4 [sailfishos][gecko] Backport Embed MessageLoop contructor back...
b4d09871056a [sailfishos][gecko] Disable MOC code generation for message_pump_qt
ecee0cc29d6e [sailfishos][gecko] Patch glslopt to build on arm
50b47afc55aa [sailfishos][gecko] Reduce Rust build requirements
2a9f943a2b71 [sailfishos][components] Cleanup static components definitions...
87d833da8e65 [sailfishos][ipc] Whitelist sync messages of EmbedLite. JB#50505
980fb1c93b47 [sailfishos][compositor] Fix GLContextProvider defines
31792345c721 [sailfishos][qt] Provide checkbox/radio renderer for Sailfish OS...
4db3eb103c70 [sailfishos][gecko] Read rustc host from environment. JB#53019...
762e87861ca5 [sailfishos][gecko] Fix build version requirements
e74cf5140577 [sailfishos][gecko] Revert &quot;Bug 1611386 - Drop support for...&quot;
6bb1c608d070 [sailfishos][gecko] Fix embedlite building. JB#50505
26d106c8ba10 [sailfishos][qt] Bring back Qt layer. JB#50505
d9e39b342094 [sailfishos][gecko] Add symlink to embedlite. JB#52893
78d17b06b04f (upstream/FIREFOX_ESR_91_9_X_RELBRANCH) Bug 1770137 - Part 2...
That means that the original 126 commits are now down to 112 commits as a result of me merging commtis together. It also means I now only have another 96 commits to go! But that's all I have time to work through today. It's a slow process, but all I need to do is continue working through the commits methodically.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
27 Aug 2024 : Day 332 #
It looks like I'm beginning to wrap things up. Don't get me wrong, I'm not under any illusion that Sailfish OS browser development will ever be "done", but the objective here is just to get ESR 91 into a usable state. Then I'm hoping Jolla can perform its magic to get from that into a production-ready state.

I still need to arrange the patches and create pull requests, but otherwise the work done over the last few days, including the help I've received from Raine, Frajo and mal at Jolla, mean that things have progressed well.

Where does this leave us today? The browser is now working nicely on Sailfish OS 4.6. Actually, I'd go so far as to say I'm really pleased with the result. It feels responsive and usable on the majority of sites I've tested. Under the light load I've subjected them to, the browser and WebView have both been pretty stable. By no means all perfect, but usable.

So I've decided to take the plunge and install it on my daily device. This is no small step for me, because for better or worse I use my phone a lot and the browser has become an essential feature. If I break my phone I'm going to regret it.

Installing it on my daily phone will allow me to write-up the steps needed to install the packages, alongside the steps needed to revert a phone back to its previous state. It'll also be a convenient way to test its functionality.

Several people have also asked to try it out so I'll also be releasing the packages and instructions. It could certainly benefit from the testing, but this will only be for the brave and/or those with a second "disposable" device.

First up I'm going to create a tarball containing all of the packages. I'm just going to include the release packages at this stage to keep things small and tidy.
$ tar -czf \
    gecko-dev-esr91-release.tar.bz2 \
    xulrunner-qt5-91.9.1-1.aarch64.rpm \
    xulrunner-qt5-misc-91.9.1-1.aarch64.rpm \
    qtmozembed-qt5-1.53.9-1.aarch64.rpm \
    sailfish-components-webview-qt5-1.5.21-1.aarch64.rpm \
    sailfish-components-webview-qt5-pickers-1.5.21-1.aarch64.rpm \
    sailfish-components-webview-qt5-popups-1.5.21-1.aarch64.rpm \
    embedlite-components-qt5-1.23.0-1.aarch64.rpm \
    sailfish-browser-2.2.45-1.aarch64.rpm \
    sailfish-browser-settings-2.2.45-1.aarch64.rpm \
    mapplauncherd-booster-browser-0.2.1-1.aarch64.rpm
This produces a neat 37 MiB compressed tarball. Some people might want all of the packages, including the debug symbols, so I'll make a tarball available with everything in as well:
$ tar -czf \
    gecko-dev-esr91-debug.tar.bz2 \
    xulrunner-qt5-91.9.1-1.aarch64.rpm \
    xulrunner-qt5-debuginfo-91.9.1-1.aarch64.rpm \
    xulrunner-qt5-debugsource-91.9.1-1.aarch64.rpm \
    xulrunner-qt5-devel-91.9.1-1.aarch64.rpm \
    xulrunner-qt5-misc-91.9.1-1.aarch64.rpm \
    qtmozembed-qt5-1.53.9-1.aarch64.rpm \
    qtmozembed-qt5-debuginfo-1.53.9-1.aarch64.rpm \
    qtmozembed-qt5-debugsource-1.53.9-1.aarch64.rpm \
    qtmozembed-qt5-devel-1.53.9-1.aarch64.rpm \
    qtmozembed-qt5-tests-1.53.9-1.aarch64.rpm \
    sailfish-components-webview-qt5-1.5.21-1.aarch64.rpm \
    sailfish-components-webview-qt5-debuginfo-1.5.21-1.aarch64.rpm \
    sailfish-components-webview-qt5-debugsource-1.5.21-1.aarch64.rpm \
    sailfish-components-webview-qt5-devel-1.5.21-1.aarch64.rpm \
    sailfish-components-webview-qt5-doc-1.5.21-1.aarch64.rpm \
    sailfish-components-webview-qt5-examples-1.5.21-1.aarch64.rpm \
    sailfish-components-webview-qt5-pickers-1.5.21-1.aarch64.rpm \
    sailfish-components-webview-qt5-popups-1.5.21-1.aarch64.rpm \
    sailfish-components-webview-qt5-tests-1.5.21-1.aarch64.rpm \
    sailfish-components-webview-qt5-ts-devel-1.5.21-1.aarch64.rpm \
    embedlite-components-qt5-1.23.0-1.aarch64.rpm \
    embedlite-components-qt5-debuginfo-1.23.0-1.aarch64.rpm \
    embedlite-components-qt5-debugsource-1.23.0-1.aarch64.rpm \
    sailfish-browser-2.2.45-1.aarch64.rpm \
    sailfish-browser-debuginfo-2.2.45-1.aarch64.rpm \
    sailfish-browser-debugsource-2.2.45-1.aarch64.rpm \
    sailfish-browser-settings-2.2.45-1.aarch64.rpm \
    sailfish-browser-tests-2.2.45-1.aarch64.rpm \
    sailfish-browser-ts-devel-2.2.45-1.aarch64.rpm \
    mapplauncherd-booster-browser-0.2.1-1.aarch64.rpm \
    mapplauncherd-booster-browser-debuginfo-0.2.1-1.aarch64.rpm \
    mapplauncherd-booster-browser-debugsource-0.2.1-1.aarch64.rpm
This debug tarball is quite a bit larger at 670 MiB, but that's okay because I don't expect most people will want or need anything other than the release tarball.
$ ls -lh gecko-dev-esr91*.tar.bz2
-rw-rw-r-- 1 flypig flypig 670M Aug 23 16:54 gecko-dev-esr91-debug.tar.bz2
-rw-rw-r-- 1 flypig flypig  37M Aug 23 16:49 gecko-dev-esr91-release.tar.bz2
Alright, time to test it out. I've uploaded the tarballs to my website which means I can download them to my phone using curl:
$ curl https://www.flypig.co.uk/dnload/dnload/sailfishos/gecko/gecko-dev-esr91-release.tar.bz2 \
    -O gecko-dev-esr91-release.tar.bz2
Next it needs unpacking:
$ tar -xvf gecko-dev-esr91-release.tar.bz2
embedlite-components-qt5-1.23.0-1.aarch64.rpm
mapplauncherd-booster-browser-0.2.1-1.aarch64.rpm
qtmozembed-qt5-1.53.9-1.aarch64.rpm
sailfish-browser-2.2.45-1.aarch64.rpm
sailfish-browser-settings-2.2.45-1.aarch64.rpm
sailfish-components-webview-qt5-1.5.21-1.aarch64.rpm
sailfish-components-webview-qt5-pickers-1.5.21-1.aarch64.rpm
sailfish-components-webview-qt5-popups-1.5.21-1.aarch64.rpm
xulrunner-qt5-91.9.1-1.aarch64.rpm
xulrunner-qt5-misc-91.9.1-1.aarch64.rpm
Before moving on I need to close down the browser if it's running and any apps that might be using a WebView. The next few steps are going to be destructive, in that they'll mess up some of the ESR 78 configuration settings. It's therefore crucial that I make a copy of my ESR 78 configuration before the upgrade, so that I can restore it later if I need to switch back.
$ cp ~/.local/share/org.sailfishos/browser/ ~/.local/share/org.sailfishos/browser.esr78.bak/
Next I need to clean a few things up, in particular by removing various configuration files that I want the ESR 91 code to regenerate.
$ rm ~/.local/share/org.sailfishos/browser/.mozilla/ua-update.json
$ rm -rf ~/.local/share/org.sailfishos/browser/.mozilla/startupCache
$ rm -rf ~/.local/share/org.sailfishos/browser/.mozilla/prefs.js
$ rm -rf ~/.local/share/org.sailfishos/browser/__PREFS_WRITTEN__
In the final version these steps are likely to be performed by a oneshot script, but creating that is a future task. Now I'm ready to install the packages. This will require root privileges, so I've prefixed the installation command with devel-su:
$ devel-su rpm -U --force \
    xulrunner-qt5-91.*.rpm \
    xulrunner-qt5-misc-91.*.rpm \
    qtmozembed-qt5-1.*.rpm \
    sailfish-components-webview-qt5-1.*.rpm \
    sailfish-components-webview-qt5-pickers-1.*.rpm \
    sailfish-components-webview-qt5-popups-1.*.rpm \
    embedlite-components-qt5-1.*.rpm \
    sailfish-browser-2.*.rpm \
    sailfish-browser-settings-2.*.rpm \
    mapplauncherd-booster-browser-0.*.rpm
Finally we need to kill the browser booster to avoid the old libraries lingering in runtime memory:
$ killall booster-browser
I originally thought this would be enough to get things working, but it turns out this breaks the email app as it's unable to find the libxul.so library. As I discussed yesterday, this is due to the jolla-email binary having an rpath that points to the ESR 78 directory. The simplest way I've found to work around this is to create a symlink to the new directory at the location of the previous one:
$ ln -s /usr/lib64/xulrunner-qt5-91.9.1/ /usr/lib64/xulrunner-qt5-78.15.1
And with that the browser should be ready to test! It works for me either from the command line, or directly by selecting the icon in the app drawer.

In case that gives a bad result, I also need to figure out the right way to revert back to ESR 78. I'll come on to that, but first, here's a summary of the steps I've taken to complete the installation of ESR 91:
  1. Download the tarball to your phone.
    $ curl -O https://www.flypig.co.uk/dnload/dnload/sailfishos/gecko/
        gecko-dev-esr91-release.tar.bz2
    
  2. Unpack the tarball:
    $ tar -xvf gecko-dev-esr91-release.tar.bz2
    
  3. Ensure the browser and any apps using the WebView are closed.
  4. Make a copy of your browser profile:
    $ cp -r ~/.local/share/org.sailfishos/browser/ ~/.local/share/org.sailfishos/
        browser.esr78.bak/
    
  5. Remove settings that need to be updated for ESR 91:
    $ rm ~/.local/share/org.sailfishos/browser/.mozilla/ua-update.json
    $ rm -rf ~/.local/share/org.sailfishos/browser/.mozilla/startupCache
    $ rm -rf ~/.local/share/org.sailfishos/browser/.mozilla/prefs.js
    $ rm -rf ~/.local/share/org.sailfishos/browser/__PREFS_WRITTEN__
    
  6. Install the ESR 91 packages:
    $ devel-su rpm -U --force \
        xulrunner-qt5-91.*.rpm \
        xulrunner-qt5-misc-91.*.rpm \
        qtmozembed-qt5-1.*.rpm \
        sailfish-components-webview-qt5-1.*.rpm \
        sailfish-components-webview-qt5-pickers-1.*.rpm \
        sailfish-components-webview-qt5-popups-1.*.rpm \
        embedlite-components-qt5-1.*.rpm \
        sailfish-browser-2.*.rpm \
        sailfish-browser-settings-2.*.rpm \
        mapplauncherd-booster-browser-0.*.rpm
    
  7. Symlink the old ESR 78 library directory to the new location:
    $ devel-su ln -s /usr/lib64/xulrunner-qt5-91.9.1/ /usr/lib64/
        xulrunner-qt5-78.15.1
    
  8. Finally kill the browser booster:
    $ killall booster-browser
    
  9. You're good to go!

For completeness you can use the following to download the much larger debug tarball:
$ curl https://www.flypig.co.uk/dnload/dnload/sailfishos/gecko/gecko-dev-esr91-debug.tar.bz2 \
    -O gecko-dev-esr91-debug.tar.bz2
Then, having unpacked it and after having completed the other instructions, you can then install all of the debug packages with the following command:
$ rpm -U --force \
    xulrunner-qt5-91.*.rpm \
    xulrunner-qt5-debuginfo-91.*.rpm \
    xulrunner-qt5-debugsource-91.*.rpm \
    xulrunner-qt5-misc-91.*.rpm \
    qtmozembed-qt5-1.*.rpm \
    qtmozembed-qt5-debuginfo-1.*.rpm \
    qtmozembed-qt5-debugsource-1.*.rpm \
    sailfish-components-webview-qt5-1.*.rpm \
    sailfish-components-webview-qt5-debuginfo-1.*.rpm \
    sailfish-components-webview-qt5-debugsource-1.*.rpm \
    sailfish-components-webview-qt5-pickers-1.*.rpm \
    sailfish-components-webview-qt5-popups-1.*.rpm \
    embedlite-components-qt5-1.*.rpm \
    embedlite-components-qt5-debuginfo-1.*.rpm \
    embedlite-components-qt5-debugsource-1.*.rpm \
    sailfish-browser-2.*.rpm \
    sailfish-browser-debuginfo-2.*.rpm \
    sailfish-browser-debugsource-2.*.rpm \
    sailfish-browser-settings-2.*.rpm \
    mapplauncherd-booster-browser-0.*.rpm \
    mapplauncherd-booster-browser-debuginfo-0.*.rpm \
    mapplauncherd-booster-browser-debugsource-0.*.rpm
Alright, let's move on to the process you need to follow if you want to revert the changes and drop back to ESR 78. Here's the way I did it on my phone. Note that I'm using zypper here which you'll need for this. You can install it with devel-su pkcon install zypper if you don't already have it.
  1. Ensure the browser and any apps using the WebView are closed.
  2. Restore your original ESR 78 profile directory.
    $ mv ~/.local/share/org.sailfishos/browser/ ~/.local/share/org.sailfishos/browser.esr91.bak/
    $ cp -r ~/.local/share/org.sailfishos/browser.esr78.bak/ ~/.local/share/org.sailfishos/browser
    
  3. Remove the symlink:
    $ devel-su rm /usr/lib64/xulrunner-qt5-78.15.1
    
  4. Install all of the original packages from the Jolla repository.
    $ devel-su zypper install --oldpackage \
        xulrunner-qt5-78.15.1+git39-1.19.1.jolla \
        xulrunner-qt5-misc-78.15.1+git39-1.19.1.jolla \
        qtmozembed-qt5-1.53.25-1.22.2.jolla \
        sailfish-components-webview-qt5-1.5.21-1.13.1.jolla \
        sailfish-components-webview-qt5-pickers-1.5.21-1.13.1.jolla \
        sailfish-components-webview-qt5-popups-1.5.21-1.13.1.jolla \
        embedlite-components-qt5-1.24.35-1.26.1.jolla \
        sailfish-browser-2.2.63-1.12.1.jolla \
        sailfish-browser-settings-2.2.63-1.12.1.jolla \
        mapplauncherd-booster-browser-0.2.2-1.1.1.jolla
    
  5. Finally kill the booster if it's running.
    $ killall booster-browser
    
  6. You're back to ESR 78!

When you run the zypper command you'll be presented with a list of packages which it says are going to change vendor and which you'll need to agree to. This is normal; for me it looked something like this:
The following 8 packages are going to be upgraded:
  embedlite-components-qt5 mapplauncherd-booster-browser qtmozembed-qt5 
    sailfish-browser sailfish-browser-settings sailfish-components-webview-qt5
  sailfish-components-webview-qt5-pickers sailfish-components-webview-qt5-popups

The following 2 packages are going to be downgraded:
  xulrunner-qt5 xulrunner-qt5-misc

The following 10 packages are going to change vendor:
  embedlite-components-qt5                  -> meego
  mapplauncherd-booster-browser             -> meego
  qtmozembed-qt5                            -> meego
  sailfish-browser                          -> meego
  sailfish-browser-settings                 -> meego
  sailfish-components-webview-qt5           -> meego
  sailfish-components-webview-qt5-pickers   -> meego
  sailfish-components-webview-qt5-popups    -> meego
  xulrunner-qt5                             -> meego
  xulrunner-qt5-misc                        -> meego

8 packages to upgrade, 2 to downgrade, 10  to change vendor.
Overall download size: 39.5 MiB. Already cached: 0 B. After the operation, 94.8 
    MiB will be freed.
Continue? [y/n/v/...? shows all options] (y): y
[...]
If you installed the full set of debug packages you'll need to use an even longer command to restore ESR 78. This should revert the lot of them:
$ zypper install --oldpackage \
    xulrunner-qt5-78.15.1+git39-1.19.1.jolla \
    xulrunner-qt5-debuginfo-78.15.1+git39-1.19.1.jolla \
    xulrunner-qt5-debugsource-78.15.1+git39-1.19.1.jolla \
    xulrunner-qt5-misc-78.15.1+git39-1.19.1.jolla \
    qtmozembed-qt5-1.53.25-1.22.2.jolla \
    qtmozembed-qt5-debuginfo-1.53.25-1.22.2.jolla \
    qtmozembed-qt5-debugsource-1.53.25-1.22.2.jolla \
    sailfish-components-webview-qt5-1.5.21-1.13.1.jolla \
    sailfish-components-webview-qt5-debuginfo-1.5.21-1.13.1.jolla \
    sailfish-components-webview-qt5-debugsource-1.5.21-1.13.1.jolla \
    sailfish-components-webview-qt5-pickers-1.5.21-1.13.1.jolla \
    sailfish-components-webview-qt5-popups-1.5.21-1.13.1.jolla \
    embedlite-components-qt5-1.24.35-1.26.1.jolla \
    embedlite-components-qt5-debuginfo-1.24.35-1.26.1.jolla \
    embedlite-components-qt5-debugsource-1.24.35-1.26.1.jolla \
    sailfish-browser-2.2.63-1.12.1.jolla \
    sailfish-browser-debuginfo-2.2.63-1.12.1.jolla \
    sailfish-browser-debugsource-2.2.63-1.12.1.jolla \
    sailfish-browser-settings-2.2.63-1.12.1.jolla \
    mapplauncherd-booster-browser-0.2.2-1.1.1.jolla \
    mapplauncherd-booster-browser-debuginfo-0.2.2-1.1.1.jolla \
    mapplauncherd-booster-browser-debugsource-0.2.2-1.1.1.jolla
So far I've been doing all of this on my development phone. The next step is for me to install them on my main daily phone. This is an Xperia 10 III (a different one) running Sailfish OS 4.6.

Working through the instructions step-by-step gave the correct results: the browser immediately works fine.

I've created a gecko install page with all of these details summarised on it (there's a link at the top of the main gecko blog page too).

If you want to give this a go I recommend you follow the steps there. If you notice any problems with the installation steps please let me know. However if you find a problem with the browser itself, it would be preferable if you could please create an issue on GitHub about it. That'll be easier for me and others to handle.

Having now installed ESR 91 on my daily phone I'm expecting to find lots of issues and bugs. But in my short time doing real-world testing I've not yet found it to be unusable.

Tomorrow I'm going to start arranging the patches.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
26 Aug 2024 : Day 331 #
I did very little coding yesterday and yet it was a very fruitful day of development. After a discussion with Frajo I was left with several useful outcomes:
  1. Frajo and mal have crafted a nice pull request to fix the Wayland dynamic loading issue.
  2. I need to update my workaround so it can be controlled using an environment variable.
  3. The default preferences need fixing for WebView apps.
  4. The reason for the Email app looking for libxul.so in the wrong directory needs investigation.
It's helpful to be clear about these points because it feels like we're reaching some kind of end state. Once these tasks are completed I'm planning to offer a release for others to test out. A little bit of structure helps me focus and take stock. There are only a few more things to do.

So the first task is to add an environment variable to control whether or not the Wayland EGL dynamic library loading workaround is applied or not. In practice I hope the workaround can be entirely removed before any proper release, but it's needed in the meantime on the current Sailfish OS 4.6 release, so it needs to be kept for now.

Choosing a suitable name for the environment variable turns out to be the hardest part of the task. I've gone for DISABLE_PLAT_EGL_FIX. When this is left unset the workaround will be applied. When set to 1, the workaround will be skipped.

When Frajo requested this environment variable I forgot to check whether he wants a flag to enable the workaround or a flag to disable it. I've gone with a flag to disable it. So it's possible I may need to flip it around at some later date. But for now, I just need something that does the job.

So I've made the following changes to the previous fix in the qtmozembed code:
$ git diff
diff --git a/src/qmozcontext.cpp b/src/qmozcontext.cpp
index 4284895..57622dd 100644
--- a/src/qmozcontext.cpp
+++ b/src/qmozcontext.cpp
@@ -45,10 +45,11 @@ Q_GLOBAL_STATIC(QMozContextPrivate, 
    mozContextPrivateInstance)
 // to a lack of reference counting in ws_init and ws_Terminate here:
 // https://github.com/libhybris/libhybris/blob/master/hybris/egl/ws.c#L99
 // If this can be fixed in libhybris then the loading here can be removed
+static const bool platformEglWorkaround = !getenv(
    &quot;DISABLE_PLAT_EGL_FIX&quot;);
 static void *platformEglHandle = nullptr;
 
 static void platform_egl_workaround_open() {
-  if (platformEglHandle) {
+  if (platformEglHandle || !platformEglWorkaround) {
     // Only load once
     return;
   }
@@ -78,8 +79,9 @@ static void platform_egl_workaround_open() {
 }
 
 static void platform_egl_workaround_close() {
-  if (platformEglHandle) {
+  if (platformEglHandle && platformEglWorkaround) {
     dlclose(platformEglHandle);
+    platformEglHandle = nullptr;
   }
 } 
It's a small change but required a bit of thought. To ensure consistency I need the environment variable to be stored once and then never changed. This constant value must then be used in all places in the code. The above approach is intended to fulfil this. I also thought about integrating the workaround fully into the QMozContextPrivate class. For permanent code this would make sense, but if the plan is to remove this workaround later, then I think it's neater to keep all of the changes minimal and in a single source file.

Having applied these changes and built the packages I then reverted Frajo's workaround on my device and checked that the environment variable does what it's supposed to do. As intended we get a crash when executing the following:
$ DISABLE_PLAT_EGL_FIX=1 harbour-webview
Whereas both of the following commands work successfully:
$ DISABLE_PLAT_EGL_FIX=0 harbour-webview 
$ harbour-webview
So that's the first task completed. Next up is fixing the preferences when running the WebView. The problem seems to be the gfx.webrender.force-disabled preference. When this is set to false the browser or WebView crash, so it needs to default to true. This can all be tested by changing the setting in the prefs.js file, with an entry like this:
user_pref(&quot;gfx.webrender.force-disabled&quot;, true);
Some care is needed though because each app that uses the WebView has its own profile, so there's actually a different prefs.js file stored for each WebView app. This is intentional: different apps may need different settings. But for this particular setting, we really only ever want it to be set to true. For all intents and purposes the option may as well be renamed do.not.crash.

In the gecko code the gfx.webrender.force-disabled configuration option is defined in the StaticPrefList.yaml file:
# Also expose a pref to allow users to force-disable WR. This is exposed
# on all channels because WR can be enabled on qualified hardware on all
# channels.
- name: gfx.webrender.force-disabled
  type: bool
  value: false
  mirror: once
When I looked at this previously I decided not to change this default setting and instead to explicitly set the value in WebEngineSettings::initialize() (part of the sailfish-components-webview project). Unfortunately my recent testing has shown that this isn't enough. It works some of the time, but there seems to be a race condition. If the WebEngineSettings is initialised fast enough the setting goes correctly through to the gecko engine and the application runs fine. But on occasion it happens too late and a crash ensues.

My original motivation for doing it this way was to try to avoid an unnecessary patch on the gecko code. Every patch is a potential pain point, so the fewer the better. But since the alternative approach doesn't seem to be working properly in all cases, changing the default looks like it'll be both the simplest and most effective solution.

Consequently I've changed the default value in the StaticPrefList.yaml file like this:
$ git diff
diff --git a/modules/libpref/init/StaticPrefList.yaml b/modules/libpref/init/
    StaticPrefList.yaml
index 9c8d10557622..96c924f00398 100644
--- a/modules/libpref/init/StaticPrefList.yaml
+++ b/modules/libpref/init/StaticPrefList.yaml
@@ -5067,7 +5067,7 @@
 # channels.
 - name: gfx.webrender.force-disabled
   type: bool
-  value: false
+  value: true
   mirror: once
 
 #ifdef MOZ_WIDGET_GTK
Unfortunately this change requires a rebuild of the gecko library, so I've set it going and it'll run for quite a while. I won't be able to test the change until it's completed. So that gives me a chance to check out the final issue, which is the location the email app is searching for the libuxl.so library.

The directory of the library is version-encoded. So for example the ESR 78 directory is /usr/lib64/xulrunner-qt5-78.15.1 whereas on ESR 91 it's /usr/lib64/xulrunner-qt5-91.9.1.

Adding a symlink so that the email app thinks the latter is actually the former allows the app to work. But the obvious question arises as to why there's a hard-coded directory in use at all. Why doesn't the email app just include sailfish-components-webview-qt5 and let that do the work of loading libxul.so?

A bit more investigation shows that the most likely reason for this is that the libxul.so directory is contained in the binary's rpath:
$ objdump -x /usr/bin/jolla-email | grep 'R.*PATH'
  RPATH                /usr/lib64/xulrunner-qt5-78.15.1
This is most likely coming from the following pkg-config output:
$ pkg-config --cflags sailfishwebengine
-I/usr/include/libsailfishwebengine -I/usr/include/qt5/QtCore -I/usr/include/
    qt5 -I/usr/lib64 -I/usr/include/xulrunner-qt5-91.9.1 -I/usr/include/nspr4
This is happening in the jolla-email code which, unfortunately, is one of Sailfish OS's closed components. So unfortunately I can only speculate as to how or why this might manifest itself in the sourcecode. Most likely it's pulled in using a PKCONFIG statement in a qmake *.pro file.

There are tools such as chrpath that can be used to alter the rpath of an executable, but that's a bit extreme. In fact, if my speculation is correct, this issue will be fixed automatically when the jolla-email binary is built against the ESR 91 packages. In the meantime, it's easy to work around it by creating a symlink from one path to the other:
$ ln -s /usr/lib64/xulrunner-qt5-91.9.1/ /usr/lib64/xulrunner-qt5-78.15.1
That can act as a temporary workaround for users wanting to test ESR 91 until the official release (assuming there is one), when it'll no longer be an issue.

Finally, some of you may have noticed this warning has started appearing in the browser's console output ever since switching to Sailfish OS 4.6:
[W] unknown:0 - MeeGo.QOfono QML module name is deprecated and subject for 
    removal. Please adapt code to &quot;import QOfono&quot;.
There's a bit of a complex history to this. In Sailfish OS 4.6 the MeeGo.QOfono import changed its name to QOfono. The version of ESR 78 that comes with 4.6 was therefore updated to use the new naming scheme. Pekka made this change on the development branch back in 2023:
$ git log -1 16ef5cdf4
commit 16ef5cdf44c2eafd7d93e17a41927ef5da700c2b
Author: Pekka Vuorela <pekka.vuorela@jolla.com>
Date:   Thu Jan 5 12:09:27 2023 +0200

    [components-webview] Migrate to new qofono import. JB#59690
    
    Also dependency was missing.
However this was before 4.6 was released and Sailfish OS 4.5 required the previous name to be used. Consequently I had to revert this change in order to get the browser to run on my development device:
$ git log -1 d5d10ac14
commit d5d10ac14d9a07db664d208919452559d76aeac9
Author: David Llewellyn-Jones <david.llewellyn-jones@jolla.com>
Date:   Fri Jul 19 07:44:11 2024 +0100

    [sailfish-webview] Migrate away from new ofono import
    
    Commit 16ef5cdf4 switched to the new ofono import, but this causes
    issues for me, possibly becaues I'm on an old SDK. So this commit should
    almost certainly be removed, but I'm including it for completeness.
    
    Reverts 16ef5cdf4.
Now that I'm on 4.6 and the new naming scheme works, I need to revert my revert. Which is exactly what I've now done.
$ git revert d5d10ac14
$ git log -1
commit dd86cfd1306540adcae7021c8cdeb880f433d081 (HEAD -> sailfishos-esr91)
Author: David Llewellyn-Jones <david.llewellyn-jones@jolla.com>
Date:   Mon Aug 26 16:48:27 2024 +0100

    Revert &quot;[sailfish-webview] Migrate away from new ofono import&quot;
    
    This reverts commit d5d10ac14d9a07db664d208919452559d76aeac9.
Before I wrap up, the build has now finished and I'm happy to say that the prefs.js fix seems to have done the trick. The Settings app, email app and various other WebView apps I've tried now all work correctly.

That's all quite a lot to have got through today and, by my reckoning, it brings all of my bug-squashing tasks to a close. But not quite all of my tasks. Tomorrow I'll package everything up and write instructions so that others can test out the packages. After that I'll start arranging the patches.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
25 Aug 2024 : Day 330 #
Over the last few days I've been mopping up the last few outstanding issues with the browser. Unfortunately I wasn't able to get the browser cover hang (which is now a crash) fixed, but the workaround is just to disable it for now. The active-tab preview still works, it's only when there are no tabs that the cover will be different.

As a result of where things are at I was all ready to make packages available for others to install and test. I even installed them on my daily phone with the plan to use them as my only browser to see how I got on.

But during the process of doing this I discovered that the WebView is no longer working with the Settings app or the Email app on Sailfish OS 4.6. These were definitely working previously on 4.5, so my immediate thought was that the workaround for the Wayland EGL dynamic loading issue was failing for these apps.

These are both critical apps for Sailfish OS, so I can't reasonably release packages that are going to break them to this extent, so I've decided to hold back on releasing the packages and the instructions for installing them just for now until this can be resolved.

We'll come back to the issue of the Settings and Email apps a little later. Since then I've been having some really fruitful discussions with Frajo (krnlyng). I'm sure I've explained this before, but Frajo is one of Jolla's hardware adaptation and Android App Support gurus and also a good friend. As a Jolla employee it's safe to assume he really knows his stuff. And it's also the truth. Back in 2017, before Frajo started working for Jolla, he was interviewed for the Jolla blog. It's well worth a read, even if some of the info is now a little outdated.

Frajo went to the trouble of testing the ESR 91 packages a few days back on Sailfish OS 4.6. Up to that point I'd only ever run them on 4.5, so was disappointed to discover that they weren't working on 4.6. Some of this context I've already covered in previous diary entries over the last few days.

To cut to the chase, Frajo got to the bottom of the reason for the failure on 4.6, which turns out to be due to a bug at quite a low level in libhybris. Lower than I would ever typically have to deal with myself. My naive understanding of libhybris is that it provides a conversion interface between glibc and the Android Bionic and Binder interfaces. Amongst other things this allows Linux (and therefore Sailfish OS) to transparently make use of Android drivers.

The issue discovered by Frajo is that libhybris will close the dynamically loaded eglplatform_wayland.so library once it's determined that it's no longer being used by any processes. Unfortunately under certain circumstances (specifically if the display is initialise multiple times in more than one thread) it can happen that libhybris will close the dynamic library even though it's still in use.

Frajo, via Raine, provided the workaround of using LD_PRELOAD that I've been using over the last few days. Since then Frajo, in discussion with mal, developed a fix for libhybris which you can see in the associated pull request to the repository. Here's how Frajo explains the relationship between the changes in this pull request and the use of LD_PRELOAD:
 
One might wonder why the LD_PRELOAD trick works at all because the ws_Terminate function sets the wsmod and ws pointers to 0. But i think i know why (a sequence of function calls and my understanding of what happens):

On a thread T1:
  1. eglInitialize() and subsequently dlopen(eglplatform_wayland)
  2. other egl calls...
On another thread T2 interleaved/simultaneously to T1's 2.
  1. eglInitialize() no dlopen
  2. eglTerminate() -> dlclose(eglplatform_wayland) -> ws and wsmod pointers are 0 which results in sometimes the assert(ws != NULL) firing as can be seen sometimes when running the browser when libhybris debug logs are active. C. eglInitialize() -> dlopen(eglplatform_wayland)
  3. pointers ws and wsmod are valid, but different than before and since the library was unloaded in step B and reloaded in step C there might be some pointers or structures that the step 2 stuff depends on which is now different/invalid.
But when LD_PRELOAD is set to preload the eglplatform_wayland library, the library never gets unloaded so the stuff that 2 depends on is still the same stuff after B/C despite the ws and wsmod pointers getting temporarily set to 0 in B (so there might even still be a case where the LD_PRELOAD trick won't work if the T1 thread manages to call a ws function while the pointers are 0).

Frajo tested out ESR 91 with his libhybris fix applied and confirmed that this allowed the browser to run, even without the workarounds. Of course I was keen to test this out for myself, but I also had another reason for wanting to test this solution, which is that I was keen to know whether it would fix the issues with the Settings and Email apps as well.

So today Frajo kindly shared RPM packages for libhybris and friends so that I could test the changes in the pull request on my own device. As per his very wise suggestion, I first made a copy of all of the related packages currently installed on my phone just in case something were to go wrong. Here's how I installed the packages provided by Frajo:
$ rpm -U \
    libhybris-0.0.5.52+pr.testing.20240825175210.1.gdfa51d7-1.8.1.jolla.aarch64.rpm \
    libhybris-libEGL-0.0.5.52+pr.testing.20240825175210.1.gdfa51d7-1.8.1.jolla.aarch64.rpm \
    libhybris-libGLESv2-0.0.5.52+pr.testing.20240825175210.1.gdfa51d7-1.8.1.jolla.aarch64.rpm \
    libhybris-libhardware-0.0.5.52+pr.testing.20240825175210.1.gdfa51d7-1.8.1.jolla.aarch64.rpm \
    libhybris-libsync-0.0.5.52+pr.testing.20240825175210.1.gdfa51d7-1.8.1.jolla.aarch64.rpm \
    libhybris-libwayland-egl-0.0.5.52+pr.testing.20240825175210.1.gdfa51d7-1.8.1.jolla.aarch64.rpm
$ systemctl restart --user lipstick
Rather than rebooting the device, restarting lipstick is sufficient to ensure the changes are applied. With this done and with the workaround I added to qtmozembed removed, I had the same experience as Frajo, in that the browser works correctly. That's a great result.

However, I also found that these packages don't fix the issues with the Settings app or the Email app. This motivated me to look into these two crashes a bit further.

It turns out that the Settings app isn't crashing due to the dynamic library loading issue, but because the prefs.js file contains settings that prevent the WebView from working.

After copying across a working prefs.js file from the harbour-webview mozilla profile, the Settings app now works correctly.

The email app suffered from a different, but equally unrelated, issue. It expects the libxul.so library to be found in the /usr/lib64/xulrunner-qt5-78.15.1 directory. This seems to be somehow baked into the email app. But of course the xulrunner packages install things in the /usr/lib64/xulrunner-qt5-91.9.1 directory.

As a workaround for this I've created a symlink so that anything looking for the ESR 78 library will get the ESR 91 library instead:
$ ln -s /usr/lib64/xulrunner-qt5-91.9.1/ /usr/lib64/xulrunner-qt5-78.15.1
Having applied this workaround the email app also works correctly. So the problem with these two apps turns out to be unrelated to the dynamic library loading issue. To be clear though, the libhybris fix is a requirement for them to work, even if it wasn't causing the crash in these cases.

Since I made copies of the original libhybris packages replaced with Frajo's versions, I'm also able to restore my phone to its previous state like this:
$ rpm -U --force \
    libhybris-0.0.5.50-1.4.1.jolla.aarch64.rpm \
    libhybris-libEGL-0.0.5.50-1.4.1.jolla.aarch64.rpm \
    libhybris-libGLESv2-0.0.5.50-1.4.1.jolla.aarch64.rpm \
    libhybris-libhardware-0.0.5.50-1.4.1.jolla.aarch64.rpm \
    libhybris-libsync-0.0.5.50-1.4.1.jolla.aarch64.rpm \
    libhybris-libwayland-egl-0.0.5.50-1.4.1.jolla.aarch64.rpm
$ systemctl restart --user lipstick
What's the consequence of all this? Well, first off is the great news that Frajo, mal and Raine have collectively looked at this and Frajo has submitted a pull request that will fix the issue in future versions of Sailfish OS. This means that the workaround I added to qtmozembed will no longer be needed and as such I have a new task to update this new piece of code, as Frajo requests:
 
I saw you've already made a workaround with the platform_egl_workaround function. Since this is a real bug in libhybris could you make sure that the workaround can be easily disabled so we can still verify that the libhybris patch works, maybe via some environment variable?

So I'll make this change over the coming days. But I also have a bit more work to do: first to fix the issue that's causing the Settings app to crash and second to look at the Email app as well.

Once all of this is sorted I'll then be in a position to release packages and instructions for how to install them.

I'm sincerely grateful to Frajo and the Jolla team for the work they've put in to identifying and fixing the underlying issue in libhybris. I've said it before but it bears repeating: I would never have been able to solve this on my own and there really is no substitute for a team of developers who really know their stuff.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
24 Aug 2024 : Day 329 #
Yesterday I achieved two semi-wins. The first was applying patch 0038 "Fix mesa egl display and buffer initialisation". The second was implementing a code-only fix to replace LD_PRELOAD. I say semi-wins and that's because neither achieved quite what I had hoped. The patch was supposed to help fix the Wayland EGL crash, which it didn't do. But the patch is needed to allow the browser to work on native devices, so I'm hoping it'll be helpful in achieving that at least. So a semi-win. The code-only fix does remove the need to use LD_PRELOAD which circumvents the Wayland EGL crash, but I'm not convinced it fixes the underlying issue. It's a hackish workaround rather than a real fix. So also a semi-win. I'd love to properly understand what's going on, but I'll have to leave the deep understanding to Frajo and Raine.

There is one outstanding issue that I'm still keen to address. It's been a bit of a thorn that I've not been able to fix, which is the hang occurring when the cover view switches from a gecko screenshot to the "no tabs" cover. At present I have this "no tabs" cover disabled. This is a shame, but as Rob (rob_kouw) highlights on the forum, it's not the end of the world if this can't be fixed. But still, it's annoying that the fix has eluded me.

I thought I'd take another look. Recent changes have caused the behaviour to change. Instead of the browser hanging it now crashes. On the face of it this sounds like a sideways step rather than a move forwards, but for me this is a very positive development. This change is likely due to the dynamic library loading changes that are now in place.

The benefit of having a crash rather than a hang is that we can now get a clean backtrace at the point of failure. Here's what we get:
sailfish-browser: ../src/wayland-client.c:2339: wl_proxy_set_queue: Assertion 
    `proxy->display == queue->display' failed.

Thread 67 &quot;QSGRenderThread&quot; received signal SIGABRT, Aborted.
[Switching to Thread 0x7fb58dd830 (LWP 15988)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50        return ret;
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x0000007fef285d20 in __GI_abort () at abort.c:79
#2  0x0000007fef294c98 in __assert_fail_base (fmt=0x7fef3a5718 &quot;%s%s%s:%u: 
    %s%sAssertion `%s' failed.\n%n&quot;, 
    assertion=assertion@entry=0x7fef11d990 &quot;proxy->display == 
    queue->display&quot;, file=file@entry=0x7fef11d6d0 &quot;../src/
    wayland-client.c&quot;, 
    line=line@entry=2339, function=function@entry=0x7fef11de80 
    &quot;wl_proxy_set_queue&quot;) at assert.c:92
#3  0x0000007fef294cfc in __GI___assert_fail (assertion=0x7fef11d990 
    &quot;proxy->display == queue->display&quot; , file=0x7fef11d6d0 &quot;../
    src/wayland-client.c&quot;, 
    line=2339, function=0x7fef11de80 &quot;wl_proxy_set_queue&quot;) at 
    assert.c:101
#4  0x0000007fef119d18 in wl_proxy_set_queue () from /usr/lib64/
    libwayland-client.so.0
#5  0x0000007ff7fa27e0 in WaylandNativeWindow::finishSwap() () from /usr/lib64/
    libhybris/eglplatform_wayland.so
#6  0x0000007feed23dc4 in _my_eglSwapBuffersWithDamageEXT () from /usr/lib64/
    libEGL.so.1
#7  0x0000007fe7424be0 in ?? () from /usr/lib64/qt5/plugins/
    wayland-graphics-integration-client/libwayland-egl.so
#8  0x0000007ff09827f8 in ?? () from /usr/lib64/libQt5Quick.so.5
#9  0x0000007ff098363c in ?? () from /usr/lib64/libQt5Quick.so.5
#10 0x0000007fef7d96a8 in ?? () from /usr/lib64/libQt5Core.so.5
#11 0x0000007fef6b8b98 in start_thread (arg=0x7fb58dd130) at pthread_create.c:
    479
#12 0x0000007fef3497cc in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/
    clone.S:78
(gdb) 
This is really fascinating stuff. The crash is happening in Wayland, which isn't a total surprise because this is happening in the QML code rather than the browser code. QML should never bring an app down in such an unclean way, so it was always clear that this was going to be quite a low-level failure.

What's more, the crash is happening because of an assertion failure, so it's a controlled crash. That's something that's worth investigating further. Something else that jumps out is that the crash seems to be triggered by a call to _my_eglSwapBuffersWithDamageEXT(). That's also of interest because eglSwapBuffersWithDamageEXT() has been a source of trouble in the past. We actually have a patch — patch 0047 "Drop swap_buffers_with_damage extension support" — that tries to address something similar by disabling the use of eglSwapBuffersWithDamageEXT() in the gecko code.

I applied this patch back on Day 83, but in ESR 91 the way EGL extensions are disabled has changed compared to ESR 78, so this is code I had to make changes to. In particular, the display-related EGL calls (of which this is one) have moved from GLLibraryEGL and into EglDisplay. The split makes sense structurally, but resulted in code being moved around, including the code for disabling extensions, which had to be split into two pieces. Maybe I messed something up when I updated the 0047 patch to overlay these changes?

So I want to make sure that the extensions are properly disabled. There's only one method that makes use of eglSwapBuffersWithDamage() directly and that's the GLContextEGL::SwapBuffers() method. Just to be clear, this isn't what's causing the crash, but I'd like to use it as a means of checking that the call isn't being made here. Once I'm done with this I'll also add a breakpoint to fSwapBuffersWithDamage() but I'm not expecting that to throw up any surprises.

Here's the SwapBuffers() method. As you can see, the call to fSwapBuffersWithDamage() only happens if at least one of the two relevant extensions hasn't been disabled (that's a few too many confusing negatives for a single sentence to bear, but the code should make this clear, even if I've struggled to successfully describe it!).
bool GLContextEGL::SwapBuffers() {
  EGLSurface surface =
      mSurfaceOverride != EGL_NO_SURFACE ? mSurfaceOverride : mSurface;
  if (surface) {
    if ((mEgl->IsExtensionSupported(
             EGLExtension::EXT_swap_buffers_with_damage) ||
         mEgl->IsExtensionSupported(
             EGLExtension::KHR_swap_buffers_with_damage))) {
      std::vector<EGLint> rects;
      for (auto iter = mDamageRegion.RectIter(); !iter.Done(); iter.Next()) {
        const IntRect& r = iter.Get();
        rects.push_back(r.X());
        rects.push_back(r.Y());
        rects.push_back(r.Width());
        rects.push_back(r.Height());
      }
      mDamageRegion.SetEmpty();
      return mEgl->fSwapBuffersWithDamage(surface, rects.data(),
                                          rects.size() / 4);
    }
    return mEgl->fSwapBuffers(surface);
  } else {
    return false;
  }
}
So the plan is to breakpoint on this method. If I get a hit I'll step through to ensure that the fSwapBuffersWithDamage() section is skipped. Assuming it is I can then add a breakpoint inside the block just to make sure it never gets entered. Here are the results of this:
$ gdb sailfish-browser
[...]
(gdb) break GLContextEGL::SwapBuffers
Breakpoint 2 at 0x7ff2358b74: file gfx/gl/GLContextProviderEGL.cpp, line 538.
(gdb) c
Continuing.
[...]
Thread 39 &quot;Compositor&quot; hit Breakpoint 2, mozilla::gl::GLContextEGL::
    SwapBuffers (this=0x7ed01a8020) at gfx/gl/GLContextProviderEGL.cpp:538
538     bool GLContextEGL::SwapBuffers() {
(gdb) n
539       EGLSurface surface =
(gdb) n
541       if (surface) {
(gdb) n
542         if ((mEgl->IsExtensionSupported(
(gdb) n
558         return mEgl->fSwapBuffers(surface);
Okay, so the block is correctly skipped. That's positive. Now I'm going to add a breakpoint inside the block just to ensure this never changes:
(gdb) b GLContextProviderEGL.cpp:708
Breakpoint 5 at 0x7ff237c934: file gfx/gl/GLContextProviderEGL.cpp, line 709.
(gdb) info break
Num     Type           Disp Enb Address            What
2       breakpoint     keep y   0x0000007ff2358b74 in mozilla::gl::GLContextEGL:
    :SwapBuffers() 
                                                   at gfx/gl/
    GLContextProviderEGL.cpp:538
        breakpoint already hit 1 time
5       breakpoint     keep y   0x0000007ff237c934 in mozilla::gl::GLContextEGL:
    :CreateGLContext(std::shared_ptr<mozilla::gl::EglDisplay>, mozilla::gl::
    GLContextDesc const&, void*, void*, bool, nsTSubstring<char>*) at gfx/gl/
    GLContextProviderEGL.cpp:709
(gdb) delete break 2
(gdb) c
Continuing.
[...]
There are no hits of this new breakpoint. So the extensions are certainly being correctly disabled, which is a good thing. I'm just going to quickly check there are no other rogue calls to fSwapBuffersWithDamage() by adding a breakpoint to it directly. There should be no hits:
(gdb) delete break
(gdb) break fSwapBuffersWithDamage
Breakpoint 6 at 0x7ff2358d3c: file gfx/gl/GLLibraryEGL.h, line 448.
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/bin/sailfish-browser 
[...]
I'm able to open a Website and browse its pages without any hits. So that's looking good; it means I can rule it out as being the cause of the issue. So I'm going to switch to a different approach and take a look at the Wayland code where the assertion is being triggered. Here's the method in question, taken from the Sailfish mirror of the Wayland repository:
WL_EXPORT void
wl_proxy_set_queue(struct wl_proxy *proxy, struct wl_event_queue *queue)
{
	pthread_mutex_lock(&proxy->display->mutex);

	wl_list_remove(&proxy->queue_link);

	if (queue) {
		assert(proxy->display == queue->display);
		proxy->queue = queue;
	} else {
		proxy->queue = &proxy->display->default_queue;
	}

	wl_list_insert(&proxy->queue->proxy_list, &proxy->queue_link);

	pthread_mutex_unlock(&proxy->display->mutex);
}
The assertion that's failing is the one requiring proxy->display == queue->display to be true. What does this mean? The details aren't entirely clear to me, but the display value being checked is likely an EGL display value. It looks like there are two different values in use when there should be only one.

That fits with some of the things I noticed when trying to fix what I now know to be the Wayland dynamic library issue. I noted that there were displays with value both 0x00 and 0x01. I'm fairly sure it should just be the latter. If that's the case, then getting the code to always use 0x01 would be a worthwhile goal.

I feel like I've collected enough information about this, but I don't want to dwell on it further. I can add what I've learnt to the bug report and return to it at a later date. In terms of priorities, I think this should be low down the list. Getting packages available for others to install should be a higher priority, as should tidying up all these changes to turn them into patches.

So that's my plan for the next couple of days. First is get some install instructions written up. Second is moving on to "tidying up" mode. These will be the most fruitful next steps.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
23 Aug 2024 : Day 328 #
I've been in a buoyant mood all night since Raine and Frajo shared a solution for getting ESR 91 working on Sailfish OS 4.6. It made it quite hard to sleep actually, knowing how close this brings things to the final goal. I'm quite looking forward to getting to the tidying up stage, which will be the final task that I plan to do on all this.

The tidying-up stage will involve turning all of the commits on the FIREFOX_ESR_91_9_X_RELBRANCH_patches branch of the gecko repository into patches that can be applied as part of the RPM build process. That's going to be a fair bit of work, because I'll need to rationalise and combine patches as well as cross-referencing them against the ESR 78 patches, but it should at least be quite mechanical. That makes it an appealing task to end on.

But right now I want to see if there's a way to integrate Frajo's LD_PRELOAD workaround into the gecko code. You'll recall that by running the browser with this environment variable set to the eglplatform_wayland.so shared library, this ensures the library is dynamically loaded at start up and not dropped. This then prevented the Wayland crash that I've been trying to figure out for the last few days.
$ LD_PRELOAD=/usr/lib64/libhybris/eglplatform_wayland.so gdb sailfish-browser
[...]
If, like me, you're not yet fully familiar with LD_PRELOAD then there's a nice article on Baeldung Linux.

As soon as I saw Frajo's fix I thought of patch 0038 "Fix mesa egl display and buffer initialisation" which has code in to dynamically load various functions exported from the libwayland-egl.so.1 library. This isn't a patch I applied, because it was too messy to do so earlier and, I had thought, the changes would only be needed for native Sailfish OS ports. That's not to say that it's not important to have the browser working on native ports, but the plan was always to get something working rather than get something perfect. So applying this patch felt like extra complexity that it would be better to avoid.

But maybe the patch is now needed to fix this problem too?

It's an awkward patch to apply because all of the code it's supposed to apply to has changed considerably. The underlying structure is all bound up in the GLScreenBuffer changes that were such a big deal previously. As a result I've had to apply the entire patch manually.

That's okay, but it was a fair bit of work. And I'm not totally certain that I've applied it correctly either. It's hard for me to test it properly, I'm hoping others may be able to do this for me. The original patch was put together with Adam Pigg and Frajo, and I'm hoping they'll be able to help again with this at some point.

But right now I just need to know the following: After the changes...
  1. ...does the browser still work if I use the LD_PRELOAD workaround?
  2. ...does this help get the browser to work without the workaround?
If the first of these is still the case, then I'll know I've at least applied the patch well enough not to have broken anything. And after performing a partial build to get the updated library on to my phone as quickly as possible, I find that yes, it does still work when the LD_PRELOAD variable is correctly set.

But unfortunately it doesn't seem to have had any beneficial effect when LD_PRELOAD is left unset. So the next step is to check whether any of the changes I've made are actually being executed.

First up I've added breakpoints to the LoadWaylandFunctions() and UnloadWaylandFunctions() methods which were added by this patch. It turns out, at least on an Xperia 10 III, neither of these breakpoints are hit:
$ LD_PRELOAD=/usr/lib64/libhybris/eglplatform_wayland.so gdb sailfish-browser
[...]
(gdb) break LoadWaylandFunctions
Breakpoint 2 at 0x7ff2330b5c: file ${PROJECT}/gecko-dev/gfx/gl/
    GLContextProviderEGL.cpp, line 208.
(gdb) break UnloadWaylandFunctions
Breakpoint 3 at 0x7ff235de8c: file ${PROJECT}/gecko-dev/gfx/gl/
    GLContextProviderEGL.cpp, line 260.
(gdb) c
Continuing.
[...]
No hits. So maybe we need to add a call to somewhere else that does get executed and, if necessary, add some code there to do something similar. My initial thinking is that the WaylandGLSurface constructor might be a good place. But it turns out this isn't being executed either:
(gdb) break WaylandGLSurface::WaylandGLSurface
Breakpoint 2 at 0x7ff2332fdc: file ${PROJECT}/gecko-dev/gfx/gl/
    GLContextProviderEGL.cpp, line 952.
(gdb) c
Continuing.
[...]
No hits. So I've settled on the GLContextEGL constructor, since I know this gets executed and it feels related to what we're trying to do. So I've added in a call to LoadWaylandFunctions() at the end of the constructor. My reasoning is that if the dynamic library is opened there, maybe it'll have the additional reference count it needs to keep the library open throughout execution.
GLContextEGL::GLContextEGL(const std::shared_ptr<EglDisplay> egl,
                           const GLContextDesc& desc, EGLConfig config,
                           EGLSurface surface, EGLContext context)
    : GLContext(desc, nullptr, false),
      mEgl(egl),
      mConfig(config),
      mContext(context),
      mSurface(surface),
      mFallbackSurface(EGL_NO_SURFACE) {
#ifdef DEBUG
  printf_stderr(&quot;Initializing context %p surface %p on display %p\n&quot;, 
    mContext,
                mSurface, mEgl->mDisplay);
#endif

#if defined(MOZ_WIDGET_QT)
  LoadWaylandFunctions();
#endif
}
Unfortunately this change doesn't make any discernible difference to the crash, so I've decided to shift to investigation mode again to try to find out where the library is currently being opened, as well as where it's being closed, in the hope this will provide some clarity as to either what needs fixing, or where to put the dynamic library loading code.
$ gdb sailfish-browser
[...]
(gdb) b dlopen
Function &quot;dlopen&quot; not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (dlopen) pending.
(gdb) b dlclose
Function &quot;dlclose&quot; not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 2 (dlclose) pending.
(gdb) r
Starting program: /usr/bin/sailfish-browser
[...]
Breakpoint 1, __dlopen (file=0x555565fa88 &quot;/usr/lib64/qt5/plugins/
    platforms/libqwayland-generic.so&quot;, mode=4097) at dlopen.c:75
75      {
(gdb) c
Continuing.
[...]
Breakpoint 1, __dlopen (file=0x555566f428 &quot;/usr/lib64/qt5/plugins/
    platforminputcontexts/libmaliitplatforminputcontextplugin.so&quot;, 
    mode=4097) at dlopen.c:75
75      {
(gdb) c
Continuing.
[...]
It turns out there are a great many dynamic libraries opened by the code. I don't know why I was quite so surprised at just how many there are as it makes perfect sense in retrospect, but I was surprised nevertheless. The debug output goes on and on, but all looks similar to the above. So I've cut out everything except the function call and library names.
__dlopen() libqwayland-generic.so
__dlopen() libmaliitplatforminputcontextplugin.so
__dlopen() libcustomcontext.so
__dlopen() libwayland-egl.so
__dlopen() linker/q.so
__dlopen() eglplatform_wayland.so
__dlopen() liblgpllibs.so
__dlopen() libxul.so
__dlopen() file=0x0
rtld_active
__dlopen() file=0x0
__dlopen() libqgif.so
__dlopen() libqico.so
__dlopen() libqjpeg.so
__dlopen() libqsvg.so
__dlopen() libqtiff.so
__dlopen() libqwebp.so
__dlopen() libqtquick2plugin.so
__dlopen() libwindowplugin.so
__dlopen() libsailfishsilicaplugin.so
__dlopen() libsailfishpolicyplugin.so
__dlopen() libsystemsettingsplugin.so
__dlopen() libsystemsettingsplugin.so
__dlopen() libnemosystemsettings.so
__dlopen() libSailfishSilicaBackgroundPlugin.so
__dlopen() libGLESv1_CM.so.1
__dlopen() libGLESv1_CM.so.1
__dlopen() libGLESv1_CM.so.1
__dlopen() libGLESv1_CM.so.1
__dlopen() libGLESv1_CM.so.1
__dlopen() libnemoconfiguration.so
__dlopen() libnemodbus.so
__dlopen() libsailfishwebviewpickersplugin.so
__dlopen() libsailfishwebviewpopupsplugin.so
__dlopen() libsailfishwebviewcontrolsplugin.so
__dlopen() libqmlmozembedpluginqt5.so
__dlopen() libQOfonoQtDeclarative.so
__dlopen() libkeepaliveplugin.so
__dlopen() libsailfishwebengineplugin.so
__dlopen() libConnmanQtDeclarative.so
__dlopen() libnemopolicy.so
__dlopen() libnemoconnectivity.so
__dlopen() libsailfishshareplugin.so
__dlopen() libqtgraphicaleffectsprivate.so
__dlopen() libmodelsplugin.so
__dlopen() libqsqlite.so
__dlopen() libqconnmanbearer.so
__dlopen() libdeclarative_feedback.so
__dlopen() libqtfeedback_libngf.so
__dlopen() libGLESv2.so.2
__dlopen() libsoftokn3.so
__dlopen() libfreeblpriv3.so
__dlopen() libnspr4.so
__dlopen() libnssutil3.so
__dlopen() libnssckbi.so
__dlopen() libEGL.so.1
rtld_active()
__dlopen() libEGL.so
__dlopen() libEGL.so.1
__dlopen() libGL.so
__dlopen() libGL.so.1
__dlopen() libGLESv2.so
__dlopen() libGLESv2.so.2
rtld_active()
__dlopen() eglplatform_wayland.so
__dlopen() ibosclientcerts.so
Even with all the cruft cut out it's still a long list. Here the __dlopen lines are where a dynamic library is opened, whereas the rtld_active() calls are where a library is closed. Unfortunately the debugger doesn't provide clues as to which library is being closed.

For a few of the more interesting cases I also captured backtraces. These show us that the first couple of calls to open libwayland-egl.so are all triggered by the Qt code, rather than the gecko code. The eglplatform_wayland.so library is loaded by gecko, but it's loaded first as a result of a call to fGetDisplay() rather than the explicit load that I added. This explicit load does execute, but slightly later on and, in fact, it's the last thing to happen before the crash occurs.
Breakpoint 1, __dlopen (file=0x5555675868 &quot;/usr/lib64/qt5/plugins/
    wayland-graphics-integration-client/libwayland-egl.so&quot;, mode=4097) at 
    dlopen.c:75
(gdb) bt
#0  __dlopen (file=0x5555675868 &quot;/usr/lib64/qt5/plugins/
    wayland-graphics-integration-client/libwayland-egl.so&quot;, mode=4097) at 
    dlopen.c:75
#1  0x0000007fef9b6088 in ?? () from /usr/lib64/libQt5Core.so.5
#2  0x0000007fef9af4bc in ?? () from /usr/lib64/libQt5Core.so.5
#3  0x0000007fef9af94c in ?? () from /usr/lib64/libQt5Core.so.5
#4  0x0000007fef9a2820 in QFactoryLoader::instance(int) const () from /usr/
    lib64/libQt5Core.so.5
#5  0x0000007fe770cdb4 in ?? () from /usr/lib64/libQt5WaylandClient.so.5
#6  0x0000007fe76eea78 in QtWaylandClient::QWaylandIntegration::
    initializeClientBufferIntegration() () from /usr/lib64/
    libQt5WaylandClient.so.5
#7  0x0000007fe76eebfc in QtWaylandClient::QWaylandIntegration::
    clientBufferIntegration() const () from /usr/lib64/libQt5WaylandClient.so.5
#8  0x0000007fe76ee598 in QtWaylandClient::QWaylandIntegration::hasCapability(
    QPlatformIntegration::Capability) const ()
   from /usr/lib64/libQt5WaylandClient.so.5
#9  0x0000007ff099ec18 in QSGRenderLoop::instance() () from /usr/lib64/
    libQt5Quick.so.5
#10 0x0000007ff09cf2b4 in QQuickWindowPrivate::init(QQuickWindow*, 
    QQuickRenderControl*) () from /usr/lib64/libQt5Quick.so.5
#11 0x0000007ff0a7262c in QQuickView::QQuickView(QWindow*) () from /usr/lib64/
    libQt5Quick.so.5
#12 0x0000007ff0ca5a80 in MDeclarativeCachePrivate::qQuickView() () from /usr/
    lib64/libmdeclarativecache5.so.0
#13 0x000000555557b31c in main (argc=<optimized out>, argv=0x7ffffff298) at 
    main.cpp:88
[...]
Breakpoint 1, __dlopen (file=0x7fffffe418 &quot;/usr/lib64/libhybris//
    eglplatform_wayland.so&quot;, mode=1) at dlopen.c:75
(gdb) bt
#0  __dlopen (file=0x7fffffe418 &quot;/usr/lib64/libhybris//
    eglplatform_wayland.so&quot;, mode=1) at dlopen.c:75
#1  0x0000007feeebbb04 in ws_init () from /usr/lib64/libEGL.so.1
#2  0x0000007feeeba478 in ?? () from /usr/lib64/libEGL.so.1
#3  0x0000007fe751f008 in ?? () from /usr/lib64/qt5/plugins/
    wayland-graphics-integration-client/libwayland-egl.so
#4  0x0000007fe76ee8f0 in QtWaylandClient::QWaylandIntegration::
    initializeClientBufferIntegration() () from /usr/lib64/
    libQt5WaylandClient.so.5
#5  0x0000007fe76eebfc in QtWaylandClient::QWaylandIntegration::
    clientBufferIntegration() const () from /usr/lib64/libQt5WaylandClient.so.5
[...]
Thread 41 &quot;Compositor&quot; hit Breakpoint 1, __dlopen (file=0x7fb4f43498 
    &quot;/usr/lib64/libhybris//eglplatform_wayland.so&quot;, mode=1) at 
    dlopen.c:75
(gdb) bt
#0  __dlopen (file=0x7fb4f43498 &quot;/usr/lib64/libhybris//
    eglplatform_wayland.so&quot;, mode=1) at dlopen.c:75
#1  0x0000007feeebbb04 in ws_init () from /usr/lib64/libEGL.so.1
#2  0x0000007feeeba478 in ?? () from /usr/lib64/libEGL.so.1
#3  0x0000007ff236afa0 in mozilla::gl::GLLibraryEGL::fGetDisplay (
    display_id=0x0, this=0x7ee81a3b60)
    at ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.h:193
#4  mozilla::gl::GetAndInitDisplay (egl=..., displayType=displayType@entry=0x0, 
    display=display@entry=0x0)
    at ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp:151
#5  0x0000007ff236b550 in mozilla::gl::GLLibraryEGL::CreateDisplay (
    this=this@entry=0x7ee81a3b60, forceAccel=forceAccel@entry=false, 
    out_failureId=out_failureId@entry=0x7fb4f43fe0, aDisplay=aDisplay@entry=0x0)
    at ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp:813
#6  0x0000007ff236c6a4 in mozilla::gl::GLLibraryEGL::DefaultDisplay (
    this=0x7ee81a3b60, out_failureId=out_failureId@entry=0x7fb4f43fe0)
    at ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp:745
#7  0x0000007ff236c7c4 in mozilla::gl::GLContextProviderEGL::
    CreateWrappingExisting (aContext=0x7ee80048a0, aSurface=0x5555b9ac50, 
    aDisplay=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/StaticPtr.h:150
#8  0x0000007ff4d77264 in mozilla::embedlite::nsWindow::GetGLContext (
    this=this@entry=0x7fbc9b8e10)
    at ${PROJECT}/gecko-dev/mobile/sailfishos/embedshared/nsWindow.cpp:405
#9  0x0000007ff4d7741c in mozilla::embedlite::nsWindow::GetNativeData (
    this=0x7fbc9b8e10, aDataType=12)
    at ${PROJECT}/gecko-dev/mobile/sailfishos/embedshared/nsWindow.cpp:173
#10 0x0000007ff23e80f4 in mozilla::layers::CompositorOGL::CreateContext (
    this=this@entry=0x7ee81a32a0)
    at ${PROJECT}/gecko-dev/gfx/layers/opengl/CompositorOGL.cpp:232
#11 0x0000007ff23fd31c in mozilla::layers::CompositorOGL::Initialize (
    this=0x7ee81a32a0, out_failureReason=0x7fb4f445a0)
    at ${PROJECT}/gecko-dev/gfx/layers/opengl/CompositorOGL.cpp:387
#12 0x0000007ff251cd50 in mozilla::layers::CompositorBridgeParent::
    NewCompositor (this=this@entry=0x7fbc9ab8f0, aBackendHints=...)
    at ${PROJECT}/gecko-dev/gfx/layers/ipc/CompositorBridgeParent.cpp:1493
#13 0x0000007ff2533bb4 in mozilla::layers::CompositorBridgeParent::
    InitializeLayerManager (this=this@entry=0x7fbc9ab8f0, aBackendHints=...)
    at ${PROJECT}/gecko-dev/gfx/layers/ipc/CompositorBridgeParent.cpp:1436
#14 0x0000007ff2533d40 in mozilla::layers::CompositorBridgeParent::
    AllocPLayerTransactionParent (this=this@entry=0x7fbc9ab8f0, 
    aBackendHints=..., aId=...)
    at ${PROJECT}/gecko-dev/gfx/layers/ipc/CompositorBridgeParent.cpp:1546
[...]
Thread 41 &quot;Compositor&quot; hit Breakpoint 1, __dlopen (file=0x7ff62b2bc0 
    &quot;eglplatform_wayland.so&quot;, mode=1) at dlopen.c:75
(gdb) bt
#0  __dlopen (file=0x7ff62b2bc0 &quot;eglplatform_wayland.so&quot;, mode=1) at 
    dlopen.c:75
#1  0x0000007ff2353edc in mozilla::gl::GLContextEGL::GLContextEGL (
    this=0x7ee81aaea0, 
    egl=std::shared_ptr<mozilla::gl::EglDisplay> (use count 3, weak count 2) = 
    {...}, desc=..., config=0x0, surface=0x5555b9ac50, context=0x7ee80048a0)
    at ${PROJECT}/gecko-dev/gfx/gl/GLContextProviderEGL.cpp:519
#2  0x0000007ff236c81c in mozilla::gl::GLContextProviderEGL::
    CreateWrappingExisting (aContext=0x7ee80048a0, aSurface=0x5555b9ac50, 
    aDisplay=<optimized out>) at ${PROJECT}/gecko-dev/gfx/gl/
    GLContextProviderEGL.cpp:1216
#3  0x0000007ff4d77264 in mozilla::embedlite::nsWindow::GetGLContext (
    this=this@entry=0x7fbc9b8e10)
    at ${PROJECT}/gecko-dev/mobile/sailfishos/embedshared/nsWindow.cpp:405
#4  0x0000007ff4d7741c in mozilla::embedlite::nsWindow::GetNativeData (
    this=0x7fbc9b8e10, aDataType=12)
    at ${PROJECT}/gecko-dev/mobile/sailfishos/embedshared/nsWindow.cpp:173
#5  0x0000007ff23e80f4 in mozilla::layers::CompositorOGL::CreateContext (
    this=this@entry=0x7ee81a32a0)
    at ${PROJECT}/gecko-dev/gfx/layers/opengl/CompositorOGL.cpp:232
#6  0x0000007ff23fd31c in mozilla::layers::CompositorOGL::Initialize (
    this=0x7ee81a32a0, out_failureReason=0x7fb4f445a0)
    at ${PROJECT}/gecko-dev/gfx/layers/opengl/CompositorOGL.cpp:387
#7  0x0000007ff251cd50 in mozilla::layers::CompositorBridgeParent::
    NewCompositor (this=this@entry=0x7fbc9ab8f0, aBackendHints=...)
    at ${PROJECT}/gecko-dev/gfx/layers/ipc/CompositorBridgeParent.cpp:1493
#8  0x0000007ff2533bb4 in mozilla::layers::CompositorBridgeParent::
    InitializeLayerManager (this=this@entry=0x7fbc9ab8f0, aBackendHints=...)
    at ${PROJECT}/gecko-dev/gfx/layers/ipc/CompositorBridgeParent.cpp:1436
#9  0x0000007ff2533d40 in mozilla::layers::CompositorBridgeParent::
    AllocPLayerTransactionParent (this=this@entry=0x7fbc9ab8f0, 
    aBackendHints=..., aId=...)
    at ${PROJECT}/gecko-dev/gfx/layers/ipc/CompositorBridgeParent.cpp:1546
#10 0x0000007ff4d5e6e4 in mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    AllocPLayerTransactionParent (this=0x7fbc9ab8f0, aBackendHints=..., 
    aId=...)
    at ${PROJECT}/gecko-dev/mobile/sailfishos/embedthread/
    EmbedLiteCompositorBridgeParent.cpp:80
#11 0x0000007ff1df7634 in mozilla::layers::PCompositorBridgeParent::
    OnMessageReceived (this=0x7fbc9ab8f0, msg__=...) at 
    PCompositorBridgeParent.cpp:1285
[...]
So I've ended up adding the following code into the GLContextEGL.cpp file. This is in the GLContextEGL constructor and the new part is the section surrounded by the MOZ_WIDGET_QT pre-processor condition:
static bool platformEglFunctionsLoaded = false;
static void *platformEglHandle = nullptr;
static void (*_wl_proxy_destroy_platform)(struct wl_proxy *proxy) = nullptr;


GLContextEGL::GLContextEGL(const std::shared_ptr<EglDisplay> egl,
                           const GLContextDesc& desc, EGLConfig config,
                           EGLSurface surface, EGLContext context)
    : GLContext(desc, nullptr, false),
      mEgl(egl),
      mConfig(config),
      mContext(context),
      mSurface(surface),
      mFallbackSurface(EGL_NO_SURFACE) {
#ifdef DEBUG
  printf_stderr(&quot;Initializing context %p surface %p on display %p\n&quot;, 
    mContext,
                mSurface, mEgl->mDisplay);
#endif

#if defined(MOZ_WIDGET_QT)
  printf_stderr(&quot;DLOPEN: checking eglplatform_wayland.so\n&quot;);
  if (platformEglFunctionsLoaded) {
    printf_stderr(&quot;DLOPEN: already loaded\n&quot;);
    return;
  }

  printf_stderr(&quot;DLOPEN: loading eglplatform_wayland.so\n&quot;);
  platformEglHandle = dlopen(&quot;eglplatform_wayland.so&quot;, RTLD_NOW | 
    RTLD_GLOBAL | RTLD_NODELETE | RTLD_DEEPBIND);
  if (!platformEglHandle) {
    printf_stderr(&quot;DLOPEN: error loading eglplatform_wayland.so\n&quot;);
  }
  platformEglFunctionsLoaded = true;

  *(void **)(&_wl_proxy_destroy_platform) = dlsym(platformEglHandle, 
    &quot;wl_proxy_destroy&quot;);
  if (!_wl_proxy_destroy_platform) {
    printf_stderr(&quot;DLOPEN: Error loading wl_proxy_destroy from 
    eglplatform_wayland.so\n&quot;);
  }
  else {
    printf_stderr(&quot;DLOPEN: loaded wl_proxy_destroy from 
    eglplatform_wayland.so\n&quot;);
  }

  printf_stderr(&quot;DLOPEN: loaded eglplatform_wayland.so\n&quot;);
#endif

}
The idea here is that the dynamic library is loaded, with all of the flags that looked relevant applied. A reference to the wl_proxy_destroy() method is pulled from the library and all this is just to try to ensure the library stays loaded. But, unfortunately, with all of this added the crash still occurs. We can see that the code is being executed from the debug output, it's just not having the desired effect:
$ sailfish-browser
[...]
Created LOG for EmbedLiteLayerManager
library &quot;libui_compat_layer.so&quot; not found
DLOPEN: checking eglplatform_wayland.so
DLOPEN: loading eglplatform_wayland.so
DLOPEN: loaded wl_proxy_destroy from eglplatform_wayland.so
DLOPEN: loaded eglplatform_wayland.so
Segmentation fault
One possibility is that the execution of the code is happening too late to be useful. If Qt has already opened the library, it's possible that this code is having no real effect when it comes to being closed again. To test out this theory I've moved this code to the very top of the main() function in the sailfish-browser code. This is practically the first thing that gets executed, so if anywhere is going to work, it should be here.

Here's what I've added:
static void platform_egl_workaround() {
  static void *platformEglHandle = nullptr;
  if (platformEglHandle) {
    return;
  }

  printf(&quot;Pre-loading eglplatform_wayland.so\n&quot;);
  platformEglHandle = dlopen(&quot;/usr/lib64/libhybris/
    eglplatform_wayland.so&quot;, RTLD_LAZY);
  if (!platformEglHandle) {
    printf(&quot;Error pre-loading eglplatform_wayland.so\n&quot;);
  }
}
With this added the browser now runs successfully, even without LD_PRELOAD being set. So that's a nice result. But this is definitely not the right place for it. For example, although this fixes things for the browser, the problem persists with the WebView, since this doesn't using the main() method at all.

A bit more testing shows that even if I move it to right at the end of the main() function, just before the execution loop is started, the crash is still avoided:
Q_DECL_EXPORT int main(int argc, char *argv[])
{
[...]
    platform_egl_workaround();

    return app->exec();
}
That means I have a fair bit of leeway here in finding a suitable place. There are a number of options, but I'm gravitating towards the qtmozembed codebase, since this is used by both the browser and WebView and wraps the rendering code. I've therefore added something similar in to the QMozContextPrivate constructor.

Happily this still works: the browser and WebView both work without the need to set the LD_PRELOAD variable. I've cleaned up the code so that the library is searched for, rather than having the hardcoded path, but it's otherwise essentially the same. We can see the debug output when executing with an appropriate QT_LOGGING_RULES setting. See here, where have the debug output from platform_egl_workaround_open:
$ QT_LOGGING_RULES=&quot;org.sailfishos.embedliteext=true&quot; harbour-webview
[D] unknown:0 - Using Wayland-EGL
library &quot;libui_compat_layer.so&quot; not found
library &quot;libutils.so&quot; not found
library &quot;libcutils.so&quot; not found
library &quot;libhardware.so&quot; not found
library &quot;android.hardware.graphics.mapper@2.0.so&quot; not found
library &quot;android.hardware.graphics.mapper@2.1.so&quot; not found
library &quot;android.hardware.graphics.mapper@3.0.so&quot; not found
library &quot;android.hardware.graphics.mapper@4.0.so&quot; not found
library &quot;libc++.so&quot; not found
library &quot;libhidlbase.so&quot; not found
library &quot;libgralloctypes.so&quot; not found
library &quot;android.hardware.graphics.common@1.2.so&quot; not found
library &quot;libion.so&quot; not found
library &quot;libz.so&quot; not found
library &quot;libhidlmemory.so&quot; not found
library &quot;android.hidl.memory@1.0.so&quot; not found
library &quot;vendor.qti.qspmhal@1.0.so&quot; not found
[D] QMozContextPrivate::QMozContextPrivate:102 - Create new Context: 
    0x71a7f7e778 , parent: 0x0 /usr/bin
[D] platform_egl_workaround_open:71 - Pre-loading eglplatform_wayland.so at 
    from &quot;/usr/lib64/libhybris//eglplatform_wayland.so&quot;
greHome from GRE_HOME:/usr/bin
libxul.so is not found, in /usr/bin/libxul.so
I admit that this isn't the nicest of solutions: it's very much a hack. But Raine seems to think that the underlying problem may be in the libhybris code rather than in the gecko code. He thinks it may be due to a lack of reference counting in ws_init and ws_Terminate. If that's the case then the proper fix will need to go there and, once that's deployed, this hack can be removed.

I'm not so convinced myself. ESR 78 is working fine on Sailfish OS 4.6 and it's unclear to me why it would be unaffected by this if it's a problem in the underlying libhybris code. Nevertheless for now the fix is there and it's working. Maybe this will be something to return to in the future, but for now, that's good enough for me.

And this is also good enough for a day's work on this today. Tomorrow I'll finalise this so that I can then move on to the tidying up phase!

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
22 Aug 2024 : Day 327 #
Before getting into development, Leif-Jöran Olsson shared this beautiful "half verse" (his description!) on the topic of travelling to Jersey. It captures my trip back in a few pithy words and I really can't tell you how much I enjoy receiving these. Thank you ljo!
 
Flypig just caught the Jersey(town) ferry, it is not a cold day in August like he said it would be. A case of gone was all he carried ... General wakeup call at 5 am. Might the egl build be ready? Fingers crossed.

Apart from my overnight ferry build, if you've been following along you may also recall that I recently updated one of my dev devices to Sailfish OS 4.6, so that I could debug the failing ESR 91 build on it. Previously I'd been using the device to run ESR 78 for comparison purposes. Because it wasn't my main development device I was using an Xperia 10 II, with my Xperia 10 III as my main dev device running the latest ESR 91 build.

After doing my best using the Xperia 10 II for debugging the ESR 91 code I eventually got fed up of waiting for the debugger. The 10 II isn't a bad device, but it's now getting on a bit (released in 2020) and it just doesn't have the raw power of the Xperia 10 III. This really shows itself when debugging code; I found myself waiting ten minutes or more for gdb to print out a member variable of a class instance.

So this morning I decided to upgrade me Xperia 10 III dev device to Sailfish OS 4.6 as well. This is a bit of a bold move because it means I'll have no devices on which ESR 91 will actually run. It works nicely on Sailfish OS 4.5 but I've yet to find the reason for it failing on 4.6. I really do need the extra oomf of the 10 III though. It'll motivate me (if I wasn't already motivated enough) to get this issue fixed as soon as possible.

And I really do want to get it fixed. This is potentially the last issue I need to deal with before moving on to the "tidying up" phase of this whole process. It'd be a real weight off my shoulders if it were working.

Having performed the upgrade, copied over and installed all of the packages, I'm now ready to actually benefit from that additional debuggging power. As I left things yesterday it looked like there was an issue occurring in the CompositorBridgeParent::NewCompositor() method. There were two reasons I felt the problem might be happening there. First, when stepping through the code, the segmentation fault happened mid-execution of the method.

That in itself isn't a guarantee of it being the source of the problem because the crash is happening in a different thread. Still, the two could be related. But there's a second reason as well, and that's that it looks like the compositor isn't being successfully created. If that's true, it would be prime candidate for a big — potentially browser-crashing — problem that needs fixing.

Here's the code that we're stepping through for reference (abridged to aid clarity):
RefPtr<Compositor> CompositorBridgeParent::NewCompositor(
    const nsTArray<LayersBackend>& aBackendHints) {
  for (size_t i = 0; i < aBackendHints.Length(); ++i) {
    RefPtr<Compositor> compositor;
    if (aBackendHints[i] == LayersBackend::LAYERS_OPENGL) {
      compositor =
          new CompositorOGL(this, mWidget, mEGLSurfaceSize.width,
                            mEGLSurfaceSize.height, mUseExternalSurfaceSize);
    } else if (aBackendHints[i] == LayersBackend::LAYERS_BASIC) {
      compositor = new BasicCompositor(this, mWidget);
    }
    nsCString failureReason;

    const int max_fb_size = 32767;
    const LayoutDeviceIntSize size = mWidget->GetClientSize();
    if (size.width > max_fb_size || size.height > max_fb_size) {
      failureReason = &quot;FEATURE_FAILURE_MAX_FRAMEBUFFER_SIZE&quot;;
      return nullptr;
    }

    if (compositor && compositor->Initialize(&failureReason)) {
      if (failureReason.IsEmpty()) {
        failureReason = &quot;SUCCESS&quot;;
      }

      // should only report success here
      if (aBackendHints[i] == LayersBackend::LAYERS_OPENGL) {
        Telemetry::Accumulate(Telemetry::OPENGL_COMPOSITING_FAILURE_ID,
                              failureReason);
      }

      return compositor;
    }

    // report any failure reasons here
    if (aBackendHints[i] == LayersBackend::LAYERS_OPENGL) {
      gfxCriticalNote << &quot;[OPENGL] Failed to init compositor with reason: 
    &quot;
                      << failureReason.get();
      Telemetry::Accumulate(Telemetry::OPENGL_COMPOSITING_FAILURE_ID,
                            failureReason);
    }
  }

  return nullptr;
}
When I stepped through this yesterday the programme counter jumped straight from the start of the for loop straight through to the line setting gfxCriticalNote for failure reporting. That suggested to me that the code in between was being skipped, most likely because none of the conditions were being met to execute it. In particular, it looked like the call to create a CompositorOGL instance was being skipped.

When I try stepping through the code again today, I realise that this was a misreading of what's happening. The jump is actually due to a compiler optimisation and, in fact, the code is creating a CompositorOGL instance after all.

Here's the step through of the code from this morning. I know these debugging step-throughs are hard to follow on their own, so I've added some annotations using comments as I've gone along to try to explain my thinking.
Thread 38 &quot;Compositor&quot; hit Breakpoint 1, mozilla::layers::
    CompositorBridgeParent::NewCompositor (this=this@entry=0x7fbcbb0190, 
    aBackendHints=...) at gfx/layers/ipc/CompositorBridgeParent.cpp:1455
1455        const nsTArray<LayersBackend>& aBackendHints) {
(gdb) n
1456      for (size_t i = 0; i < aBackendHints.Length(); ++i) {
(gdb) p aBackendHints.mHdr->mLength
$7 = 2
(gdb) n
1515          gfxCriticalNote << &quot;[OPENGL] Failed to init compositor with 
    reason: &quot;
(gdb) # Don't be fooled, this is an optimisation
(gdb) n
1458        if (aBackendHints[i] == LayersBackend::LAYERS_OPENGL) {
(gdb) # We're back up top again
(gdb) n
1461                                mEGLSurfaceSize.height, 
    mUseExternalSurfaceSize);
(gdb) # We have a LAYERS_OPENGL situation, which is good
(gdb) n
1476        nsCString failureReason;
(gdb) n
1485        const LayoutDeviceIntSize size = mWidget->GetClientSize();
(gdb) n
1486        if (size.width > max_fb_size || size.height > max_fb_size) {
(gdb) n
1493        if (compositor && compositor->Initialize(&failureReason)) {
(gdb) # We want to go into that Initialise calls
(gdb) s
mozilla::layers::CompositorOGL::Initialize (this=0x7ed81a2b90, 
    out_failureReason=0x7fb50ed5a0) at gfx/layers/opengl/CompositorOGL.cpp:378
378     bool CompositorOGL::Initialize(nsCString* const out_failureReason) {
(gdb) n
379       ScopedGfxFeatureReporter reporter(&quot;GL Layers&quot;);
(gdb) n
385       if (!mGLContext) {
(gdb) p mGLContext
$8 = {mRawPtr = 0x0}
(gdb) n
387         mGLContext = CreateContext();
(gdb) # We're going to head into this too
(gdb) s
mozilla::layers::CompositorOGL::CreateContext (this=this@entry=0x7ed81a2b90) at 
    gfx/layers/opengl/CompositorOGL.cpp:227
227     already_AddRefed<mozilla::gl::GLContext> CompositorOGL::CreateContext() 
    {
(gdb) n
231       nsIWidget* widget = mWidget->RealWidget();
(gdb) n
232       void* widgetOpenGLContext =
(gdb) n

Thread 1 &quot;sailfish-browse&quot; received signal SIGSEGV, Segmentation 
    fault.
[Switching to Thread 0x7feb174010 (LWP 2399)]
0x0000007fe761e2d0 in wl_display_read_events () from /usr/lib64/
    libwayland-client.so.0
(gdb) 
So while we do see the crash again here, it doesn't seem to be due to a missing compositor. Looking at the debugging above, it looks more likely the call to CompositorOGL::CreateContext() is the problem. Unfortunately, despite stepping through that method and trying my best to pin things down, I've still yet to establish whether this is actually causing the crash.

However, something happened since that's caused me to change tack. Mid-afternoon I received a message from Raine letting me that he and Frajo have managed to get the new ESR 91 version working on a Sailfish OS 4.6 device. Here's Raine's comment discussing what Frajo has been working on:
 
Could you try this?
LD_DEBUG=libs LD_PRELOAD=/usr/lib64/libhybris/eglplatform_wayland.so 
    sailfish-browser
Above worked for him. This should keep the eglplatform_wayland.so loaded and prevent unloading. The reason looks to be somewhere where dlunload happens, qtmozembed spots didn’t help but the issue might in libhybris itself.


If this is indeed the problem then this will be a great result. So Let's try it on my own device here. When I set LD_DEBUG=libs as Raine suggested there's a huge amount of debug output generated — way more than I can reasonably include here — so I've dropped that part for the sake of brevity. But I get the same overall result whether it's set or not:
$ LD_PRELOAD=/usr/lib64/libhybris/eglplatform_wayland.so sailfish-browser
[D] unknown:0 - Using Wayland-EGL
library &quot;libui_compat_layer.so&quot; not found
library &quot;libutils.so&quot; not found
library &quot;libcutils.so&quot; not found
library &quot;libhardware.so&quot; not found
library &quot;android.hardware.graphics.mapper@2.0.so&quot; not found
library &quot;android.hardware.graphics.mapper@2.1.so&quot; not found
library &quot;android.hardware.graphics.mapper@3.0.so&quot; not found
library &quot;android.hardware.graphics.mapper@4.0.so&quot; not found
library &quot;libc++.so&quot; not found
library &quot;libhidlbase.so&quot; not found
library &quot;libgralloctypes.so&quot; not found
library &quot;android.hardware.graphics.common@1.2.so&quot; not found
library &quot;libion.so&quot; not found
library &quot;libz.so&quot; not found
library &quot;libhidlmemory.so&quot; not found
library &quot;android.hidl.memory@1.0.so&quot; not found
library &quot;vendor.qti.qspmhal@1.0.so&quot; not found
greHome from GRE_HOME:/usr/bin
libxul.so is not found, in /usr/bin/libxul.so
Created LOG for EmbedLiteTrace
[W] unknown:0 - MeeGo.QOfono QML module name is deprecated and subject for 
    removal. Please adapt code to &quot;import QOfono&quot;.
[D] onCompleted:108 - ViewPlaceholder requires a SilicaFlickable parent
Created LOG for EmbedLite
[D] unknown:0 - Updating services as GetServices returns
[D] unknown:0 - No default route set, services: 19
[D] unknown:0 - Selected service &quot;Moominland&quot; path &quot;/net/connman/
    service/wifi_3c38f400343d_4d6f6f6d696e6c616e64_managed_psk&quot;
Created LOG for EmbedPrefs
Created LOG for EmbedLiteLayerManager
Crash Annotation GraphicsCriticalError: |[0][GFX1-]: Failed to create 
    EGLConfig! (t=1.32095) [GFX1-]: Failed to create EGLConfig!
Crash Annotation GraphicsCriticalError: |[0][GFX1-]: Failed to create 
    EGLConfig! (t=1.32095) |[1][GFX1-]: Failed to create EGLConfig! (t=1.32193) 
    [GFX1-]: Failed to create EGLConfig!
Crash Annotation GraphicsCriticalError: |[0][GFX1-]: Failed to create 
    EGLConfig! (t=1.32095) |[1][GFX1-]: Failed to create EGLConfig! (t=1.32193) 
    |[2][GFX1-]: [OPENGL] Failed to init compositor with reason: 
    FEATURE_FAILURE_OPENGL_CREATE_CONTEXT (t=1.32273) [GFX1-]: [OPENGL] Failed 
    to init compositor with reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT
sailfish-browser: ws.c:150: ws_prepareSwap: Assertion `ws != NULL' failed.
Aborted
Okay, so that's definitely not bringing up a usable browser just yet. However, you may recall that a couple of days back I made the following change to add a condition around the call to eglInitialize():
  if (display == EGL_NO_DISPLAY) {
    if (!lib.fInitialize(display, nullptr, nullptr)) {
      return nullptr;
    }
  }
This change looked like it was giving positive results, but I've never actually had the browser working with it like this. So I really need to try Frajo's fix with the version of the build that Frajo is also using, which is without this change.

I've therefore reverted this change and have started building a fresh version of the packages to test out. I'm excited to find out whether this will make the difference. I'm trying not to get too hopefully just yet, but if it does, that would be a real step forwards.

After a lengthy wait for the build to complete it's now late at night, but I'm eager to test out Frajo's fix. I run the command with some trepidation...
$ LD_PRELOAD=/usr/lib64/libhybris/eglplatform_wayland.so gdb sailfish-browser
[...]
And it works! With this LD_PRELOAD variable set, ESR 91 now runs nicely on Sailfish OS 4.6. I'm pretty certain I'd have been scrabbling around with the debugger for weeks without this fix from Frajo. Both Frajo and Raine are top-tier developers, so I'm not surprised they were able to come up with a solution so quickly. It really highlights to me how important it is to have amazing devs like Frajo and Raine working on Sailfish OS.

I feel like we're in touching distance of the finish line now. Although this trick gets the browser to work, it still needs a proper fix of course, but it offers hope that the problem isn't too far away from a this. Once this is sorted and unless something else comes up, this will be the last thing on my list before I move on to tidying up mode.

So this is an exciting time. It's been over a year now; I'll be happy when I can finally say ESR 91 is ready.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
21 Aug 2024 : Day 326 #
Since yesterday I've been travelling by ferry between Jersey and Great Britain. It's an eight-hour overnight trip, which was thankfully enough time to build a new set of packages ready for me to test the new version this morning. To recap on where things are at, I'm currently trying to get ESR 91 to work on Sailfish OS 4.6. I identified differences between ESR 78 and ESR 91 in the way the display was being set up, which I thought may have been the cause. In particular, while on ESR 78 the eglInitialize() and eglGetDisplay() EGL methods are called only by the Qt wayland client code, on ESR 91 they're being called a couple more times by the gecko code as well.

Having updated the code to try to fix this, I'm now going to test the result by placing breakpoints on these two methods of the latest ESR 91 build running on Sailfish OS 4.6. Hopefully the hits will now come only from the Qt Wayland client code and not from the gecko code.
(gdb) b eglInitialize
Breakpoint 2 at 0x7fb6bbb580
(gdb) b eglGetDisplay
Breakpoint 3 at 0x7fb6bbb568
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/bin/sailfish-browser 
[...]
[D] unknown:0 - Using Wayland-EGL

Breakpoint 3, 0x0000007fb6bbb568 in eglGetDisplay () from /usr/lib64/libEGL.so.1
(gdb) bt
#0  0x0000007fb6bbb568 in eglGetDisplay () from /usr/lib64/libEGL.so.1
#1  0x0000007faf220008 in ?? () from /usr/lib64/qt5/plugins/
    wayland-graphics-integration-client/libwayland-egl.so
#2  0x0000007faf3ef8f0 in QtWaylandClient::QWaylandIntegration::
    initializeClientBufferIntegration() () from /usr/lib64/
    libQt5WaylandClient.so.5
#3  0x0000007faf3efbfc in QtWaylandClient::QWaylandIntegration::
    clientBufferIntegration() const () from /usr/lib64/libQt5WaylandClient.so.5
#4  0x0000007faf3ef598 in QtWaylandClient::QWaylandIntegration::hasCapability(
    QPlatformIntegration::Capability) const ()
   from /usr/lib64/libQt5WaylandClient.so.5
#5  0x0000007fb869fc18 in QSGRenderLoop::instance() () from /usr/lib64/
    libQt5Quick.so.5
#6  0x0000007fb86d02b4 in QQuickWindowPrivate::init(QQuickWindow*, 
    QQuickRenderControl*) () from /usr/lib64/libQt5Quick.so.5
#7  0x0000007fb877362c in QQuickView::QQuickView(QWindow*) () from /usr/lib64/
    libQt5Quick.so.5
#8  0x0000007fb89a6a80 in MDeclarativeCachePrivate::qQuickView() () from /usr/
    lib64/libmdeclarativecache5.so.0
#9  0x000000555557b31c in main (argc=<optimized out>, argv=0x7ffffff298) at 
    main.cpp:88
(gdb) c
Continuing.
[...]
library &quot;eglSubDriverAndroid.so&quot; not found

Breakpoint 2, 0x0000007fb6bbb580 in eglInitialize () from /usr/lib64/libEGL.so.1
(gdb) bt
#0  0x0000007fb6bbb580 in eglInitialize () from /usr/lib64/libEGL.so.1
#1  0x0000007faf22001c in ?? () from /usr/lib64/qt5/plugins/
    wayland-graphics-integration-client/libwayland-egl.so
#2  0x0000007faf3ef8f0 in QtWaylandClient::QWaylandIntegration::
    initializeClientBufferIntegration() () from /usr/lib64/
    libQt5WaylandClient.so.5
#3  0x0000007faf3efbfc in QtWaylandClient::QWaylandIntegration::
    clientBufferIntegration() const () from /usr/lib64/libQt5WaylandClient.so.5
#4  0x0000007faf3ef598 in QtWaylandClient::QWaylandIntegration::hasCapability(
    QPlatformIntegration::Capability) const ()
   from /usr/lib64/libQt5WaylandClient.so.5
#5  0x0000007fb869fc18 in QSGRenderLoop::instance() () from /usr/lib64/
    libQt5Quick.so.5
#6  0x0000007fb86d02b4 in QQuickWindowPrivate::init(QQuickWindow*, 
    QQuickRenderControl*) () from /usr/lib64/libQt5Quick.so.5
#7  0x0000007fb877362c in QQuickView::QQuickView(QWindow*) () from /usr/lib64/
    libQt5Quick.so.5
#8  0x0000007fb89a6a80 in MDeclarativeCachePrivate::qQuickView() () from /usr/
    lib64/libmdeclarativecache5.so.0
#9  0x000000555557b31c in main (argc=<optimized out>, argv=0x7ffffff298) at 
    main.cpp:88
(gdb) c
Continuing.
greHome from GRE_HOME:/usr/bin
libxul.so is not found, in /usr/bin/libxul.so
Created LOG for EmbedLiteTrace
[...]
Created LOG for EmbedLiteLayerManager

Thread 38 &quot;Compositor&quot; received signal SIGSEGV, Segmentation fault.
0x0000007faef777e0 in ?? ()
(gdb) bt
#0  0x0000007faef777e0 in ?? ()
#1  0x0000007fac70f310 in ?? ()
#2  0x0000007fad8dafb0 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb)
And what we're seeing is just that: the two hits are both coming from the Qt Wayland client code, just as we were hoping. That's the good news. The bad news is that the browser still crashes at start-up, so there's still more work to be done.

So once again I want to try to pin down the location of the crash. This would be a lot easier if we could get a decent backtrace at the point the crash occurs, but unfortunately the backtrace is being lost for some reason. So I'm stepping through the code instead. Immediately I find that the crash is happening when Init() is called from the GLLibraryEGL::Create() method triggered in the GLLibraryEGL() constructor:
Thread 40 &quot;Compositor&quot; hit Breakpoint 5, mozilla::gl::GLLibraryEGL::
    GLLibraryEGL (this=0x7e8c111410) at gfx/gl/GLLibraryEGL.h:113
113     class GLLibraryEGL final {
(gdb) n
mozilla::gl::GLLibraryEGL::Create (
    out_failureId=out_failureId@entry=0x7efaac6fe0, aDisplay=0x1) at gfx/gl/
    GLLibraryEGL.cpp:344
344       RefPtr<GLLibraryEGL> ret = new GLLibraryEGL;
(gdb) n
345       if (!ret->Init(false, out_failureId, aDisplay)) {
(gdb) n
[New Thread 0x7fa6010830 (LWP 12021)]

Thread 40 &quot;Compositor&quot; received signal SIGSEGV, Segmentation fault.
0x0000007faef777e0 in ?? ()
(gdb) 
I'm going to need to step inside that Init()method to find out what's really going on:
(gdb) break GLLibraryEGL::Init
Breakpoint 6 at 0x7fba06bd78: file gfx/gl/GLLibraryEGL.cpp, line 351.
(gdb) c
Continuing.

Thread 40 &quot;Compositor&quot; hit Breakpoint 6, mozilla::gl::GLLibraryEGL::
    Init (this=this@entry=0x7e941114d0, forceAccel=forceAccel@entry=false, 
    out_failureId=ou
t_failureId@entry=0x7efaac6fe0, aDisplay=aDisplay@entry=0x1) at gfx/gl/
    GLLibraryEGL.cpp:351
351     bool GLLibraryEGL::Init(bool forceAccel, nsACString* const 
    out_failureId, EGLDisplay aDisplay) {
(gdb) n
352       MOZ_RELEASE_ASSERT(!mSymbols.fTerminate);
(gdb) n
354       mozilla::ScopedGfxFeatureReporter reporter(&quot;EGL&quot;);
(gdb) n
397       if (!mEGLLibrary) {
(gdb) p mEGLLibrary
$1 = (PRLibrary *) 0x0
(gdb) n
398         mEGLLibrary = PR_LoadLibrary(&quot;libEGL.so&quot;);
(gdb) n
401       if (!mEGLLibrary) {
(gdb) p mEGLLibrary
$2 = (PRLibrary *) 0x0
(gdb) n
402         mEGLLibrary = PR_LoadLibrary(&quot;libEGL.so.1&quot;);
[...]
(gdb) n
430         mGLLibrary = PR_LoadLibrary(GLES2_LIB2);
(gdb) n
436       if (!mEGLLibrary || !mGLLibrary) {
(gdb) p mEGLLibrary
$3 = (PRLibrary *) 0x7e94111410
(gdb) n
453       SymLoadStruct earlySymbols[] = {SYMBOL(GetDisplay),
(gdb) n
484         const SymbolLoader libLoader(*mEGLLibrary);
(gdb) n
485         if (!libLoader.LoadSymbols(earlySymbols)) {
(gdb) n
494         const char internalFuncName[] =
(gdb) n
496         const auto& internalFunc =
(gdb) n
498         if (internalFunc) {
(gdb) n
504       std::shared_ptr<EglDisplay> defaultDisplay = CreateDisplay(
    forceAccel, out_failureId, aDisplay);
(gdb) n
505       if (!defaultDisplay) {
(gdb) n
508       mDefaultDisplay = defaultDisplay;
(gdb) n
510       InitLibExtensions();
(gdb) p mDefaultDisplay->_M_ptr
$8 = (std::__weak_ptr<mozilla::gl::EglDisplay, (__gnu_cxx::_Lock_policy)2>::
    element_type *) 0x5555c931b0
(gdb) p mDefaultDisplay->_M_ptr.mDisplay
$9 = (const EGLDisplay) 0x1
(gdb) n                            
512       const SymbolLoader pfnLoader(mSymbols.fGetProcAddress);
(gdb) n
514       const auto fnLoadSymbols = [&](const SymLoadStruct* symbols) {
(gdb) n 
523       mIsANGLE = IsExtensionSupported(EGLLibExtension::
    ANGLE_platform_angle);
(gdb) n
527       if (mIsANGLE) {
(gdb) p mIsANGLE
$10 = false
(gdb) n
548         const SymLoadStruct symbols[] = {SYMBOL(
    GetNativeClientBufferANDROID),
(gdb) n
[...]
632         (void)fnLoadSymbols(symbols);
(gdb) n
504       std::shared_ptr<EglDisplay> defaultDisplay = CreateDisplay(
    forceAccel, out_failureId, aDisplay);
(gdb) 

Thread 40 &quot;Compositor&quot; received signal SIGSEGV, Segmentation fault.
0x0000007faef777e0 in ?? ()
(gdb) 
That takes us up to CreateDisplay() at which point the crash happens again. So I need to give it another go, this time stepping inside the CreateDisplay() method. And this time I'm going to step into all of the methods that are being called as well.
(gdb) break GLLibraryEGL::CreateDisplay
Breakpoint 7 at 0x7fba06b6c0: file gfx/gl/GLLibraryEGL.cpp, line 754.
(gdb) r
[...]
Created LOG for EmbedLiteLayerManager

Thread 40 &quot;Compositor&quot; hit Breakpoint 7, mozilla::gl::GLLibraryEGL::
    CreateDisplay (this=this@entry=0x7e94111340, 
    forceAccel=forceAccel@entry=false, 
    out_failureId=out_failureId@entry=0x7efaac6fe0, 
    aDisplay=aDisplay@entry=0x1) at gfx/gl/GLLibraryEGL.cpp:754
754         EGLDisplay aDisplay) {
(gdb) n
757       if (IsExtensionSupported(EGLLibExtension::ANGLE_platform_angle_d3d)) {
(gdb) n
815         ret = GetAndInitDisplay(*this, nativeDisplay, aDisplay);
(gdb) s
mozilla::gl::GetAndInitDisplay (egl=..., displayType=displayType@entry=0x0, 
    display=display@entry=0x1) at gfx/gl/GLLibraryEGL.cpp:149
149                                                          EGLDisplay display 
    = EGL_NO_DISPLAY) {
(gdb) n
150       if (display == EGL_NO_DISPLAY) {
(gdb) n
154       return EglDisplay::Create(egl, display, false);
(gdb) s
mozilla::gl::EglDisplay::Create (lib=..., display=0x1, 
    isWarp=isWarp@entry=false) at gfx/gl/GLLibraryEGL.cpp:664
664                                                    const bool isWarp) {
(gdb) n
667         const auto itr = lib.mActiveDisplays.find(display);
(gdb) n
676       if (display == EGL_NO_DISPLAY) {
(gdb) n
683       std::call_once(sMesaLeakFlag, MesaMemoryLeakWorkaround);
(gdb) n
686           std::make_shared<EglDisplay>(PrivateUseOnly{}, lib, display, 
    isWarp);
(gdb) n
687       lib.mActiveDisplays.insert({display, ret});
(gdb) n
688       return ret;
(gdb) n
686           std::make_shared<EglDisplay>(PrivateUseOnly{}, lib, display, 
    isWarp);
(gdb) n
mozilla::gl::GLLibraryEGL::CreateDisplay (this=this@entry=0x7e94111340, 
    forceAccel=forceAccel@entry=false, 
    out_failureId=out_failureId@entry=0x7efaac6fe0, 
    aDisplay=aDisplay@entry=0x1) at gfx/gl/GLLibraryEGL.cpp:815
815         ret = GetAndInitDisplay(*this, nativeDisplay, aDisplay);
(gdb) n
818       if (!ret) {
(gdb) n
826       return ret;
(gdb) n
mozilla::gl::GLLibraryEGL::Init (this=this@entry=0x7e94111340, 
    forceAccel=forceAccel@entry=false, 
    out_failureId=out_failureId@entry=0x7efaac6fe0, 
    aDisplay=aDisplay@entry=0x1) at gfx/gl/GLLibraryEGL.cpp:505
505       if (!defaultDisplay) {
(gdb) p defaultDisplay._M_ptr.mDisplay
$14 = (const EGLDisplay) 0x1
(gdb) n
508       mDefaultDisplay = defaultDisplay;
(gdb) n
510       InitLibExtensions();
(gdb) n
512       const SymbolLoader pfnLoader(mSymbols.fGetProcAddress);
(gdb) n
514       const auto fnLoadSymbols = [&](const SymLoadStruct* symbols) {
(gdb) n
523       mIsANGLE = IsExtensionSupported(EGLLibExtension::
    ANGLE_platform_angle);
(gdb) n
527       if (mIsANGLE) {
(gdb) n
548         const SymLoadStruct symbols[] = {SYMBOL(
    GetNativeClientBufferANDROID),
(gdb) n
[...]
632         (void)fnLoadSymbols(symbols);
(gdb) n
504       std::shared_ptr<EglDisplay> defaultDisplay = CreateDisplay(
    forceAccel, out_failureId, aDisplay);
(gdb) n

Thread 40 &quot;Compositor&quot; received signal SIGSEGV, Segmentation fault.
0x0000007faef777e0 in ?? ()
(gdb)
This is a bit confusing for me. It seems we start in CreateDisplay() but then it's called again later. Is it being called twice? It turns out not: it's just the way the debugger steps in and out of the methods. When it steps out of a method it displays the name of the method that the programme counter just jumped back to, as if it were entering the method from the initial entry point. In other words, the debugger doesn't seem to distinguish between a call to a method that adds a line to the stack, compared to returning to a method due to removing a line from the stack.

So what we're actually seeing in the debugger output above, but hugely simplified, is something like the following. The list indent is increased when a method is popped onto the stack and decreased when a method is popped off of the stack.
  1. GLLibraryEGL::CreateDisplay() at GLLibraryEGL.cpp:754
    1. GetAndInitDisplay() at GLLibraryEGL.cpp:149
      1. EglDisplay::Create() at GLLibraryEGL.cpp:664
    2. GLLibraryEGL::CreateDisplay() at GLLibraryEGL.cpp:815
    3. GLLibraryEGL::Init() at GLLibraryEGL.cpp:505
  2. GLLibraryEGL::CreateDisplay() at GLLibraryEGL.cpp:504
It's clear from this that CreateDisplay() is only being called once, which is correct. At this point the crash occurs, although it's not quite clear why it's exactly at this point. At least it doesn't now seem to be because multiple EglDisplay instances are being created. In fact, only one is being constructed, as should be the case, and which we can confirm using the debugger again. Be warned that we're about to get a really long backtrace, but there's some useful info here so I've left most of it in:
Thread 41 &quot;Compositor&quot; hit Breakpoint 8, mozilla::gl::EglDisplay::
    EglDisplay (this=0x5555c8ee40, lib=..., disp=0x1, isWarp=false) at /usr/src/
    debug/xulrunn
er-qt5-91.9.1-1.aarch64/gfx/gl/GLLibraryEGL.cpp:691
691     EglDisplay::EglDisplay(const PrivateUseOnly&, GLLibraryEGL& lib,
(gdb) bt
#0  mozilla::gl::EglDisplay::EglDisplay (this=0x5555c8ee40, lib=..., disp=0x1, 
    isWarp=false)
    at gfx/gl/GLLibraryEGL.cpp:691
#1  0x0000007fba06b594 in __gnu_cxx::new_allocator<mozilla::gl::EglDisplay>::
    construct<mozilla::gl::EglDisplay, mozilla::gl::EglDisplay::PrivateUseOnly, 
    mozilla::gl::GLLibraryEGL&, void* const&, bool const&> (__p=0x5555c8ee40, 
    this=<optimized out>)
    at /srv/mer/toolings/SailfishOS-4.6.0.11EA/opt/cross/
    aarch64-meego-linux-gnu/include/c++/10.3.1/ext/new_allocator.h:156
#2  std::allocator_traits<std::allocator<mozilla::gl::EglDisplay> >::
    construct<mozilla::gl::EglDisplay, mozilla::gl::EglDisplay::PrivateUseOnly, 
    mozilla::gl::GLLibraryEGL&, void* const&, bool const&> (__p=0x5555c8ee40, 
    __a=...)
    at /srv/mer/toolings/SailfishOS-4.6.0.11EA/opt/cross/
    aarch64-meego-linux-gnu/include/c++/10.3.1/bits/alloc_traits.h:512
#3  std::_Sp_counted_ptr_inplace<mozilla::gl::EglDisplay, std::
    allocator<mozilla::gl::EglDisplay>, (__gnu_cxx::_Lock_policy)2>::
    _Sp_counted_ptr_inplace<mozilla::gl::EglDisplay::PrivateUseOnly, mozilla::
    gl::GLLibraryEGL&, void* const&, bool const&> (__a=..., this=0x5555c8ee30)
    at /srv/mer/toolings/SailfishOS-4.6.0.11EA/opt/cross/
    aarch64-meego-linux-gnu/include/c++/10.3.1/bits/shared_ptr_base.h:551
#4  std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<mozilla::gl:
    :EglDisplay, std::allocator<mozilla::gl::EglDisplay>, mozilla::gl::
    EglDisplay::PrivateUseOnly, mozilla::gl::GLLibraryEGL&, void* const&, bool 
    const&> (__a=..., __p=<synthetic pointer>: <optimized out>, this=<synthetic 
    pointer>)
    at /srv/mer/toolings/SailfishOS-4.6.0.11EA/opt/cross/
    aarch64-meego-linux-gnu/include/c++/10.3.1/bits/shared_ptr_base.h:682
#5  std::__shared_ptr<mozilla::gl::EglDisplay, (__gnu_cxx::_Lock_policy)2>::
    __shared_ptr<std::allocator<mozilla::gl::EglDisplay>, mozilla::gl::
    EglDisplay::PrivateUseOnly, mozilla::gl::GLLibraryEGL&, void* const&, bool 
    const&> (__tag=..., this=<synthetic pointer>)
    at /srv/mer/toolings/SailfishOS-4.6.0.11EA/opt/cross/
    aarch64-meego-linux-gnu/include/c++/10.3.1/bits/shared_ptr_base.h:1376
#6  std::shared_ptr<mozilla::gl::EglDisplay>::shared_ptr<std::allocator<mozilla:
    :gl::EglDisplay>, mozilla::gl::EglDisplay::PrivateUseOnly, mozilla::gl::
    GLLibraryEGL&, void* const&, bool const&> (__tag=..., this=<synthetic 
    pointer>)
    at /srv/mer/toolings/SailfishOS-4.6.0.11EA/opt/cross/
    aarch64-meego-linux-gnu/include/c++/10.3.1/bits/shared_ptr.h:408
#7  std::allocate_shared<mozilla::gl::EglDisplay, std::allocator<mozilla::gl::
    EglDisplay>, mozilla::gl::EglDisplay::PrivateUseOnly, mozilla::gl::
    GLLibraryEGL&, void* const&, bool const&> (__a=...)
    at /srv/mer/toolings/SailfishOS-4.6.0.11EA/opt/cross/
    aarch64-meego-linux-gnu/include/c++/10.3.1/bits/shared_ptr.h:862
#8  std::make_shared<mozilla::gl::EglDisplay, mozilla::gl::EglDisplay::
    PrivateUseOnly, mozilla::gl::GLLibraryEGL&, void* const&, bool const&> ()
    at /srv/mer/toolings/SailfishOS-4.6.0.11EA/opt/cross/
    aarch64-meego-linux-gnu/include/c++/10.3.1/bits/shared_ptr.h:878
#9  mozilla::gl::EglDisplay::Create (lib=..., display=<optimized out>, 
    isWarp=isWarp@entry=false)
    at gfx/gl/GLLibraryEGL.cpp:686
#10 0x0000007fba06b674 in mozilla::gl::GetAndInitDisplay (egl=..., 
    displayType=displayType@entry=0x0, display=<optimized out>, 
    display@entry=0x1)
    at gfx/gl/GLLibraryEGL.cpp:154
#11 0x0000007fba06bc58 in mozilla::gl::GLLibraryEGL::CreateDisplay (
    this=this@entry=0x7e901114d0, forceAccel=forceAccel@entry=false, 
    out_failureId=out_failureId@entry=0x7efaa85fe0, aDisplay=aDisplay@entry=0x1)
    at gfx/gl/GLLibraryEGL.cpp:815
#12 0x0000007fba06c0b8 in mozilla::gl::GLLibraryEGL::Init (
    this=this@entry=0x7e901114d0, forceAccel=forceAccel@entry=false, 
    out_failureId=out_failureId@entry=0x7efaa85fe0, aDisplay=aDisplay@entry=0x1)
    at gfx/gl/GLLibraryEGL.cpp:504
#13 0x0000007fba06c8fc in mozilla::gl::GLLibraryEGL::Create (
    out_failureId=out_failureId@entry=0x7efaa85fe0, aDisplay=0x1)
    at gfx/gl/GLLibraryEGL.cpp:345
#14 0x0000007fba06cf34 in mozilla::gl::GLContextProviderEGL::
    CreateWrappingExisting (aContext=0x7e90004230, aSurface=0x5555943c30, 
    aDisplay=<optimized out>) at gfx/gl/GLContextProviderEGL.cpp:1008
#15 0x0000007fbca77904 in mozilla::embedlite::nsWindow::GetGLContext (
    this=this@entry=0x7f8cdb6400)
    at mobile/sailfishos/embedshared/nsWindow.cpp:405
#16 0x0000007fbca77abc in mozilla::embedlite::nsWindow::GetNativeData (
    this=0x7f8cdb6400, aDataType=12)
    at mobile/sailfishos/embedshared/nsWindow.cpp:173
#17 0x0000007fba0e890c in mozilla::layers::CompositorOGL::CreateContext (
    this=this@entry=0x7e90110d80)
    at gfx/layers/opengl/CompositorOGL.cpp:232
#18 0x0000007fba0fdb34 in mozilla::layers::CompositorOGL::Initialize (
    this=0x7e90110d80, out_failureReason=0x7efaa865a0)
    at gfx/layers/opengl/CompositorOGL.cpp:387
#19 0x0000007fba21d568 in mozilla::layers::CompositorBridgeParent::
    NewCompositor (this=this@entry=0x7f8caed4d0, aBackendHints=...)
    at gfx/layers/ipc/CompositorBridgeParent.cpp:1493
#20 0x0000007fba2343cc in mozilla::layers::CompositorBridgeParent::
    InitializeLayerManager (this=this@entry=0x7f8caed4d0, aBackendHints=...)
    at gfx/layers/ipc/CompositorBridgeParent.cpp:1436
#21 0x0000007fba234558 in mozilla::layers::CompositorBridgeParent::
    AllocPLayerTransactionParent (this=this@entry=0x7f8caed4d0, 
    aBackendHints=..., aId=...)
    at gfx/layers/ipc/CompositorBridgeParent.cpp:1546
#22 0x0000007fbca5ed84 in mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    AllocPLayerTransactionParent (this=0x7f8caed4d0, aBackendHints=...,
    aId=...) at mobile/sailfishos/embedthread/
    EmbedLiteCompositorBridgeParent.cpp:80
#23 0x0000007fb9af84f4 in mozilla::layers::PCompositorBridgeParent::
    OnMessageReceived (this=0x7f8caed4d0, msg__=...) at 
    PCompositorBridgeParent.cpp:1285
[...]
#38 0x0000007fb706b7cc in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/
    clone.S:78
(gdb) c
Continuing.

Thread 41 &quot;Compositor&quot; received signal SIGSEGV, Segmentation fault.
0x0000007faef777e0 in ?? ()
(gdb)
I need to pin down where the crash is happening. But I also want to check whether the code is still generating the error we saw a couple of days back stating that the "Wayland connection experienced a fatal error (Resource temporarily unavailable)". So I'm going to place a breakpoint on EmbedLiteCompositorBridgeParent::AllocPLayerTransactionParent() — which was called just prior to this message appearing previously — and see whether it appears again.
$ WAYLAND_DEBUG=1 gdb sailfish-browser
[...]
(gdb) break EmbedLiteCompositorBridgeParent::AllocPLayerTransactionParent
Breakpoint 2 at 0x7fbca5ed70: file mobile/sailfishos/embedthread/
    EmbedLiteCompositorBridgeParent.cpp, line 79.
(gdb) r
The program being debugged has been started already.
[...]
Created LOG for EmbedLiteLayerManager
[ 307796.499]  -> wl_compositor@4.create_surface(new id wl_surface@42)
[ 307796.589]  -> android_wlegl@27.get_server_buffer_handle(new id 
    android_wlegl_server_buffer_handle@43, 1, 1, 1, 768)
[ 307796.634]  -> wl_display@1.sync(new id wl_callback@47)
[ 307798.325] wl_display@1.delete_id(43)
[ 307798.380] wl_display@1.delete_id(47)
[ 307798.408] android_wlegl_server_buffer_handle@43.buffer_ints(array[88])
[ 307798.441] android_wlegl_server_buffer_handle@43.buffer_fd(fd 74)
[ 307798.466] android_wlegl_server_buffer_handle@43.buffer_fd(fd 75)
[ 307798.491] android_wlegl_server_buffer_handle@43.buffer(new id 
    wl_buffer@4278190086, 1, 64)
[ 307798.536] wl_callback@47.done(2107)
[ 307798.561]  -> android_wlegl@27.get_server_buffer_handle(new id 
    android_wlegl_server_buffer_handle@47, 1, 1, 1, 768)
[ 307798.585]  -> wl_display@1.sync(new id wl_callback@48)
[ 307799.811] wl_display@1.delete_id(47)
[ 307800.014] wl_display@1.delete_id(48)
[ 307800.135] android_wlegl_server_buffer_handle@47.buffer_ints(array[88])
[ 307800.170] android_wlegl_server_buffer_handle@47.buffer_fd(fd 73)
[ 307800.216] android_wlegl_server_buffer_handle@47.buffer_fd(fd 74)
[ 307800.260] android_wlegl_server_buffer_handle@47.buffer(new id 
    wl_buffer@4278190087, 1, 64)
[ 307800.318] wl_callback@48.done(2107)
[ 307800.355]  -> android_wlegl@27.get_server_buffer_handle(new id 
    android_wlegl_server_buffer_handle@48, 1, 1, 1, 768)
[ 307800.391]  -> wl_display@1.sync(new id wl_callback@49)
[ 307802.186] wl_display@1.delete_id(48)
[ 307802.215] wl_display@1.delete_id(49)
[ 307802.290] android_wlegl_server_buffer_handle@48.buffer_ints(array[88])
[ 307802.348] android_wlegl_server_buffer_handle@48.buffer_fd(fd 73)
[ 307802.377] android_wlegl_server_buffer_handle@48.buffer_fd(fd 74)
[ 307802.402] android_wlegl_server_buffer_handle@48.buffer(new id 
    wl_buffer@4278190088, 1, 64)
[ 307802.456] wl_callback@49.done(2107)
[ 307803.277]  -> wl_buffer@4278190086.destroy()
[ 307803.364]  -> wl_buffer@4278190088.destroy()
[ 307803.402]  -> wl_buffer@4278190087.destroy()
[ 307803.439]  -> wl_surface@42.destroy()
[ 307807.166] wl_callback@34.done(46206349)
[ 307807.258]  -> wl_buffer@4278190082.destroy()
[ 307807.338]  -> android_wlegl@27.get_server_buffer_handle(new id 
    android_wlegl_server_buffer_handle@34, 1080, 2520, 277, 268436224)
[ 307807.377]  -> wl_display@1.sync(new id wl_callback@49)
[ 307811.278] wl_display@1.delete_id(42)
[ 307811.346] wl_display@1.delete_id(34)
[ 307811.407] wl_display@1.delete_id(49)
[ 307811.430] android_wlegl_server_buffer_handle@34.buffer_ints(array[88])
[ 307811.616] android_wlegl_server_buffer_handle@34.buffer_fd(fd 27)
[ 307811.645] android_wlegl_server_buffer_handle@34.buffer_fd(fd 31)
[ 307811.666] android_wlegl_server_buffer_handle@34.buffer(new id 
    wl_buffer@4278190082, 277, 1088)
[ 307811.721] wl_callback@49.done(2107)
[ 307817.837]  -> wl_surface@20.frame(new id wl_callback@49)
[ 307818.016]  -> wl_surface@20.attach(wl_buffer@4278190082, 0, 0)
[ 307818.048]  -> wl_surface@20.damage(0, 0, 1080, 2520)
[ 307818.073]  -> wl_surface@20.commit()
[ 307818.096]  -> wl_display@1.sync(new id wl_callback@42)
[New Thread 0x7fa6051830 (LWP 16227)]
[New Thread 0x7fa6010830 (LWP 16228)]
[ 307829.250] discarded [unknown]@42.[event 0](0 fd, 12 byte)
[ 307829.501] wl_display@1.delete_id(42)
[Switching to Thread 0x7efaac8830 (LWP 16220)]

Thread 40 &quot;Compositor&quot; hit Breakpoint 2, mozilla::embedlite::
    EmbedLiteCompositorBridgeParent::AllocPLayerTransactionParent (
    this=0x7f8cbdaeb0, aBackendHints=..., aId=...) at mobile/sailfishos/
    embedthread/EmbedLiteCompositorBridgeParent.cpp:79
79      {
(gdb) n
[ 663026.258] wl_display@1.delete_id(49)
[ 663026.351] wl_shell_surface@22.ping(2108)
[ 663026.417]  -> wl_shell_surface@22.pong(2108)
[ 663026.454] wl_keyboard@8.leave(2109, wl_surface@24)
[ 663026.501]  -> wl_display@1.sync(new id wl_callback@42)
[ 663026.536] qt_extended_surface@23.onscreen_visibility(0)
80        PLayerTransactionParent* p =
(gdb) n
[ 675275.026] wl_buffer@4278190081.release()
[ 675275.219] wl_callback@49.done(46239292)
[ 675275.309]  -> wl_buffer@4278190080.destroy()
[ 675275.520]  -> android_wlegl@27.get_server_buffer_handle(new id 
    android_wlegl_server_buffer_handle@49, 1080, 2520, 277, 268436224)
[ 675275.891]  -> wl_display@1.sync(new id wl_callback@50)
[ 675294.078] wl_display@1.delete_id(42)
[ 675294.402] wl_display@1.delete_id(49)
[ 675294.476] wl_display@1.delete_id(50)
[ 675295.213] android_wlegl_server_buffer_handle@49.buffer_ints(array[88])
[ 675295.350] android_wlegl_server_buffer_handle@49.buffer_fd(fd 31)
[ 675295.423] android_wlegl_server_buffer_handle@49.buffer_fd(fd 33)
[ 675295.488] android_wlegl_server_buffer_handle@49.buffer(new id 
    wl_buffer@4278190080, 277, 1088)
[ 675296.112] wl_callback@50.done(2111)
[ 675307.846]  -> wl_surface@20.frame(new id wl_callback@50)
[ 675307.922]  -> wl_surface@20.attach(wl_buffer@4278190080, 0, 0)
[ 675307.962]  -> wl_surface@20.damage(0, 0, 1080, 2520)
[ 675308.240]  -> wl_surface@20.commit()
[ 675308.272]  -> wl_display@1.sync(new id wl_callback@51)

Thread 40 &quot;Compositor&quot; received signal SIGSEGV, Segmentation fault.
0x0000007faef777e0 in ?? ()
(gdb) 
Well, it doesn't appear, and this output is reproducible, so it's not just a fluke. By stepping deeper into this code I eventually hit CompositorBridgeParent::NewCompositor(). It looks like something may be going wrong inside this method. I'd love to continue digging in to this now, but after that 5 am start on the ferry this morning, I can feel my brain losing focus. I'll need to come at this with fresh eyes in the morning.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
20 Aug 2024 : Day 325 #
Although I managed to collect quite a bit of data yesterday about the Wayland crash on Sailfish OS 4.6 running ESR 91, I'm still at a loss as to how to fix it. At this point, I think my best bet will be to compare what's happening on ESR 91 around the time of the crash with what's happening on ESR 78.

But that's tricky for a couple of reasons, the most obvious being that, since I upgraded one of my two dev phones to Sailfish OS 4.6, now none of my phones are running the ESR 78 version of the browser. If I want to compare something to it, I'm going to have to reinstall it on one of my phones.

I'm going to start by finding out the ordering and quantity of calls to eglGetDisplay() and eglInitialize() along with their parameters. Since it's already installed, here are the steps running on ESR 91:
(gdb) info break
Num     Type           Disp Enb Address            What
3       breakpoint     keep y   0x0000007fba06b6bc in mozilla::gl::GLLibraryEGL:
    :fGetDisplay(void*) const 
                                                   at gfx/gl/GLLibraryEGL.h:193
4       breakpoint     keep y   0x0000007fba06b5e8 in mozilla::gl::GLLibraryEGL:
    :fInitialize(void*, int*, int*) const 
                                                   at gfx/gl/GLLibraryEGL.h:283
        breakpoint already hit 1 time
(gdb) r
[...]
Thread 37 &quot;Compositor&quot; hit Breakpoint 4, mozilla::gl::GLLibraryEGL::
    fInitialize (minor=0x0, major=0x0, dpy=0x1, this=0x7e9c111510) at gfx/gl/
    GLLibraryEGL.h:283
283         WRAP(fInitialize(dpy, major, minor));
(gdb) c
Continuing.
[1579573.416] wl_shell_surface@22.ping(1286)
[1579573.536]  -> wl_shell_surface@22.pong(1286)

Thread 37 &quot;Compositor&quot; hit Breakpoint 3, mozilla::gl::GLLibraryEGL::
    fGetDisplay (display_id=0x0, this=0x7e9c111510) at gfx/gl/GLLibraryEGL.h:193
193         WRAP(fGetDisplay(display_id));
(gdb) c
Continuing.
[1587345.226] wl_shell_surface@22.ping(1287)
[1587345.347]  -> wl_shell_surface@22.pong(1287)
[1592605.903] wl_shell_surface@22.ping(1288)
[1592606.049]  -> wl_shell_surface@22.pong(1288)
library &quot;libui_compat_layer.so&quot; not found
[1594544.007]  -> wl_display@1.get_registry(new id wl_registry@2)
[1594544.394]  -> wl_display@1.sync(new id wl_callback@3)

Thread 37 &quot;Compositor&quot; hit Breakpoint 4, mozilla::gl::GLLibraryEGL::
    fInitialize (minor=0x0, major=0x0, dpy=0x1, this=0x7e9c111510) at gfx/gl/
    GLLibraryEGL.h:283
283         WRAP(fInitialize(dpy, major, minor));
(gdb) c
Continuing.
[1606304.282] wl_shell_surface@22.ping(1289)
[1606304.417]  -> wl_shell_surface@22.pong(1289)
[New Thread 0x7ef2c64830 (LWP 26341)]
[New Thread 0x7ef2a63830 (LWP 26342)]

Thread 37 &quot;Compositor&quot; received signal SIGSEGV, Segmentation fault.
0x0000007faef77360 in ?? ()
(gdb) 
What we can see is the following sequence of calls in this order and with these parameters:
  1. eglInitialize(minor=0x0, major=0x0, dpy=0x1)
  2. eglGetDisplay(display_id=0x0)
  3. eglInitialize(minor=0x0, major=0x0, dpy=0x1)
Let's now compare that against ESR 78. Ordinarily I'd install the ESR 78 packages directly from Jolla's repositories like so:
$ zypper install --oldpackage xulrunner-qt5-78.15.1+git39-1.19.1.jolla \
    xulrunner-qt5-debuginfo-78.15.1+git39-1.19.1.jolla \
    xulrunner-qt5-debugsource-78.15.1+git39-1.19.1.jolla \
    xulrunner-qt5-misc-78.15.1+git39-1.19.1.jolla \
    qtmozembed-qt5-1.53.25-1.22.2.jolla \
    qtmozembed-qt5-debuginfo-1.53.25-1.22.2.jolla \
    qtmozembed-qt5-debugsource-1.53.25-1.22.2.jolla \
    sailfish-components-webview-qt5-1.5.21-1.13.1.jolla \
    sailfish-components-webview-qt5-debuginfo-1.5.21-1.13.1.jolla \
    sailfish-components-webview-qt5-debugsource-1.5.21-1.13.1.jolla \
    sailfish-components-webview-qt5-pickers-1.5.21-1.13.1.jolla \
    sailfish-components-webview-qt5-popups-1.5.21-1.13.1.jolla \
    embedlite-components-qt5-1.24.35-1.26.1.jolla \
    embedlite-components-qt5-debuginfo-1.24.35-1.26.1.jolla \
    embedlite-components-qt5-debugsource-1.24.35-1.26.1.jolla \
    sailfish-browser-2.2.63-1.12.1.jolla \
    sailfish-browser-debuginfo-2.2.63-1.12.1.jolla \
    sailfish-browser-debugsource-2.2.63-1.12.1.jolla \
    sailfish-browser-settings-2.2.63-1.12.1.jolla \
    mapplauncherd-booster-browser-0.2.2-1.1.1.jolla \
    mapplauncherd-booster-browser-debuginfo-0.2.2-1.1.1.jolla \
    mapplauncherd-booster-browser-debugsource-0.2.2-1.1.1.jolla
However I'm travelling by sea this evening and likely to be out of range of any mobile networks, so if I'm going to be switching between ESR 78 and ESR 91 during testing it'll be convenient if I have the packages downloaded locally so I can install them from the file system, rather than over the network. So I've downloaded the packages directly from the repos. This turned out to be impossible over my mobile connection, so once again I had to find a café with free customer WiFi to download them over instead.

That turned a stumbling 30 minute marathon into a confident 30 second sprint of a download. Now I've collected all of the packages together and have them saved to my local device I'll be able to reinstall them in future without having to download them again.
ls -1
embedlite-components-qt5-1.24.35-1.26.1.jolla.aarch64.rpm
embedlite-components-qt5-debuginfo-1.24.35-1.26.1.jolla.aarch64.rpm
embedlite-components-qt5-debugsource-1.24.35-1.26.1.jolla.aarch64.rpm
mapplauncherd-booster-browser-0.2.2-1.1.1.jolla.aarch64.rpm
mapplauncherd-booster-browser-debuginfo-0.2.2-1.1.1.jolla.aarch64.rpm
mapplauncherd-booster-browser-debugsource-0.2.2-1.1.1.jolla.aarch64.rpm
qtmozembed-qt5-1.53.25-1.22.2.jolla.aarch64.rpm
qtmozembed-qt5-debuginfo-1.53.25-1.22.2.jolla.aarch64.rpm
qtmozembed-qt5-debugsource-1.53.25-1.22.2.jolla.aarch64.rpm
sailfish-browser-2.2.63-1.12.1.jolla.aarch64.rpm
sailfish-browser-debuginfo-2.2.63-1.12.1.jolla.aarch64.rpm
sailfish-browser-debugsource-2.2.63-1.12.1.jolla.aarch64.rpm
sailfish-browser-settings-2.2.63-1.12.1.jolla.aarch64.rpm
sailfish-components-webview-qt5-1.5.21-1.13.1.jolla.aarch64.rpm
sailfish-components-webview-qt5-pickers-1.5.21-1.13.1.jolla.aarch64.rpm
sailfish-components-webview-qt5-popups-1.5.21-1.13.1.jolla.aarch64.rpm
xulrunner-qt5-78.15.1+git39-1.19.1.jolla.aarch64.rpm
xulrunner-qt5-debuginfo-78.15.1+git39-1.19.1.jolla.aarch64.rpm
xulrunner-qt5-debugsource-78.15.1+git39-1.19.1.jolla.aarch64.rpm
xulrunner-qt5-misc-78.15.1+git39-1.19.1.jolla.aarch64.rpm
$ rpm -U --force xulrunner-qt5-78.*.rpm xulrunner-qt5-debuginfo-78.*.rpm \
    xulrunner-qt5-debugsource-78.*.rpm xulrunner-qt5-misc-78.*.rpm \
    qtmozembed-qt5-1.*.rpm qtmozembed-qt5-debuginfo-1.*.rpm \
    qtmozembed-qt5-debugsource-1.*.rpm sailfish-components-webview-qt5-1.*.rpm \
    sailfish-components-webview-qt5-pickers-1.*.rpm \
    sailfish-components-webview-qt5-popups-1.*.rpm \
    embedlite-components-qt5-1.*.rpm embedlite-components-qt5-debuginfo-1.*.rpm 
    \
    embedlite-components-qt5-debugsource-1.*.rpm \
    sailfish-browser-2.*.rpm sailfish-browser-debuginfo-2.*.rpm \
    sailfish-browser-debugsource-2.*.rpm sailfish-browser-settings-2.*.rpm \
    mapplauncherd-booster-browser-0.*.rpm \
    mapplauncherd-booster-browser-debuginfo-0.*.rpm \
    mapplauncherd-booster-browser-debugsource-0.*.rpm
So with the ESR 78 version of the browser installed, let's see where these two methods — eglInitialize() and eglGetDisplay() get called.
$ WAYLAND_DEBUG=1 gdb sailfish-browser
[...]
(gdb) b eglInitialize
Breakpoint 3 at 0x7fb6dbe580
(gdb) b eglGetDisplay
Breakpoint 4 at 0x7fb6dbe568
(gdb) info break
Num     Type           Disp Enb Address            What
3       breakpoint     keep y   0x0000007fb6dbe580 <eglInitialize+16>
        breakpoint already hit 1 time
4       breakpoint     keep y   0x0000007fb6dbe568 <eglGetDisplay+8>
        breakpoint already hit 1 time
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/bin/sailfish-browser 
[...]
Using host libthread_db library &quot;/lib64/libthread_db.so.1&quot;.
[3922558.854]  -> wl_display@1.get_registry(new id wl_registry@2)
[3922559.301]  -> wl_display@1.sync(new id wl_callback@3)
[3922561.417] wl_display@1.delete_id(3)
[3922561.531] wl_registry@2.global(1, &quot;wl_compositor&quot;, 3)
[3922561.652]  -> wl_registry@2.bind(1, &quot;wl_compositor&quot;, 3, new id 
    [unknown]@4)
[3922561.705] wl_registry@2.global(2, &quot;wl_data_device_manager&quot;, 1)
[3922561.819]  -> wl_registry@2.bind(2, &quot;wl_data_device_manager&quot;, 1, 
    new id [unknown]@5)
[3922561.858] wl_registry@2.global(3, &quot;wl_shm&quot;, 1)
[3922561.881]  -> wl_registry@2.bind(3, &quot;wl_shm&quot;, 1, new id 
    [unknown]@6)
[3922561.903] wl_registry@2.global(4, &quot;qt_hardware_integration&quot;, 1)
[3922562.071]  -> wl_registry@2.bind(4, &quot;qt_hardware_integration&quot;, 1, 
    new id [unknown]@7)
[3922562.144]  -> wl_display@1.sync(new id wl_callback@8)
[3922562.272] wl_registry@2.global(5, &quot;android_wlegl&quot;, 2)
[3922562.318] wl_registry@2.global(6, &quot;qt_surface_extension&quot;, 1)
[3922562.410]  -> wl_registry@2.bind(6, &quot;qt_surface_extension&quot;, 1, 
    new id [unknown]@9)
[3922562.456] wl_registry@2.global(7, &quot;qt_touch_extension&quot;, 1)
[3922562.669]  -> wl_registry@2.bind(7, &quot;qt_touch_extension&quot;, 1, new 
    id [unknown]@10)
[3922562.743] wl_registry@2.global(8, &quot;qt_windowmanager&quot;, 1)
[3922562.813]  -> wl_registry@2.bind(8, &quot;qt_windowmanager&quot;, 1, new id 
    [unknown]@11)
[3922562.881] wl_registry@2.global(9, &quot;wl_seat&quot;, 3)
[3922563.016]  -> wl_registry@2.bind(9, &quot;wl_seat&quot;, 3, new id 
    [unknown]@12)
[3922563.121]  -> wl_data_device_manager@5.get_data_device(new id 
    wl_data_device@13, wl_seat@12)
[3922563.348] wl_registry@2.global(10, &quot;wl_output&quot;, 2)
[3922563.539]  -> wl_registry@2.bind(10, &quot;wl_output&quot;, 2, new id 
    [unknown]@14)
[3922563.697]  -> wl_display@1.sync(new id wl_callback@15)
[3922563.724] wl_registry@2.global(11, &quot;wl_shell&quot;, 1)
[3922563.750] wl_registry@2.global(12, &quot;lipstick_recorder_manager&quot;, 1)
[3922563.784] wl_registry@2.global(13, &quot;alien_manager&quot;, 1)
[3922563.815] wl_callback@3.done(1851)
[3922567.164] wl_display@1.delete_id(8)
[3922567.322] wl_display@1.delete_id(15)
[3922567.363] qt_hardware_integration@7.client_backend(&quot;wayland-egl&quot;)
[3922567.750] wl_callback@8.done(1851)
[3922567.787] qt_touch_extension@10.configure(0)
[3922567.811] qt_windowmanager@11.hints(1)
[3922567.828] wl_seat@12.capabilities(7)
[3922568.127]  -> wl_seat@12.get_keyboard(new id wl_keyboard@8)
[3922568.401]  -> wl_seat@12.get_pointer(new id wl_pointer@3)
[3922568.642]  -> wl_compositor@4.create_surface(new id wl_surface@16)
[3922568.791]  -> wl_seat@12.get_touch(new id wl_touch@17)
[3922568.937] wl_output@14.geometry(0, 0, 60, 139, 0, &quot;&quot;, 
    &quot;&quot;, 0)
[3922568.986] wl_output@14.mode(3, 1080, 2520, 60000)
[3922569.021] wl_output@14.scale(1)
[3922569.063] wl_output@14.done()
[3922569.144] wl_callback@15.done(1851)
[3922569.693]  -> wl_shm@6.create_pool(new id wl_shm_pool@15, fd 5, 4096)
[3922572.771]  -> wl_shm_pool@15.resize(8832)
[3922572.928]  -> wl_shm_pool@15.resize(18624)
[D] unknown:0 - Using Wayland-EGL

Breakpoint 4, 0x0000007fb6dbe568 in eglGetDisplay () from /usr/lib64/libEGL.so.1
(gdb) bt
#0  0x0000007fb6dbe568 in eglGetDisplay () from /usr/lib64/libEGL.so.1
#1  0x0000007faf447008 in ?? () from /usr/lib64/qt5/plugins/
    wayland-graphics-integration-client/libwayland-egl.so
#2  0x0000007faf6168f0 in QtWaylandClient::QWaylandIntegration::
    initializeClientBufferIntegration() () from /usr/lib64/
    libQt5WaylandClient.so.5
#3  0x0000007faf616bfc in QtWaylandClient::QWaylandIntegration::
    clientBufferIntegration() const () from /usr/lib64/libQt5WaylandClient.so.5
#4  0x0000007faf616598 in QtWaylandClient::QWaylandIntegration::hasCapability(
    QPlatformIntegration::Capability) const ()
   from /usr/lib64/libQt5WaylandClient.so.5
#5  0x0000007fb88a2c18 in QSGRenderLoop::instance() () from /usr/lib64/
    libQt5Quick.so.5
#6  0x0000007fb88d32b4 in QQuickWindowPrivate::init(QQuickWindow*, 
    QQuickRenderControl*) () from /usr/lib64/libQt5Quick.so.5
#7  0x0000007fb897662c in QQuickView::QQuickView(QWindow*) () from /usr/lib64/
    libQt5Quick.so.5
#8  0x0000007fb8ba9a80 in MDeclarativeCachePrivate::qQuickView() () from /usr/
    lib64/libmdeclarativecache5.so.0
#9  0x000000555557a6ac in main (argc=<optimized out>, argv=0x7ffffff288) at 
    main.cpp:87
(gdb) c
Continuing.
library &quot;libui_compat_layer.so&quot; not found
library &quot;libGLESv2_adreno.so&quot; not found
library &quot;eglSubDriverAndroid.so&quot; not found
[3971689.925]  -> wl_display@1.get_registry(new id wl_registry@18)
[3971690.080]  -> wl_display@1.sync(new id wl_callback@19)

Breakpoint 3, 0x0000007fb6dbe580 in eglInitialize () from /usr/lib64/libEGL.so.1
(gdb) bt
#0  0x0000007fb6dbe580 in eglInitialize () from /usr/lib64/libEGL.so.1
#1  0x0000007faf44701c in ?? () from /usr/lib64/qt5/plugins/
    wayland-graphics-integration-client/libwayland-egl.so
#2  0x0000007faf6168f0 in QtWaylandClient::QWaylandIntegration::
    initializeClientBufferIntegration() () from /usr/lib64/
    libQt5WaylandClient.so.5
#3  0x0000007faf616bfc in QtWaylandClient::QWaylandIntegration::
    clientBufferIntegration() const () from /usr/lib64/libQt5WaylandClient.so.5
#4  0x0000007faf616598 in QtWaylandClient::QWaylandIntegration::hasCapability(
    QPlatformIntegration::Capability) const ()
   from /usr/lib64/libQt5WaylandClient.so.5
#5  0x0000007fb88a2c18 in QSGRenderLoop::instance() () from /usr/lib64/
    libQt5Quick.so.5
#6  0x0000007fb88d32b4 in QQuickWindowPrivate::init(QQuickWindow*, 
    QQuickRenderControl*) () from /usr/lib64/libQt5Quick.so.5
#7  0x0000007fb897662c in QQuickView::QQuickView(QWindow*) () from /usr/lib64/
    libQt5Quick.so.5
#8  0x0000007fb8ba9a80 in MDeclarativeCachePrivate::qQuickView() () from /usr/
    lib64/libmdeclarativecache5.so.0
#9  0x000000555557a6ac in main (argc=<optimized out>, argv=0x7ffffff288) at 
    main.cpp:87
(gdb) c
Continuing.
greHome from GRE_HOME:/usr/bin
libxul.so is not found, in /usr/bin/libxul.so
Created LOG for EmbedLiteTrace
[...]
There are no more hits after this point, which is more than a little surprising to me. It means that all of these calls are being managed by the Qt Wayland client; none of them are being called directly from the gecko code.

This is in stark contrast to the ESR 91 code, where they're called several times as we saw above. Having pored over this EGL initialisation code for so long now, it's frustrating to find there are still differences. On the other hand I'm pleased that this might be a lead that could solve the serious crash that's happening on Sailfish OS 4.6.

I'm going back to ESR 91 to get the full set of backtracs for all calls to the same two methods now. So, first, I install all of the ESR 91 packages:
$ rpm -U --force xulrunner-qt5-91.*.rpm xulrunner-qt5-debuginfo-91.*.rpm \
    xulrunner-qt5-debugsource-91.*.rpm xulrunner-qt5-misc-91.*.rpm \
    qtmozembed-qt5-1.*.rpm qtmozembed-qt5-debuginfo-1.*.rpm \
    qtmozembed-qt5-debugsource-1.*.rpm sailfish-components-webview-qt5-1.*.rpm \
    sailfish-components-webview-qt5-debuginfo-1.*.rpm \
    sailfish-components-webview-qt5-debugsource-1.*.rpm \
    sailfish-components-webview-qt5-pickers-1.*.rpm \
    sailfish-components-webview-qt5-popups-1.*.rpm \
    embedlite-components-qt5-1.*.rpm embedlite-components-qt5-debuginfo-1.*.rpm 
    \
    embedlite-components-qt5-debugsource-1.*.rpm \
    sailfish-browser-2.*.rpm sailfish-browser-debuginfo-2.*.rpm \
    sailfish-browser-debugsource-2.*.rpm sailfish-browser-settings-2.*.rpm \
    mapplauncherd-booster-browser-0.*.rpm \
    mapplauncherd-booster-browser-debuginfo-0.*.rpm \
    mapplauncherd-booster-browser-debugsource-0.*.rpm harbour-webview-0.*.rpm
Now I'm going to add the same breakpoints and capture the backtraces. Note in the log below that there are five calls in total to the two methods. The first two are made by the Qt Wayland client as on ESR 78. The subsequent three are all made from the gecko code.
$ WAYLAND_DEBUG=1 gdb sailfish-browser
[...]
(gdb) b eglInitialize
Breakpoint 2 at 0x7fb6bbb580
(gdb) b eglGetDisplay
Breakpoint 3 at 0x7fb6bbb568
(gdb) info break
Num     Type           Disp Enb Address            What
2       breakpoint     keep y   0x0000007fb6bbb580 <eglInitialize+16>
3       breakpoint     keep y   0x0000007fb6bbb568 <eglGetDisplay+8>
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/bin/sailfish-browser 
[...]
[ 804848.529] wl_registry@2.global(9, &quot;wl_seat&quot;, 3)
[ 804848.907]  -> wl_registry@2.bind(9, &quot;wl_seat&quot;, 3, new id 
    [unknown]@12)
[ 804849.019]  -> wl_data_device_manager@5.get_data_device(new id 
    wl_data_device@13, wl_seat@12)
[ 804849.110] wl_registry@2.global(10, &quot;wl_output&quot;, 2)
[ 804849.219]  -> wl_registry@2.bind(10, &quot;wl_output&quot;, 2, new id 
    [unknown]@14)
[ 804849.291]  -> wl_display@1.sync(new id wl_callback@15)
[ 804849.430] wl_registry@2.global(11, &quot;wl_shell&quot;, 1)
[ 804849.471] wl_registry@2.global(12, &quot;lipstick_recorder_manager&quot;, 1)
[ 804849.503] wl_registry@2.global(13, &quot;alien_manager&quot;, 1)
[ 804849.534] wl_callback@3.done(1891)
[ 804851.429] wl_display@1.delete_id(8)
[ 804851.496] wl_display@1.delete_id(15)
[ 804851.570] qt_hardware_integration@7.client_backend(&quot;wayland-egl&quot;)
[ 804851.869] wl_callback@8.done(1891)
[ 804851.912] qt_touch_extension@10.configure(0)
[ 804851.940] qt_windowmanager@11.hints(1)
[ 804851.968] wl_seat@12.capabilities(7)
[ 804854.670]  -> wl_seat@12.get_keyboard(new id wl_keyboard@8)
[ 804854.876]  -> wl_seat@12.get_pointer(new id wl_pointer@3)
[ 804855.000]  -> wl_compositor@4.create_surface(new id wl_surface@16)
[ 804855.125]  -> wl_seat@12.get_touch(new id wl_touch@17)
[ 804855.273] wl_output@14.geometry(0, 0, 60, 139, 0, &quot;&quot;, 
    &quot;&quot;, 0)
[ 804855.471] wl_output@14.mode(3, 1080, 2520, 60000)
[ 804855.514] wl_output@14.scale(1)
[ 804855.541] wl_output@14.done()
[ 804855.634] wl_callback@15.done(1891)
[ 804856.118]  -> wl_shm@6.create_pool(new id wl_shm_pool@15, fd 5, 4096)
[ 804860.545]  -> wl_shm_pool@15.resize(8832)
[ 804860.737]  -> wl_shm_pool@15.resize(18624)
[...]
[D] unknown:0 - Using Wayland-EGL

Breakpoint 3, 0x0000007fb6bbb568 in eglGetDisplay () from /usr/lib64/libEGL.so.1
(gdb) bt
#0  0x0000007fb6bbb568 in eglGetDisplay () from /usr/lib64/libEGL.so.1
#1  0x0000007faf220008 in ?? () from /usr/lib64/qt5/plugins/
    wayland-graphics-integration-client/libwayland-egl.so
#2  0x0000007faf3ef8f0 in QtWaylandClient::QWaylandIntegration::
    initializeClientBufferIntegration() () from /usr/lib64/
    libQt5WaylandClient.so.5
#3  0x0000007faf3efbfc in QtWaylandClient::QWaylandIntegration::
    clientBufferIntegration() const () from /usr/lib64/libQt5WaylandClient.so.5
#4  0x0000007faf3ef598 in QtWaylandClient::QWaylandIntegration::hasCapability(
    QPlatformIntegration::Capability) const ()
   from /usr/lib64/libQt5WaylandClient.so.5
#5  0x0000007fb869fc18 in QSGRenderLoop::instance() () from /usr/lib64/
    libQt5Quick.so.5
#6  0x0000007fb86d02b4 in QQuickWindowPrivate::init(QQuickWindow*, 
    QQuickRenderControl*) () from /usr/lib64/libQt5Quick.so.5
#7  0x0000007fb877362c in QQuickView::QQuickView(QWindow*) () from /usr/lib64/
    libQt5Quick.so.5
#8  0x0000007fb89a6a80 in MDeclarativeCachePrivate::qQuickView() () from /usr/
    lib64/libmdeclarativecache5.so.0
#9  0x000000555557b31c in main (argc=<optimized out>, argv=0x7ffffff288) at 
    main.cpp:88
(gdb) c
Continuing.
[...]
[ 870219.641]  -> wl_display@1.get_registry(new id wl_registry@18)
[ 870219.733]  -> wl_display@1.sync(new id wl_callback@19)

Breakpoint 2, 0x0000007fb6bbb580 in eglInitialize () from /usr/lib64/libEGL.so.1
#0  0x0000007fb6bbb580 in eglInitialize () from /usr/lib64/libEGL.so.1
#1  0x0000007faf22001c in ?? () from /usr/lib64/qt5/plugins/
    wayland-graphics-integration-client/libwayland-egl.so
#2  0x0000007faf3ef8f0 in QtWaylandClient::QWaylandIntegration::
    initializeClientBufferIntegration() () from /usr/lib64/
    libQt5WaylandClient.so.5
#3  0x0000007faf3efbfc in QtWaylandClient::QWaylandIntegration::
    clientBufferIntegration() const () from /usr/lib64/libQt5WaylandClient.so.5
#4  0x0000007faf3ef598 in QtWaylandClient::QWaylandIntegration::hasCapability(
    QPlatformIntegration::Capability) const ()
   from /usr/lib64/libQt5WaylandClient.so.5
#5  0x0000007fb869fc18 in QSGRenderLoop::instance() () from /usr/lib64/
    libQt5Quick.so.5
#6  0x0000007fb86d02b4 in QQuickWindowPrivate::init(QQuickWindow*, 
    QQuickRenderControl*) () from /usr/lib64/libQt5Quick.so.5
#7  0x0000007fb877362c in QQuickView::QQuickView(QWindow*) () from /usr/lib64/
    libQt5Quick.so.5
#8  0x0000007fb89a6a80 in MDeclarativeCachePrivate::qQuickView() () from /usr/
    lib64/libmdeclarativecache5.so.0
#9  0x000000555557b31c in main (argc=<optimized out>, argv=0x7ffffff288) at 
    main.cpp:88
(gdb) c
Continuing.
[...]
[1148170.148] wl_callback@42.done(1900)
[1148175.058]  -> wl_surface@20.frame(new id wl_callback@42)
[1148175.146]  -> wl_surface@20.attach(wl_buffer@4278190080, 0, 0)
[1148175.182]  -> wl_surface@20.damage(0, 0, 1080, 2520)
[1148175.239]  -> wl_surface@20.commit()
[1148175.270]  -> wl_display@1.sync(new id wl_callback@50)
[1148176.064] discarded [unknown]@50.[event 0](0 fd, 12 byte)
[1148176.128] wl_display@1.delete_id(50)
[1148194.446] wl_display@1.delete_id(42)
[Switching to Thread 0x7efadfe830 (LWP 16236)]

Thread 38 &quot;Compositor&quot; hit Breakpoint 2, 0x0000007fb6bbb580 in 
    eglInitialize () from /usr/lib64/libEGL.so.1
(gdb) bt
#0  0x0000007fb6bbb580 in eglInitialize () from /usr/lib64/libEGL.so.1
#1  0x0000007fba06b5fc in mozilla::gl::GLLibraryEGL::fInitialize (minor=0x0, 
    major=0x0, dpy=0x1, this=0x7e9c111cb0)
    at gfx/gl/GLLibraryEGL.h:283
#2  mozilla::gl::EglDisplay::Create (lib=..., display=<optimized out>, 
    isWarp=isWarp@entry=false)
    at gfx/gl/GLLibraryEGL.cpp:676
#3  0x0000007fba06b690 in mozilla::gl::GetAndInitDisplay (egl=..., 
    displayType=displayType@entry=0x0, display=<optimized out>, 
    display@entry=0x1)
    at gfx/gl/GLLibraryEGL.cpp:154
#4  0x0000007fba06bc74 in mozilla::gl::GLLibraryEGL::CreateDisplay (
    this=this@entry=0x7e9c111cb0, forceAccel=forceAccel@entry=false, 
    out_failureId=out_failureId@entry=0x7efadfcfe0, aDisplay=aDisplay@entry=0x1)
    at gfx/gl/GLLibraryEGL.cpp:813
#5  0x0000007fba06c0d4 in mozilla::gl::GLLibraryEGL::Init (
    this=this@entry=0x7e9c111cb0, forceAccel=forceAccel@entry=false, 
    out_failureId=out_failureId@entry=0x7efadfcfe0, aDisplay=aDisplay@entry=0x1)
    at gfx/gl/GLLibraryEGL.cpp:504
#6  0x0000007fba06c918 in mozilla::gl::GLLibraryEGL::Create (
    out_failureId=out_failureId@entry=0x7efadfcfe0, aDisplay=0x1)
    at gfx/gl/GLLibraryEGL.cpp:345
#7  0x0000007fba06cf50 in mozilla::gl::GLContextProviderEGL::
    CreateWrappingExisting (aContext=0x7e9c004230, aSurface=0x7fa000f610, 
    aDisplay=<optimized out>) at gfx/gl/GLContextProviderEGL.cpp:1008
#8  0x0000007fbca77924 in mozilla::embedlite::nsWindow::GetGLContext (
    this=this@entry=0x7f8c78ef00)
    at mobile/sailfishos/embedshared/nsWindow.cpp:405
#9  0x0000007fbca77adc in mozilla::embedlite::nsWindow::GetNativeData (
    this=0x7f8c78ef00, aDataType=12)
    at mobile/sailfishos/embedshared/nsWindow.cpp:173
#10 0x0000007fba0e8928 in mozilla::layers::CompositorOGL::CreateContext (
    this=this@entry=0x7e9c111220)
    at gfx/layers/opengl/CompositorOGL.cpp:232
#11 0x0000007fba0fdb50 in mozilla::layers::CompositorOGL::Initialize (
    this=0x7e9c111220, out_failureReason=0x7efadfd5a0)
    at gfx/layers/opengl/CompositorOGL.cpp:387
#12 0x0000007fba21d584 in mozilla::layers::CompositorBridgeParent::
    NewCompositor (this=this@entry=0x7f8c9f23b0, aBackendHints=...)
    at gfx/layers/ipc/CompositorBridgeParent.cpp:1493
#13 0x0000007fba2343e8 in mozilla::layers::CompositorBridgeParent::
    InitializeLayerManager (this=this@entry=0x7f8c9f23b0, aBackendHints=...)
    at gfx/layers/ipc/CompositorBridgeParent.cpp:1436
#14 0x0000007fba234574 in mozilla::layers::CompositorBridgeParent::
    AllocPLayerTransactionParent (this=this@entry=0x7f8c9f23b0, 
    aBackendHints=..., aId=...)
    at gfx/layers/ipc/CompositorBridgeParent.cpp:1546
#15 0x0000007fbca5eda4 in mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    AllocPLayerTransactionParent (this=0x7f8c9f23b0, aBackendHints=..., 
    aId=...) at mobile/sailfishos/embedthread/
    EmbedLiteCompositorBridgeParent.cpp:80
#16 0x0000007fb9af84f4 in mozilla::layers::PCompositorBridgeParent::
    OnMessageReceived (this=0x7f8c9f23b0, msg__=...) at 
    PCompositorBridgeParent.cpp:1285
[...]
#31 0x0000007fb706b7cc in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/
    clone.S:78
(gdb) c
Continuing.
[1189632.011] wl_shell_surface@22.ping(1901)
[1189632.195]  -> wl_shell_surface@22.pong(1901)

Thread 38 &quot;Compositor&quot; hit Breakpoint 3, 0x0000007fb6bbb568 in 
    eglGetDisplay () from /usr/lib64/libEGL.so.1
(gdb) bt
#0  0x0000007fb6bbb568 in eglGetDisplay () from /usr/lib64/libEGL.so.1
#1  0x0000007fba06b6c4 in mozilla::gl::GLLibraryEGL::fGetDisplay (
    display_id=0x0, this=0x7e9c111cb0)
    at gfx/gl/GLLibraryEGL.h:193
#2  mozilla::gl::GetAndInitDisplay (egl=..., displayType=displayType@entry=0x0, 
    display=display@entry=0x0)
    at gfx/gl/GLLibraryEGL.cpp:151
#3  0x0000007fba06bc74 in mozilla::gl::GLLibraryEGL::CreateDisplay (
    this=this@entry=0x7e9c111cb0, forceAccel=forceAccel@entry=false, 
    out_failureId=out_failureId@entry=0x7efadfcfe0, aDisplay=aDisplay@entry=0x0)
    at gfx/gl/GLLibraryEGL.cpp:813
#4  0x0000007fba06cd5c in mozilla::gl::GLLibraryEGL::DefaultDisplay (
    this=0x7e9c111cb0, out_failureId=out_failureId@entry=0x7efadfcfe0)
    at gfx/gl/GLLibraryEGL.cpp:745
#5  0x0000007fba06ce7c in mozilla::gl::GLContextProviderEGL::
    CreateWrappingExisting (aContext=0x7e9c004230, aSurface=0x7fa000f610, 
    aDisplay=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/StaticPtr.h:150
#6  0x0000007fbca77924 in mozilla::embedlite::nsWindow::GetGLContext (
    this=this@entry=0x7f8c78ef00)
    at mobile/sailfishos/embedshared/nsWindow.cpp:405
#7  0x0000007fbca77adc in mozilla::embedlite::nsWindow::GetNativeData (
    this=0x7f8c78ef00, aDataType=12)
    at mobile/sailfishos/embedshared/nsWindow.cpp:173
#8  0x0000007fba0e8928 in mozilla::layers::CompositorOGL::CreateContext (
    this=this@entry=0x7e9c111220)
    at gfx/layers/opengl/CompositorOGL.cpp:232
#9  0x0000007fba0fdb50 in mozilla::layers::CompositorOGL::Initialize (
    this=0x7e9c111220, out_failureReason=0x7efadfd5a0)
    at gfx/layers/opengl/CompositorOGL.cpp:387
#10 0x0000007fba21d584 in mozilla::layers::CompositorBridgeParent::
    NewCompositor (this=this@entry=0x7f8c9f23b0, aBackendHints=...)
    at gfx/layers/ipc/CompositorBridgeParent.cpp:1493
#11 0x0000007fba2343e8 in mozilla::layers::CompositorBridgeParent::
    InitializeLayerManager (this=this@entry=0x7f8c9f23b0, aBackendHints=...)
    at gfx/layers/ipc/CompositorBridgeParent.cpp:1436
#12 0x0000007fba234574 in mozilla::layers::CompositorBridgeParent::
    AllocPLayerTransactionParent (this=this@entry=0x7f8c9f23b0, 
    aBackendHints=..., aId=...)
    at gfx/layers/ipc/CompositorBridgeParent.cpp:1546
#13 0x0000007fbca5eda4 in mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    AllocPLayerTransactionParent (this=0x7f8c9f23b0, aBackendHints=..., 
    aId=...) at mobile/sailfishos/embedthread/
    EmbedLiteCompositorBridgeParent.cpp:80
#14 0x0000007fb9af84f4 in mozilla::layers::PCompositorBridgeParent::
    OnMessageReceived (this=0x7f8c9f23b0, msg__=...) at 
    PCompositorBridgeParent.cpp:1285
[...]
#29 0x0000007fb706b7cc in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/
    clone.S:78
(gdb) c
Continuing.
[1205964.749] wl_shell_surface@22.ping(1902)
[1205965.156]  -> wl_shell_surface@22.pong(1902)
[1211002.805] wl_shell_surface@22.ping(1903)
[1211003.507]  -> wl_shell_surface@22.pong(1903)
library &quot;libui_compat_layer.so&quot; not found
[1213267.127]  -> wl_display@1.get_registry(new id wl_registry@2)
[1213267.719]  -> wl_display@1.sync(new id wl_callback@3)

Thread 38 &quot;Compositor&quot; hit Breakpoint 2, 0x0000007fb6bbb580 in 
    eglInitialize () from /usr/lib64/libEGL.so.1
(gdb) bt
#0  0x0000007fb6bbb580 in eglInitialize () from /usr/lib64/libEGL.so.1
#1  0x0000007fba06b5fc in mozilla::gl::GLLibraryEGL::fInitialize (minor=0x0, 
    major=0x0, dpy=0x1, this=0x7e9c111cb0)
    at gfx/gl/GLLibraryEGL.h:283
#2  mozilla::gl::EglDisplay::Create (lib=..., display=<optimized out>, 
    isWarp=isWarp@entry=false)
    at gfx/gl/GLLibraryEGL.cpp:676
#3  0x0000007fba06b690 in mozilla::gl::GetAndInitDisplay (egl=..., 
    displayType=displayType@entry=0x0, display=<optimized out>, 
    display@entry=0x0)
    at gfx/gl/GLLibraryEGL.cpp:154
#4  0x0000007fba06bc74 in mozilla::gl::GLLibraryEGL::CreateDisplay (
    this=this@entry=0x7e9c111cb0, forceAccel=forceAccel@entry=false, 
    out_failureId=out_failureId@entry=0x7efadfcfe0, aDisplay=aDisplay@entry=0x0)
    at gfx/gl/GLLibraryEGL.cpp:813
#5  0x0000007fba06cd5c in mozilla::gl::GLLibraryEGL::DefaultDisplay (
    this=0x7e9c111cb0, out_failureId=out_failureId@entry=0x7efadfcfe0)
    at gfx/gl/GLLibraryEGL.cpp:745
#6  0x0000007fba06ce7c in mozilla::gl::GLContextProviderEGL::
    CreateWrappingExisting (aContext=0x7e9c004230, aSurface=0x7fa000f610, 
    aDisplay=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/StaticPtr.h:150
#7  0x0000007fbca77924 in mozilla::embedlite::nsWindow::GetGLContext (
    this=this@entry=0x7f8c78ef00)
    at mobile/sailfishos/embedshared/nsWindow.cpp:405
#8  0x0000007fbca77adc in mozilla::embedlite::nsWindow::GetNativeData (
    this=0x7f8c78ef00, aDataType=12)
    at mobile/sailfishos/embedshared/nsWindow.cpp:173
#9  0x0000007fba0e8928 in mozilla::layers::CompositorOGL::CreateContext (
    this=this@entry=0x7e9c111220)
    at gfx/layers/opengl/CompositorOGL.cpp:232
#10 0x0000007fba0fdb50 in mozilla::layers::CompositorOGL::Initialize (
    this=0x7e9c111220, out_failureReason=0x7efadfd5a0)
    at gfx/layers/opengl/CompositorOGL.cpp:387
#11 0x0000007fba21d584 in mozilla::layers::CompositorBridgeParent::
    NewCompositor (this=this@entry=0x7f8c9f23b0, aBackendHints=...)
    at gfx/layers/ipc/CompositorBridgeParent.cpp:1493
#12 0x0000007fba2343e8 in mozilla::layers::CompositorBridgeParent::
    InitializeLayerManager (this=this@entry=0x7f8c9f23b0, aBackendHints=...)
    at gfx/layers/ipc/CompositorBridgeParent.cpp:1436
#13 0x0000007fba234574 in mozilla::layers::CompositorBridgeParent::
    AllocPLayerTransactionParent (this=this@entry=0x7f8c9f23b0, 
    aBackendHints=..., aId=...)
    at gfx/layers/ipc/CompositorBridgeParent.cpp:1546
#14 0x0000007fbca5eda4 in mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    AllocPLayerTransactionParent (this=0x7f8c9f23b0, aBackendHints=..., 
    aId=...) at mobile/sailfishos/embedthread/
    EmbedLiteCompositorBridgeParent.cpp:80
#15 0x0000007fb9af84f4 in mozilla::layers::PCompositorBridgeParent::
    OnMessageReceived (this=0x7f8c9f23b0, msg__=...) at 
    PCompositorBridgeParent.cpp:1285
[...]
#30 0x0000007fb706b7cc in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/
    clone.S:78
(gdb) c
Continuing.
[1226523.918] wl_shell_surface@22.ping(1904)
[1226524.854]  -> wl_shell_surface@22.pong(1904)
[New Thread 0x7efa5fe830 (LWP 16303)]

Thread 38 &quot;Compositor&quot; received signal SIGSEGV, Segmentation fault.
0x0000007faef77360 in ?? ()
(gdb) bt
#0  0x0000007faef77360 in ?? ()
#1  0x0000007f7598c6b0 in ?? ()
Backtrace stopped: not enough registers or memory available to unwind further
(gdb) 
Comparing against the ESR 78 log makes clear that something is going wrong with the ESR 91 exeuction. The first two calls made by the Qt Wayland client are correct. The three following these, made from inside the gecko code, shouldn't be happening at all.

Next I want to find out why these same calls aren't happening when running the ESR 78 code. It seems the display is being returned as 0x01, which is enough to prevent any further display initialisation code being needed:
Thread 37 &quot;Compositor&quot; hit Breakpoint 2, mozilla::gl::
    GetAndInitDisplay (display=0x1, displayType=0x0, egl=...) at gfx/gl/
    GLLibraryEGL.cpp:237
237       if (display == EGL_NO_DISPLAY) {
(gdb) p display
$1 = (EGLDisplay) 0x1
(gdb) bt
#0  mozilla::gl::GetAndInitDisplay (display=0x1, displayType=0x0, egl=...)
    at gfx/gl/GLLibraryEGL.cpp:237
#1  mozilla::gl::GLLibraryEGL::CreateDisplay (this=this@entry=0x7e9c1117f0, 
    forceAccel=forceAccel@entry=false, gfxInfo=..., 
    out_failureId=out_failureId@entry=0x7ee3107388, aDisplay=aDisplay@entry=0x1)
    at gfx/gl/GLLibraryEGL.cpp:826
#2  0x0000007fba4d5174 in mozilla::gl::GLLibraryEGL::DoEnsureInitialized (
    this=0x7e9c1117f0, forceAccel=<optimized out>, 
    out_failureId=out_failureId@entry=0x7ee3107388, aDisplay=0x1)
    at gfx/gl/GLLibraryEGL.cpp:578
#3  0x0000007fba4d5860 in mozilla::gl::GLLibraryEGL::DoEnsureInitialized (
    aDisplay=<optimized out>, out_failureId=0x7ee3107388, 
    forceAccel=<optimized out>, this=<optimized out>) at gfx/gl/
    GLLibraryEGL.cpp:388
#4  0x0000007fba4d598c in mozilla::gl::GLContextProviderEGL::
    CreateWrappingExisting (aContext=0x7e9c004db0, aSurface=0x55559410d0, 
    aDisplay=<optimized out>) at gfx/gl/GLContextProviderEGL.cpp:1098
#5  0x0000007fbcc8c910 in mozilla::embedlite::nsWindow::GetGLContext (
    this=0x7f8cc64ab0)
    at mobile/sailfishos/embedshared/nsWindow.cpp:415
#6  0x0000007fba55e0d8 in mozilla::layers::CompositorOGL::CreateContext (
    this=this@entry=0x7e9c003420)
    at gfx/layers/opengl/CompositorOGL.cpp:228
#7  0x0000007fba586ef8 in mozilla::layers::CompositorOGL::Initialize (
    this=0x7e9c003420, out_failureReason=0x7ee31077c0)
    at gfx/layers/opengl/CompositorOGL.cpp:374
#8  0x0000007fba675548 in mozilla::layers::CompositorBridgeParent::
    NewCompositor (this=this@entry=0x7f8c779b50, aBackendHints=...)
    at gfx/layers/ipc/CompositorBridgeParent.cpp:1534
#9  0x0000007fba6887ec in mozilla::layers::CompositorBridgeParent::
    InitializeLayerManager (this=0x7f8c779b50, aBackendHints=...)
    at gfx/layers/ipc/CompositorBridgeParent.cpp:1447
#10 0x0000007fba688968 in mozilla::layers::CompositorBridgeParent::
    AllocPLayerTransactionParent (this=this@entry=0x7f8c779b50, 
    aBackendHints=..., aId=...)
    at gfx/layers/ipc/CompositorBridgeParent.cpp:1587
#11 0x0000007fbcc6aa44 in mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    AllocPLayerTransactionParent (this=0x7f8c779b50, aBackendHints=..., 
    aId=...) at mobile/sailfishos/embedthread/
    EmbedLiteCompositorBridgeParent.cpp:78
#12 0x0000007fb9ecbfac in mozilla::layers::PCompositorBridgeParent::
    OnMessageReceived (this=0x7f8c779b50, msg__=...) at 
    PCompositorBridgeParent.cpp:1391
[...]
#26 0x0000007fb726e7cc in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/
    clone.S:78
(gdb) 
As we can see here, the display is set to 0x01 and seems to be coming from the call to RequestGLContext() in the following method:
GLContext*
nsWindow::GetGLContext() const
{
  LOGT(&quot;this:%p, UseExternalContext:%d&quot;, this, sUseExternalGLContext);
  if (sUseExternalGLContext) {
    void* context = nullptr;
    void* surface = nullptr;
    void* display = nullptr;
    if (mWindow && mWindow->GetListener()->RequestGLContext(context, surface, 
    display)) {
      MOZ_ASSERT(context && surface);
      RefPtr<GLContext> mozContext = GLContextProvider::CreateWrappingExisting(
    context, surface, display);
      if (!mozContext || !mozContext->Init()) {
        NS_ERROR(&quot;Failed to initialize external GL context!&quot;);
        return nullptr;
      }
      return mozContext.forget().take();
    } else {
      NS_ERROR(&quot;Embedder wants to use external GL context without actually 
    providing it!&quot;);
    }
  }
  return nullptr;
}
This then pulls in the display from qtmozembed like so:
Thread 37 &quot;Compositor&quot; hit Breakpoint 3, QMozWindowPrivate::
    RequestGLContext (this=0x5555dc56e0, context=@0x7ee31643d0: 0x0, 
    surface=@0x7ee31643d8: 0x0, display=@0x7ee31643e0: 0x0) at qmozwindow_p.cpp:
    133
133         q.requestGLContext();
(gdb) bt
#0  QMozWindowPrivate::RequestGLContext (this=0x5555dc56e0, 
    context=@0x7ee31643d0: 0x0, surface=@0x7ee31643d8: 0x0, 
    display=@0x7ee31643e0: 0x0)
    at qmozwindow_p.cpp:133
#1  0x0000007fbcc8c8fc in mozilla::embedlite::nsWindow::GetGLContext (
    this=0x7f8ccce770)
    at mobile/sailfishos/embedshared/nsWindow.cpp:413
#2  0x0000007fba55e0d8 in mozilla::layers::CompositorOGL::CreateContext (
    this=this@entry=0x7e9c003420)
    at gfx/layers/opengl/CompositorOGL.cpp:228
#3  0x0000007fba586ef8 in mozilla::layers::CompositorOGL::Initialize (
    this=0x7e9c003420, out_failureReason=0x7ee31647c0)
    at gfx/layers/opengl/CompositorOGL.cpp:374
#4  0x0000007fba675548 in mozilla::layers::CompositorBridgeParent::
    NewCompositor (this=this@entry=0x7f8cae31e0, aBackendHints=...)
    at gfx/layers/ipc/CompositorBridgeParent.cpp:1534
#5  0x0000007fba6887ec in mozilla::layers::CompositorBridgeParent::
    InitializeLayerManager (this=0x7f8cae31e0, aBackendHints=...)
    at gfx/layers/ipc/CompositorBridgeParent.cpp:1447
#6  0x0000007fba688968 in mozilla::layers::CompositorBridgeParent::
    AllocPLayerTransactionParent (this=this@entry=0x7f8cae31e0, 
    aBackendHints=..., aId=...)
    at gfx/layers/ipc/CompositorBridgeParent.cpp:1587
#7  0x0000007fbcc6aa44 in mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    AllocPLayerTransactionParent (this=0x7f8cae31e0, aBackendHints=..., 
    aId=...) at mobile/sailfishos/embedthread/
    EmbedLiteCompositorBridgeParent.cpp:78
#8  0x0000007fb9ecbfac in mozilla::layers::PCompositorBridgeParent::
    OnMessageReceived (this=0x7f8cae31e0, msg__=...) at 
    PCompositorBridgeParent.cpp:1391
[...]
#22 0x0000007fb726e7cc in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/
    clone.S:78
(gdb) n
mozilla::embedlite::nsWindow::GetGLContext (this=0x7f8ccce770) at mobile/
    sailfishos/embedshared/nsWindow.cpp:415
415           RefPtr<GLContext> mozContext = GLContextProvider::
    CreateWrappingExisting(context, surface, display);
(gdb) n

Thread 37 &quot;Compositor&quot; hit Breakpoint 2, mozilla::gl::
    GetAndInitDisplay (display=0x1, displayType=0x0, egl=...) at gfx/gl/
    GLLibraryEGL.cpp:237
237       if (display == EGL_NO_DISPLAY) {
(gdb) p display
$3 = (EGLDisplay) 0x1
(gdb) 
Looking back at the debugger logs for ESR 91 we see something quite different. There the display starts off as 0x01, which is correct, but this doesn't prevent a call to eglInitialize() being made, which is incorrect. Looking through the code this can be attributed to a lack of guarding around the initialise call in the EglDisplay::Create() method:
// static
std::shared_ptr<EglDisplay> EglDisplay::Create(GLLibraryEGL& lib,
                                               const EGLDisplay display,
                                               const bool isWarp) {
  // Retrieve the EglDisplay if it already exists
  {
    const auto itr = lib.mActiveDisplays.find(display);
    if (itr != lib.mActiveDisplays.end()) {
      const auto ret = itr->second.lock();
      if (ret) {
        return ret;
      }
    }
  }

  if (!lib.fInitialize(display, nullptr, nullptr)) {
    return nullptr;
  }
[...]
We can wrap the call to fInitialize() in a condition to stop this happening, like this:
  if (display == EGL_NO_DISPLAY) {
    if (!lib.fInitialize(display, nullptr, nullptr)) {
      return nullptr;
    }
  }
That should fix the first one. However we then have a call to fGetDisplay() which seems to be happening due to the display value being set to 0x00. Why that's happening I do not know. However, right now it's nearly midnight and I'm halfway between Jersey and Great Britain on a ferry. The Ferry docks at 6 am and so enforces a 5 am wake-up call on all passengers. So that means it's most definitely time for me to pause for the day and head to bed.

I'm going to leave the build running during the trip and with any luck it'll be completed by the time the ferry docks.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
19 Aug 2024 : Day 324 #
This morning I woke up to find the gecko libraries had successfully built. Which means today it's time to test them. But before I can do that I'll need to also rebuild all of the packages that depend on the new gecko library. This includes qtmozembed, embedlite-components, sailfish-components-webview, sailfish-browser and the browser booster. Thankfully building all of these turns out to be pretty straightforward using the SDK's nice output-prefix option.
$ cd ../qtmozembed/
$ sfdk build -d -p
[...]
$ cd ../embedlite-components
$ sfdk build -d -p
[...]
$ cd ../sailfish-components-webview/
$ sfdk build -d -p
[...]
$ cd ../sailfish-browser/
$ sfdk build -d -p
[...]
$ cd ../mapplauncherd-booster-browser/
$ sfdk build -d -p
[...]
$ cd ../harbour-webview-example/
$ sfdk build -d -p
[...]
This leaves me with the following projects, each of which has several associated packages:
  1. rust-cbindgen
  2. nspr
  3. xulrunner-qt5
  4. qtmozembed-qt5
  5. embedlite-components-qt5
  6. sailfish-components-webview-qt5
  7. sailfish-browser
  8. mapplauncherd-booster-browser
  9. harbour-webview
As I was explaining yesterday, I've upgraded one of my two development phones so that it's now running Sailfish OS 4.6. Having transferred all of the packages over to it as well they now need to be installed. There are, it has to be said, quite a few packages to install.
$ rpm -U --force xulrunner-qt5-91.*.rpm xulrunner-qt5-debuginfo-91.*.rpm \
    xulrunner-qt5-debugsource-91.*.rpm xulrunner-qt5-misc-91.*.rpm \
    qtmozembed-qt5-1.*.rpm qtmozembed-qt5-debuginfo-1.*.rpm \
    qtmozembed-qt5-debugsource-1.*.rpm sailfish-components-webview-qt5-1.*.rpm \
    sailfish-components-webview-qt5-debuginfo-1.*.rpm \
    sailfish-components-webview-qt5-debugsource-1.*.rpm \
    sailfish-components-webview-qt5-pickers-1.*.rpm \
    sailfish-components-webview-qt5-popups-1.*.rpm \
    embedlite-components-qt5-1.*.rpm embedlite-components-qt5-debuginfo-1.*.rpm \
    embedlite-components-qt5-debugsource-1.*.rpm \
    sailfish-browser-2.*.rpm sailfish-browser-debuginfo-2.*.rpm \
    sailfish-browser-debugsource-2.*.rpm sailfish-browser-settings-2.*.rpm \
    mapplauncherd-booster-browser-0.*.rpm \
    mapplauncherd-booster-browser-debuginfo-0.*.rpm \
    mapplauncherd-booster-browser-debugsource-0.*.rpm harbour-webview-0.*.rpm \
    harbour-webview-debuginfo-0.*.rpm harbour-webview-debugsource-0.*.rpm
With those installed it's now, finally, time to test out ESR 91 on Sailfish OS 4.6. Sadly, the results are as expected, in that it fails to get to the first page render before crashing:
$ sailfish-browser 
[D] unknown:0 - Using Wayland-EGL
library &quot;libui_compat_layer.so&quot; not found
library &quot;libGLESv2_adreno.so&quot; not found
library &quot;eglSubDriverAndroid.so&quot; not found
greHome from GRE_HOME:/usr/bin
libxul.so is not found, in /usr/bin/libxul.so
Created LOG for EmbedLiteTrace
[W] unknown:0 - MeeGo.QOfono QML module name is deprecated and subject for 
    removal. Please adapt code to &quot;import QOfono&quot;.
[D] onCompleted:108 - ViewPlaceholder requires a SilicaFlickable parent
Created LOG for EmbedLite
[D] unknown:0 - Updating services as GetServices returns
[D] unknown:0 - No default route set, services: 3
[D] unknown:0 - Selected service &quot;kolbe&quot; path &quot;/net/connman/
    service/wifi_3c01efa2be4a_6b6f6c6265_managed_psk&quot;
Created LOG for EmbedPrefs
Created LOG for EmbedLiteLayerManager
Segmentation fault (core dumped)
Since I'd already been told by the devs at Jolla that it wasn't working for them, it's not a surprise that it's crashing for me as well. A small part of me had hoped that the fixes I added to address video rendering, which prevented multiple GLLibraryEGL instances from being created, might have helped with this issue too. But sadly that wasn't to be.

Now is probably a good time to note that this diary entry is a messy one today. I've collected lots of logs, but they don't make for good reading. On the other hand, it helps for me to keep a record, so while I'll try to only include the relevant parts, there's still going to be a lot of cruft today.

Running with the debugger doesn't turn out to be especially informative. The backtrace for the crash is nonexistent and there are way too many threads to easily pick out where the actual problem lies:
$ gdb sailfish-browser
[...]
Thread 42 &quot;Compositor&quot; received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ef28bd830 (LWP 15325)]
0x0000007faef77360 in ?? ()
(gdb) bt
#0  0x0000007faef77360 in ?? ()
#1  0x0000007f7598c6b0 in ?? ()
Backtrace stopped: not enough registers or memory available to unwind further
(gdb) info thread
  Id   Target Id                                         Frame 
  1    Thread 0x7fb2e75010 (LWP 14987) &quot;sailfish-browse&quot; 
    0x0000007fb7061e14 in __GI___poll (fds=0x5555c94a40, nfds=5, 
    timeout=<optimized out>)
    at ../sysdeps/unix/sysv/linux/poll.c:41
  2    Thread 0x7fac607830 (LWP 15281) &quot;QQmlThread&quot;      
    0x0000007fb7061e14 in __GI___poll (fds=0x7fa80013d0, nfds=1, 
    timeout=<optimized out>)
    at ../sysdeps/unix/sysv/linux/poll.c:41
  3    Thread 0x7fac406830 (LWP 15282) &quot;QDBusConnection&quot; 
    0x0000007fb7061e14 in __GI___poll (fds=0x7fa000d290, nfds=4, 
    timeout=<optimized out>)
    at ../sysdeps/unix/sysv/linux/poll.c:41
  4    Thread 0x7fac205830 (LWP 15283) &quot;pool-spawner&quot;    syscall () 
    at ../sysdeps/unix/sysv/linux/aarch64/syscall.S:38
  5    Thread 0x7fa77fe830 (LWP 15284) &quot;gmain&quot;           
    0x0000007fb7061e14 in __GI___poll (fds=0x5555847530, nfds=1, 
    timeout=<optimized out>)
    at ../sysdeps/unix/sysv/linux/poll.c:41
  6    Thread 0x7fa75fd830 (LWP 15285) &quot;dconf worker&quot;    
    0x0000007fb7061e14 in __GI___poll (fds=0x7f90000b80, nfds=1, 
    timeout=<optimized out>)
    at ../sysdeps/unix/sysv/linux/poll.c:41
  7    Thread 0x7fa73fc830 (LWP 15286) &quot;gdbus&quot;           
    0x0000007fb7061e14 in __GI___poll (fds=0x7f94000b80, nfds=3, 
    timeout=<optimized out>)
    at ../sysdeps/unix/sysv/linux/poll.c:41
  8    Thread 0x7fa6826830 (LWP 15290) &quot;QThread&quot;         
    0x0000007fb7061e14 in __GI___poll (fds=0x7f880013d0, nfds=1, 
    timeout=<optimized out>)
    at ../sysdeps/unix/sysv/linux/poll.c:41
  9    Thread 0x7fa64ec830 (LWP 15291) &quot;QQuickPixmapRea&quot; 
    0x0000007fb7061e14 in __GI___poll (fds=0x7f8c00b400, nfds=1, 
    timeout=<optimized out>)
    at ../sysdeps/unix/sysv/linux/poll.c:41
  10   Thread 0x7fa62eb830 (LWP 15292) &quot;Qt bearer threa&quot; 
    0x0000007fb7061e14 in __GI___poll (fds=0x7f80001430, nfds=1, 
    timeout=<optimized out>)
    at ../sysdeps/unix/sysv/linux/poll.c:41
  11   Thread 0x7fa6027830 (LWP 15293) &quot;GeckoWorkerThre&quot; clone () at 
    ../sysdeps/unix/sysv/linux/aarch64/clone.S:63
  13   Thread 0x7fa5e26830 (LWP 15295) &quot;QSGRenderThread&quot; 
    futex_wait_cancelable (private=128, expected=0, futex_word=0x5555ceb87c)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  14   Thread 0x7fa5ac1830 (LWP 15296) &quot;IPC I/O Parent&quot;  syscall () 
    at ../sysdeps/unix/sysv/linux/aarch64/syscall.S:39
  15   Thread 0x7fa4079830 (LWP 15297) &quot;Netlink Monitor&quot; 
    0x0000007fb7061e14 in __GI___poll (fds=fds@entry=0x7fa4078be8, 
    nfds=nfds@entry=2, 
    timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:41
  16   Thread 0x7ef83fe830 (LWP 15298) &quot;Socket Thread&quot;   
    0x0000007fb7061e14 in __GI___poll (fds=0x7ef83fd800, nfds=2, 
    timeout=<optimized out>)
    at ../sysdeps/unix/sysv/linux/poll.c:41
  17   Thread 0x7ef3dbe830 (LWP 15299) &quot;TaskCon~read #0&quot; 
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7f84030914)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  18   Thread 0x7ef3bbf830 (LWP 15300) &quot;TaskCon~read #1&quot; 
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7f84030910)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  19   Thread 0x7ef39c0830 (LWP 15301) &quot;TaskCon~read #2&quot; 
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7f84030910)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  20   Thread 0x7ef37c1830 (LWP 15302) &quot;TaskCon~read #3&quot; 
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7f84030914)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  21   Thread 0x7ef35c2830 (LWP 15303) &quot;TaskCon~read #4&quot; 
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7f84030914)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  22   Thread 0x7ef33c3830 (LWP 15304) &quot;TaskCon~read #5&quot; 
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7f84030910)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  23   Thread 0x7ef31c4830 (LWP 15305) &quot;TaskCon~read #6&quot; 
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7f84030910)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  24   Thread 0x7ef2fc5830 (LWP 15306) &quot;TaskCon~read #7&quot; 
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7f84030910)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  26   Thread 0x7ef80fe830 (LWP 15308) &quot;Timer&quot;           
    futex_abstimed_wait_cancelable (private=<optimized out>, 
    abstime=0x7ef80fdb58, 
    clockid=<optimized out>, expected=0, futex_word=0x7f8403158c) at ../sysdeps/
    unix/sysv/linux/futex-internal.h:208
  27   Thread 0x7ef80bd830 (LWP 15309) &quot;IPDL Background&quot; 
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7f8444ed94)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  28   Thread 0x7ef8329830 (LWP 15310) &quot;Cache2 I/O&quot;      
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7f8444a184)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  29   Thread 0x7ef807c830 (LWP 15311) &quot;Cookie&quot;          
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7f84622e40)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  30   Thread 0x7ef3efe830 (LWP 15312) &quot;Backgro~Pool #1&quot; 
    futex_abstimed_wait_cancelable (private=<optimized out>, 
    abstime=0x7ef3efdb28, 
    clockid=<optimized out>, expected=0, futex_word=0x7f840312ac) at ../sysdeps/
    unix/sysv/linux/futex-internal.h:208
  31   Thread 0x7ef3ebd830 (LWP 15313) &quot;StreamTrans #1&quot;  
    0x0000007fb7007330 in _int_free (av=0x7eb4000020, p=0x7eb445b300, 
    have_lock=0) at malloc.c:4184
  32   Thread 0x7ef3e7c830 (LWP 15314) &quot;QuotaManager IO&quot; 
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7ec4003dd4)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  34   Thread 0x7ef2d64830 (LWP 15316) &quot;ProxyResolution&quot; 
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7f84abc064)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  35   Thread 0x7ef2b08830 (LWP 15317) &quot;Worker Launcher&quot; 
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7f84b29470)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  36   Thread 0x7ef2ac7830 (LWP 15318) &quot;StreamTrans #2&quot;  
    futex_abstimed_wait_cancelable (private=<optimized out>, 
    abstime=0x7ef2ac6b28, 
    clockid=<optimized out>, expected=0, futex_word=0x7f8420a238) at ../sysdeps/
    unix/sysv/linux/futex-internal.h:208
  37   Thread 0x7ef2839830 (LWP 15320) &quot;IndexedDB #1&quot;    
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7ec4005704)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  38   Thread 0x7ef2a86830 (LWP 15321) &quot;StreamTrans #3&quot;  
    futex_abstimed_wait_cancelable (private=<optimized out>, 
    abstime=0x7ef2a85b28, 
    clockid=<optimized out>, expected=0, futex_word=0x7f8420a238) at ../sysdeps/
    unix/sysv/linux/futex-internal.h:208
  39   Thread 0x7ef8392830 (LWP 15322) &quot;TRR Background&quot;  
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7f84adbb80)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  40   Thread 0x7ef2a45830 (LWP 15323) &quot;DNS Resolver #1&quot; 
    futex_abstimed_wait_cancelable (private=<optimized out>, 
    abstime=0x7ef2a449e8, 
    clockid=<optimized out>, expected=0, futex_word=0x7f84af875c) at ../sysdeps/
    unix/sysv/linux/futex-internal.h:208
  41   Thread 0x7ef28fe830 (LWP 15324) &quot;Softwar~cThread&quot; 
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7ef28fdcc8)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
* 42   Thread 0x7ef28bd830 (LWP 15325) &quot;Compositor&quot;      
    0x0000007faef77360 in ?? ()
  43   Thread 0x7ef27f8830 (LWP 15326) &quot;ImageIO&quot;         
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7f847a9ab0)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  44   Thread 0x7ef27b7830 (LWP 15327) &quot;DNS Resolver #2&quot; 
    futex_abstimed_wait_cancelable (private=<optimized out>, 
    abstime=0x7ef27b69e8, 
    clockid=<optimized out>, expected=0, futex_word=0x7f84af875c) at ../sysdeps/
    unix/sysv/linux/futex-internal.h:208
  45   Thread 0x7ef2776830 (LWP 15328) &quot;DNS Resolver #3&quot; 
    futex_abstimed_wait_cancelable (private=<optimized out>, 
    abstime=0x7ef27759e8, 
    clockid=<optimized out>, expected=0, futex_word=0x7f84af875c) at ../sysdeps/
    unix/sysv/linux/futex-internal.h:208
  46   Thread 0x7fa5c25830 (LWP 15331) &quot;ImageBridgeChld&quot; 
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7f847b3080)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  47   Thread 0x7ef2735830 (LWP 15332) &quot;Compositor&quot;      
    0x0000007fae9fb4f4 in ?? ()
  48   Thread 0x7ef2534830 (LWP 15333) &quot;GeckoWorkerThre&quot; clone () at 
    ../sysdeps/unix/sysv/linux/aarch64/clone.S:63
(gdb) 
To help pin down the problem I've placed a breakpoint on the GLLibraryEGL constructor to find out whether the crash comes before or after. It turns out to be after, which means I can also step through the code from there to try to figure out when it crashes with a bit more accuracy.
Thread 42 &quot;Compositor&quot; hit Breakpoint 1, mozilla::gl::GLLibraryEGL::
    GLLibraryEGL (this=0x7e8c1113d0) at gfx/gl/GLLibraryEGL.h:113
113     class GLLibraryEGL final {
(gdb) n
mozilla::gl::GLLibraryEGL::Create (
    out_failureId=out_failureId@entry=0x7ef28bbfe0, aDisplay=0x1) at gfx/gl/
    GLLibraryEGL.cpp:344
344       RefPtr<GLLibraryEGL> ret = new GLLibraryEGL;
(gdb) n
345       if (!ret->Init(false, out_failureId, aDisplay)) {
(gdb) n
348       return ret;
(gdb) n
mozilla::gl::GLContextProviderEGL::CreateWrappingExisting (
    aContext=0x7e8c004230, aSurface=0x5555c135d0, aDisplay=<optimized out>) at 
    gfx/gl/GLContextProviderEGL.cpp:1008
1008        gDefaultEglLibrary = GLLibraryEGL::Create(&failureId, aDisplay);
(gdb) n
[...]
1005      nsCString failureId;
(gdb) n
mozilla::embedlite::nsWindow::GetGLContext (this=this@entry=0x7f847b9c20) at 
    mobile/sailfishos/embedshared/nsWindow.cpp:406
406           if (!mozContext || !mozContext->Init()) {
(gdb) n
mozilla::layers::CompositorOGL::CreateContext (this=this@entry=0x7e8c110d80) at 
    gfx/layers/opengl/CompositorOGL.cpp:234
234       if (widgetOpenGLContext) {
(gdb) n
79      ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h: No such 
    file or directory.
(gdb) n
mozilla::layers::CompositorOGL::Initialize (this=0x7e8c110d80, 
    out_failureReason=0x7ef28bc5a0) at gfx/layers/opengl/CompositorOGL.cpp:387
387         mGLContext = CreateContext();
(gdb) n
397       if (!mGLContext) {
(gdb) n
[...]
379       ScopedGfxFeatureReporter reporter(&quot;GL Layers&quot;);
(gdb) n
mozilla::layers::CompositorBridgeParent::NewCompositor (
    this=this@entry=0x7f84c16450, aBackendHints=...) at ${PROJECT}/
    obj-build-mer-qt-xr/dist/include/nsTStringRepr.h:91
1495            failureReason = &quot;SUCCESS&quot;;
(gdb) n
[...]
1488          return nullptr;
(gdb) n
mozilla::layers::CompositorBridgeParent::InitializeLayerManager (
    this=this@entry=0x7f84c16450, aBackendHints=...) at gfx/layers/ipc/
    CompositorBridgeParent.cpp:1436
1436      mCompositor = NewCompositor(aBackendHints);
(gdb) n
[...]
1450      MonitorAutoLock lock(*sIndirectLayerTreesLock);
(gdb) n
mozilla::layers::CompositorBridgeParent::AllocPLayerTransactionParent (
    this=this@entry=0x7f84c16450, aBackendHints=..., aId=...) at gfx/layers/ipc/
    CompositorBridgeParent.cpp:1548
1548      if (!mLayerManager) {
(gdb) n
[...]
1561      p->AddIPDLReference();
(gdb) n
mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    AllocPLayerTransactionParent (this=0x7f84c16450, aBackendHints=..., 
    aId=...) at mobile/sailfishos/embedthread/
    EmbedLiteCompositorBridgeParent.cpp:83
83        EmbedLiteWindowParent *parentWindow = EmbedLiteWindowParent::From(
    mWindowId);
(gdb) n
[...]
88        if (!StaticPrefs::embedlite_compositor_external_gl_context()) {
(gdb) n

mozilla::layers::PCompositorBridgeParent::OnMessageReceived (this=0x7f84c16450, 
    msg__=...) at PCompositorBridgeParent.cpp:1286
1286    PCompositorBridgeParent.cpp: No such file or directory.
(gdb) n
1291    in PCompositorBridgeParent.cpp
(gdb) n
1292    in PCompositorBridgeParent.cpp
(gdb) n
1294    in PCompositorBridgeParent.cpp
(gdb) n
1254    in PCompositorBridgeParent.cpp
(gdb) n
1250    in PCompositorBridgeParent.cpp
(gdb) n
mozilla::layers::PCompositorManagerParent::OnMessageReceived (this=<optimized 
    out>, msg__=...) at PCompositorManagerParent.cpp:199
199     PCompositorManagerParent.cpp: No such file or directory.
(gdb) n
mozilla::ipc::MessageChannel::DispatchAsyncMessage (
    this=this@entry=0x7f84dd3ca8, aProxy=aProxy@entry=0x7e8c10be10, aMsg=...) 
    at ipc/glue/MessageChannel.cpp:2075
2075                                         nestedLevel);
(gdb) n
[...]
2078      MaybeHandleError(rv, aMsg, &quot;DispatchAsyncMessage&quot;);
(gdb) n
mozilla::ipc::MessageChannel::DispatchMessage (this=this@entry=0x7f84dd3ca8, 
    aMsg=...) at ipc/glue/MessageChannel.cpp:1992
1992          CxxStackFrame frame(*this, IN_MESSAGE, &aMsg);
(gdb) n
[...]
1971      RefPtr<ActorLifecycleProxy> listenerProxy = 
    mListener->GetLifecycleProxy();
(gdb)
mozilla::ipc::MessageChannel::MessageTask::Run (this=0x5555941070) at ipc/glue/
    MessageChannel.cpp:1877
1877      MonitorAutoLock lock(*mMonitor);
(gdb) n
nsThread::ProcessNextEvent (this=0x7f84d05580, aMayWait=<optimized out>, 
    aResult=0x7ef28bcd07) at xpcom/threads/nsThread.cpp:1158
1158            mPerformanceCounterState.RunnableDidRun(std::move(snapshot.ref(
    )));
(gdb) n
[...]
1120          Maybe<LogRunnable::Run> log;
(gdb) n
NS_ProcessNextEvent (aThread=<optimized out>, aThread@entry=0x7f84d05580, 
    aMayWait=aMayWait@entry=true) at ${PROJECT}/obj-build-mer-qt-xr/dist/
    include/nsError.h:30
(gdb) n
mozilla::ipc::MessagePumpForNonMainThreads::Run (this=0x7e8c001840, 
    aDelegate=0x7ef28bce00) at ipc/glue/MessagePump.cpp:300
300         bool didWork = NS_ProcessNextEvent(thread, false) ? true : false;
(gdb) n
[...]
300         bool didWork = NS_ProcessNextEvent(thread, false) ? true : false;
(gdb)
301         if (!keep_running_) {
(gdb)
interface '(null)' has no event 0
[C] unknown:0 - The Wayland connection experienced a fatal error (Resource 
    temporarily unavailable)
305         didWork |= aDelegate->DoDelayedWork(&delayed_work_time_);
(gdb)
307         if (didWork && delayed_work_time_.is_null()) {
(gdb)
333       bool is_null() const { return ticks_ == 0; }
(gdb)
308           mDelayedWorkTimer->Cancel();
(gdb)
311         if (!keep_running_) {
(gdb)
[Parent 16750, StreamTrans #8] ###!!! ABORT: file ${PROJECT}/gecko-dev/ipc/
    chromium/src/base/thread_local_posix.cc:35
[Parent 16750, StreamTrans #8] ###!!! ABORT: file ${PROJECT}/gecko-dev/ipc/
    chromium/src/base/thread_local_posix.cc:35

Thread 58 &quot;StreamTrans #8&quot; received signal SIGSEGV, Segmentation 
    fault.
[Switching to Thread 0x7ef2768830 (LWP 18288)]
0x0000007fbfb8d1e8 in mozalloc_abort () from /usr/lib64/libqt5embedwidget.so.1
(gdb)
Single stepping until exit from function mozalloc_abort,
which has no line number information.
[New Thread 0x7ef27a9830 (LWP 18290)]
WasmTrapHandler (signum=11, info=0x7ef27663f0, context=0x7ef2766470) at js/src/
    wasm/WasmSignalHandlers.cpp:705
705     static void WasmTrapHandler(int signum, siginfo_t* info, void* context) 
    {
(gdb)

Thread 1 &quot;sailfish-browse&quot; received signal SIGSEGV, Segmentation 
    fault.
[Switching to Thread 0x7fb2e75010 (LWP 16750)]
0x0000007faef777e0 in ?? ()
(gdb)
Cannot find bounds of current function
(gdb)
Stepping through isn't completely enlightening, but it does throw up some potentially useful information. For example, while stepping through the code the following shows up in the debug output:
interface '(null)' has no event 0
[C] unknown:0 - The Wayland connection experienced a fatal error (Resource 
    temporarily unavailable)
A search of the Web for this shows some hits, notably in the Qt libraries, but nothing to fully uncover what's going on here. I figure it's also worth checking with the WebView, which has a substantially different rendering pipeline, to see whether that's working or not. It turns out it's not... the crash happens in pretty much the same place:
$ gdb harbour-webview
[...]
Created LOG for EmbedLiteLayerManager
[New Thread 0x7ebbda5830 (LWP 23770)]
library &quot;libui_compat_layer.so&quot; not found
[New Thread 0x7ebb7fe830 (LWP 23771)]
=============== Preparing offscreen rendering context ===============
[New Thread 0x7ebbaef830 (LWP 23772)]
[New Thread 0x7ebbaae830 (LWP 23773)]
[New Thread 0x7ebb3fe830 (LWP 23774)]

Thread 10 &quot;QSGRenderThread&quot; received signal SIGSEGV, Segmentation 
    fault.
[Switching to Thread 0x7fa73f8830 (LWP 23733)]
0x0000007faf60a360 in ?? ()
(gdb) bt
#0  0x0000007faf60a360 in ?? ()
#1  0x0000007fae0f99f8 in ?? ()
Backtrace stopped: not enough registers or memory available to unwind further
(gdb) 
Just in case it's related, I'm also going to install the NSPR packages that I built against, even if they should be the same as the packages already available in Sailfish OS 4.6:
$ rpm -U --force nspr-4.*.rpm nspr-debuginfo-4.*.rpm nspr-debugsource-4.*.rpm
Another thing that could be worth trying is running with the WAYLAND_DEBUG> environment variable set to 1. This will print out a huge amount of debug info related to Wayland, so could throw something useful up.
$ WAYLAND_DEBUG=1 sailfish-browser
[3059655.707]  -> wl_display@1.get_registry(new id wl_registry@2)
[3059655.852]  -> wl_display@1.sync(new id wl_callback@3)
[3059657.925] wl_display@1.delete_id(3)
[3059658.067] wl_registry@2.global(1, &quot;wl_compositor&quot;, 3)
[3059658.109]  -> wl_registry@2.bind(1, &quot;wl_compositor&quot;, 3, new id 
    [unknown]@4)
[3059658.169] wl_registry@2.global(2, &quot;wl_data_device_manager&quot;, 1)
[3059658.240]  -> wl_registry@2.bind(2, &quot;wl_data_device_manager&quot;, 1, 
    new id [unknown]@5)
[3059658.288] wl_registry@2.global(3, &quot;wl_shm&quot;, 1)
[3059658.296]  -> wl_registry@2.bind(3, &quot;wl_shm&quot;, 1, new id 
    [unknown]@6)
[3059658.306] wl_registry@2.global(4, &quot;qt_hardware_integration&quot;, 1)
[3059658.351]  -> wl_registry@2.bind(4, &quot;qt_hardware_integration&quot;, 1, 
    new id [unknown]@7)
[3059658.433]  -> wl_display@1.sync(new id wl_callback@8)
[3059658.511] wl_registry@2.global(5, &quot;android_wlegl&quot;, 2)
[3059658.520] wl_registry@2.global(6, &quot;qt_surface_extension&quot;, 1)
[3059658.543]  -> wl_registry@2.bind(6, &quot;qt_surface_extension&quot;, 1, 
    new id [unknown]@9)
[3059658.552] wl_registry@2.global(7, &quot;qt_touch_extension&quot;, 1)
[3059658.581]  -> wl_registry@2.bind(7, &quot;qt_touch_extension&quot;, 1, new 
    id [unknown]@10)
[3059658.598] wl_registry@2.global(8, &quot;qt_windowmanager&quot;, 1)
[3059658.614]  -> wl_registry@2.bind(8, &quot;qt_windowmanager&quot;, 1, new id 
    [unknown]@11)
[3059658.627] wl_registry@2.global(9, &quot;wl_seat&quot;, 3)
[3059658.663]  -> wl_registry@2.bind(9, &quot;wl_seat&quot;, 3, new id 
    [unknown]@12)
[3059658.689]  -> wl_data_device_manager@5.get_data_device(new id 
    wl_data_device@13, wl_seat@12)
[3059658.776] wl_registry@2.global(10, &quot;wl_output&quot;, 2)
[3059658.813]  -> wl_registry@2.bind(10, &quot;wl_output&quot;, 2, new id 
    [unknown]@14)
[3059658.837]  -> wl_display@1.sync(new id wl_callback@15)
[...]
Created LOG for EmbedPrefs
[3061814.124]  -> wl_surface@20.set_buffer_transform(0)
[3061814.556]  -> wl_surface@20.commit()
[3061821.476] wl_display@1.delete_id(42)
[3061821.549] wl_buffer@4278190084.release()
[3061821.570] wl_callback@42.done(17852739)
[3061821.612]  -> wl_surface@24.frame(new id wl_callback@42)
[3061821.645]  -> wl_surface@24.attach(wl_buffer@4278190083, 0, 0)
[3061821.690]  -> wl_surface@24.damage(0, 0, 1080, 2520)
[3061821.770]  -> wl_surface@24.commit()
[3061821.787]  -> wl_display@1.sync(new id wl_callback@47)
[3061825.338] discarded [unknown]@47.[event 0](0 fd, 12 byte)
[...]
[3061998.156] wl_callback@51.done(1181)
[3061997.685] wl_buffer@4278190084.release()
[3061998.422] wl_display@1.delete_id(42)
[3061998.473] wl_callback@42.done(17852914)
[3061998.518]  -> wl_surface@24.frame(new id wl_callback@42)
[3061998.559]  -> wl_surface@24.attach(wl_buffer@4278190083, 0, 0)
[3061998.596]  -> wl_surface@24.damage(0, 0, 1080, 2520)
[3061998.631]  -> wl_surface@24.commit()
[3061998.663]  -> wl_display@1.sync(new id wl_callback@51)
[3062000.088] discarded [unknown]@51.[event 0](0 fd, 12 byte)
[3062000.249] wl_display@1.delete_id(51)
[3062015.861] wl_buffer@4278190081.release()
[3062017.381] wl_display@1.delete_id(50)
[3062018.195] wl_callback@50.done(17852935)
[3062018.284]  -> wl_surface@20.frame(new id wl_callback@50)
[3062018.309]  -> wl_surface@20.attach(wl_buffer@4278190080, 0, 0)
[3062018.331]  -> wl_surface@20.damage(0, 0, 1080, 2520)
[3062018.387]  -> wl_surface@20.commit()
[3062018.435]  -> wl_display@1.sync(new id wl_callback@51)
[3062017.374] wl_display@1.delete_id(42)
[3062023.593] discarded [unknown]@51.[event 0](0 fd, 12 byte)
[3062023.788] wl_display@1.delete_id(51)
library &quot;libui_compat_layer.so&quot; not found
Segmentation fault (core dumped)
In practice the Wayland debug output isn't immediately helpful. It's pretty overwhelming actually. From what I can tell it looks cyclical and I can't see any obvious reasons why it would work on one loop but fail on the next. Placing a breakpoint on the AllocPLayerTransactionParent() method and giving it another go doesn't uncover anything else that might be useful for fixing this either:
Created LOG for EmbedLiteLayerManager
[3818873.473]  -> wl_compositor@4.create_surface(new id wl_surface@46)
[3818873.680]  -> android_wlegl@27.get_server_buffer_handle(new id 
    android_wlegl_server_buffer_handle@45, 1, 1, 1, 768)
[3818873.754]  -> wl_display@1.sync(new id wl_callback@47)
[3818875.442] wl_display@1.delete_id(45)
[3818875.533] android_wlegl_server_buffer_handle@45.buffer_ints(array[88])
[3818875.462] wl_display@1.delete_id(47)
[3818875.672] android_wlegl_server_buffer_handle@45.buffer_fd(fd 71)
[3818876.108] android_wlegl_server_buffer_handle@45.buffer_fd(fd 72)
[3818876.130] android_wlegl_server_buffer_handle@45.buffer(new id 
    wl_buffer@4278190086, 1, 64)
[3818876.210] wl_callback@47.done(1189)
[3818876.255]  -> android_wlegl@27.get_server_buffer_handle(new id 
    android_wlegl_server_buffer_handle@47, 1, 1, 1, 768)
[3818876.299]  -> wl_display@1.sync(new id wl_callback@48)
[...]
[3818901.384] wl_display@1.delete_id(46)
[New Thread 0x7fa607e830 (LWP 27221)]
[3818920.750] wl_display@1.delete_id(49)
[Switching to Thread 0x7efacf7830 (LWP 27215)]

Thread 37 &quot;Compositor&quot; hit Breakpoint 1, mozilla::embedlite::
    EmbedLiteCompositorBridgeParent::AllocPLayerTransactionParent (
    this=0x7f8cb1f6e0, aBackendHint
s=..., aId=...) at mobile/sailfishos/embedthread/
    EmbedLiteCompositorBridgeParent.cpp:79
79      {
(gdb) n
[3829046.000] wl_shell_surface@22.ping(1190)
[3829046.245]  -> wl_shell_surface@22.pong(1190)
[3829054.933] wl_buffer@4278190083.release()
[3829055.084] wl_callback@42.done(18583961)
[3829067.657]  -> wl_surface@24.frame(new id wl_callback@42)
80        PLayerTransactionParent* p =
(gdb) n
[...]
[3833740.282] wl_callback@46.done(1190)
[3833753.149]  -> wl_surface@20.frame(new id wl_callback@46)
[3833753.227]  -> wl_surface@20.attach(wl_buffer@4278190080, 0, 0)
[3833753.246]  -> wl_surface@20.damage(0, 0, 1080, 2520)
[3833753.260]  -> wl_surface@20.commit()
[3833753.272]  -> wl_display@1.sync(new id wl_callback@50)
[3833754.672] discarded [unknown]@50.[event 0](0 fd, 12 byte)
[3833754.801] wl_display@1.delete_id(50)

Thread 1 &quot;sailfish-browse&quot; received signal SIGSEGV, Segmentation 
    fault.
[Switching to Thread 0x7fb2e75010 (LWP 26774)]
0x0000007faf31f2d0 in wl_display_read_events () from /usr/lib64/
    libwayland-client.so.0
(gdb) bt
#0  0x0000007faf31f2d0 in wl_display_read_events () from /usr/lib64/
    libwayland-client.so.0
#1  0x0000007faf3fbb50 in QtWaylandClient::QWaylandDisplay::flushRequests() () 
    from /usr/lib64/libQt5WaylandClient.so.5
#2  0x0000007fb76e6cdc in QMetaObject::activate(QObject*, int, int, void**) () 
    from /usr/lib64/libQt5Core.so.5
#3  0x0000007fb7761698 in QSocketNotifier::activated(int, QSocketNotifier::
    QPrivateSignal) () from /usr/lib64/libQt5Core.so.5
#4  0x0000007fb76f3cac in QSocketNotifier::event(QEvent*) () from /usr/lib64/
    libQt5Core.so.5
#5  0x0000007fb76bab98 in QCoreApplication::notifyInternal2(QObject*, QEvent*) (
    ) from /usr/lib64/libQt5Core.so.5
#6  0x0000007fb770eb34 in ?? () from /usr/lib64/libQt5Core.so.5
#7  0x0000007fb6dd94dc in ?? () from /usr/lib64/libglib-2.0.so.0
#8  0x0000007fb6ddc648 in ?? () from /usr/lib64/libglib-2.0.so.0
#9  0x0000007fb6ddce34 in g_main_context_iteration () from /usr/lib64/
    libglib-2.0.so.0
#10 0x0000007fb770e67c in QEventDispatcherGlib::processEvents(QFlags<QEventLoop:
    :ProcessEventsFlag>) () from /usr/lib64/libQt5Core.so.5
#11 0x0000007fb76b8f84 in QEventLoop::exec(QFlags<QEventLoop::
    ProcessEventsFlag>) () from /usr/lib64/libQt5Core.so.5
#12 0x0000007fb76c09e4 in QCoreApplication::exec() () from /usr/lib64/
    libQt5Core.so.5
#13 0x000000555557d294 in main (argc=<optimized out>, argv=<optimized out>) at 
    main.cpp:203
(gdb) info thread
  Id   Target Id                                         Frame
* 1    Thread 0x7fb2e75010 (LWP 26774) &quot;sailfish-browse&quot; 
    0x0000007faf31f2d0 in wl_display_read_events () from /usr/lib64/
    libwayland-client.so.0
  2    Thread 0x7fac607830 (LWP 26933) &quot;QQmlThread&quot;      
    0x0000007fb7061e14 in __GI___poll (fds=0x7fa80013d0, nfds=1, 
    timeout=<optimized out>)
    at ../sysdeps/unix/sysv/linux/poll.c:41
  3    Thread 0x7fac406830 (LWP 26934) &quot;QDBusConnection&quot; 
    0x0000007fb7061e14 in __GI___poll (fds=0x7fa000c700, nfds=4, 
    timeout=<optimized out>)
    at ../sysdeps/unix/sysv/linux/poll.c:41
  4    Thread 0x7fac205830 (LWP 26935) &quot;pool-spawner&quot;    syscall () 
    at ../sysdeps/unix/sysv/linux/aarch64/syscall.S:38
  5    Thread 0x7fa77fe830 (LWP 26936) &quot;gmain&quot;           
    0x0000007fb7061e14 in __GI___poll (fds=0x5555847210, nfds=1, 
    timeout=<optimized out>)
    at ../sysdeps/unix/sysv/linux/poll.c:41
  6    Thread 0x7fa75fd830 (LWP 26937) &quot;dconf worker&quot;    
    0x0000007fb7061e14 in __GI___poll (fds=0x7f90000b80, nfds=1, 
    timeout=<optimized out>)
    at ../sysdeps/unix/sysv/linux/poll.c:41
  7    Thread 0x7fa73fc830 (LWP 26938) &quot;gdbus&quot;           
    0x0000007fb7061e14 in __GI___poll (fds=0x7f94000b80, nfds=3, 
    timeout=<optimized out>)
    at ../sysdeps/unix/sysv/linux/poll.c:41
  8    Thread 0x7fa67c2830 (LWP 27156) &quot;QThread&quot;         
    0x0000007fb7061e14 in __GI___poll (fds=0x7f880013d0, nfds=1, 
    timeout=<optimized out>)
    at ../sysdeps/unix/sysv/linux/poll.c:41
  9    Thread 0x7fa6487830 (LWP 27166) &quot;GeckoWorkerThre&quot; 
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7eac001c90)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  11   Thread 0x7fa6286830 (LWP 27168) &quot;QSGRenderThread&quot; 
    futex_wait_cancelable (private=128, expected=0, futex_word=0x5555c6e73c)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  12   Thread 0x7fa5ea6830 (LWP 27177) &quot;IPC I/O Parent&quot;  syscall () 
    at ../sysdeps/unix/sysv/linux/aarch64/syscall.S:39
  13   Thread 0x7fa44d0830 (LWP 27181) &quot;Netlink Monitor&quot; 
    0x0000007fb7061e14 in __GI___poll (fds=fds@entry=0x7fa44cfbe8, 
    nfds=nfds@entry=2,
    timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:41
  14   Thread 0x7fa448f830 (LWP 27182) &quot;Socket Thread&quot;   
    0x0000007fb7061e14 in __GI___poll (fds=fds@entry=0x7fa448e800, 
    nfds=nfds@entry=2,
    timeout=timeout@entry=7000) at ../sysdeps/unix/sysv/linux/poll.c:41
  15   Thread 0x7f002fd830 (LWP 27183) &quot;TaskCon~read #0&quot; 
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7f8c030914)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  16   Thread 0x7efbffe830 (LWP 27184) &quot;TaskCon~read #1&quot; 
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7f8c030914)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  17   Thread 0x7efbdff830 (LWP 27185) &quot;TaskCon~read #2&quot; 
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7f8c030914)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  18   Thread 0x7efbc00830 (LWP 27186) &quot;TaskCon~read #3&quot; 
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7f8c030914)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  19   Thread 0x7efba01830 (LWP 27187) &quot;TaskCon~read #4&quot; 
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7f8c030910)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  20   Thread 0x7efb802830 (LWP 27188) &quot;TaskCon~read #5&quot; 
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7f8c030914)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  21   Thread 0x7efb603830 (LWP 27189) &quot;TaskCon~read #6&quot; 
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7f8c030914)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  22   Thread 0x7efb404830 (LWP 27190) &quot;TaskCon~read #7&quot; 
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7f8c030914)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  24   Thread 0x7fa43b7830 (LWP 27192) &quot;Timer&quot;           
    futex_abstimed_wait_cancelable (private=<optimized out>, 
    abstime=0x7fa43b6b58,
    clockid=<optimized out>, expected=0, futex_word=0x7f8c031588) at ../sysdeps/
    unix/sysv/linux/futex-internal.h:208
  25   Thread 0x7fa4376830 (LWP 27193) &quot;IPDL Background&quot; 
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7f8c448b90)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  26   Thread 0x7fa4335830 (LWP 27194) &quot;Cache2 I/O&quot;      
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7f8c575320)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  27   Thread 0x7fa40fe830 (LWP 27195) &quot;Cookie&quot;          
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7f8c62b530)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  28   Thread 0x7fa40bd830 (LWP 27196) &quot;StreamTrans #1&quot;  
    futex_abstimed_wait_cancelable (private=<optimized out>, 
    abstime=0x7fa40bcb28,
    clockid=<optimized out>, expected=0, futex_word=0x7f8c2852fc) at ../sysdeps/
    unix/sysv/linux/futex-internal.h:208
  29   Thread 0x7efb0fe830 (LWP 27203) &quot;Backgro~Pool #1&quot; 
    futex_abstimed_wait_cancelable (private=<optimized out>, 
    abstime=0x7efb0fdb28,
    clockid=<optimized out>, expected=0, futex_word=0x7f8c0312a8) at ../sysdeps/
    unix/sysv/linux/futex-internal.h:208
  30   Thread 0x7efb0bd830 (LWP 27204) &quot;ProxyResolution&quot; 
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7f8c9ffb70)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  31   Thread 0x7efaffa830 (LWP 27205) &quot;Worker Launcher&quot; 
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7f8c983a20)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  32   Thread 0x7efafb9830 (LWP 27209) &quot;StreamTrans #2&quot;  
    futex_abstimed_wait_cancelable (private=<optimized out>, 
    abstime=0x7efafb8b28,
    clockid=<optimized out>, expected=0, futex_word=0x7f8c2852fc) at ../sysdeps/
    unix/sysv/linux/futex-internal.h:208
  33   Thread 0x7efaf78830 (LWP 27211) &quot;QuotaManager IO&quot; 
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7ecc003dd4)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  34   Thread 0x7fa43f8830 (LWP 27212) &quot;StreamTrans #3&quot;  
    futex_abstimed_wait_cancelable (private=<optimized out>, 
    abstime=0x7fa43f7b28,
    clockid=<optimized out>, expected=0, futex_word=0x7f8c2852fc) at ../sysdeps/
    unix/sysv/linux/futex-internal.h:208
  35   Thread 0x7efaf37830 (LWP 27213) &quot;DOM Worker&quot;      
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7f8cc1be6c)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  36   Thread 0x7efad38830 (LWP 27214) &quot;Softwar~cThread&quot; 
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7efad37cc8)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  37   Thread 0x7efacf7830 (LWP 27215) &quot;Compositor&quot;      
    __GI__dl_debug_state () at dl-debug.c:74
  38   Thread 0x7efaca6830 (LWP 27216) &quot;ImageIO&quot;         
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7f8ccc2500)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  39   Thread 0x7efac65830 (LWP 27217) &quot;TRR Background&quot;  
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7f8ca94ca0)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  40   Thread 0x7efac24830 (LWP 27218) &quot;DNS Resolver #1&quot; 
    futex_abstimed_wait_cancelable (private=<optimized out>, 
    abstime=0x7efac239e8,
    clockid=<optimized out>, expected=0, futex_word=0x7f8ccc8748) at ../sysdeps/
    unix/sysv/linux/futex-internal.h:208
  41   Thread 0x7efabe3830 (LWP 27219) &quot;IndexedDB #1&quot;    
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7ecc0059a0)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  42   Thread 0x7efaba2830 (LWP 27220) &quot;StreamTrans #4&quot;  
    futex_abstimed_wait_cancelable (private=<optimized out>, 
    abstime=0x7efaba1b28,
    clockid=<optimized out>, expected=0, futex_word=0x7f8c2852fc) at ../sysdeps/
    unix/sysv/linux/futex-internal.h:208
  43   Thread 0x7fa607e830 (LWP 27221) &quot;ImageBridgeChld&quot; 
    futex_wait_cancelable (private=128, expected=0, futex_word=0x7f8c7736d0)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:80
  44   Thread 0x7fa603d830 (LWP 27232) &quot;SSL Cert #1&quot;     
    futex_abstimed_wait_cancelable (private=<optimized out>, 
    abstime=0x7fa603cb28, 
    clockid=<optimized out>, expected=0, futex_word=0x7ef4001a88) at ../sysdeps/
    unix/sysv/linux/futex-internal.h:208
(gdb) thread 11
[Switching to thread 11 (Thread 0x7fa6286830 (LWP 27168))]
#0  futex_wait_cancelable (private=128, expected=0, futex_word=0x5555c6e73c) at 
    ../sysdeps/unix/sysv/linux/futex-internal.h:80
80        int err = lll_futex_timed_wait (futex_word, expected, NULL, private);
(gdb) bt
#0  futex_wait_cancelable (private=128, expected=0, futex_word=0x5555c6e73c) at 
    ../sysdeps/unix/sysv/linux/futex-internal.h:80
#1  __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x5555c6e6e0, 
    cond=0x5555c6e710) at pthread_cond_wait.c:508
#2  __pthread_cond_wait (cond=0x5555c6e710, mutex=0x5555c6e6e0) at 
    pthread_cond_wait.c:638
#3  0x0000007fb74fbefc in QWaitCondition::wait(QMutex*, unsigned long) () from /
    usr/lib64/libQt5Core.so.5
#4  0x0000007fb86a354c in ?? () from /usr/lib64/libQt5Quick.so.5
#5  0x0000007fb86a56e8 in ?? () from /usr/lib64/libQt5Quick.so.5
#6  0x0000007fb74fb6a8 in ?? () from /usr/lib64/libQt5Core.so.5
#7  0x0000007fb73dab98 in start_thread (arg=0x7fa6286130) at pthread_create.c:
    479
#8  0x0000007fb706b7cc in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/
    clone.S:78
(gdb) thread 37
[Switching to thread 37 (Thread 0x7efacf7830 (LWP 27215))]
#0  __GI__dl_debug_state () at dl-debug.c:74
74      }
(gdb) bt
#0  __GI__dl_debug_state () at dl-debug.c:74
#1  0x0000007fbfcd2038 in dl_open_worker (a=a@entry=0x7efacf5180) at dl-open.c:
    292
#2  0x0000007fb70aa2a4 in __GI__dl_catch_exception (exception=0x7efacf5168, 
    operate=0x7fbfcd1f10 <dl_open_worker>, args=0x7efacf5180)
    at dl-error-skeleton.c:196
#3  0x0000007fbfcd1a6c in _dl_open (file=0x7efacf5498 &quot;/usr/lib64/
    libhybris//eglplatform_wayland.so&quot;, mode=-2147483647,
    caller_dlopen=0x7fb6bbcb04 <ws_init+260>, nsid=-2, argc=1, 
    argv=0x7ffffff2a8, env=0x5555b50470) at dl-open.c:592
#4  0x0000007fb89c6274 in dlopen_doit (a=a@entry=0x7efacf5428) at dlopen.c:66
#5  0x0000007fb70aa2a4 in __GI__dl_catch_exception (
    exception=exception@entry=0x7efacf53b0, operate=0x7fb89c6210 <dlopen_doit>, 
    args=0x7efacf5428)
    at dl-error-skeleton.c:196
#6  0x0000007fb70aa350 in __GI__dl_catch_error (objname=0x7e98000b90, 
    errstring=0x7e98000b98, mallocedp=0x7e98000b88, operate=<optimized out>,
    args=<optimized out>) at dl-error-skeleton.c:215
#7  0x0000007fb89c6b20 in _dlerror_run (operate=operate@entry=0x7fb89c6210 
    <dlopen_doit>, args=args@entry=0x7efacf5428) at dlerror.c:170
#8  0x0000007fb89c6314 in __dlopen (file=<optimized out>, mode=<optimized out>) 
    at dlopen.c:87
#9  0x0000007fb6bbcb04 in ws_init () from /usr/lib64/libEGL.so.1
#10 0x0000007fb6bbb478 in ?? () from /usr/lib64/libEGL.so.1
#11 0x0000007fba06b6c4 in mozilla::gl::GLLibraryEGL::fGetDisplay (
    display_id=0x0, this=0x7e981119f0)
    at gfx/gl/GLLibraryEGL.h:193
#12 mozilla::gl::GetAndInitDisplay (egl=..., displayType=displayType@entry=0x0, 
    display=display@entry=0x0)
    at gfx/gl/GLLibraryEGL.cpp:151
#13 0x0000007fba06bc74 in mozilla::gl::GLLibraryEGL::CreateDisplay (
    this=this@entry=0x7e981119f0, forceAccel=forceAccel@entry=false,
    out_failureId=out_failureId@entry=0x7efacf5fe0, aDisplay=aDisplay@entry=0x0)
    at gfx/gl/GLLibraryEGL.cpp:813
#14 0x0000007fba06cd5c in mozilla::gl::GLLibraryEGL::DefaultDisplay (
    this=0x7e981119f0, out_failureId=out_failureId@entry=0x7efacf5fe0)
    at gfx/gl/GLLibraryEGL.cpp:745
#15 0x0000007fba06ce7c in mozilla::gl::GLContextProviderEGL::
    CreateWrappingExisting (aContext=0x7e98004230, aSurface=0x5555c6df20,
    aDisplay=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/StaticPtr.h:150
#16 0x0000007fbca77924 in mozilla::embedlite::nsWindow::GetGLContext (
    this=this@entry=0x7f8c61a450)
    at mobile/sailfishos/embedshared/nsWindow.cpp:405
#17 0x0000007fbca77adc in mozilla::embedlite::nsWindow::GetNativeData (
    this=0x7f8c61a450, aDataType=12)
    at mobile/sailfishos/embedshared/nsWindow.cpp:173
#18 0x0000007fba0e8928 in mozilla::layers::CompositorOGL::CreateContext (
    this=this@entry=0x7e98111220)
    at gfx/layers/opengl/CompositorOGL.cpp:232
#19 0x0000007fba0fdb50 in mozilla::layers::CompositorOGL::Initialize (
    this=0x7e98111220, out_failureReason=0x7efacf65a0)
    at gfx/layers/opengl/CompositorOGL.cpp:387
#20 0x0000007fba21d584 in mozilla::layers::CompositorBridgeParent::
    NewCompositor (this=this@entry=0x7f8cb1f6e0, aBackendHints=...)
    at gfx/layers/ipc/CompositorBridgeParent.cpp:1493
#21 0x0000007fba2343e8 in mozilla::layers::CompositorBridgeParent::
    InitializeLayerManager (this=this@entry=0x7f8cb1f6e0, aBackendHints=...)
    at gfx/layers/ipc/CompositorBridgeParent.cpp:1436
#22 0x0000007fba234574 in mozilla::layers::CompositorBridgeParent::
    AllocPLayerTransactionParent (this=this@entry=0x7f8cb1f6e0, 
    aBackendHints=..., aId=...)
    at gfx/layers/ipc/CompositorBridgeParent.cpp:1546
#23 0x0000007fbca5eda4 in mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    AllocPLayerTransactionParent (this=0x7f8cb1f6e0, aBackendHints=...,
    aId=...) at mobile/sailfishos/embedthread/
    EmbedLiteCompositorBridgeParent.cpp:80
#24 0x0000007fb9af84f4 in mozilla::layers::PCompositorBridgeParent::
    OnMessageReceived (this=0x7f8cb1f6e0, msg__=...) at 
    PCompositorBridgeParent.cpp:1285
#25 0x0000007fb9b49158 in mozilla::layers::PCompositorManagerParent::
    OnMessageReceived (this=<optimized out>, msg__=...)
    at PCompositorManagerParent.cpp:200
#26 0x0000007fb9a47e24 in mozilla::ipc::MessageChannel::DispatchAsyncMessage (
    this=this@entry=0x7f8c61ad38, aProxy=aProxy@entry=0x7e98107140, aMsg=...)
    at ipc/glue/MessageChannel.cpp:2076
#27 0x0000007fb9a56f48 in mozilla::ipc::MessageChannel::DispatchMessage (
    this=this@entry=0x7f8c61ad38, aMsg=...)
    at ipc/glue/MessageChannel.cpp:2001
#28 0x0000007fb9a58a3c in mozilla::ipc::MessageChannel::RunMessage (
    this=0x7f8c61ad38, aTask=...)
    at ipc/glue/MessageChannel.cpp:1860
#29 0x0000007fb9a58bc4 in mozilla::ipc::MessageChannel::MessageTask::Run (
    this=0x7f8ccd8b90)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/ipc/MessageChannel.h:
    588
#30 0x0000007fb962ec98 in nsThread::ProcessNextEvent (this=0x7f8ccac4d0, 
    aMayWait=<optimized out>, aResult=0x7efacf6d07)
    at xpcom/threads/nsThread.cpp:1146
#31 0x0000007fb961ce84 in NS_ProcessNextEvent (aThread=<optimized out>, 
    aThread@entry=0x7f8ccac4d0, aMayWait=aMayWait@entry=true)
    at xpcom/threads/nsThreadUtils.cpp:466
#32 0x0000007fb9a3c57c in mozilla::ipc::MessagePumpForNonMainThreads::Run (
    this=0x7e98001840, aDelegate=0x7efacf6e00)
    at ipc/glue/MessagePump.cpp:330
#33 0x0000007fb99f9884 in MessageLoop::RunInternal (
    this=this@entry=0x7efacf6e00)
    at ipc/chromium/src/base/message_loop.cc:359
#34 0x0000007fb99f9ac4 in MessageLoop::RunHandler (this=0x7efacf6e00)
    at ipc/chromium/src/base/message_loop.cc:352
#35 MessageLoop::Run (this=this@entry=0x7efacf6e00) at ipc/chromium/src/base/
    message_loop.cc:334
#36 0x0000007fb9630730 in nsThread::ThreadFunc (aArg=0x7f8cc32c10) at xpcom/
    threads/nsThread.cpp:392
#37 0x0000007fb8a10bf4 in _pt_root (arg=0x7f8ccac630) at ../../.././nspr/pr/src/
    pthreads/ptthread.c:201
#38 0x0000007fb73dab98 in start_thread (arg=0x7efacf7130) at pthread_create.c:
    479
#39 0x0000007fb706b7cc in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/
    clone.S:78
(gdb)
Unfortunately after stepping through the code very carefully, increasing the debug output and picking out relevant backtraces from the many threads running when the crash occurs, doesn't seem to have brought me closer to solving the problem. Maybe it's helped get me more familiar with these pieces of code, but not much else. It looks like I'm going to have to compare against ESR 78 execution again, which is a bit awkward because I've just installed ESR 91 over the phone that was running ESR 78. So I'll likely have to reinstall ESR 78 on at least one of my phones.

With all this in mind, it seems I'm going to have to return to this in the morning. I've spent a lot of the day investigating, without spending any time actually coding and without any hint of a fix yet. So there's going to have to be a lot more work on this before the browser is successfully running on Sailfish OS 4.6.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
18 Aug 2024 : Day 323 #
I spent most of my day yesterday installing Sailfish OS 4.6 on one of my dev phones and adding 4.6.0.11EA-aarch64 to the list of targets supported by the Sailfish SDK installed on my laptop. I eventually got to the point where the dependencies were available and the build was failing on some Rust code. Although it's pretty clear from the context that the reason is down to the Rust version in the SDK being newer than that expected for the build, I wasn't able to figure out how to fix it yesterday. Here's the error again that's causing the build to fail:
25:56.21 error[E0597]: `desc_set` does not live long enough
25:56.21     --> gfx/wgpu/wgpu-core/src/device/mod.rs:1795:26
25:56.21      |
25:56.21 1782 |         let mut desc_set = desc_sets.pop().unwrap();
25:56.21      |             ------------ binding `desc_set` declared here
25:56.21 ...
25:56.21 1795 |                     set: desc_set.raw_mut(),
25:56.21      |                          ^^^^^^^^ borrowed value does not live 
                                                  long enough
25:56.21 ...
25:56.21 1816 |     }
25:56.21      |     -
25:56.21      |     |
25:56.21      |     `desc_set` dropped here while still borrowed
25:56.22      |     borrow might be used here, when `write_map` is dropped and 
                    runs the `Drop` code for type `BTreeMap`
25:56.22      |
25:56.22      = note: values in a scope are dropped in the opposite order they 
                are defined
25:57.25 For more information about this error, try `rustc --explain E0597`.
25:57.26 warning: `wgpu-core` (lib) generated 1 warning
25:57.26 error: could not compile `wgpu-core` (lib) due to previous error; 1 
    warning emitted
This is happening in the wgpu project code, which is external to but used by the gecko engine. The Rust compiler is famous for generating high quality error messages alongside useful tips for how to fix the problem causing it. Here the reason for the error is clear, but the way to fix it is less so.

The problems seems to be that desc_set is being taken from write_map and borrowed by the DescriptorSetWrite structure, the lifetime of which extends beyond the lifetime of write_map. As a mutable variable this means there's a risk that the variable may be read from or written to after it's been dropped when write_map> is dropped at the end of the method.

There are typically a few ways to tackle this. Since variables are dropped in the opposite order to their creation, often the error can be removed simply by reordering the variable declarations. On other occasions cloning the variable may be a way to avoid the problem. Finally it can sometimes be solved by adding in suitable lifetime annotations.

In this case, it appears it can be fixed by moving where the desc_set variable is created. If it's created before the wraite_map variable, then it won't be dropped when write_map is dropped.

There's actually a fix for this in the wgpu repository which is both simple and clear. There's also a little background explaining why this is necessary in the associated PR. I've therefore applied the changes from this patch and set the build running again.

The build quickly gets to the point it was failing before and continues past it. That's a good sign!

But, not much later, it hits a second, different, problem:
 2:28.66    Compiling mp4parse v0.11.5 (https://github.com/mozilla/
    mp4parse-rust?rev=1bb484e96ae724309e3346968e8ffd4c25e61616#1bb484e9)
 2:28.78 error[E0277]: cannot multiply `u64` by `NonZeroU8`
 2:28.78     --> ${PROJECT}/gecko-dev/third_party/rust/mp4parse/src/lib.rs:2318:
    62
 2:28.78      |
 2:28.78 2318 |                 static_assertions::const_assert!(<$lhs>::MAX * 
    <$rhs>::MAX <= <$output>::MAX);
 2:28.78      |                                                              ^ 
    no implementation for `u64 * NonZeroU8`
 2:28.78 ...
 2:28.78 2328 | impl_mul!((U8, std::num::NonZeroU8) => (U16, u16));
 2:28.78      | -------------------------------------------------- in this 
    macro invocation
 2:28.78      |
 2:28.79      = help: the trait `Mul<NonZeroU8>` is not implemented for `u64`
 2:28.79      = help: the following other types implement trait `Mul<Rhs>`:
 2:28.79                <u64 as Mul>
 2:28.79                <u64 as Mul<&u64>>
 2:28.79                <&'a u64 as Mul<u64>>
 2:28.79                <&u64 as Mul<&u64>>
 2:28.79      = note: this error originates in the macro `impl_mul` (in Nightly 
    builds, run with -Z macro-backtrace for more info)
 2:28.79 error[E0277]: cannot multiply `u64` by `NonZeroU8`
 2:28.79     --> ${PROJECT}/gecko-dev/third_party/rust/mp4parse/src/lib.rs:2318:
    62
 2:28.79      |
 2:28.79 2318 |                 static_assertions::const_assert!(<$lhs>::MAX * 
    <$rhs>::MAX <= <$output>::MAX);
 2:28.79      |                                                              ^ 
    no implementation for `u64 * NonZeroU8`
 2:28.79 ...
 2:28.79 2329 | impl_mul!((U32, std::num::NonZeroU8) => (U32MulU8, u64));
 2:28.79      | -------------------------------------------------------- in 
    this macro invocation
 2:28.79      |
 2:28.79      = help: the trait `Mul<NonZeroU8>` is not implemented for `u64`
 2:28.79      = help: the following other types implement trait `Mul<Rhs>`:
 2:28.79                <u64 as Mul>
 2:28.79                <u64 as Mul<&u64>>
 2:28.79                <&'a u64 as Mul<u64>>
 2:28.79                <&u64 as Mul<&u64>>
 2:28.79      = note: this error originates in the macro `impl_mul` (in Nightly 
    builds, run with -Z macro-backtrace for more info)
 2:28.93 For more information about this error, try `rustc --explain E0277`.
 2:28.93 error: could not compile `mp4parse` (lib) due to 2 previous errors
Here we have something else. It seems that the trait that allows u64 and Non?ZeroUB values to be multiplied together is missing. Again, we're not the first people to hit this problem and a helpful discussion on the mp4parse-rust issue tracker shows that a solution has already been implemented for this.

It's a straightforward change and transferring it over to the gecko code turns out to be straightforward as well. After doing this, however, the build doesn't get as far as it did before, failing with the following error:
 1:58.55 toolkit/library/rust/force-cargo-library-build
 2:01.62 error: the listed checksum of `${PROJECT}/gecko-dev/third_party/rust/
    mp4parse/src/lib.rs` has changed:
 2:01.62 expected: 
    ea3d90a541bd01cbb9f8e25599e77600f2ab011535b23334a433e5bd0d8ac7df
 2:01.62 actual:   
    9176ca468fe94f8988893e8249a7dd781dc641461109b6f08ae33f75b4989e90
 2:01.62 directory sources are not intended to be edited, if modifications are 
    required then it is recommended that `[patch]` is used with a forked copy 
    of the source
 2:01.63 make[4]: *** [${PROJECT}/gecko-dev/config/makefiles/rust.mk:405: 
    force-cargo-library-build] Error 101
This is an issue we've seen multiple times previously. The Rust sources are pinned using checksums in various .cargo-checksum.json files. This means the source can't be changed without also updating the checksums to match. This is a little tiresome, but also important for supporting consistent and reproducible builds. So since I've edited one of the mp4parse source files, I have to update the checksum to match.

The error message helpfully provides both the value stored currently in the checksum file and the value that's being generated from the sources. The two have to match in order for the build to continue.

So I've edited the third_party/rust/mp4parse/.cargo-checksum.json file to switch out the former checksum value with the latter value. I've set the build running again and now we'll have to wait and see what happens.

And what happens is looking good: the build passes all of the previous errors and now, after running for five hours the build is continuing without error.

There are plenty of places for it still to fail, but this is promising nevertheless. I'll have to wait for the build to complete before I can do any more significant work on this, so I'll likely have to pick this up again in the morning.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
17 Aug 2024 : Day 322 #
It's a bit of a strange one today. I've not managed to spend any time coding and yet, the tasks I've done today are just as important as any coding I might have done. If you've been following along you'll know my current task is getting ESR 91 working on Sailfish OS 4.6. Up until now I've been developing on Sailfish OS 4.5, which was the latest version available when I started this process.

Until now I've chosen not to switch to the newer version to avoid complicating matters. Getting the browser working has been challenging enough without also having to worry about whether the difficulties are down to the gecko code or the underlying build target.

Now that the gecko code is feeling pretty stable, it feels like the right time to update and get it working on 4.6. Not least because, according to reports I've received from the Jolla team, it doesn't currently work on 4.6.

In order to shift to 4.6 I have to perform two upgrades: I have to upgrade my SDK and I have to upgrade my development phone. Both are currently on version 4.5. Upgrading the SDK is less of an issue because it's entirely possible, in fact normal, to install multiple target versions within the same SDK. I already have the aarch64, armv7hl and i486 variants of 3.4.0.24 and 4.5.0.18 installed as targets in my SDK and adding another one isn't going to affect any of those already installed.

On the other hand the target installed on my phone for testing purposes is more complex. I'm typically using three phones. My main daily phone is already running Sailfish OS 4.6, but I prefer not to use it for gecko development because I need it in a usable state at all times. Besides this I have two development phones. One has a debug version of ESR 78 installed (it's my own build, but largely identical to the version available from the Jolla repositories). The other has the latest test build of ESR 91.

I'm apprehensive about upgrading my ESR 91 testing phone because I know it'll cause breakage. So I've decided to upgrade my ESR 78 testing phone instead. This approaches has merit, first because it means I can continue testing ESR 78 on it, which I know will work because it comes preinstalled; second because it'll allow me to test ESR 91 on it as well without causing breakage of my other two phones.

That all sounds great, except that I'm currently travelling with a very poor mobile roaming signal.

So I'm performing the upgrade while eating lunch in a café that also offers free customer WiFi.

It's turned out to be s surprisingly effective approach. I now have a dev phone running 4.6 ready to try out ESR 91 on it just as soon as I've got a completed build.

Upgrading the SDK might be less risky, but it's more effort. For this I'm going to be using my temperamental mobile data connection. The upgrader claims it's a 429.89 MiB download which is well within my allowance, but it's going to take a little time. The installer claims it'll be a 30 minute download, which is a bit of a wait, but manageable.
 
The Sailfish SDK upgrader dialogue running on Linux showing the download progress (308.23 of 429.89 MB - minutes remaining) for 4.6.0.11.

Having installed Sailfish OS SDK 4.6.0.11EA-aarch64 I now have to set things up. This is more complicated than it might sound because, thinking all the way back to the start of this process, it's not just gecko that needs to be built. There are a collection of prerequisites that I'll need to build as well. These will need to be built and installed installed into a snapshot of the SDK so that gecko can be built against them.

So, first the build engine needs configuring.
$ sfdk config
# ---- command scope ---------
# <clear>

# ---- session scope ---------
# <clear>

# ---- global scope ---------
output-prefix = ~/RPMS
device = kolbe
target = SailfishOS-devel-aarch64

$ sfdk config --global --push target SailfishOS-4.6.0.11EA-aarch64
$ sfdk config --session --push no-fix-version
$ sfdk config --session --push snapshot esr91
$ sfdk config
# ---- command scope ---------
# <clear>

# ---- session scope ---------
no-fix-version
snapshot = esr91

# ---- global scope ---------
target = SailfishOS-4.6.0.11EA-aarch64
output-prefix = ~/RPMS
device = kolbe
These configuration changes are important for a number of reasons. The target setting ensures the build is performed against the correct target. The no-fix-version setting ensures the package version is taken from the spec file rather than the latest git tag. Finally the snapshot setting will make a copy of the build target with the given name so we can keep the packages installed ready for any future builds.

So, let's build!
$ sfdk build -p -d --with git_workaround
[...]
'cbindgen >= 0.19.0' not found in package names. Trying capabilities.
No provider of 'cbindgen >= 0.19.0' found.
'pkgconfig(nspr) >= 4.32.0' not found in package names. Trying capabilities.
No provider of 'pkgconfig(nspr) >= 4.32.0' found.
Obviously this didn't go quite to plan, but it has highlighted that I'll need to build cbindgen and nspr before I an build gecko. So let's create those two packages first. Because we've set up the build engine correctly, they'll be stored in the ~/RPMS directory and then installed from there when the gecko-dev spec file is read and the build engine identifies that they're needed.

One of the great things about having done all this before and having kept a careful record of it all is that I can just refer back to my previous notes to figure out the correct way to do all this.
$ cd rust-cbindgen/
$ sfdk build -d -p
[...]
$ cd ../nspr/
$ sfdk build -d -p
[...]
$ cd ../nspr/
$ mkdir rpm
$ mv *.patch *.spec *.changes rpm
$ sed -i -e 's/&quot;@${SOURCE_DATE_EPOCH}&quot;/
    &quot;${SOURCE_DATE_EPOCH}&quot;/g' rpm/nspr.spec 
$ tar -xvf nspr-4.35.tar.gz --strip-components=1
$ SOURCE_DATE_EPOCH=$(git log -1 --pretty=%ct)
$ git checkout -b update-4.35-temp
$ git add -u .
$ git add nspr rpm
$ git commit -m &quot;temp&quot;
$ sfdk build -d -p
[...]
$ git checkout update-4.35
$ git branch -D update-4.35-temp
$ cd ../gecko-dev-esr91/gecko-dev/
$ sfdk build -p -d --with git_workaround
[...]
This leaves me with a bunch of packages ready to be used as part of the gecko build:
$ ls -1
cbindgen-0.19.0-0.aarch64.rpm
nspr-4.35.0+git1-1.aarch64.rpm
nspr-debuginfo-4.35.0+git1-1.aarch64.rpm
nspr-debugsource-4.35.0+git1-1.aarch64.rpm
nspr-devel-4.35.0+git1-1.aarch64.rpm
rust-cbindgen-debuginfo-0.19.0-0.aarch64.rpm
rust-cbindgen-debugsource-0.19.0-0.aarch64.rpm
Now that I have these I've kicked off the build again. Things get past the initial prepare stage of the rpm build process. That's progress, certainly better than previously, but soon after I had an issue with compilation of one of the Rust packages:
25:56.21 error[E0597]: `desc_set` does not live long enough
25:56.21     --> gfx/wgpu/wgpu-core/src/device/mod.rs:1795:26
25:56.21      |
25:56.21 1782 |         let mut desc_set = desc_sets.pop().unwrap();
25:56.21      |             ------------ binding `desc_set` declared here
25:56.21 ...
25:56.21 1795 |                     set: desc_set.raw_mut(),
25:56.21      |                          ^^^^^^^^ borrowed value does not live 
    long enough
25:56.21 ...
25:56.21 1816 |     }
25:56.21      |     -
25:56.21      |     |
25:56.21      |     `desc_set` dropped here while still borrowed
25:56.22      |     borrow might be used here, when `write_map` is dropped and 
    runs the `Drop` code for type `BTreeMap`
25:56.22      |
25:56.22      = note: values in a scope are dropped in the opposite order they 
    are defined
25:57.25 For more information about this error, try `rustc --explain E0597`.
25:57.26 warning: `wgpu-core` (lib) generated 1 warning
25:57.26 error: could not compile `wgpu-core` (lib) due to previous error; 1 
    warning emitted
This is interesting. The "lifetime" error being thrown up suggests that a variable is potentially being used after it's been freed. Rust can be really picky about this kind of thing. Of course, the interesting thing is that this is upstream code that I've not made any changes too, so why is it failing here? The likely answer is that we're using a version of Rust that's different from the one expected by the upstream build.

Given that the difference is due to the newer build target, we can infer that the problem is down to a Rust version that's too new. That is, either something has changed in the libraries, or the compiler has started approaching lifetimes in a slightly different way. Breakages like this aren't unheard of in the Rust world; the compiler isn't as stable or backwards-compatible as some other languages.

Unfortunately the fix isn't immediately clear to me and, having spent most of the day performing upgrades, it's too late for me now to start trying to figure out a fix. I'll have to return to this tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
16 Aug 2024 : Day 321 #
I've been working my way through the last few issues with the browser. The latest has been to get DuckDuckGo rendering correctly. Ever since the SecFetch header changes back in January the page has been working, it's just been too wide for the screen. It turns out to be expected behaviour: the server thinks we're a desktop browser and intentionally serves us a wide page. Changing the user agent string so we look like Android fixed the issue.

Yesterday I just hacked around with the ua-update.json file in my profile on my device to get this working. Today I need to make the change permanent by adding it to ua-update.json.in in the sailfish-browser repository.

The reason the issue with the OpenAI community pages came to light is that Raine mentioned it to me. Alongside that page, he also asked about two other pages, discussion.fedoraproject.org and help.nextcloud.com. All three use the latest Discourse forum software and all three of them share the a similar issue, which is that they all render too wide for the screen. That's if we send our correct user agent string to them.

If we pretend to be Android on the other hand we get served slightly different code, with the result that they then render really quite cleanly.

Discourse must be doing some kind of user-agent checking on the server and serving pages that different depending on the browser the server thinks is in use. It's an ugly thing to do, but not at all uncommon.

The problem with Discourse using this approach is especially important when it comes to Sailfish OS because the Sailfish OS Forum uses Discourse for its backend as well. Currently the version is being held back so that it continues to work correctly with ESR 78 on Sailfish OS devices. Once ESR 91 is released, it should also then be safe for Jolla to upgrade the Sailfish OS Forum Discourse version as well.

That's just a bit of background. The task now is to cement all of these fixes by adding them to the ua-update.json.in file in the sailfish-browser repository. Here's what I've addeed:
  &quot;duckduckgo.com&quot;: &quot;Mozilla/5.0 (Mobile; rv:91.0) Gecko/91.0 
    Firefox/91.0&quot;,
  &quot;openai.com&quot;: &quot;Mozilla/5.0 (Mobile; rv:91.0) Gecko/91.0 
    Firefox/91.0&quot;,
  &quot;fedoraproject.org&quot;: &quot;Mozilla/5.0 (Mobile; rv:91.0) Gecko/91.0 
    Firefox/91.0&quot;,
  &quot;nextcloud.com&quot;: &quot;Mozilla/5.0 (Mobile; rv:91.0) Gecko/91.0 
    Firefox/91.0&quot;
With that out of the way I'm left with the task of getting the browser to work on Sailfish OS 4.6. The reports I've received are that it fails due to a potential double-free on some Wayland-related resource. It's possible the changes made to fix video rendering by avoiding duplicate calls to eglTerminate() may have addressed this. But I have another concern as well, which is that the hang caused by switching to the default cover that we were investigating on Day 296 may share the same underlying cause as well.

I'm going to test this now. Unfortunately when I try activating the default cover I still get the same hang, even after fixing the video crash bug. So there's still something to address here and, since fixing this may be what's needed to get the browser working on Sailfish OS 4.6, I think I'll need to return to it.

I've the same difficulty as before, which is that it's a hang, not a crash. So getting a clean backtrace is proving to be a real challenge. There are also so many threads running, almost all of which appear to be stalled, making it hard to discern which is the one blocking progress:
1  &quot;sailfish-browse&quot; pthread_cond_wait () from /lib64/libpthread.so.0
2  &quot;QQmlThread&quot;      poll () from /lib64/libc.so.6
3  &quot;QDBusConnection&quot; poll () from /lib64/libc.so.6
4  &quot;gmain&quot;           poll () from /lib64/libc.so.6
5  &quot;dconf worker&quot;    poll () from /lib64/libc.so.6
6  &quot;gdbus&quot;           poll () from /lib64/libc.so.6
7  &quot;QThread&quot;         poll () from /lib64/libc.so.6
8  &quot;QQuickPixmapRea&quot; poll () from /lib64/libc.so.6
9  &quot;Qt bearer threa&quot; poll () from /lib64/libc.so.6
10 &quot;GeckoWorkerThre&quot; poll () from /lib64/libc.so.6
12 &quot;QSGRenderThread&quot; pthread_cond_wait () from /lib64/libpthread.so.0
14 &quot;IPC I/O Parent&quot;  syscall () from /lib64/libc.so.6
15 &quot;Netlink Monitor&quot; poll () from /lib64/libc.so.6
16 &quot;Socket Thread&quot;   poll () from /lib64/libc.so.6
18 &quot;TaskCon~read #0&quot; pthread_cond_wait () from /lib64/libpthread.so.0
19 &quot;TaskCon~read #1&quot; pthread_cond_wait () from /lib64/libpthread.so.0
20 &quot;TaskCon~read #2&quot; pthread_cond_wait () from /lib64/libpthread.so.0
21 &quot;TaskCon~read #3&quot; pthread_cond_wait () from /lib64/libpthread.so.0
22 &quot;TaskCon~read #4&quot; pthread_cond_wait () from /lib64/libpthread.so.0
23 &quot;TaskCon~read #5&quot; pthread_cond_wait () from /lib64/libpthread.so.0
24 &quot;TaskCon~read #6&quot; pthread_cond_wait () from /lib64/libpthread.so.0
25 &quot;TaskCon~read #7&quot; pthread_cond_wait () from /lib64/libpthread.so.0
27 &quot;Timer&quot;           pthread_cond_timedwait () from /lib64/
    libpthread.so.0
28 &quot;IPDL Background&quot; pthread_cond_wait () from /lib64/libpthread.so.0
29 &quot;Cache2 I/O&quot;      pthread_cond_wait () from /lib64/libpthread.so.0
30 &quot;Cookie&quot;          pthread_cond_wait () from /lib64/libpthread.so.0
32 &quot;StreamTrans #1&quot;  pthread_cond_timedwait () from /lib64/
    libpthread.so.0
35 &quot;Worker Launcher&quot; pthread_cond_wait () from /lib64/libpthread.so.0
36 &quot;QuotaManager IO&quot; pthread_cond_wait () from /lib64/libpthread.so.0
38 &quot;Softwar~cThread&quot; pthread_cond_wait () from /lib64/libpthread.so.0
39 &quot;Compositor&quot;      pthread_cond_wait () from /lib64/libpthread.so.0
40 &quot;ImageIO&quot;         pthread_cond_wait () from /lib64/libpthread.so.0
41 &quot;ImageBridgeChld&quot; pthread_cond_wait () from /lib64/libpthread.so.0
44 &quot;Permission&quot;      pthread_cond_wait () from /lib64/libpthread.so.0
45 &quot;TRR Background&quot;  pthread_cond_wait () from /lib64/libpthread.so.0
46 &quot;URL Classifier&quot;  pthread_cond_wait () from /lib64/libpthread.so.0
47 &quot;DNS Resolver #1&quot; pthread_cond_timedwait () from /lib64/
    libpthread.so.0
48 &quot;DNS Resolver #2&quot; pthread_cond_timedwait () from /lib64/
    libpthread.so.0
49 &quot;DNS Resolver #3&quot; pthread_cond_timedwait () from /lib64/
    libpthread.so.0
50 &quot;ProxyResolution&quot; pthread_cond_wait () from /lib64/libpthread.so.0
51 &quot;DNS Resolver #4&quot; pthread_cond_timedwait () from /lib64/
    libpthread.so.0
52 &quot;DNS Resolver #5&quot; pthread_cond_timedwait () from /lib64/
    libpthread.so.0
56 &quot;HTML5 Parser&quot;    pthread_cond_wait () from /lib64/libpthread.so.0
57 &quot;localStorage DB&quot; pthread_cond_wait () from /lib64/libpthread.so.0
58 &quot;StyleThread#0&quot;   pthread_cond_wait () from /lib64/libpthread.so.0
59 &quot;StyleThread#1&quot;   pthread_cond_wait () from /lib64/libpthread.so.0
60 &quot;StyleThread#2&quot;   pthread_cond_wait () from /lib64/libpthread.so.0
61 &quot;StyleThread#3&quot;   pthread_cond_wait () from /lib64/libpthread.so.0
62 &quot;StyleThread#4&quot;   pthread_cond_wait () from /lib64/libpthread.so.0
63 &quot;StyleThread#5&quot;   pthread_cond_wait () from /lib64/libpthread.so.0
64 &quot;Compositor&quot;      ?? ()
65 &quot;Compositor&quot;      ?? ()
68 &quot;Backgro~Pool #3&quot; pthread_cond_timedwait () from /lib64/
    libpthread.so.0
71 &quot;mozStorage #1&quot;   pthread_cond_wait () from /lib64/libpthread.so.0
72 &quot;mozStorage #2&quot;   pthread_cond_wait () from /lib64/libpthread.so.0
73 &quot;BgIOThr~Pool #1&quot; pthread_cond_timedwait () from /lib64/
    libpthread.so.0
74 &quot;QSGRenderThread&quot; poll () from /lib64/libc.so.6
So I'm having to make soem guesses here, educated or otherwise. My guess is that the issue is something to do with rendering, so I've placed breakpoints on a few crucial methods that participate in the rendering process:
(gdb) break Paint
Breakpoint 1 at 0x7ff23bc57c: Paint. (87 locations)
(gdb) break OnFirstPaint
Breakpoint 2 at 0x7ff4c3fcf4: OnFirstPaint. (4 locations)
(gdb) break SchedulePaint
Breakpoint 3 at 0x7ff3ba92e8: SchedulePaint. (3 locations)
I've disabled all of these breakpoints and run the browser up until I'm about to switch to Private browsing mode. It's at this point that the blank cover is attached and the browser hangs. I've now re-enabled the breakpoints and pressed the button to switch browser mode.

The hang happens, but none of the breakpoints hit. Not immediately at least. After maybe a few seconds the SchedulePaint() method is called:
Thread 10 &quot;GeckoWorkerThre&quot; hit Breakpoint 3, nsIFrame::SchedulePaint 
    (this=this@entry=0x7e5c02db20, aType=aType@entry=nsIFrame::PAINT_DEFAULT, 
    aFrameChanged=aFrameChanged@entry=true)
    at ${PROJECT}/gecko-dev/layout/generic/nsIFrame.cpp:7415
7415    ${PROJECT}/gecko-dev/layout/generic/nsIFrame.cpp: No such file or 
    directory.
(gdb) bt
#0  nsIFrame::SchedulePaint (this=this@entry=0x7e5c02db20, 
    aType=aType@entry=nsIFrame::PAINT_DEFAULT, 
    aFrameChanged=aFrameChanged@entry=true)
    at ${PROJECT}/gecko-dev/layout/generic/nsIFrame.cpp:7415
#1  0x0000007ff41029cc in nsIFrame::SetParent (this=this@entry=0x7e5c02db20, 
    aParent=aParent@entry=0x7e5c02da80)
    at ${PROJECT}/gecko-dev/layout/generic/nsIFrame.cpp:11047
#2  0x0000007ff4102a34 in nsFrameList::ApplySetParent (
    this=this@entry=0x7fde775e08, aParent=aParent@entry=0x7e5c02da80)
    at ${PROJECT}/gecko-dev/layout/generic/nsFrameList.cpp:280
#3  0x0000007ff4102a70 in nsFrameList::InsertFrames (
    this=this@entry=0x7e5c02db08, aParent=aParent@entry=0x7e5c02da80, 
    aPrevSibling=aPrevSibling@entry=0x0, aFrameList=...)
    at ${PROJECT}/gecko-dev/layout/generic/nsFrameList.cpp:130
#4  0x0000007ff4092abc in nsContainerFrame::InsertFrames (this=0x7e5c02da80, 
    aListID=mozilla::layout::kPrincipalList, aPrevFrame=0x0, 
    aPrevFrameLine=<optimized out>, aFrameList=...)
    at ${PROJECT}/gecko-dev/layout/generic/nsContainerFrame.cpp:144
#5  0x0000007ff402e164 in nsFrameManager::InsertFrames (
    this=this@entry=0x7fb9189c40, aParentFrame=aParentFrame@entry=0x7e5c02da80, 
    aListID=aListID@entry=mozilla::layout::kPrincipalList, 
    aPrevFrame=aPrevFrame@entry=0x0, aFrameList=...)
    at ${PROJECT}/gecko-dev/layout/base/nsFrameManager.cpp:89
#6  0x0000007ff4045a0c in nsCSSFrameConstructor::AppendFramesToParent (
    this=this@entry=0x7fb9189c40, aState=..., aParentFrame=0x7e5c02da80, 
    aFrameList=..., aPrevSibling=aPrevSibling@entry=0x0, 
    aIsRecursiveCall=aIsRecursiveCall@entry=false)
    at ${PROJECT}/gecko-dev/layout/base/nsCSSFrameConstructor.cpp:5933
#7  0x0000007ff404f1f0 in nsCSSFrameConstructor::ContentAppended (
    this=this@entry=0x7fb9189c40, 
    aFirstNewContent=aFirstNewContent@entry=0x7fb8d0ef00, 
    aInsertionKind=aInsertionKind@entry=nsCSSFrameConstructor::InsertionKind::
    Sync)
    at ${PROJECT}/gecko-dev/layout/base/nsCSSFrameConstructor.cpp:6819
#8  0x0000007ff401c718 in mozilla::RestyleManager::ProcessRestyledFrames (
    this=this@entry=0x7fb9189d30, aChangeList=...)
    at ${PROJECT}/gecko-dev/layout/base/RestyleManager.cpp:1402
#9  0x0000007ff401d364 in mozilla::RestyleManager::DoProcessPendingRestyles (
    this=0x7fb9189d30, aFlags=aFlags@entry=mozilla::ServoTraversalFlags::Empty)
    at ${PROJECT}/gecko-dev/layout/base/RestyleManager.cpp:3051
#10 0x0000007ff401d8cc in mozilla::RestyleManager::ProcessPendingRestyles (
    this=<optimized out>)
    at ${PROJECT}/gecko-dev/layout/base/RestyleManager.cpp:3130
#11 0x0000007ff401df7c in mozilla::PresShell::DoFlushPendingNotifications (
    this=0x7fb918d9c0, aFlush=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:290
#12 0x0000007ff3fd8758 in mozilla::PresShell::FlushPendingNotifications (
    this=this@entry=0x7fb918d9c0, aType=..., aType@entry=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/PresShell.h:1414
#13 0x0000007ff3fe4850 in nsRefreshDriver::Tick (this=0x7fb916b650, aId=..., 
    aNowTime=..., 
    aIsExtraTick=aIsExtraTick@entry=nsRefreshDriver::IsExtraTick::No)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/FlushType.h:61
#14 0x0000007ff3fe5abc in nsRefreshDriver::<lambda()>::operator() (
    __closure=0x7eec002780)
    at ${PROJECT}/gecko-dev/layout/base/nsRefreshDriver.cpp:234
#15 mozilla::detail::RunnableFunction<nsRefreshDriver::EnsureTimerStarted(
    nsRefreshDriver::EnsureTimerStartedFlags)::<lambda()> >::Run(void) (
    this=0x7eec002770) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/
    nsThreadUtils.h:532
#16 0x0000007ff19948d4 in mozilla::RunnableTask::Run (this=0x7f24092130)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
[...]
#38 0x0000007fef53689c in ?? () from /lib64/libc.so.6
(gdb) 
This is followed by a further seven or so calls to SchedulePaint() in quick succession. After disabling the SchedulePaint() breakpoint this is followed by multiple hits to the Paint() method. Finally I disable the Paint() breakpoint and now there's no more obvious activity occurring.

I'm not sure what this all tells me. I think it may be that the render code is the wrong place to be looking, but I'm not fully certain. What I am certain about is that fixing this issue is going to take a whole lot of trial, error and luck.

Unfortuantely that's all I can manage for today, but I'll return to exploring this issue again tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
15 Aug 2024 : Day 320 #
I've been enjoying some good progress over the last few days, first fixing a crash bug during video playback and then fixing the discolouration that's been plaguing video playback. That's boosted my confidence and given me the momentum I need to try to tackle the WebRTC video discolouration issue.

The problem with video decoding turned out to the data frames that were being initialised without a valid colour space value. Having set this value the colours returned to their expected state. So I'm thinking maybe there's something similar going on with the WebRTC video decoding? After all, both are making use of the gecko-camera code.

In the gecko code, however, they take different routes. The video decoding is handled by the GeckoCameraVideoDecoder.cpp source file in the dom/media/platforms/gecko-camera/ directory. The WebRTC decoding is handled by code in device_info_sfos.cc and video_capture_sfos.cc in the third_party/libwebrtc/webrtc/modules/video_capture/sfos/ directory.

On the face of it they don't share functionality and unfortunately there's no code similar to the video decoding pipeline in the WebRTC code. The closest I can find to the colour space fix in the WebRTC code is where the video type is set in the FillCapabilities() method in device_info_sfos.cc:
                VideoCaptureCapability vcaps;
                vcaps.width = cap.width;
                vcaps.height = cap.height;
                vcaps.maxFPS = cap.fps;
                vcaps.videoType = VideoType::kI420;
                _captureCapabilities.push_back(vcaps);
Could it be that the video type should be set to something different? The list of possible values is summarised in the Mozilla docs with the canonical list found in the common_types.h file looking like this:
enum class VideoType {
  kUnknown,
  kI420,
  kIYUV,
  kRGB24,
  kABGR,
  kARGB,
  kARGB4444,
  kRGB565,
  kARGB1555,
  kYUY2,
  kYV12,
  kUYVY,
  kMJPEG,
  kNV21,
  kNV12,
  kBGRA,
};
As we can see from the code snippet from inside FillCapabilities() earlier, currently the video type is being set to I420. Here's the description for this from the docs:
 
"I420": Also known as Planar YUV 4:2:0, this format is composed of three distinct planes, one plane of luma and two planes of chroma, denoted Y, U and V, and present in this order. The U an V planes are sub-sampled horizontally and vertically by a factor of 2 compared to the Y plane. Each sample in this format is 8 bits.

This looks to me to be essentially the same as the Y'CbCr format that the video decoder was using. This is interesting because it highlights that the three channels aren't all the same size: the two chroma channels are sub-sampled by a factor of two in both directions (so giving a quarter of the area). Given this, it's hard to imagine that the data isn't being correctly interpreted as I420 data at the other end. If it were interpreted as RGB data, say, all three channels would then need to be the same size and this would create the most almighty mess if it wasn't.

After looking more through the code I can't see anything wrong on the gecko side. As Frajo (krnlyng) explains on the GitHub ticket the issue is probably related to code in gecko-camera that's incorrectly initialising a DroidMediaBufferYCbCr structure. There are two instances of this structure used in the gecko-camera code and I can see one being correctly passed to droid_media_buffer_lock_ycbcr() before being used. The other, however, isn't.

It could be that passing the other instance through the same method would give positive results. But right now, just as Frajo hypothesised, this very much looks like it's a problem in gecko-camera rather than in the gecko code itself. And that means I'm still going to put it to one side for the time being. Maybe I'll come back to this later.

That means I need to move on to the next task in my list, which is figuring out why some sites aren't correctly fitting to the screen width. There are two examples of sites where I've seen this happen. The most obvious is DuckDuckGo. This happens to be the site I use most frequently for searching the Web, so from my perspective it'd be really nice if it were working correctly with the browser.

Unfortunately that's not currently the case. It renders, but extends over roughly three times the width of the screen, as you can see in these screenshots.
 
DuckDuckGo rendered on a phone showing results from the search term 'trieste'; the three images (left, centre and right) show it extending across three widths of the screen.

We see something similar happening with websites that use Discourse for their forum backend. The header bar appears to extend across two screen widths, even though the topic items correctly extend across just a single screen. It's usable but looks like a bit of a mess.
 
Screenshots from a phone of community.openai.com; the two images left and right, show the main forum page extends across two widths of the screen.

We go to some lengths in the browser code to set the correct dimensions, dots-per-inch values and scaling so that pages render correctly, especially if they're designed to be responsive. But there are a myriad ways that a browser can query the dimensions of a page, so my working hypothesis is that one or more of the values the browser is returning to the JavaScript on the page are incorrect. That might explain why, for example, the DuckDuckGo input field is the correct width across a single page, but the search results are not. Maybe the first is using one API to determine the width and the other using something different?

To try to test this I've put together a test page that displays all of the ways I can think of to query relevant dimensions. It turns out there are quite a few! Unfortunately, comparing the results from the ESR 78 build and the ESR 91 build I'm getting identical results. I've even worked through all of the tests on QuirksMode with no apparent difference between the two versions of the browser. Either I'm missing a critical JavaScript API from the ones I'm testing, or there's something else going on here.
 
Screenshots showing the test page on ESR 78 and ESR 91. Each lists various JavaScript queries (e.g. window.innderWidth) with them both showing identical numerical results in both screenshots.

But when displaying the test page using Firefox on my laptop in Responsive Design Mode (which allows you to change the dimensions of the page), I accidentally discover something significant: when rendering DockDuckGo on desktop Firefox with the screen width reduced, it actually spans three screen widths as well.
 
Screenshots from desktop Firefox but with the screen width reduced, showing DuckDuckGo spanning three screen widths.

Playing around with the browser's user agent string shows that this is all intentional and happening server-side. By changing the user-agent to an Android version string suddenly things start working correctly, with the search results spanning just a single page. It seems a little odd, but presumably when DuckDuckGo thinks a desktop browser is in use, it intentionally uses a wider page size. So I've added the following two lines to the ua-update.json:
  &quot;duckduckgo.com&quot;: &quot;Mozilla/5.0 (Mobile; rv:91.0) Gecko/91.0 
    Firefox/91.0&quot;,
  &quot;openai.com&quot;: &quot;Mozilla/5.0 (Mobile; rv:91.0) Gecko/91.0 
    Firefox/91.0&quot;
With this applied, now both DuckDuckGo and the OpenAI forum that uses Discourse are working correctly, filling just a single width of the screen:
 
Two screenshots; on the left DuckDuckGo and on the right community.openai.com. Both exactly span a single width of the screen.

So that's resolved one of the last few things on my list to fix. The next is getting the browser to run on Sailfish OS 4.6. My concern is that getting this working is going to be challenging, but that's something for tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
14 Aug 2024 : Day 319 #
I'm moving on today from crashing video to discolouration of video. Yesterday I was, I think, able to fix the crash by ensuring only a single instance of GLLibraryEGL is created during execution of the browser. Today the problem is different: the question today is why is the browser confusing green, blue and red channels with luma, blue difference chroma and red difference chroma channels?

On Mastodon Vlad G (vlagged) came up with the nice suggestion of amending the environment variables used for configuring gecko-camera. Vlad has had previous experience with browser video playback issues from earlier versions of Sailfish OS and has also always been a big supporter of this gecko upgrade process, so I'm keen to give the suggestion a try. Here's Vlad's comment:
 
I don't have a recent official device, but what is GECKO_CAMERA_DROID_FORCE_MEDIA_BUFFER and GECKO_CAMERA_DROID_NO_MEDIA_BUFFER on your system? The first can force enabling a newer codepath in droidmedia if you set it to =1, the latter can force disabling that. Of course, there is some default value if none of the "force"/"no" envs are set, maybe autodetected. It is worth trying with both settings IMO. Add them to e.g. /var/lib/environment/nemo/90-quirks.conf to make them go everywhere.

We can see the code that reads in these environment variables in the gecko-camera source on GitHub, although we have to dig a little deeper to determine what they actually do.

As Vlad points out, each of the two environment variables can take one of three different values: unset, 0 or 1, giving a total of nine potential variations. I tested all nine of them, but unfortunately all give the same results: the video discolouration remains.

Nevertheless, the suggestion led me to the gecko-camera code. There have also been some useful comments from Frajo (krnlyng) and Björn (Thaodan) on the related GitHub issue. These point out that the Y'CbCr buffer may not be being initialised correctly:
 
There is a DroidMediaBufferYCbCr ycbcrTemplate which gets set up and passed to DroidCameraYCbCrFrame::map() and eventually used to describe the buffer layout but it seems some parts of it are wrong. [...]
The ycbcrTemplate is set up here which seems to not match the underlying buffer.


These comments relate to WebRTC which follows a different execution flow. Nevertheless they made me think about the way the buffer is being set up in the gecko code. The crucial work seems to happen inside the GeckoCameraVideoDecoder::onDecodedYCbCrFrame() method in the GeckoCameraVideoDecoder.cpp source file. There we see the following code:
  VideoData::YCbCrBuffer buffer;
  // Y plane.
  buffer.mPlanes[0].mData = const_cast<uint8_t*>(frame->y);
  buffer.mPlanes[0].mStride = frame->yStride;
  buffer.mPlanes[0].mWidth = frame->width;
  buffer.mPlanes[0].mHeight = frame->height;
  buffer.mPlanes[0].mSkip = 0;
  // Cb plane.
  buffer.mPlanes[1].mData = const_cast<uint8_t*>(frame->cb);
  buffer.mPlanes[1].mStride = frame->cStride;
  buffer.mPlanes[1].mWidth = (frame->width + 1) / 2;
  buffer.mPlanes[1].mHeight = (frame->height + 1) / 2;
  buffer.mPlanes[1].mSkip = frame->chromaStep - 1;
  // Cr plane.
  buffer.mPlanes[2].mData = const_cast<uint8_t*>(frame->cr);
  buffer.mPlanes[2].mStride = frame->cStride;
  buffer.mPlanes[2].mWidth = (frame->width + 1) / 2;
  buffer.mPlanes[2].mHeight = (frame->height + 1) / 2;
  buffer.mPlanes[2].mSkip = frame->chromaStep - 1;
This takes all of the input channels, structures them for the underlying video renderer and passes them on to be rendered. If we look at the definition of the VideoData::YCbCrBuffer structure that's being filled out, we find there are a few fields that are being left unset. Here's the structure as it's defined in MediaData.h:
  // YCbCr data obtained from decoding the video. The index's are:
  //   0 = Y
  //   1 = Cb
  //   2 = Cr
  struct YCbCrBuffer {
    struct Plane {
      uint8_t* mData;
      uint32_t mWidth;
      uint32_t mHeight;
      uint32_t mStride;
      uint32_t mSkip;
    };

    Plane mPlanes[3];
    YUVColorSpace mYUVColorSpace = YUVColorSpace::Identity;
    ColorDepth mColorDepth = ColorDepth::COLOR_8;
    ColorRange mColorRange = ColorRange::LIMITED;
  };
In the decoder code we can see the mPlanes substructure being filled out nicely, but the other three fields, mYUVColorSpace, mColorDepth and mColorRange are simply being left as their default values.

Perhaps we'd have better results if these were set to something. Looking at other decoders for other platforms and services, it seems that at least the mYUVColorSpace and mColorRange are routinely being filled out by the decoder.

But what values to give them? To try to understand better I thought it could be helpful to look at the description of the commits that added them. It turns out each of the three remaining variables were each added at different times:
$ git blame dom/media/MediaData.h -L 431,448
Blaming lines:   2% (18/696), done.
3e4230f663e4e content/media/MediaData.h (Ben Kelly         2014-02-03 431)
   // YCbCr data obtained from decoding the video. The index's are:
3e4230f663e4e content/media/MediaData.h (Ben Kelly         2014-02-03 432)
   //   0 = Y
3e4230f663e4e content/media/MediaData.h (Ben Kelly         2014-02-03 433)
   //   1 = Cb
3e4230f663e4e content/media/MediaData.h (Ben Kelly         2014-02-03 434)
   //   2 = Cr
804b8b8883ba2 dom/media/MediaData.h     (Sylvestre Ledru   2018-11-19 435)
   struct YCbCrBuffer {
804b8b8883ba2 dom/media/MediaData.h     (Sylvestre Ledru   2018-11-19 436)
     struct Plane {
3e4230f663e4e content/media/MediaData.h (Ben Kelly         2014-02-03 437)
       uint8_t* mData;
3e4230f663e4e content/media/MediaData.h (Ben Kelly         2014-02-03 438)
       uint32_t mWidth;
3e4230f663e4e content/media/MediaData.h (Ben Kelly         2014-02-03 439)
       uint32_t mHeight;
3e4230f663e4e content/media/MediaData.h (Ben Kelly         2014-02-03 440)
       uint32_t mStride;
3e4230f663e4e content/media/MediaData.h (Ben Kelly         2014-02-03 441)
       uint32_t mSkip;
3e4230f663e4e content/media/MediaData.h (Ben Kelly         2014-02-03 442)
     };
3e4230f663e4e content/media/MediaData.h (Ben Kelly         2014-02-03 443)
 
3e4230f663e4e content/media/MediaData.h (Ben Kelly         2014-02-03 444)
     Plane mPlanes[3];
d517313a4029f dom/media/MediaData.h     (Jeff Gilbert      2021-03-19 445)
     YUVColorSpace mYUVColorSpace = YUVColorSpace::Identity;
b10364a15f063 dom/media/MediaData.h     (Jean-Yves Avenard 2018-09-25 446)
     ColorDepth mColorDepth = ColorDepth::COLOR_8;
bb8acbc1f9c3e dom/media/MediaData.h     (Jean-Yves Avenard 2019-07-26 447)
     ColorRange mColorRange = ColorRange::LIMITED;
3e4230f663e4e content/media/MediaData.h (Ben Kelly         2014-02-03 448)
   };
Here are the three relevant commits. First the commit that added the mYUVColorSpace variable:
$ git log -1 d517313a4029f
commit d517313a4029f626220a7cdc2a6a80461ee37490
Author: Jeff Gilbert <jgilbert@mozilla.com>
Date:   Fri Mar 19 00:58:23 2021 +0000

    Bug 1697670 - Remove gfx::YUVColorSpace::UNKNOWN. r=mstange
    
    Replace with Maybe<YUVColorSpace> where still needed.
    
    Differential Revision: https://phabricator.services.mozilla.com/D107938
It looks like setting this value to the default might make a good choice. To do this I've followed the approach used in some other decoders which seem to use the width and height as parameters. Next we have the code change that added the mColorDepth variable. This seems to be more straightforward, and in our case I'm fairly certain we're using 8-bit colour.
$ git log -1 b10364a15f063
commit b10364a15f06371361a0e404c7f4a0d32f7c81d2
Author: Jean-Yves Avenard <jyavenard@mozilla.com>
Date:   Tue Sep 25 20:44:55 2018 +0000

    Bug 1493198 - P2. Use enum for describing color depth. r=mattwoodrow
    
    Depends on D6662
    
    Differential Revision: https://phabricator.services.mozilla.com/D6663
    
    --HG--
    extra : moz-landing-system : lando
Finally the change that introduced mColorRange:
$ git log -1 bb8acbc1f9c3e
commit bb8acbc1f9c3eee5a6a282e356ce863bc4ddae7d
Author: Jean-Yves Avenard <jyavenard@mozilla.com>
Date:   Fri Jul 26 08:45:37 2019 +0000

    Bug 1543359 - P1. Add mColorRange info to YCbCrBufferData. r=bryce
    
    Differential Revision: https://phabricator.services.mozilla.com/D27210
    
    --HG--
    extra : moz-landing-system : lando
Looking at this last commit doesn't really give me an answer, so again I'm going to try sticking to the default. The result of all this is that I've added the following three lines to the onDecodedYCbCrFrame() method to update the structure before it gets passed on to the renderer:
  buffer.mYUVColorSpace = DefaultColorSpace({frame->width, frame->height});
  buffer.mColorDepth = gfx::ColorDepth::COLOR_8;
  buffer.mColorRange = gfx::ColorRange::LIMITED;
On testing this out I find that... hooray!... the video now looks perfect. The colours are no longer mangled and the quality looks pretty good to me. Unfortunately the feed coming from the actual camera when using WebRTC isn't fixed, but video playback is now working nicely.

Although the camera feed discolouration persists, my aim here was to fix the audio and video playback. Both are now working correctly and I'm happy with the results. So I'm ticking these two items from the testing list.

That means the list is complete and I can now move on to other tasks. I have two significant remaining tasks. One is to get the page width interpreted correctly. The second is to get the browser working on Sailfish OS 4.6. I don't yet know where to start with either but I feel like I've already done enough for today, so I'll leave these as tasks for tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
13 Aug 2024 : Day 318 #
During the last 318 days of posting these diary entries I've found myself enjoying a variety of different modes of transport. Mostly trains, but also busses, cars and planes. So far busses have been the most awkward by far. Today I'm trying something hew: developing on a ferry. I have to say, it's at the other extreme, being by far the most comfortable yet. Even better than developing on a train: calmer and with more space. At least, that'll be true until we leave cellphone range and I can no longer access the Internet; at that point I may not find it so comfortable!

Yesterday I was looking in to two issues, both related to video. First is the discolouration issue, which it turns out is due to the Y'CbCr channels being interpreted as RGB channels. The second is a call to eglTerminate() which is causing the browser to crash during video playback.

Last night and this morning I've been repeatedly playing the YouTube video on the Jolla test page. This is the same video that triggered the crash with the backtrace yesterday. I built a new version of the browser that has the call to eglTerminate() replaced by a call to send some debug output to the console instead:
EglDisplay::~EglDisplay() {
  printf_stderr(&quot;CRASH: EglDisplay destructor\n&quot;);
  //fTerminate();
  mLib->mActiveDisplays.erase(mDisplay);
}
During my testing since making this change I've yet to experience a crash. The debug output appears on the first playthrough, so far up to three times, but no more after that:
Created LOG for EmbedLite
Created LOG for EmbedPrefs
Created LOG for EmbedLiteLayerManager
CRASH: EglDisplay destructor
library &quot;libandroidicu.so&quot; needed or dlopened by &quot;/system/lib64/
    libmedia.so&quot; is not accessible for the namespace &quot;(default)&quot;
library &quot;/apex/com.android.vndk.v30/lib64/hw/
    android.hidl.memory@1.0-impl.so&quot; needed or dlopened by &quot;/usr/
    libexec/droid-hybris/system/lib64/libvndksupport.so&quot; is not accessible 
    for the namespace &quot;sphal&quot;
CRASH: EglDisplay destructor
CRASH: EglDisplay destructor
I don't really want to remove the call to eglTerminate() completely as this will likely result in a resource leak. The documentation for the function states the following:
 
Name: eglTerminate — terminate an EGL display connection
C Specification: EGLBoolean eglTerminate(EGLDisplay display);
Parameters: display Specifies the EGL display connection to terminate.
Description: eglTerminate releases resources associated with an EGL display connection. Termination marks all EGL resources associated with the EGL display connection for deletion. If contexts or surfaces associated with display is current to any thread, they are not released until they are no longer current as a result of eglMakeCurrent.
Terminating an already terminated EGL display connection has no effect. A terminated display may be re-initialized by calling eglInitialize again. Errors
EGL_FALSE is returned if eglTerminate fails, EGL_TRUE otherwise.
EGL_BAD_DISPLAY is generated if display is not an EGL display connection.
See Also: eglInitialize, eglMakeCurrent.

The purpose of the function definitely seems to relate to releasing resources once they're no longer needed. The obvious questions are: what's the value of display being passed in and what's the return value coming back? To find these out I've added a line of debug output to the constructor to check the display value at creation time:
EglDisplay::EglDisplay(const PrivateUseOnly&, GLLibraryEGL& lib,
                       const EGLDisplay disp, const bool isWarp)
    : mLib(&lib), mDisplay(disp), mIsWARP(isWarp) {
  printf_stderr(&quot;EGL: constructor, display: %d\n&quot;, mDisplay);
[...]
}
I've also updated the debug output in the destructor so that we can check the display value going in to the eglTerminate() call and the return value coming out:
EglDisplay::~EglDisplay() {
  printf_stderr(&quot;EGL: destructor, display: %d\n&quot;, mDisplay);
  EGLBoolean result = fTerminate();
  printf_stderr(&quot;EGL: eglTerminate return: %d\n&quot;, result);
  mLib->mActiveDisplays.erase(mDisplay);
}
The result is unexpected in a number of ways. The first time I run it I get the output below. What's notable here is first that there's a repeated cycle of construction and destruction calls. Each time I refresh the page the video loads and, after a short pause of between one and two seconds the context is constructed and then almost immediately destructed. Second is that two copies of the context seem to be constructed with the same display value of 1, with multiple contexts being active at the same time and using the same display:
[...]
Created LOG for EmbedLite
Created LOG for EmbedPrefs
Created LOG for EmbedLiteLayerManager

EGL: constructor, display: 1
EGL: destructor, display: 1
EGL: eglTerminate return: 1

EGL: constructor, display: 1
[D] unknown:0 - AMBIENCE: received embedliteviewcreated
library &quot;libandroidicu.so&quot; needed or dlopened by &quot;/system/lib64/
    libmedia.so&quot; is not accessible for the namespace &quot;(default)&quot;
library &quot;/apex/com.android.vndk.v30/lib64/hw/
    android.hidl.memory@1.0-impl.so&quot; needed or dlopened by &quot;/usr/
    libexec/droid-hybris/system/lib64/libvndksupport.so&quot; is not accessible 
    for the namespace &quot;sphal&quot;

EGL: constructor, display: 1
EGL: destructor, display: 1
EGL: eglTerminate return: 1

EGL: constructor, display: 1
EGL: destructor, display: 1
EGL: eglTerminate return: 1

EGL: constructor, display: 1
EGL: destructor, display: 1
EGL: eglTerminate return: 1

EGL: constructor, display: 1
EGL: destructor, display: 1
EGL: eglTerminate return: 1
[...]
For most of the cycles we can see the context gets destructed before the next is constructed, but on the fourth line prefixed with EGL a context is constructed that's not being immediately destructed. Then on the fifth EGL line there's a new context constructed using the same display value.

That's doesn't look quite right to me. It might make more sense if a different display value were being used, but that's not what we're seeing. At least when I run this there are no crashes. But after restarting I get similar outputs, but this time there's also a crash on the fourth runthrough of the video:
[...]
Created LOG for EmbedLite
Created LOG for EmbedPrefs
Created LOG for EmbedLiteLayerManager

EGL: constructor, display: 1
EGL: destructor, display: 1
EGL: eglTerminate return: 1

EGL: constructor, display: 1
[D] unknown:0 - AMBIENCE: received embedliteviewcreated
library &quot;libandroidicu.so&quot; needed or dlopened by &quot;/system/lib64/
    libmedia.so&quot; is not accessible for the namespace &quot;(default)&quot;
library &quot;/apex/com.android.vndk.v30/lib64/hw/
    android.hidl.memory@1.0-impl.so&quot; needed or dlopened by &quot;/usr/
    libexec/droid-hybris/system/lib64/libvndksupport.so&quot; is not accessible 
    for the namespace &quot;sphal&quot;

EGL: constructor, display: 1
EGL: destructor, display: 1
EGL: eglTerminate return: 1

EGL: constructor, display: 1
EGL: destructor, display: 1
EGL: eglTerminate return: 1

EGL: constructor, display: 1
EGL: destructor, display: 1
Segmentation fault
Once again the crash happens as a result of the call to eglTerminate(), which means that the return value from the method is never output to the console.

To try to figure out what's going on I've placed some breakpoints on the EGLDisplay constructor. I'm curious to know how they're getting created and why we're not getting different values each time. This is what we get for the very first occurrence of a call to the constructor:
Thread 39 &quot;Compositor&quot; hit Breakpoint 1, mozilla::gl::EglDisplay::
    EglDisplay (this=0x7ed019b090, lib=..., disp=0x1, isWarp=false)
    at ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp:689
689     ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp: No such file or directory.
(gdb) bt
#0  mozilla::gl::EglDisplay::EglDisplay (this=0x7ed019b090, lib=..., disp=0x1, 
    isWarp=false)
    at ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp:689
#1  0x0000007ff2397378 in __gnu_cxx::new_allocator<mozilla::gl::EglDisplay>::
    construct<mozilla::gl::EglDisplay, mozilla::gl::EglDisplay::PrivateUseOnly, 
    mozilla::gl::GLLibraryEGL&, void* const&, bool const&> (__p=0x7ed019b090, 
    this=<optimized out>)
    at /srv/mer/toolings/SailfishOS-4.5.0.18/opt/cross/aarch64-meego-linux-gnu/
    include/c++/8.3.0/new:169
#2  std::allocator_traits<std::allocator<mozilla::gl::EglDisplay> >::
    construct<mozilla::gl::EglDisplay, mozilla::gl::EglDisplay::PrivateUseOnly, 
    mozilla::gl::GLLibraryEGL&, void* const&, bool const&> (__p=0x7ed019b090, 
    __a=...)
    at /srv/mer/toolings/SailfishOS-4.5.0.18/opt/cross/aarch64-meego-linux-gnu/
    include/c++/8.3.0/bits/alloc_traits.h:475
#3  std::_Sp_counted_ptr_inplace<mozilla::gl::EglDisplay, std::
    allocator<mozilla::gl::EglDisplay>, (__gnu_cxx::_Lock_policy)2>::
    _Sp_counted_ptr_inplace<mozilla::gl::EglDisplay::PrivateUseOnly, mozilla::
    gl::GLLibraryEGL&, void* const&, bool const&> (__a=..., this=0x7ed019b080)
    at /srv/mer/toolings/SailfishOS-4.5.0.18/opt/cross/aarch64-meego-linux-gnu/
    include/c++/8.3.0/bits/shared_ptr_base.h:545
#4  std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<mozilla::gl:
    :EglDisplay, std::allocator<mozilla::gl::EglDisplay>, mozilla::gl::
    EglDisplay::PrivateUseOnly, mozilla::gl::GLLibraryEGL&, void* const&, bool 
    const&> (__a=..., __p=<synthetic pointer>: <optimized out>, this=<synthetic 
    pointer>)
    at /srv/mer/toolings/SailfishOS-4.5.0.18/opt/cross/aarch64-meego-linux-gnu/
    include/c++/8.3.0/bits/shared_ptr_base.h:677
#5  std::__shared_ptr<mozilla::gl::EglDisplay, (__gnu_cxx::_Lock_policy)2>::
    __shared_ptr<std::allocator<mozilla::gl::EglDisplay>, mozilla::gl::
    EglDisplay::PrivateUseOnly, mozilla::gl::GLLibraryEGL&, void* const&, bool 
    const&> (__tag=..., this=<synthetic pointer>)
    at /srv/mer/toolings/SailfishOS-4.5.0.18/opt/cross/aarch64-meego-linux-gnu/
    include/c++/8.3.0/bits/shared_ptr_base.h:1342
#6  std::shared_ptr<mozilla::gl::EglDisplay>::shared_ptr<std::allocator<mozilla:
    :gl::EglDisplay>, mozilla::gl::EglDisplay::PrivateUseOnly, mozilla::gl::
    GLLibraryEGL&, void* const&, bool const&> (__tag=..., this=<synthetic 
    pointer>)
    at /srv/mer/toolings/SailfishOS-4.5.0.18/opt/cross/aarch64-meego-linux-gnu/
    include/c++/8.3.0/bits/shared_ptr.h:359
#7  std::allocate_shared<mozilla::gl::EglDisplay, std::allocator<mozilla::gl::
    EglDisplay>, mozilla::gl::EglDisplay::PrivateUseOnly, mozilla::gl::
    GLLibraryEGL&, void* const&, bool const&> (__a=...)
    at /srv/mer/toolings/SailfishOS-4.5.0.18/opt/cross/aarch64-meego-linux-gnu/
    include/c++/8.3.0/bits/shared_ptr.h:706
#8  std::make_shared<mozilla::gl::EglDisplay, mozilla::gl::EglDisplay::
    PrivateUseOnly, mozilla::gl::GLLibraryEGL&, void* const&, bool const&> ()
    at /srv/mer/toolings/SailfishOS-4.5.0.18/opt/cross/aarch64-meego-linux-gnu/
    include/c++/8.3.0/bits/shared_ptr.h:722
#9  mozilla::gl::EglDisplay::Create (lib=..., display=<optimized out>, 
    isWarp=isWarp@entry=false)
    at ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp:684
#10 0x0000007ff23974c4 in mozilla::gl::GetAndInitDisplay (egl=..., 
    displayType=displayType@entry=0x0, display=<optimized out>, 
    display@entry=0x1)
    at ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp:154
#11 0x0000007ff2397a34 in mozilla::gl::GLLibraryEGL::CreateDisplay (
    this=this@entry=0x7ed01a2660, forceAccel=forceAccel@entry=false, 
    out_failureId=out_failureId@entry=0x7f2d90af50, aDisplay=aDisplay@entry=0x1)
    at ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp:817
#12 0x0000007ff2397e1c in mozilla::gl::GLLibraryEGL::Init (
    this=this@entry=0x7ed01a2660, forceAccel=forceAccel@entry=false, 
    out_failureId=out_failureId@entry=0x7f2d90af50, aDisplay=aDisplay@entry=0x1)
    at ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp:504
#13 0x0000007ff2398b48 in mozilla::gl::GLContextProviderEGL::
    CreateWrappingExisting (aContext=0x7ed00042f0, aSurface=0x5555985ee0, 
    aDisplay=0x1)
    at ${PROJECT}/gecko-dev/gfx/gl/GLContextProviderEGL.cpp:1008
#14 0x0000007ff4c589ac in mozilla::embedlite::nsWindow::GetGLContext (
    this=this@entry=0x7fb8b7b940)
    at ${PROJECT}/gecko-dev/mobile/sailfishos/embedshared/nsWindow.cpp:405
#15 0x0000007ff4c58b78 in mozilla::embedlite::nsWindow::GetNativeData (
    this=0x7fb8b7b940, aDataType=12)
    at ${PROJECT}/gecko-dev/mobile/sailfishos/embedshared/nsWindow.cpp:173
#16 0x0000007ff24120ac in mozilla::layers::CompositorOGL::CreateContext (
    this=this@entry=0x7ed01a22a0)
    at ${PROJECT}/gecko-dev/gfx/layers/opengl/CompositorOGL.cpp:232
#17 0x0000007ff2427964 in mozilla::layers::CompositorOGL::Initialize (
    this=0x7ed01a22a0, out_failureReason=0x7f2d90b510)
    at ${PROJECT}/gecko-dev/gfx/layers/opengl/CompositorOGL.cpp:387
#18 0x0000007ff253d6f4 in mozilla::layers::CompositorBridgeParent::
    NewCompositor (this=this@entry=0x7fb8bdec20, aBackendHints=...)
    at ${PROJECT}/gecko-dev/gfx/layers/ipc/CompositorBridgeParent.cpp:1493
There are a lot of uninteresting allocator calls here. We don't get to the interesting bit until frame 13, below which we have the following sequence of calls:
  1. GLContextProviderEGL::CreateWrappingExisting()
  2. nsWindow::GetGLContext()
  3. nsWindow::GetNativeData()
  4. CompositorOGL::CreateContext()
  5. CompositorOGL::Initialize()
  6. CompositorBridgeParent::NewCompositor()
The second and third time the context is created we get the same backtrace. But on the fourth occasion we get something different:
Thread 10 &quot;GeckoWorkerThre&quot; hit Breakpoint 1, mozilla::gl::EglDisplay:
    :EglDisplay (this=0x7fb9ec51e0, lib=..., disp=0x1, isWarp=false)
    at ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp:689
689     in ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp
(gdb) bt
#0  mozilla::gl::EglDisplay::EglDisplay (this=0x7fb9ec51e0, lib=..., disp=0x1, 
    isWarp=false)
    at ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp:689
#1  0x0000007ff2397378 in __gnu_cxx::new_allocator<mozilla::gl::EglDisplay>::
    construct<mozilla::gl::EglDisplay, mozilla::gl::EglDisplay::PrivateUseOnly, 
    mozilla::gl::GLLibraryEGL&, void* const&, bool const&> (__p=0x7fb9ec51e0, 
    this=<optimized out>)
    at /srv/mer/toolings/SailfishOS-4.5.0.18/opt/cross/aarch64-meego-linux-gnu/
    include/c++/8.3.0/new:169
#2  std::allocator_traits<std::allocator<mozilla::gl::EglDisplay> >::
    construct<mozilla::gl::EglDisplay, mozilla::gl::EglDisplay::PrivateUseOnly, 
    mozilla::gl::GLLibraryEGL&, void* const&, bool const&> (__p=0x7fb9ec51e0, 
    __a=...)
    at /srv/mer/toolings/SailfishOS-4.5.0.18/opt/cross/aarch64-meego-linux-gnu/
    include/c++/8.3.0/bits/alloc_traits.h:475
#3  std::_Sp_counted_ptr_inplace<mozilla::gl::EglDisplay, std::
    allocator<mozilla::gl::EglDisplay>, (__gnu_cxx::_Lock_policy)2>::
    _Sp_counted_ptr_inplace<mozilla::gl::EglDisplay::PrivateUseOnly, mozilla::
    gl::GLLibraryEGL&, void* const&, bool const&> (__a=..., this=0x7fb9ec51d0)
    at /srv/mer/toolings/SailfishOS-4.5.0.18/opt/cross/aarch64-meego-linux-gnu/
    include/c++/8.3.0/bits/shared_ptr_base.h:545
#4  std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<mozilla::gl:
    :EglDisplay, std::allocator<mozilla::gl::EglDisplay>, mozilla::gl::
    EglDisplay::PrivateUseOnly, mozilla::gl::GLLibraryEGL&, void* const&, bool 
    const&> (__a=..., __p=<synthetic pointer>: <optimized out>, this=<synthetic 
    pointer>)
    at /srv/mer/toolings/SailfishOS-4.5.0.18/opt/cross/aarch64-meego-linux-gnu/
    include/c++/8.3.0/bits/shared_ptr_base.h:677
#5  std::__shared_ptr<mozilla::gl::EglDisplay, (__gnu_cxx::_Lock_policy)2>::
    __shared_ptr<std::allocator<mozilla::gl::EglDisplay>, mozilla::gl::
    EglDisplay::PrivateUseOnly, mozilla::gl::GLLibraryEGL&, void* const&, bool 
    const&> (__tag=..., this=<synthetic pointer>)
    at /srv/mer/toolings/SailfishOS-4.5.0.18/opt/cross/aarch64-meego-linux-gnu/
    include/c++/8.3.0/bits/shared_ptr_base.h:1342
#6  std::shared_ptr<mozilla::gl::EglDisplay>::shared_ptr<std::allocator<mozilla:
    :gl::EglDisplay>, mozilla::gl::EglDisplay::PrivateUseOnly, mozilla::gl::
    GLLibraryEGL&, void* const&, bool const&> (__tag=..., this=<synthetic 
    pointer>)
    at /srv/mer/toolings/SailfishOS-4.5.0.18/opt/cross/aarch64-meego-linux-gnu/
    include/c++/8.3.0/bits/shared_ptr.h:359
#7  std::allocate_shared<mozilla::gl::EglDisplay, std::allocator<mozilla::gl::
    EglDisplay>, mozilla::gl::EglDisplay::PrivateUseOnly, mozilla::gl::
    GLLibraryEGL&, void* const&, bool const&> (__a=...)
    at /srv/mer/toolings/SailfishOS-4.5.0.18/opt/cross/aarch64-meego-linux-gnu/
    include/c++/8.3.0/bits/shared_ptr.h:706
#8  std::make_shared<mozilla::gl::EglDisplay, mozilla::gl::EglDisplay::
    PrivateUseOnly, mozilla::gl::GLLibraryEGL&, void* const&, bool const&> ()
    at /srv/mer/toolings/SailfishOS-4.5.0.18/opt/cross/aarch64-meego-linux-gnu/
    include/c++/8.3.0/bits/shared_ptr.h:722
#9  mozilla::gl::EglDisplay::Create (lib=..., display=<optimized out>, 
    isWarp=isWarp@entry=false)
    at ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp:684
#10 0x0000007ff23974c4 in mozilla::gl::GetAndInitDisplay (egl=..., 
    displayType=displayType@entry=0x0, display=<optimized out>, 
    display@entry=0x0)
    at ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp:154
#11 0x0000007ff2397a34 in mozilla::gl::GLLibraryEGL::CreateDisplay (
    this=this@entry=0x7fb99b9f30, forceAccel=forceAccel@entry=false, 
    out_failureId=out_failureId@entry=0x7fde768ac8, aDisplay=aDisplay@entry=0x0)
    at ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp:817
#12 0x0000007ff2397e1c in mozilla::gl::GLLibraryEGL::Init (
    this=this@entry=0x7fb99b9f30, forceAccel=forceAccel@entry=false, 
    out_failureId=out_failureId@entry=0x7fde768ac8, aDisplay=aDisplay@entry=0x0)
    at ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp:504
#13 0x0000007ff2398664 in mozilla::gl::GLLibraryEGL::Create (
    out_failureId=out_failureId@entry=0x7fde768ac8)
    at ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp:345
#14 0x0000007ff23987bc in mozilla::gl::DefaultEglLibrary (
    out_failureId=out_failureId@entry=0x7fde768ac8)
    at ${PROJECT}/gecko-dev/gfx/gl/GLContextProviderEGL.cpp:1307
#15 0x0000007ff23a9ec8 in mozilla::gl::DefaultEglDisplay (
    out_failureId=0x7fde768ac8)
    at ${PROJECT}/gecko-dev/gfx/gl/GLContextEGL.h:29
#16 mozilla::gl::GLContextProviderEGL::CreateHeadless (desc=..., 
    desc@entry=<error reading variable: value has been optimized out>, 
    out_failureId=0x7fde768ac8, out_failureId@entry=<error reading variable: 
    value has been optimized out>)
    at ${PROJECT}/gecko-dev/gfx/gl/GLContextProviderEGL.cpp:1248
The interesting parts of this distinct backtrace start at frame 12:
  1. GLLibraryEGL::Init()
  2. GLLibraryEGL::Create()
  3. DefaultEglLibrary()
  4. DefaultEglDisplay()
  5. GLContextProviderEGL::CreateHeadless()
From that point onward we get a mixture of both backtraces. Why is the fact they're different relevant? Well, I have a suspicion that the underlying problem may be that there are two different contexts being created and working entirely independently. If the contexts are different, then each instance will have its own list containing display values that are already in use. Hence the values used for display end up overlapping, rather than always being distinct. Then when an attempt is made to terminated the same display twice... boom!

That's my hypothesis anyway. It turns out I'm nearly right, but not quite. After stepping through the methods seen in the two backtraces above, it becomes clear that it's not the context that's being duplicated, but rather the GLLibraryEGL. The reason is that in all the mess of trying to figure out how to get the WebView working alongside WebGL, I ended up with two different ways to create the library.

We can see the result of this by adding a breakpoint to EglDisplay::Create(). A pointer to the library that requested it is passed in as a parameter, so we can query this parameter to check that only one library is in use at any one time.

The following two cases are from the same execution of ESR 91. Here's the first hit. Notice that the lib parameter is pointing to memory location 0x7ed01a21b0:
Thread 39 &quot;Compositor&quot; hit Breakpoint 2, mozilla::gl::EglDisplay::
    Create (lib=..., display=0x1, isWarp=isWarp@entry=false)
    at ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp:664
664     ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp: No such file or directory.
(gdb) p lib
$1 = (mozilla::gl::GLLibraryEGL &) @0x7ed01a21b0: {mRefCnt = {static 
    isThreadSafe = true, mValue = {<std::__atomic_base<unsigned long>> = {
        static _S_alignment = 8, _M_i = 1}, static is_always_lock_free = 
    true}}, mEGLLibrary = 0x7ed01a23e0, mGLLibrary = 0x7ed01a2470, mIsANGLE = 
    false, 
  mAvailableExtensions = std::bitset, mDefaultDisplay = std::weak_ptr<mozilla::
    gl::EglDisplay> (empty) = {get() = 0x0}, 
  mActiveDisplays = std::unordered_map with 0 elements, mSymbols = 
    {fGetProcAddress = 0x7fef07cdc0 <eglGetProcAddress>, 
    fGetDisplay = 0x7fef07d440 <eglGetDisplay>, fGetPlatformDisplay = 0x0, 
    fTerminate = 0x7fef07d480 <eglTerminate>, 
    fGetCurrentSurface = 0x7fef07e010 <eglGetCurrentSurface>, 
    fGetCurrentContext = 0x7fef07dfb0 <eglGetCurrentContext>, 
    fMakeCurrent = 0x7fef07df18 <eglMakeCurrent>, fDestroyContext = 
    0x7fef07de90 <eglDestroyContext>, fCreateContext = 0x7fef07cac8 
    <eglCreateContext>, 
    fDestroySurface = 0x7fef07d0b8 <eglDestroySurface>, fCreateWindowSurface = 
    0x7fef07d500 <eglCreateWindowSurface>, 
    fCreatePbufferSurface = 0x7fef07d940 <eglCreatePbufferSurface>, 
    fCreatePbufferFromClientBuffer = 0x7fef07dc20 
    <eglCreatePbufferFromClientBuffer>, 
    fCreatePixmapSurface = 0x7fef07ca30 <eglCreatePixmapSurface>, fBindAPI = 
    0x7fef07da80 <eglBindAPI>, fInitialize = 0x7fef07d618 <eglInitialize>, 
    fChooseConfig = 0x7fef07d7e0 <eglChooseConfig>, fGetError = 0x7fef07c9a0 
    <eglGetError>, fGetConfigAttrib = 0x7fef07d898 <eglGetConfigAttrib>, 
    fGetConfigs = 0x7fef07d738 <eglGetConfigs>, fWaitNative = 0x7fef07e1f8 
    <eglWaitNative>, fSwapBuffers = 0x7fef07d248 <eglSwapBuffers>, 
    fCopyBuffers = 0x7fef07e278 <eglCopyBuffers>, fQueryString = 0x7fef07d6b0 
    <eglQueryString>, fQueryContext = 0x7fef07e0f0 <eglQueryContext>, 
    fBindTexImage = 0x7fef07dd60 <eglBindTexImage>, fReleaseTexImage = 
    0x7fef07ddf8 <eglReleaseTexImage>, fSwapInterval = 0x7fef07cc18 
    <eglSwapInterval>, 
    fCreateImageKHR = 0x0, fDestroyImageKHR = 0x0, fQuerySurface = 0x7fef07d9d8 
    <eglQuerySurface>, fQuerySurfacePointerANGLE = 0x0, fCreateSyncKHR = 0x0, 
    fDestroySyncKHR = 0x0, fClientWaitSyncKHR = 0x0, fGetSyncAttribKHR = 0x0, 
    fWaitSyncKHR = 0x0, fDupNativeFenceFDANDROID = 0x0, fCreateStreamKHR = 0x0, 
    fDestroyStreamKHR = 0x0, fQueryStreamKHR = 0x0, 
    fStreamConsumerGLTextureExternalKHR = 0x0, fStreamConsumerAcquireKHR = 0x0, 
    fStreamConsumerReleaseKHR = 0x0, fQueryDisplayAttribEXT = 0x0, 
    fQueryDeviceAttribEXT = 0x0, fStreamConsumerGLTextureExternalAttribsNV = 
    0x0, 
    fCreateStreamProducerD3DTextureANGLE = 0x0, fStreamPostD3DTextureANGLE = 
    0x0, fCreateDeviceANGLE = 0x0, fReleaseDeviceANGLE = 0x0, 
    fSwapBuffersWithDamage = 0x0, fSetDamageRegion = 0x0, 
    fGetNativeClientBufferANDROID = 0x0}}
(gdb) 
For the next case of this call the lib parameter is pointing elsewhere, to memory location 0x7fb9350e50:
Thread 10 &quot;GeckoWorkerThre&quot; hit Breakpoint 2, mozilla::gl::EglDisplay:
    :Create (lib=..., display=0x1, isWarp=isWarp@entry=false)
    at ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp:664
664     in ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp
(gdb) p lib
$5 = (mozilla::gl::GLLibraryEGL &) @0x7fb9350e50: {mRefCnt = {static 
    isThreadSafe = true, mValue = {<std::__atomic_base<unsigned long>> = {
        static _S_alignment = 8, _M_i = 1}, cstatic is_always_lock_free = 
    true}}, mEGLLibrary = 0x7ed01a23e0, mGLLibrary = 0x7ed01a2470, mIsANGLE = 
    false,
  mAvailableExtensions = std::bitset, mDefaultDisplay = std::weak_ptr<mozilla::
    gl::EglDisplay> (empty) = {get() = 0x0},
  mActiveDisplays = std::unordered_map with 0 elements, mSymbols = 
    {fGetProcAddress = 0x7fef07cdc0 <eglGetProcAddress>,
    fGetDisplay = 0x7fef07d440 <eglGetDisplay>, fGetPlatformDisplay = 0x0, 
    fTerminate = 0x7fef07d480 <eglTerminate>,
    fGetCurrentSurface = 0x7fef07e010 <eglGetCurrentSurface>, 
    fGetCurrentContext = 0x7fef07dfb0 <eglGetCurrentContext>,
    fMakeCurrent = 0x7fef07df18 <eglMakeCurrent>, fDestroyContext = 
    0x7fef07de90 <eglDestroyContext>, fCreateContext = 0x7fef07cac8 
    <eglCreateContext>,
    fDestroySurface = 0x7fef07d0b8 <eglDestroySurface>, fCreateWindowSurface = 
    0x7fef07d500 <eglCreateWindowSurface>,
    fCreatePbufferSurface = 0x7fef07d940 <eglCreatePbufferSurface>, 
    fCreatePbufferFromClientBuffer = 0x7fef07dc20 
    <eglCreatePbufferFromClientBuffer>,
    fCreatePixmapSurface = 0x7fef07ca30 <eglCreatePixmapSurface>, fBindAPI = 
    0x7fef07da80 <eglBindAPI>, fInitialize = 0x7fef07d618 <eglInitialize>,
    fChooseConfig = 0x7fef07d7e0 <eglChooseConfig>, fGetError = 0x7fef07c9a0 
    <eglGetError>, fGetConfigAttrib = 0x7fef07d898 <eglGetConfigAttrib>,
    fGetConfigs = 0x7fef07d738 <eglGetConfigs>, fWaitNative = 0x7fef07e1f8 
    <eglWaitNative>, fSwapBuffers = 0x7fef07d248 <eglSwapBuffers>,
    fCopyBuffers = 0x7fef07e278 <eglCopyBuffers>, fQueryString = 0x7fef07d6b0 
    <eglQueryString>, fQueryContext = 0x7fef07e0f0 <eglQueryContext>,
    fBindTexImage = 0x7fef07dd60 <eglBindTexImage>, fReleaseTexImage = 
    0x7fef07ddf8 <eglReleaseTexImage>, fSwapInterval = 0x7fef07cc18 
    <eglSwapInterval>,
    fCreateImageKHR = 0x0, fDestroyImageKHR = 0x0, fQuerySurface = 0x7fef07d9d8 
    <eglQuerySurface>, fQuerySurfacePointerANGLE = 0x0, fCreateSyncKHR = 0x0,
    fDestroySyncKHR = 0x0, fClientWaitSyncKHR = 0x0, fGetSyncAttribKHR = 0x0, 
    fWaitSyncKHR = 0x0, fDupNativeFenceFDANDROID = 0x0, fCreateStreamKHR = 0x0,
    fDestroyStreamKHR = 0x0, fQueryStreamKHR = 0x0, 
    fStreamConsumerGLTextureExternalKHR = 0x0, fStreamConsumerAcquireKHR = 0x0,
    fStreamConsumerReleaseKHR = 0x0, fQueryDisplayAttribEXT = 0x0, 
    fQueryDeviceAttribEXT = 0x0, fStreamConsumerGLTextureExternalAttribsNV = 
    0x0,
    fCreateStreamProducerD3DTextureANGLE = 0x0, fStreamPostD3DTextureANGLE = 
    0x0, fCreateDeviceANGLE = 0x0, fReleaseDeviceANGLE = 0x0,
    fSwapBuffersWithDamage = 0x0, fSetDamageRegion = 0x0, 
    fGetNativeClientBufferANDROID = 0x0}}
(gdb)
As we continue through the execution we find that both of these two GLLibraryEGL instances are in use throughout. Both of the ways to create an instance of GLLibraryEGL are in GLContextProviderEGL. The first happens when a call is made to GLContextProviderEGL::CreateWrappingExisting(). It's clear from the code that a new copy of the library will be created each time this method is called.

The second happens when there's a call to DefaultEglLibrary(). In this case there's an interesting twist, because DefaultEglLibrary() stores the result in gDefaultEglLibrary which is a static variable. A new instance is constructed only if this is set to null, essentially making GLLibraryEGL a singleton when created via this route.

Since they're both in the same file, the solution I've come up with is to make CreateWrappingExisting() use gDefaultEglLibrary as well. That way, whichever is called first will create the canonical instance of the library. Any subsequent calls will reuse the same instance.

Testing this out with the browser I get good results. It's clear that eglTerminate() is no longer being called partway through the video; in fact, there's now no construction of destruction of a new display at all while the video is playing. I've not yet managed to trigger a crash, but will have to use the browser a bit more before I feel more confident about this.

Just as importantly, I've tested the browser and the WebView app with both standard browsing and pages that contain WebGL. So far the results have been stable as well.

If this really has prevented a crash then this will be a great result. I have a suspicion that something similar was happening on ESR 78, which would periodically crash when videos were playing. This was problematic on many pages with embedded videos. Often it would appear that the browser was crashing at random. If this fixes these crashes, the browser will be far more enjoyable to use in general.

None of these changes will fix the discolouration issue and I won't have time to do more work on that today. But I will certainly return to it tomorrow with a fresh pair of eyes.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
12 Aug 2024 : Day 317 #
The build I started yesterday has gone through successfully, which is a nice step forwards. This is with all the audio and video patches applied and I'm hoping I'll now have better luck using the audio and video test pages that were causing us trouble back on Day 299.

Here's what I wrote about it back then:
 
For audio testing the results are also unfortunately a fail. I'm testing using BBC Sounds which works fine on ESR 78. But on my ESR 91 build we don't get any audio, just an error message that states "This content doesn't seem to be working". Disappointing.

I get the same with the BBC iPlayer for video: it works on ESR 78 but not ESR 91. When using Jolla's video test page I get the same experience. On YouTube as well.

So that's rather a litany of failures. Now when I test out these various video and audio pages I get a very different experience. BBC Sounds works nicely and I can listen to radio programs, both live and historical. On Jolla's video test pages, BBC iPlayer and YouTube the videos all play, which is definitely a big improvement. But there is a catch.

Now with the videos I'm discolouration similar to that we saw on Day 290 when working on the WebRTC video. The colours seem to have channels switched or be colour-shifted in a way that means you can clearly see the video, just with all the wrong colours.

Comparing the original with the version showing in ESR 91, this doesn't seem to be a straight channel switch as we were seeing before. In fact I've not yet been able to figure out what the conversion is that's happening here. Any ideas?
 

I'm in two minds now as to whether or not I should tick off the audio and video items in the testing list. There's already an issue for tackling the discolouration so it might makes sense for future work on this to fall under that.

On the other hand, it'd be nice to get this all resolved as soon as possible. My plan is to do a bit more testing and to spend some time today trying to figure out whether there's an obvious solution or missing change that I still need to make. If I can't figure something out quickly, I'll reconsider ticking these items off. But I'm going to leave them unticked for now.

So, today it's more audio and video testing.

The first thing I try out is just a few runs of the YouTube video on the Jolla test page. Everything seems to be going well, with several successful playthroughs, but then there's an unexpected crash. Because I'm running using the debugger I'm able to capture a backtrace for it. Note I've cut out several batches of frames for clarity:
Thread 10 &quot;GeckoWorkerThre&quot; received signal SIGSEGV, Segmentation 
    fault.
[Switching to LWP 8326]
0x0000007fe7747e90 in wl_proxy_destroy () from /usr/lib64/libwayland-client.so.0
(gdb) bt
#0  0x0000007fe7747e90 in wl_proxy_destroy () from /usr/lib64/
    libwayland-client.so.0
#1  0x0000007fe7496fdc in waylandws_Terminate () from /usr/lib64/libhybris//
    eglplatform_wayland.so
#2  0x0000007fef07d4bc in eglTerminate () from /usr/lib64/libEGL.so.1
#3  0x0000007ff23963ec in mozilla::gl::GLLibraryEGL::fTerminate (
    display=<optimized out>, this=<optimized out>)
    at gfx/gl/GLLibraryEGL.h:234
#4  mozilla::gl::EglDisplay::fTerminate (this=0x7fb938c020)
    at gfx/gl/GLLibraryEGL.h:639
#5  mozilla::gl::EglDisplay::~EglDisplay (this=0x7fb938c020, 
    __in_chrg=<optimized out>)
    at gfx/gl/GLLibraryEGL.cpp:734
#6  0x0000007ff23964a4 in __gnu_cxx::new_allocator<mozilla::gl::EglDisplay>::
    destroy<mozilla::gl::EglDisplay> (__p=<optimized out>, this=<optimized out>)
    at /srv/mer/toolings/SailfishOS-4.5.0.18/opt/cross/aarch64-meego-linux-gnu/
    include/c++/8.3.0/ext/new_allocator.h:140
[...]
#19 RefPtr<mozilla::gl::GLContext>::operator=(decltype(nullptr)) (
    this=0x7fbbc4b8e0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:168
#20 mozilla::WebGLContext::DestroyResourcesAndContext (
    this=this@entry=0x7fbbc4b8c0)
    at dom/canvas/WebGLContext.cpp:217  
#21 0x0000007ff31fd2ec in mozilla::WebGLContext::~WebGLContext (
    this=0x7fbbc4b8c0, __in_chrg=<optimized out>)
    at dom/canvas/WebGLContext.cpp:152  
#22 0x0000007ff31fd9bc in mozilla::WebGLContext::~WebGLContext (
    this=0x7fbbc4b8c0, __in_chrg=<optimized out>)
    at dom/canvas/WebGLContext.cpp:152  
#23 0x0000007ff31b09e4 in mozilla::detail::RefCounted<mozilla::VRefCounted, (
    mozilla::detail::RefCountAtomicity)1>::Release (this=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefCounted.h:240
#24 mozilla::RefPtrTraits<mozilla::WebGLContext>::Release (aPtr=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:50
#25 RefPtr<mozilla::WebGLContext>::ConstRemovingRefPtrTraits<mozilla::
    WebGLContext>::Release (aPtr=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:381
[...]
#42 0x0000007ff19163c8 in nsCycleCollector::CollectWhite (
    this=this@entry=0x7fb803f7b0)
    at xpcom/base/nsCycleCollector.cpp:3081
#43 0x0000007ff191cd68 in nsCycleCollector::Collect (this=0x7fb803f7b0, 
    aCCType=aCCType@entry=SliceCC, aBudget=...,
    aManualListener=aManualListener@entry=0x0, 
    aPreferShorterSlices=aPreferShorterSlices@entry=false)
    at xpcom/base/nsCycleCollector.cpp:3435
#44 0x0000007ff191cfdc in nsCycleCollector_collectSlice (budget=..., 
    aPreferShorterSlices=false)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#45 0x0000007ff28a14f0 in nsJSContext::RunCycleCollectorSlice (
    aDeadline=aDeadline@entry=...)
    at dom/base/nsJSEnvironment.cpp:1406
#46 0x0000007ff28a21a0 in mozilla::CCGCScheduler::CCRunnerFired (aDeadline=...)
    at dom/base/nsJSEnvironment.cpp:1543
[...]
#72 0x0000007fef53689c in ?? () from /lib64/libc.so.6
(gdb)
The crash appears to be triggered by a call to EglDisplay::fTerminate() but there are other questions to ask about what's going on here. For example, further down the stack we see that the underlying cause is the destruction of EglDisplay which itself is being caused by the destruction of GLContextEGL. Could this be down to an incorrect reference count? Or maybe it's intentionally being deleted at this point. It's not clear why it would be though.

To try to answer this I thought it might help to know where the context is being created. So I've added a breakpoint to CreateHeadless() which appears to be where this is happening.

Once the page with the video has opened the breakpoint hits a minute or two after the video has started playing. I've captured the backtrace, which looks like this:
Thread 10 &quot;GeckoWorkerThre&quot; hit Breakpoint 1, mozilla::gl::
    GLContextProviderEGL::CreateHeadless (desc=..., 
    desc@entry=<error reading variable: value has been optimized out>, 
    out_failureId=0x7fde76f9b8, 
    out_failureId@entry=<error reading variable: value has been optimized out>)
    at gfx/gl/GLContextProviderEGL.cpp:1247
1247        const GLContextCreateDesc& desc, nsACString* const out_failureId) {
(gdb) bt
#0  mozilla::gl::GLContextProviderEGL::CreateHeadless (desc=..., 
    desc@entry=<error reading variable: value has been optimized out>, 
    out_failureId=0x7fde76f9b8, out_failureId@entry=<error reading variable: 
    value has been optimized out>)
    at gfx/gl/GLContextProviderEGL.cpp:1247
#1  0x0000007ff31ec8f4 in mozilla::WebGLContext::<lambda(
    already_AddRefed<mozilla::gl::GLContext> (*)(const mozilla::gl::
    GLContextCreateDesc&, nsACString*), char const*)>::operator()(mozilla::
    WebGLContext::fnCreateT *, const char *) const (
    __closure=__closure@entry=0x7fde76fa70, pfnCreate=<optimized out>, 
    info=info@entry=0x7ff64c2c98 &quot;tryNativeGL&quot;)
    at dom/canvas/WebGLContext.cpp:350
#2  0x0000007ff31fdf58 in mozilla::WebGLContext::<lambda()>::operator() (
    __closure=<optimized out>)
    at dom/canvas/WebGLContext.cpp:362
#3  mozilla::WebGLContext::CreateAndInitGL (this=this@entry=0x7fb97555d0, 
    forceEnabled=forceEnabled@entry=true, out_failReasons=<optimized out>, 
    out_failReasons@entry=0x7fde76fb60)
    at dom/canvas/WebGLContext.cpp:371
#4  0x0000007ff31fe37c in mozilla::WebGLContext::<lambda()>::operator() (
    __closure=<optimized out>)
    at dom/canvas/WebGLContext.cpp:514
#5  mozilla::WebGLContext::Create (host=..., desc=..., 
    out=out@entry=0x7fb9fa7e38)
    at dom/canvas/WebGLContext.cpp:562
#6  0x0000007ff31b638c in mozilla::HostWebGLContext::Create (ownerData=..., 
    desc=..., out=out@entry=0x7fb9fa7e38)
    at dom/canvas/HostWebGLContext.cpp:59
#7  0x0000007ff31e5de0 in mozilla::ClientWebGLContext::<lambda()>::operator() (
    __closure=<optimized out>)
    at dom/canvas/ClientWebGLContext.cpp:625
#8  mozilla::ClientWebGLContext::CreateHostContext (
    this=this@entry=0x7fb99e1f80, requestedSize=...)
    at dom/canvas/ClientWebGLContext.cpp:654
[...]
#28 0x0000007fb99c4c41 in ?? ()
Backtrace stopped: not enough registers or memory available to unwind further
(gdb) 
After releasing the execution the video continues to play through to the end without incident. No crash and the browser is still responsive. It's not at all clear to me why the context would only be created just part of the way through the video. Very strange.

Assuming for a moment that the construction and destruction of the context is intentional, it's possible the problem is all down to that call to fTerminate(). This is calling the eglTerminate() GLES method and it does look like it may be causing problems for Wayland. Maybe this is the bug?

To get a feel for this I've removed the call to fTerminate() from the EglDisplay destructor and replaced it with a call to output a debug string to the console instead: EglDisplay::~EglDisplay() { printf_stderr("CRASH: EglDisplay destructor"); //fTerminate(); mLib->mActiveDisplays.erase(mDisplay); } My thinking is that if we see the debug output but no crash, that may be an indication that the call to fTerminate() is at the heart of the problem, rather than being caused by badly written code elsewhere. On the other hand if there's still a crash we should be look elsewhere in the code.

I've built a copy of the library with these changes and transferred it over to my phone. It may take a while for the issue to re-trigger, so I'll continue testing this tomorrow morning.

While the library worked it's way through the build I've also spent some time reviewing the video decoder code from earlier. I notice various references to the video colour model that could help explain the video discolouration.

The code in GeckoCameraVideoDecoder.cpp suggests that the decoder is generating Y'CbCR data. The three channels of Y'CbCR are made up of luma (Y'), blue difference chroma (Cb) and red difference chroma (Cr). It appears that one of the benefits of using this colour model is that the chroma channels can have a lower resolution than the luma channel without adversely affecting resulting appearance. This helps reduce the size of the data that needs to be transferred.

If I split the video image being rendered by the browser into its red, blue and green components and then do the same for the correct image, except split into the Y', Cb and Cr components, I find that the results broadly match up.

Here we can see the results graphically. There are two columns and three rows of components. The left hand column from top-to-bottom shows the Green, Blue and Red components respectively of the image rendered by the browser. The right hand column from top-to-bottom shows the Y', Cb and Cr components respectively taken from the original video:
 
The Blender cartoon parrot shown six times in grescale. The left column contains Green, Blue and Red channels from the browser render. The right column shows the luma, blue difference chroma and red difference chroma channels from the original video. The images side-by-side look similar.

The blue and red channels from the browser appear to be rendered in a lower resolution, but otherwise look very similar to the Y'CbCr data from the original video.

What can we conclude? It seems that the browser is sending Y'CbCr data to be rendered, but the hardware is rendering it as RGB data. The result is the peculiar colour mixture that we end up seeing.

It's certainly going to be helpful to know what's going wrong, but finding the place in the code where this is happening and figuring out how to fix it is another story.

Although I've not been able to solve either the crash or the discolouration today, it nevertheless feels like we've covered some useful groundwork. Tomorrow I'll try to pursue both issues further.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
11 Aug 2024 : Day 316 #
Last night I was so tired. I was really desperate to get the video patches building and it felt like there were only a few remaining fixes needed before the build would go through. My hope was that if I could get them done I'd have a completed build by morning. But I was just too tied. So here I am now the next day with the changes still to make, but with a lot more focus and much greater chance of getting it right.

The error I got last night looked like this:
116:46.05 ${PROJECT}/gecko-dev/dom/media/platforms/PDMFactory.cpp:607:65: error:
     no matching function for call to ‘mozilla::PDMFactory::StartupPDM(mozilla::
    GeckoCameraDecoderModule, mozilla::StripAtomic<mozilla::Atomic<bool, (
    mozilla::MemoryOrdering)0> >)’
116:46.05                 StaticPrefs::media_gecko_camera_codec_preferred());
116:46.05                                                                  ^
The problem seems to be with the way the decoder module is being created. Previously the standard C++ constructor was used to instantiate the module, but if we take a look at change D54876 we can see that in ESR 91 it's no longer being done this way. The description of the relevant commit isn't super helpful:
$ git log -1 665d2ab6f565f
commit 665d2ab6f565fd446e1786563c94508a1a9f0251
Author: Dan Glastonbury <dan.glastonbury@gmail.com>
Date:   Tue Oct 20 23:24:27 2020 +0000

    Bug 1595994 - P3: Only create PDMs that are supported in the current 
    process. r=kamidphish
    
    Depends on D52797
    
    Differential Revision: https://phabricator.services.mozilla.com/D54876
In practice this means that a templated call to a start-up method is being used instead of the constructor directly. By comparing the code in the AppleDecoderModule.h and AppleDecoderModule.cpp files we can see that this is because a new reference-counted initialiser is being used:
/* static */
already_AddRefed<PlatformDecoderModule> AppleDecoderModule::Create() {
  return MakeAndAddRef<AppleDecoderModule>();
}
Following the example of the other modules there I've made a similar change to the GeckoCameraDecoderModule class. Looking through the other differences between the old a new versions of the Apple decoder module code I can see that there's also this another new method that's been added to the class:
  bool Supports(const SupportDecoderParams& aParams,
                DecoderDoctorDiagnostics* aDiagnostics) const override;
So now I'm wondering if I need to add something along these lines to the GeckoCameraDecoderModule class as well. This Supports() method was added in a different commit to the Create() changes and checking the description that goes along with the commit, it looks like this is a change that's specific to the AppleDecoderModule():
$ git log -1 f3562546bb2b1
commit f3562546bb2b1f91957b0c38d9662562980bd939
Author: Jean-Yves Avenard <jyavenard@mozilla.com>
Date:   Thu Aug 13 02:16:19 2020 +0000

    Bug 1657521 - P5. Add VP9 HW decoder support on macOS 11 (Big Sur). r=jolin
    
    To create a VP9 decoder, the VideoToolbox requires a vppC atom similar to 
    how the H264 one requires an avcC one.
    
    That information is typically not available in the webm container and is 
    found in the VP9 bytestream with each keyframe.
    
    In order to minimise the extent of the changes, we move the task of 
    retrieving the vpcC content in the MediaChangeMonitor as it already 
    performs a similar task in order to detect if the format has changed.
    
    The VPXChangeMonitor will now only instantiate a VP9 decoder once a 
    keyframe is seen.
    
    Differential Revision: https://phabricator.services.mozilla.com/D86544
I've therefore decided not to add this Supports() method to our version of the class. However, I do still need to update the PDMFactory class to make use of the new approach to reference-counted instantiation. It seems there are two ways to do this, depending on the ordering the modules should be selected. For example, the code that creates the Apple decoder module just looks like this:
  CreateAndStartupPDM<AppleDecoderModule>();
Contrast this with the code for the Android decoder module:
  if (StaticPrefs::media_android_media_codec_enabled()) {
    StartupPDM(AndroidDecoderModule::Create(),
               StaticPrefs::media_android_media_codec_preferred());
  }
It's not clear which of the two I should be using. The difference seems to be between whether CreateAndStartupPDM() is used, or StartupPDM() is used. So let's look at what these two methods do. The latter looks like this:
bool PDMFactory::StartupPDM(already_AddRefed<PlatformDecoderModule> aPDM,
                            bool aInsertAtBeginning) {
  RefPtr<PlatformDecoderModule> pdm = aPDM;
  if (pdm && NS_SUCCEEDED(pdm->Startup())) {
    if (aInsertAtBeginning) {
      mCurrentPDMs.InsertElementAt(0, pdm);
    } else {
      mCurrentPDMs.AppendElement(pdm);
    }
    return true;
  }
  return false;
}
So in this case the static Create() method is called to create the module directly with the result passed in as aPDM. There's then some code to determine whether or not the module should be inserted at the beginning or end of the module list. In contrast using CreateAndStartupPDM() is marginally cleaner in the calling code, but in practice ends up doing exactly the same thing and calling StartupPDM(). The only difference is that it doesn't then allow control over the ordering in the list, the aInsertAtBeginning variable being always set to the default of false:
  template <typename DECODER_MODULE, typename... ARGS>
  bool CreateAndStartupPDM(ARGS&&... aArgs) {
    return StartupPDM(DECODER_MODULE::Create(std::forward<ARGS>(aArgs)...));
  }
In our case we have a preference media_gecko_camera_codec_preferred() that's intended to control the ordering in the list, so we need to go with the first approach, which I think should look something like this:
  if (StaticPrefs::media_gecko_camera_codec_enabled()) {
    StartupPDM(GeckoCameraDecoderModule::Create(),
               StaticPrefs::media_gecko_camera_codec_preferred());
  }
Having made these changes I'm feeling pretty good about things, but on attempting a partial build just of the relevant files to save time, there are several more errors thrown up by the compiler.
${PROJECT}/gecko-dev/dom/media/platforms/gecko-camera/GeckoCameraVideoDecoder.h:
     In member function ‘virtual nsCString mozilla::GeckoCameraVideoDecoder::
    GetDescriptionName() const’:
${PROJECT}/gecko-dev/dom/media/platforms/gecko-camera/GeckoCameraVideoDecoder.h:
    48:12: error: ‘NS_LITERAL_CSTRING’ was not declared in this scope
     return NS_LITERAL_CSTRING(&quot;gecko-camera video decoder&quot;);
            ^~~~~~~~~~~~~~~~~~
[...]
${PROJECT}/gecko-dev/dom/media/platforms/gecko-camera/
    GeckoCameraVideoDecoder.cpp: In constructor  mozilla::
    GeckoCameraVideoDecoder::GeckoCameraVideoDecoder(gecko::codec::
    CodecManager*, const mozilla::CreateDecoderParams&)’:
${PROJECT}/gecko-dev/dom/media/platforms/gecko-camera/
    GeckoCameraVideoDecoder.cpp:31:26: error: ‘const struct mozilla::
    CreateDecoderParams’ has no member named ‘mTaskQueue’
       mTaskQueue(aParams.mTaskQueue),
                          ^~~~~~~~~~
${PROJECT}/gecko-dev/dom/media/platforms/gecko-camera/
    GeckoCameraVideoDecoder.cpp: In member function ‘virtual void mozilla::
    GeckoCameraVideoDecoder::onDecodedYCbCrFrame(const gecko::camera::
    YCbCrFrame*)’:
${PROJECT}/gecko-dev/dom/media/platforms/gecko-camera/
    GeckoCameraVideoDecoder.cpp:161:21: error: ‘struct mozilla::VideoData::
    YCbCrBuffer::Plane’ has no member named ‘mOffset’
   buffer.mPlanes[0].mOffset = 0;
                     ^~~~~~~
[...]
${PROJECT}/gecko-dev/dom/media/platforms/gecko-camera/
    GeckoCameraVideoDecoder.cpp:190:22: error: cannot bind rvalue reference of 
    type ‘RefPtr<mozilla::MediaData>&&’ to lvalue of type ‘RefPtr<mozilla::
    MediaData>’
   mReorderQueue.Push(data);
                      ^~~~
[...]
To fix the first of these I just have to use the new ESR 91 string-literal annotation, which has come up multiple times now. For the mTaskQueue error, checking the changes in other modules it becomes clear that this is no longer passed in as part of the mParams structure. Instead it now needs to be created directly in the constructor:
      mTaskQueue(new TaskQueue(
          GetMediaThreadPool(MediaThreadType::PLATFORM_DECODER), 
    &quot;GeckoCameraVideoDecoder&quot;)),
I thought maybe there'd be some change tot he destruction code as well, but I don't see anything, or indeed any other consequences from this change, so it's a pretty simple fix. The mOffset error is also a case of the element having been removed from one of the structures. Following the upstream changes that we can see in changeset D78320 it looks like it should be safe just to remove the lines causing the errors entirely.

Finally for the last of the errors, we now have to move the data object rather than copying it, in line with the changes shown in D78249. This leaves us with the following replacement line for adding the data to the queue:
  mReorderQueue.Push(std::move(data));
With these changes made the partial build goes through, so we should be ready for a successful full build. I've committed the changes and merged them in to the existing commit that applied patch 0089 "Add a video decoder based on gecko-camera", since this is the commit that added all of these Gecko Camera files to the build.

I've set the full build running and am excited to see first whether the build goes through and second whether this improves the audio and video support. With any luck I'll have an answer by morning.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
10 Aug 2024 : Day 315 #
I left the build running overnight after having applied an almighty tranche of audio and video patches yesterday. Twelve patches I applied in total. Fairly soon after the build had started, but in the middle of the night by the time I noticed, the build failed due to a Rust checksum error:
13:39.17 error: the listed checksum of `${PROJECT}/gecko-dev/third_party/rust/
    cubeb-sys/libcubeb/src/cubeb_pulse.c` has changed:
13:39.17 expected: 
    c20283e17cb8a893ea5e2588e051112462726e531d5f481169cff06d5254f7ec
13:39.17 actual:   
    d5d869d39e598870c64fbc256948d77e32b5ab69744862fae4b903f589d66878
13:39.17 directory sources are not intended to be edited, if modifications are 
    required then it is recommended that [replace] is used with a forked copy 
    of the source
13:39.18 make[4]: *** [${PROJECT}/gecko-dev/config/makefiles/rust.mk:405: 
    force-cargo-library-build] Error 101
After editing the third_party/rust/cubeb-sys/.cargo-checksum.json to change the checksum I then restarted the build. By the morning the build had finished, but not completed, having hit another — more serious — error further down the line:
96:11.26 In file included from Unified_cpp_dom_media_platforms0.cpp:11:
96:11.26 ${PROJECT}/gecko-dev/dom/media/platforms/PDMFactory.cpp: In member 
    function ‘void mozilla::PDMFactory::CreateDefaultPDMs()’:
96:11.26 ${PROJECT}/gecko-dev/dom/media/platforms/PDMFactory.cpp:606:5: error: 
    ‘m’ was not declared in this scope
96:11.26      m = new GeckoCameraDecoderModule();
96:11.26      ^
The change causing the error was made by applying patch 0089 "Add a video decoder based on gecko-camera". The issue here is that while m was a local variable defined at the top of the PDMFactory::CreatePDMs() method in the ESR 78 code, it's been removed completely from ESR 91. Here it is in the ESR 78 code:
void PDMFactory::CreatePDMs() {
  RefPtr<PlatformDecoderModule> m;
[...]
}
Here's the change as it's been applied by the patcher to the ESR 91 code:
@@ -594,6 +601,12 @@ void PDMFactory::CreateDefaultPDMs() {
                StaticPrefs::media_android_media_codec_preferred());
   }
 #endif
+#ifdef MOZ_EMBEDLITE
+  if (StaticPrefs::media_gecko_camera_codec_enabled()) {
+    m = new GeckoCameraDecoderModule();
+    StartupPDM(m, StaticPrefs::media_gecko_camera_codec_preferred());
+  }
+#endif
 
   CreateAndStartupPDM<AgnosticDecoderModule>();
 }
Although patch 0089 didn't apply cleanly, this particular change wasn't part of the conflicted code, so I'm not totally surprised that I missed it. Nevertheless now it's been highlighted it's clear why this would cause an error given m has disappeared. Luckily we have some related changes in the surrounding code that we can compare against. Take for example the code used to initialise the media codec module on Android. In ESR 78 it looked like this:
  if (StaticPrefs::media_android_media_codec_enabled()) {
    m = new AndroidDecoderModule();
    StartupPDM(m, StaticPrefs::media_android_media_codec_preferred());
  }
On ESR 91 this was changed to the following:
  if (StaticPrefs::media_android_media_codec_enabled()) {
    StartupPDM(AndroidDecoderModule::Create(),
               StaticPrefs::media_android_media_codec_preferred());
  }
There are similar changes for AppleMedia, FFmpeg, FFVPX, QNX and Windows. All that's happening is that, rather than capture the module instance in a temporary variable, it's being passed straight in to the StartupPDM() method as a parameter. I reckon we can do the same thing for our EmbedLite change. So after a bit of reworking the code, I now have this:
@@ -594,6 +601,12 @@ void PDMFactory::CreateDefaultPDMs() {
                StaticPrefs::media_android_media_codec_preferred());
   }
 #endif
+#ifdef MOZ_EMBEDLITE
+  if (StaticPrefs::media_gecko_camera_codec_enabled()) {
+    StartupPDM(GeckoCameraDecoderModule(),
+               StaticPrefs::media_gecko_camera_codec_preferred());
+  }
+#endif
 
   CreateAndStartupPDM<AgnosticDecoderModule>();
 }
Now I just have to do some careful git rebasing to ensure this change gets integrated into the correct patch. To do this I've added the changes to a new commit applied to HEAD. I then perform an interactive rebase that covers both this new patch and the original patch 0089 that I want to integrate my changes into. I then move the new commit to just after patch 0089 and set it to squash downwards on top of it.

The result is, I hope, a nice clean tree and a build that compiles. To find out whether we've achieved the latter I'm going to have to run the build, which will likely take the rest of the day, so I'm going to get that started straight away.

[...]

Unfortunately we reached the end of the day without a successful build. The following error was thrown up, resulting from changes to the way the PlatformDecoderModule subclasses are instantiated and managed.
116:46.05 In file included from Unified_cpp_dom_media_platforms0.cpp:11:
116:46.05 ${PROJECT}/gecko-dev/dom/media/platforms/PDMFactory.cpp: In member 
    function ‘void mozilla::PDMFactory::CreateDefaultPDMs()’:
116:46.05 ${PROJECT}/gecko-dev/dom/media/platforms/PDMFactory.cpp:607:65: error:
     no matching function for call to ‘mozilla::PDMFactory::StartupPDM(mozilla::
    GeckoCameraDecoderModule, mozilla::StripAtomic<mozilla::Atomic<bool, (
    mozilla::MemoryOrdering)0> >)’
116:46.05                 StaticPrefs::media_gecko_camera_codec_preferred());
116:46.05                                                                  ^
Unfortunately it's already late and I'm too tired to figure out how to fix it. I'm going to have to compare changes made between ESR 78 and ESR 91 for other decoder modules similar to ours. That means I'll have to pick this up in the morning to find out exactly how the code needs changing.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
9 Aug 2024 : Day 314 #
Today I'm changing tack but not changing gear. Over the last few days I've been working through issues that have come up when testing using Jolla's test pages. Now I'm going back to the testing list to try to tackle the final two items there: video and audio.

Although audio should — on the face of it — be the easiest, I'm going to start with video because it has a bigger impact and there's a chance that by fixing the video I'll fix the audio as well.

The plan of action is to find the appropriate patches, apply them and test the results. That sounds simple, but it's the application of the patches that worries me most: it's unlikely to be as smooth as I'd like it to be and if I have to start digging into individual lines of code, I'll quickly get out of my depth. So while I'm excited that we're approaching the end-game, I'm also a little nervous.

I think it's also worth mentioning that in a week's time we'll also have hit a full year of development in wall-time. That is, I started this task on the 16th August 2023. There have been a few breaks here and there, so by the time we get there I expect we'll be on day 321 rather than day 365. So it won't be a full year of work, but it will be a full rotation round the sun since this all started.
$ time gecko-dev

real    515520m0.002s
user    48984m0.002s
sys     20250m0.001s
Alright, it's time for some development. Today I'm going to attempt to apply video-related patches. Now that I've looked through them I can see there are quite a few. First the patches related to video:
  1. 0045 - "Prioritize GMP plugins over all others, and support decoding video for h264, vp8 & vp9"
  2. 0046 - "Force recycling of gmp-droid instances"
  3. 0049 - "Force use of mobile video controls"
  4. 0085 - "Fix video hardware accelaration not being used on first playback"
  5. 0089 - "Add a video decoder based on gecko-camera"
  6. 0094 - "Bug 1750760 Create ffmpeg59 module for ffmpeg5.0"
  7. 0095 - "Bug 1750760 Open libavcodec.so.59 library and bind ffmpeg 5.0"
  8. 0096 - "Bug 1750760 Update audio and video decoders to ffmpeg 5.0"
  9. 0097 - "Bug 1761471 [FFmpeg 5.0] Get frame color range and color space directly"
  10. 0098 - "Bug 1758948 [FFmpeg] Use AVFrame::pts instead of AVFrame::pkt_pts on ffmpeg 4.x"
There are also a couple of additional patches which relate specifically to audio:
  1. 0066 - "Ensure audio continues when screen is locked"
  2. 0084 - "Fix audio underruns for fullduplex mode"
I honestly don't know how easy (or hard) it's going to be to apply these patches. My suspicion is that I've not touched much of this code to make changes myself, which should reduce the likelihood of conflicts. On the other hand I've no idea the extent of any changes that may have been made to this code upstream. We'll just have to try it and see.

Sadly we're not off to a totally auspicious start.
$ git am --3way \
    ../rpm/0045-sailfishos-gecko-Prioritize-GMP-plugins-over-all-oth.patch
Applying: Prioritize GMP plugins over all others, and support decoding video 
    for h264, vp8 & vp9.
Using index info to reconstruct a base tree...
M       dom/media/platforms/PDMFactory.cpp
M       dom/media/platforms/agnostic/gmp/GMPDecoderModule.cpp
Falling back to patching base and 3-way merge...
Auto-merging dom/media/platforms/agnostic/gmp/GMPDecoderModule.cpp
Auto-merging dom/media/platforms/PDMFactory.cpp
CONFLICT (content): Merge conflict in dom/media/platforms/PDMFactory.cpp
error: Failed to merge in the changes.
Patch failed at 0001 Prioritize GMP plugins over all others, and support 
    decoding video for h264, vp8 & vp9.
[...]
Things have changed quite a bit in the code. What was previously all performed inside a single method like so:
void PDMFactory::CreatePDMs() {
  RefPtr<PlatformDecoderModule> m;
[...]
  if (StaticPrefs::media_rdd_process_enabled() &&
      BrowserTabsRemoteAutostart()) {
    m = new RemoteDecoderModule;
    StartupPDM(m);
  }

  if (StaticPrefs::media_gmp_decoder_enabled()) {
    m = new GMPDecoderModule();
    mGMPPDMFailedToStartup = !StartupPDM(m);
  } else {
    mGMPPDMFailedToStartup = false;
  }
[...]
}
Is now partitioned into multiple sub-methods like so:
void PDMFactory::CreatePDMs() {
[...]
  if (XRE_IsGPUProcess()) {
    CreateGpuPDMs();
  } else if (XRE_IsRDDProcess()) {
    CreateRddPDMs();
  } else if (XRE_IsContentProcess()) {
    CreateContentPDMs();
  } else {
    MOZ_DIAGNOSTIC_ASSERT(
        XRE_IsParentProcess(),
        &quot;PDMFactory is only usable in the Parent/GPU/RDD/Content 
    process&quot;);
    CreateDefaultPDMs();
  }
}
Thankfully the code that actually needs changing appears to have been moved into the CreateDefaultPDMs() without too many other alterations besides, so I've been able to manually apply what look to me to be equivalent changes. The next three patches fare better:
$ git am --3way \
    ../rpm/0046-sailfishos-gecko-Force-recycling-of-gmp-droid-instan.patch
Applying: Force recycling of gmp-droid instances. JB#51730
Using index info to reconstruct a base tree...
M       dom/media/platforms/agnostic/gmp/GMPVideoDecoder.h
Falling back to patching base and 3-way merge...
Auto-merging dom/media/platforms/agnostic/gmp/GMPVideoDecoder.h
$ git am --3way ../rpm/
    0049-sailfishos-gecko-Force-use-of-mobile-video-controls..patch
Applying: Force use of mobile video controls. JB#55484 OMP#JOLLA-371
$ git am --3way ../rpm/
    0085-sailfishos-gecko-dev-Fix-video-hardware-accelaration.patch
Applying: Fix video hardware accelaration not being used on first playback. 
    JB#56630 OMP#JOLLA-568
But after that things get a bit more complex:
$ git am --3way \
    ../rpm/0089-sailfishos-gecko-Add-a-video-decoder-based-on-gecko-.patch
Applying: Add a video decoder based on gecko-camera. JB#56755
Using index info to reconstruct a base tree...
M       dom/media/platforms/PDMFactory.cpp
M       dom/media/platforms/moz.build
M       modules/libpref/init/StaticPrefList.yaml
M       toolkit/moz.configure
Falling back to patching base and 3-way merge...
Auto-merging toolkit/moz.configure
CONFLICT (content): Merge conflict in toolkit/moz.configure
Auto-merging modules/libpref/init/StaticPrefList.yaml
Auto-merging dom/media/platforms/moz.build
CONFLICT (content): Merge conflict in dom/media/platforms/moz.build
Auto-merging dom/media/platforms/PDMFactory.cpp
error: Failed to merge in the changes.
Patch failed at 0001 Add a video decoder based on gecko-camera. JB#56755
Thankfully the two conflicts turn out to be pretty straightforward to fix. The first in toolkit/moz.configure turns out to be something I changed in an earlier commit:
$ git log -1 1c02a359c3687
commit 1c02a359c368750f7b03987d13a4de842db94616
Author: David Llewellyn-Jones <david@flypig.co.uk>
Date:   Wed Aug 9 23:30:20 2023 +0100

    Add --with-embedlite flag
    
    Adds the --with-embedlite flag so that we can use the flag in
    embedding/embedlite/config/mozconfig.merqtxulrunner.
Since the changes I made earlier are identical to the changes in this patch, I can just skip them now. The second conflict in dom/media/platforms/moz.build is due to the classic "single quotes switched for double quotes" in the build files. This has tripped me up a few times before. In ESR 78 strings were delineated in build files using single quotes. In ESR 91 they've all been changed so that they're now delineated using double quotes. This plays havoc with the patcher which can no longer match any lines that happen to contain strings.

Thankfully, fixing it manually is straightforward if a little laborious. So with these two conflicts addressed the patch is now applied. The next one I'm applying is an audio one, simply because I thought it'd be better to apply them in some semblance of the same order. Thankfully it applies okay:
$ git am --3way \
    ../rpm/0084-sailfishos-gecko-Fix-audio-underruns-for-fullduplex-.patch
Applying: Fix audio underruns for fullduplex mode. JB#55461
Using index info to reconstruct a base tree...
A       media/libcubeb/src/cubeb_pulse.c
Falling back to patching base and 3-way merge...
Auto-merging third_party/rust/cubeb-sys/libcubeb/src/cubeb_pulse.c
The next one is not so happy:
$ git am --3way \
    ../rpm/0094-Bug-1750760-Create-ffmpeg59-module-for-ffmpeg5.0-r-a.patch
Applying: Bug 1750760 Create ffmpeg59 module for ffmpeg5.0 r=alwu
Using index info to reconstruct a base tree...
M       dom/media/platforms/ffmpeg/moz.build
M       tools/rewriting/ThirdPartyPaths.txt
${PROJECT}/.git/modules/gecko-dev/rebase-apply/patch:582: new blank line at EOF.
+
warning: 1 line adds whitespace errors.
Falling back to patching base and 3-way merge...
Auto-merging tools/rewriting/ThirdPartyPaths.txt
Auto-merging dom/media/platforms/ffmpeg/moz.build
CONFLICT (content): Merge conflict in dom/media/platforms/ffmpeg/moz.build
error: Failed to merge in the changes.
Patch failed at 0001 Bug 1750760 Create ffmpeg59 module for ffmpeg5.0 r=alwu
It's just the one build file, ffmpeg/moz.build, with a conflict and the fix is so simple that I can't even figure out why the patcher had a problem with it. There seems to be a bit of a pendulum swing with these, with the next now applying fine.
$ git am --3way \
    ../rpm/0095-Bug-1750760-Open-libavcodec.so.59-library-and-bind-f.patch
Applying: Bug 1750760 Open libavcodec.so.59 library and bind ffmpeg 5.0 symbols 
    r=alwu
Using index info to reconstruct a base tree...
M       dom/media/platforms/ffmpeg/FFmpegLibWrapper.cpp
M       dom/media/platforms/ffmpeg/FFmpegRuntimeLinker.cpp
Falling back to patching base and 3-way merge...
Auto-merging dom/media/platforms/ffmpeg/FFmpegRuntimeLinker.cpp
Auto-merging dom/media/platforms/ffmpeg/FFmpegLibWrapper.cpp
But then problems with the one after:
$ git am --3way \
    ../rpm/0096-Bug-1750760-Update-audio-and-video-decoders-to-ffmpe.patch
Applying: Bug 1750760 Update audio and video decoders to ffmpeg 5.0 r=alwu
Using index info to reconstruct a base tree...
M       dom/media/platforms/ffmpeg/FFmpegAudioDecoder.cpp
M       dom/media/platforms/ffmpeg/FFmpegVideoDecoder.cpp
Falling back to patching base and 3-way merge...
Auto-merging dom/media/platforms/ffmpeg/FFmpegVideoDecoder.cpp
CONFLICT (content): Merge conflict in dom/media/platforms/ffmpeg/
    FFmpegVideoDecoder.cpp
Auto-merging dom/media/platforms/ffmpeg/FFmpegAudioDecoder.cpp
error: Failed to merge in the changes.
Patch failed at 0001 Bug 1750760 Update audio and video decoders to ffmpeg 5.0 
    r=
This time it's actually managed to get properly confused, but it's still easy to fix with some manual inspection. Now, as if only to prove the pendulum pattern wrong, the next one has issues as well:
$ git am --3way \
    ../rpm/0097-Bug-1761471-FFmpeg-5.0-Get-frame-color-range-and-col.patch
Applying: Bug 1761471 [FFmpeg 5.0] Get frame color range and color space 
    directly r=alwu
Using index info to reconstruct a base tree...
M       dom/media/platforms/ffmpeg/FFmpegLibWrapper.cpp
M       dom/media/platforms/ffmpeg/FFmpegVideoDecoder.cpp
M       dom/media/platforms/ffmpeg/FFmpegVideoDecoder.h
Falling back to patching base and 3-way merge...
Auto-merging dom/media/platforms/ffmpeg/FFmpegVideoDecoder.h
CONFLICT (content): Merge conflict in dom/media/platforms/ffmpeg/
    FFmpegVideoDecoder.h
Auto-merging dom/media/platforms/ffmpeg/FFmpegVideoDecoder.cpp
CONFLICT (content): Merge conflict in dom/media/platforms/ffmpeg/
    FFmpegVideoDecoder.cpp
Auto-merging dom/media/platforms/ffmpeg/FFmpegLibWrapper.cpp
error: Failed to merge in the changes.
Patch failed at 0001 Bug 1761471 [FFmpeg 5.0] Get frame color range and color 
    space directly r=alwu
In both cases the problem is simply that a couple of the methods have now been marked as const which prevents the patcher from identifying them as the same line. Easily solved. The next one applies without incident:
$ git am --3way ../rpm/
    0098-Bug-1758948-FFmpeg-Use-AVFrame-pts-instead-of-AVFram.patch
Applying: Bug 1758948 [FFmpeg] Use AVFrame::pts instead of AVFrame::pkt_pts on 
    ffmpeg 4.x r=alwu
That leaves just one more audio patch to go.
$ git am --3way \
    ../rpm/0066-sailfishos-media-Ensure-audio-continues-when-screen-.patch
Applying: Ensure audio continues when screen is locked. Contributes to JB#51747
Using index info to reconstruct a base tree...
M       dom/html/HTMLMediaElement.cpp
M       dom/media/MediaDecoder.cpp
Falling back to patching base and 3-way merge...
Auto-merging dom/media/MediaDecoder.cpp
Auto-merging dom/html/HTMLMediaElement.cpp
CONFLICT (content): Merge conflict in dom/html/HTMLMediaElement.cpp
error: Failed to merge in the changes.
Patch failed at 0001 Ensure audio continues when screen is locked. Contributes 
    to JB#51747
It's not a totally clean application but the reason turns out to be just some functionality that's been moved out of a method and inlined into a condition. Once again, the fix appears to be pretty clear and clean on manual inspection.

So that's all of them: ten video and two audio patches applied. The next step is to build things and see how that's affected the audio and video functionality of the browser. Because the build files have changed it's going to be a full overnight rebuild, so we'll have to see how it's turned out in the morning.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
8 Aug 2024 : Day 313 #
I've moved on to the fourth in my list of tasks related to Jolla's test pages today. Fixing the double-tap action:
  1. Single select widget.
  2. External links.
  3. Full screen.
  4. Double-tap.
Once I've completed this last of the tasks, I'll be able to tick off the "Everything on the browser test page" item in the browser functionality testing issue.

The problem with double-tap can be experienced through the double tap handling and zooming page. There are various elements on the page that should behave in different ways. In order to test them most effectively the "User-scalable" option at the top of the page should be set to "yes". This allows the page to be zoomed in and out.

Then double-tapping on the coloured boxes should cause the browser to zoom in on the element, or back out again depending on how its currently framed. This part of the functinality is working correctly.

There's a "Single click" button which, when tapped should light up. When double-tapped the browser should zoom in to its position. This is also working correctly.

Finally the "Double click" button at the end has slightly different properties. When single-tapped it should do nothing. When double-tapped it should light up. So no zooming in this case.

It's this last item which is behaving incorrectly. Currently if you double-tap on the item the page will zoom in on it and there will be no lighting up. So my task today is to figure out why it's acting like that and how to fix it so that double-tapping lights up the button without zooming.

The first thing I want to do is find out where the double-tap-to-zoom functionality lives in the code. In Mozilla parlance this sort of functionality comes under something that's sometimes refered to as "APZ", short for Asynchronous Panning and Zooming. The abbreviation APZC appears in the code a lot, meaning the Asynchronous Panning and Zooming Controller. For a long time I felt quite intimidated by these terms, until I realised they described a user experience rather than a set of technical capabilities. The aim is to make the browser feel responsive.

To find the appropraite APZ code we're going to start our journey in embedhelper.js since this is Sailfish-specific, lives close to the surface in the front-end code and ties in to both the double-tap and tap-to-zoom functionalities, as we can see in this code taken from the file:
  receiveMessage: function receiveMessage(aMessage) {
    switch (aMessage.name) {
[...]
      case &quot;Gesture:DoubleTap&quot;: {
        try {
          let [x, y] = [aMessage.json.x, aMessage.json.y];
          this._sendMouseEvent(&quot;mousemove&quot;, content, x, y);
          this._sendMouseEvent(&quot;mousedown&quot;, content, x, y, 2);
          this._sendMouseEvent(&quot;mouseup&quot;,   content, x, y);
        } catch(e) {
          Cu.reportError(e);
        }
        this._cancelTapHighlight();
        break;
      }
[...]
      case &quot;embedui:zoomToRect&quot;: {
        if (aMessage.data) {
          let winId = Services.embedlite.getIDByWindow(content);
          // This is a hackish way as zoomToRect does not work if x-value has 
    not changed or viewport has not been scaled (zoom animation).
          // Thus, we're missing animation when viewport has not been scaled.
          let scroll = this._viewportData && 
    this._viewportData.cssCompositedRect.width === aMessage.data.width;

          if (scroll) {
            content.scrollTo(aMessage.data.x, aMessage.data.y);
          } else {
            Services.embedlite.zoomToRect(winId, aMessage.data.x, 
    aMessage.data.y, aMessage.data.width, aMessage.data.height);
          }
        }
        break;
      }
      case &quot;embedui:scrollTo&quot;: {
        if (aMessage.data) {
            content.scrollTo(aMessage.data.x, aMessage.data.y);
        }
        break;
      }
I've added some code to the top of this function to display the message that's received, like this:
    dump(&quot;TAP: Message: &quot; + aMessage.name + &quot;\n&quot;);
The idea is that when I tap, long tap, double tap and so on, the actions should appear in the console output. In practice, only single tap seems to do anything:
[...]
[LWP 1025 exited]
[New LWP 7704]
TAP: Message: Gesture:SingleTap
[...]
Nevertheless this still gives me some useful material to go on because the code that's triggering this must also be sending out the Gesture:SingleTap message, so searching for this should bring me closer to the source of the problem. That brings me to the EmbedLiteViewChild::RecvHandleDoubleTap() method, which looks like this (in abridged form):
mozilla::ipc::IPCResult EmbedLiteViewChild::RecvHandleDoubleTap(const 
    LayoutDevicePoint &aPoint,
                                                                const Modifiers 
    &aModifiers,
                                                                const 
    ScrollableLayerGuid &aGuid,
                                                                const uint64_t 
    &aInputBlockId)
{
[...]
  // Check whether the element is interested in double clicks
  bool doubleclick = false;
  if (StaticPrefs::embedlite_azpc_json_doubletap()) {
[...]
    if (EventTarget* target = hittest.GetDOMEventTarget()) {
      if (nsCOMPtr<nsIContent> targetContent = do_QueryInterface(target)) {
        // Check if the element or any parent element has a double click handler
        for (Element* element = targetContent->GetAsElementOrParentElement();
             element && !doubleclick; element = element->GetParentElement()) {
          doubleclick = ElementSupportsDoubleClick(element);
        }
      }
    }
  }

  if (nsLayoutUtils::AllowZoomingForDocument(document) && !doubleclick) {
[...]
    if (APZCCallbackHelper::GetOrCreateScrollIdentifiers(
        document->GetDocumentElement(), &presShellId, &viewId)) {
      ZoomToRect(presShellId, viewId, zoomTarget);
    }
  } else {
[...]
    mHelper->DispatchMessageManagerMessage(u&quot;Gesture:DoubleTap&quot;_ns, 
    data);
  }

  return IPC_OK();
}
Since this is C++ code we can step through it using the debugger to find out what's going on:
Thread 10 &quot;GeckoWorkerThre&quot; hit Breakpoint 1, mozilla::embedlite::
    EmbedLiteViewChild::RecvHandleDoubleTap (this=0x7fb8d56d20, aPoint=..., 
    aModifiers=@0x7fde776594: 0, aGuid=..., aInputBlockId=@0x7fde7765a0: 14)
    at mobile/sailfishos/embedshared/EmbedLiteViewChild.cpp:966
966     {
(gdb) n
967       bool ok = false;
(gdb) n
969       NS_ENSURE_TRUE(ok, IPC_OK());
(gdb) n
971       nsIContent* content = nsLayoutUtils::FindContentFor(aGuid.mScrollId);
(gdb) n
972       NS_ENSURE_TRUE(content, IPC_OK());
(gdb) n
974       PresShell* presShell = APZCCallbackHelper::
    GetRootContentDocumentPresShellForContent(content);
(gdb) n
975       NS_ENSURE_TRUE(presShell, IPC_OK());
(gdb) n
977       RefPtr<Document> document = presShell->GetDocument();
(gdb) n
978       NS_ENSURE_TRUE(document && !document->Fullscreen(), IPC_OK());
(gdb) n
980       nsPoint offset;
(gdb) n
981       nsCOMPtr<nsIWidget> widget = mHelper->GetWidget(&offset);
(gdb) n
985       if (StaticPrefs::embedlite_azpc_json_doubletap()) {
(gdb) n
1006            presShell->GetPresContext()->CSSToDevPixelScale());
(gdb) 
What we see is that execution reaches the point where embedlite_azpc_json_doubletap() is called and then jumps straight past it. In other words embedlite_azpc_json_doubletap() must be returning false. Why is that a problem? Well, looking at the code, if this portion of code is skipped the doubleclick variable will always be set to false and then in the next condition it means ZoomToRect() will always be called instead of the code that dispatches the Gesture:DoubleTap message.

The call to embedlite_azpc_json_doubletap() is returning the value taken by the embedlite.azpc.json.doubletap static preference, so what we need is for this preference to be set to true.

Before going any further into this I can test it by setting the preference using about:config while the browser is running. Once it's set to true I find that the double-tap and tap-to-zoom both now work as expected, with the former working on the item that supports it, but the latter happening everywhere else.

So now I need to find out why embedlite.azpc.json.doubletap is set incorrectly. You may recall that I had to make changes to the way the preferences were working back on Day 97. That feels like a long time ago now! Back then I converted the preferences into static preferences, which meant stipulating them in the StaticPrefList.yaml file, along with their default values. Here's what I put in for the embedlite.azpc.json.doubletap preferences:
-   name: embedlite.azpc.json.doubletap
    type: bool
    value: false
    mirror: always
As you can see, the default value here is false. So the fix should be easy: I can just switch this default value to true. But I'd also like to understand why I got it wrong, and for this I'll need to dig back into my own changes from back in December last year. Here's the git blame for the relevant lines:
$ git blame modules/libpref/init/StaticPrefList.yaml -L 4015,4018
e035d6ff3a785 (David Llewellyn-Jones 2023-12-02 22:53:02 +0000 4015)
     -   name: embedlite.azpc.json.doubletap
e035d6ff3a785 (David Llewellyn-Jones 2023-12-02 22:53:02 +0000 4016)
         type: bool
e035d6ff3a785 (David Llewellyn-Jones 2023-12-02 22:53:02 +0000 4017)
         value: false
e035d6ff3a785 (David Llewellyn-Jones 2023-12-02 22:53:02 +0000 4018)
         mirror: always
If I check the log for the e035d6ff3a785 commit that introduced these lines, here's what it has to say:
$ git log -1 e035d6ff3a785
commit e035d6ff3a78575e6516025fac3b4e377db33133
Author: David Llewellyn-Jones <david@flypig.co.uk>
Date:   Sat Dec 2 22:53:02 2023 +0000

    Add embedlite static prefs
    
    Adds the various embedlite static preferences to StaticPrefList.yaml so
    that these will be availble for use as static pref variables in the C++
    code.
That's all very well, but this still doesn't tell me why I got the default value wrong. What was it set to in ESR 78? Well, the interesting thing is that the default value is actually stipulated twice in the ESR 78 code. First there's the code that sets the preference value in EmbedLiteViewChild.cpp. As we can see, this sets the default value to false, which is what I must have copied over when moving the preferences into StaticPrefList.yaml:
  Preferences::AddBoolVarCache(&sPostAZPCAsJson.doubleTap, 
    &quot;embedlite.azpc.json.doubletap&quot;, false);
However, it's also set in the embedding.js file, and in tis case it's set to true:
// AZPC overrides, see EmbedLiteViewChild.cpp
pref(&quot;embedlite.azpc.handle.viewport&quot;, true);
pref(&quot;embedlite.azpc.handle.singletap&quot;, false);
pref(&quot;embedlite.azpc.handle.longtap&quot;, false);
pref(&quot;embedlite.azpc.handle.scroll&quot;, true);
pref(&quot;embedlite.azpc.json.viewport&quot;, true);
pref(&quot;embedlite.azpc.json.singletap&quot;, true);
pref(&quot;embedlite.azpc.json.doubletap&quot;, true);
pref(&quot;embedlite.azpc.json.longtap&quot;, true);
pref(&quot;embedlite.azpc.json.scroll&quot;, false);
For some reason I must have assumed that the value in EmbedLiteViewChild.cpp would take precedence, where as in fact it's the other way around. There's another commit that relates to this which is also worth taking note of. This is the commit that lives in the Sailfish part of the gecko code, which has control over the embedding.js file. Here's the commit description:
$ git log -1 -S &quot;embedlite.azpc.json.doubletap&quot; embedding/embedlite/
    embedding.js
commit 720b4f6574e76abaec66712f8ec6aec59bc5dd1f
Author: David Llewellyn-Jones <david@flypig.co.uk>
Date:   Sat Dec 2 22:54:23 2023 +0000

    Convert AddBoolVarCache variables to static prefs
    
    The VarChache has been removed so all instances of AddBoolVarCache() had
    been removed and turned into fixed boolean values. These changes turn
    them into static prefs so that they are configurable again. Use of
    static preferences ensures that we can still have efficient access.
And here's the change, which highlights how the values were removed from the embedding.js file as well, due to them being moved into StaticPrefList.yaml:
$ git diff 720b4f6574e76a~ 720b4f6574e76a -- embedding/embedlite/embedding.js
diff --git a/embedding/embedlite/embedding.js b/embedding/embedlite/embedding.js
index 7853edece218..8156a1664be3 100644
--- a/embedding/embedlite/embedding.js
+++ b/embedding/embedlite/embedding.js
@@ -108,23 +108,6 @@ pref(&quot;media.gstreamer.enable-blacklist&quot;, false);
 // Disable X backend on GTK
 pref(&quot;gfx.xrender.enabled&quot;, false);
 
-// AZPC overrides, see EmbedLiteViewChild.cpp
-pref(&quot;embedlite.azpc.handle.viewport&quot;, true);
-pref(&quot;embedlite.azpc.handle.singletap&quot;, false);
-pref(&quot;embedlite.azpc.handle.longtap&quot;, false);
-pref(&quot;embedlite.azpc.handle.scroll&quot;, true);
-pref(&quot;embedlite.azpc.json.viewport&quot;, true);
-pref(&quot;embedlite.azpc.json.singletap&quot;, true);
-pref(&quot;embedlite.azpc.json.doubletap&quot;, true);
-pref(&quot;embedlite.azpc.json.longtap&quot;, true);
-pref(&quot;embedlite.azpc.json.scroll&quot;, false);
-
-// Make gecko compositor use GL context/surface provided by the application.
-pref(&quot;embedlite.compositor.external_gl_context&quot;, false);
-// Request the application to create GLContext for the compositor as
-// soon as the top level PuppetWidget is creted for the view. Setting
-// this pref only makes sense when using external compositor gl context.
-pref(&quot;embedlite.compositor.request_external_gl_context_early&quot;, 
    false);
 pref(&quot;extensions.update.enabled&quot;, false);
 pref(&quot;extensions.systemAddon.update.enabled&quot;, false);
So there we have it: mystery solved. There were two conflicting default settings in ESR 78 and I somehow managed to pick the wrong one. But with the correct value set, double-tap is now working correctly.

That means that all four of the functionalities that were broken on the Jolla test pages are now fixed and I can tick off this item in the list of browser functionality tests.

The list is looking pretty solid now. The two remaining tasks I need to tackle are audio and video decoding. I have patches from ESR 78 which, if they apply cleanly, might fix these really easily. But if the patches don't apply, these could be really challenging to fix because I really no very little about the area. But that will be something for me to worry about tomorrow!

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
7 Aug 2024 : Day 312 #
This morning the packages had completed their build so I'm currently transferring them over to my device ready to test. Yesterday I changed a single preference, the following:
security.external_protocol_requires_permission
Setting its default value in all.js from true to false. This morning I noticed another change I should have made. The lazy getter for this preference also has a default value:
XPCOMUtils.defineLazyPreferenceGetter(
  nsContentDispatchChooser,
  "isPermissionEnabled",
  "security.external_protocol_requires_permission",
  true
);
I should have set this to false as well. Luckily this is in the JavaScript code so I can safely make this change on-device before I do my testing.

The sharp-eyed amongst you may also have noticed that all.js is a JavaScript file as well, so why did I have to do a full rebuild of the packages for that one? That's a really good question with a fascinating answer.

For although all.js has a JavaScript extensions, and although the contents looks just like JavaScript, it is in fact not. We can see this by visiting the comments at the top of the file where we find the following:
// For the syntax used by this file, consult the comments at the top of
// modules/libpref/parser/src/lib.rs.
Okay, let's take the advice and check out the comments at the top of the libpref library source.
 
This crate implements a prefs file parser.
Pref files have the following grammar. Note that there are slight differences between the grammar for a default prefs files and a user prefs file.
<pref-file>   = <pref>*
<pref>        = <pref-spec> "(" <pref-name> "," <pref-value> <pref-attrs> ")" ";"
<pref-spec>   = "user_pref" | "pref" | "sticky_pref" // in default pref files
<pref-spec>   = "user_pref"                          // in user pref files
<pref-name>   = <string-literal>
<pref-value>  = <string-literal> | "true" | "false" | <int-value>
<int-value>   = <sign>? <int-literal>
<sign>        = "+" | "-"
<int-literal> = [0-9]+ (and cannot be followed by [A-Za-z_])
<string-literal> =
  A single or double-quoted string, with the following escape sequences
  allowed: \", \', \\, \n, \r, \xNN, \uNNNN, where \xNN gives a raw byte
  value that is copied directly into an 8-bit string value, and \uNNNN
  gives a UTF-16 code unit that is converted to UTF-8 before being copied
  into an 8-bit string value. \x00 and \u0000 are disallowed because they
  would cause C++ code handling such strings to misbehave.
<pref-attrs>  = ("," <pref-attr>)*      // in default pref files
              = <empty>                 // in user pref files
<pref-attr>   = "sticky" | "locked"     // default pref files only


The description of this syntax continues a bit further; I've cut it for brevity, but what I've included here gives enough of an idea. This highlights that the file has its own grammar that's separate from the JavaScript grammar, and while it's very similar to JavaScript, it's not exactly the same. For example a preference line in all.js is required to finish with a semicolon, whereas this is optional in JavaScript. Mozilla's documentation confirms the format is in fact different:
 
These files are not JavaScript; the .js suffix is present for historical reasons. They are read by a custom parser within libpref.

This is why I couldn't just change the value on the device for testing this. But that's a bit of an aside and during the time I've had to write about it the packages have copied over to my phone.

I've now installed the new packages, edited the ContentDispatchChooser.jsm file and packaged it up into omni.ja. I've also removed the preference from the prefs.js in the profile. Time to fire up the browser and test it out...

...And it works! Great, that's another thing sorted; time to move on to the next one:
  1. Single select widget.
  2. External links.
  3. Full screen.
  4. Double-tap.
We've been through the first two of these and they're both fixed. Time to move to the third: fullscreen. Fullscreen is useful in a variety of situations, such as when playing videos. The easiest way to test it is using Jolla's test pages. There are five pages related to fullscreen, but I'll probably focus on two: the short page and the long page each containing a fullscreen toggle.

Currently when I press on the "Go to fullscreen" button nothing happens. Neither does the browser go to fullscreen, nor is there any error output to the console. So right now it's all a bit mysterious. Time to investigate.

The first place to find out more is in the JavaScript of the page itself. There we find this function used for toggling the fullscreen mode:
function toggleFullScreen() {
    if (!listenerRegistered) {
        document.onmozfullscreenchange = updateState;
        listenerRegistered = true;
    }

    if (!document.mozFullScreenElement) {
        document.documentElement.mozRequestFullScreen();
    } else {
        if (document.mozFullScreen) {
            document.mozCancelFullScreen();
        }
    }
}
There seem to be three key document related attributes here:
  1. The onmozfullscreenchange property for determining the current state.
  2. The mozRequestFullScreen() method for entering fullscreen mode.
  3. The mozCancelFullScreen() method for exiting fullscreen mode.
That means we have somewhere to start by searching the code for some of these strings. The first place to search should always be the patches and that immediately throws up something that looks relevant: patch 0054 "Make fullscreen enabling work as used to with pref full-screen-api.content-only"

The patch describes itself like this:
 
We don't have chrome from doc shell point of view. This commit sha1 3116f3bf53df offends fullscreen API to work without chrome and shall not make root docShell act as chrome. We previously had "full-screen-api.content-only" pref set to "true".

Okay, that makes sense. The patch itself is really short; all it does is if-def out a small amount of code in nsGlobalWindowOuter.cpp:
+  // We don't have chrome from doc shell point of view. This commit
+  // sha1 3116f3bf53df offends fullscreen API to work without chrome
+  // and shall not make root docShell act as chrome. We previously had
+  // "full-screen-api.content-only" pref set to "true".
+#if 0
   // make sure we don't try to set full screen on a non-chrome window,
   // which might happen in embedding world
   if (mDocShell->ItemType() != nsIDocShellTreeItem::typeChrome)
     return NS_ERROR_FAILURE;
+#endif
Checking the current source code in ESR 91 I can see I've not yet applied this patch. So the obvious thing to do at this point would be to apply it, rebuild and test. On visual inspection I can't see any reason why the patch shouldn't apply cleanly.

And it couldn't have been cleaner:
$ git am --3way \
    ../rpm/0054-sailfishos-gecko-Make-fullscreen-enabling-work-as-us.patch
Applying: Make fullscreen enabling work as used to with pref 
    full-screen-api.content-only. Fixes JB#44129
I've kicked off a build, so now it's just a case of waiting until the build completes. In retrospect I could have run a partial build, but it's just about to be the start of my work day, which means an eight hour stretch when I can happily leave my laptop building. It should be complete by the time I return to it this evening.

[...]

The build completed and — great news — fullscreen now works correctly! It's funny how hard it is to tell in advance whether an issue is going to be easy or hard to fix. The apparently simplest thing can turn out to be a convoluted journey, even if the solution is simple. Something that looks equally challenging if not more so can turn out to be solved by just applying the existing patch. I realise I'm benefiting from someone else's hard work (in this case Raine's) when I do this, but it's a pleasant surprise for it to work so seamlessly.

Alright, that's the third of our four-item issue list completed. Next up is double-tap. If I recall correctly, the double-tap issue relates to double-tap-to-zoom. Zooming is working correctly, but for items that support double-tap we want the double-tap action to take precedence over double-tap-to-zoom. Currently it's the other way around, so this will need fixing.

That'll be my task for tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
6 Aug 2024 : Day 311 #
I'm continuing to try to get external content actions to work today. By that, I mean links that trigger external applications, such as "mailto" links which are supposed to open up an email for editing in an email client, "sms" links that are supposed to open a Message sending application, or "tel" links that are supposed to open the phone application.

On ESR 78 these all work flawlessly, but on ESR 91 they're currently broken. Overnight I ran a build that introduced two previous patches, patch 0056 "Use libcontentaction for custom scheme URI handling" and patch 0062 "Fix content action integration to work". These both make changes that are important for supporting these external content actions.

After installing the new packages this morning and testing the browser I find... no change. The external content links still don't work. But wait! Yesterday I was looking through the code in nsExternalHelperAppService.cpp and ContentDispatchChooser.jsm. The former calls the nsIHandlerInfo handlers while the latter implements them. During my testing I discovered that changes in ESR 91 have introduced a new check for a confirmation prompt before opening such a URL. We don't currently have a Sailfish implementation for these prompts, so the trigger is always failing, causing the handleURI() method to exit early.

After hacking around this check using the changes I made yesterday and trying again, the content action now works!

This is great news and all, and it's another moment where it feels like I'm stepping in to the light coming out of a dark tunnel. But it's not yet a solution. I still need to figure out whether the right approach is to hack around the failing prompt request, or implement it. Implementing it would be nicer for sure, but it'll really be a question of how much time and effort is involved, how big the changes are to implement it compared to working around it, and whether there are any other downsides to either approach.

So my task for today is to figure this out and implement the solution.

Looking through the code in ContentDispatchChooser.jsm it looks like there may be several ways to tackle this problem. Let's return to the code so I can explain a couple of the ways it could be done. Our end goal is to get to the point where launchWithURI() is called. As I've just established, now that the patches applied, once launchWithURI() is called everything else is now working correctly.

There are, in fact, two different ways that launchWithURI() can get called in the flow of execution. All of the following snippets of code will be taken from the handleURI() method in ContentDispatchChooser.jsm. First in this method comes this piece of logic:
    // Skip the dialog if a preferred application is set and the caller has
    // permission.
    if (
      callerHasPermission &&
      !aHandler.alwaysAskBeforeHandling &&
      (aHandler.preferredAction == Ci.nsIHandlerInfo.useHelperApp ||
        aHandler.preferredAction == Ci.nsIHandlerInfo.useSystemDefault)
    ) {
      try {
        aHandler.launchWithURI(aURI, aBrowsingContext);
      } catch (error) {
        // We are not supposed to ask, but when file not found the user most 
    likely
        // uninstalled the application which handles the uri so we will continue
        // by application chooser dialog.
        if (error.result == Cr.NS_ERROR_FILE_NOT_FOUND) {
          aHandler.alwaysAskBeforeHandling = true;
        } else {
          throw error;
        }
      }
    }
Currently this doesn't get executed because one of the disjunctive conditions that this is gated by resolves to false. A bit of testing shows that they are in fact all true apart from the callerHasPermission value. If this were set to true the entire condition would resolve to true and we'd enter the gated block, which would immediate make the required call to launchWithURK().

So one approach would be to get callerHasPermission set to true. Let's put this on our stack to return to later, so that we can cover the other options. But we will return to this value shortly.

Currently this condition is false and so nothing in the block is executed. So we move to the next part of the code which looks like this:
    let shouldOpenHandler = false;
    try {
      shouldOpenHandler = await this._prompt(
        aHandler,
        aPrincipal,
        callerHasPermission,
        aBrowsingContext
      );
    } catch (error) {
      Cu.reportError(error.message);
    }

    if (!shouldOpenHandler) {
      return;
    }

    // Site was granted permission and user chose to open application.
    // Launch the external handler.
    aHandler.launchWithURI(aURI, aBrowsingContext);
This bit of code immediately calls this._prompt() which returns a value of either true or false. If the value is false then this means the user doesn't want the external app opened and so the execution returns early. If it returns true on the other hand, the early return is skipped and execution continues right to the end of the method, including the launchWithURI() call that we're so interested in at the end.

So the process that happens inside this._prompt() is crucial for this branch of execution. The _prompt() method is pretty simple, but quite long, so rather than copy out the code here I'll just pick out sections and describe what happens instead. There are two main sections. The first checks whether the permission has been granted, as indicated by the aHasPermissions variable. If it has it does nothing. Otherwise a call is made to _openDialog() to open a window to request permissions.

The code waits for the dialog to return at which points it reads in the "granted" property to determine whether the permission was granted. If it wasn't the method returns early. If it was the value of the "resetHandlerChoice" property is stored in the resetHandlerChoice variable and the state of shouldOpenHandler is flipped from false to true.
  /**
   * Show permission or/and app chooser prompt.
   * @param {nsIHandlerInfo} aHandler - Info about protocol and handlers.
   * @param {nsIPrincipal} aPrincipal - Principal which triggered the load.
   * @param {boolean} aHasPermission - Whether the caller has permission to
   * open the protocol.
   * @param {BrowsingContext} [aBrowsingContext] - Context associated with the
   * protocol navigation.
   */
  async _prompt(aHandler, aPrincipal, aHasPermission, aBrowsingContext) {
    let shouldOpenHandler = false;
    let resetHandlerChoice = false;

    // If caller does not have permission, prompt the user.
    if (!aHasPermission) {
[...]
      await this._openDialog(
        DIALOG_URL_PERMISSION,
        {
          handler: aHandler,
          principal: aPrincipal,
          browsingContext: aBrowsingContext,
          outArgs,
          canPersistPermission,
          preferredHandlerName: this._getHandlerName(aHandler),
        },
        aBrowsingContext
      );
      if (!outArgs.getProperty(&quot;granted&quot;)) {
        // User denied request
        return false;
      }

      // Check if user wants to set a new application to handle the protocol.
      resetHandlerChoice = outArgs.getProperty(&quot;resetHandlerChoice&quot;);
[...]
      shouldOpenHandler = true;
    }
In the second segment we see the resetHandlerChoice coming up again. If it's true a dialog is opened that's intended to be used for choosing the application to use. Sailfish OS has its own mechanism for this, so if we were to implement these dialogues we'd probably want the first but not this second dialog.

In case this second dialog is opened the "openHandler" property is used to overwrite the value in shouldOpenHandler once the dialog has closed.
    // Prompt if the user needs to make a handler choice for the protocol.
    if (aHandler.alwaysAskBeforeHandling || resetHandlerChoice) {
      // User has not set a preferred application to handle this protocol 
    scheme.
      // Open the application chooser dialog
[...]
      await this._openDialog(
        DIALOG_URL_APP_CHOOSER,
        {
          handler: aHandler,
          outArgs,
          usePrivateBrowsing,
          enableButtonDelay: aHasPermission,
        },
        aBrowsingContext
      );

      shouldOpenHandler = outArgs.getProperty(&quot;openHandler&quot;);
[...]
    }

    return shouldOpenHandler;
  }
The shouldOpenHandler variable is then used as the return value for the method. That means that the value returned by _prompt() depends on how the user responds to the dialog choices presented.

We don't yet have an implementation for these dialogues, but we could consider it. But it would require quite a bit of work, because we'd need to add new functionality to the settings pages so that the values could be amended later. Given this, and since it's not functionality we currently provide, I'm going to leave this implementation for now. It would be fun to implement, but this release really doesn't need any more delay right now.

So instead I'm going to shift focus to the callerHasPermission from the early code inside the handleURI() method. The status of this variable controlled whether the prompt boxes were opened or skipped. We want the dialogues to be skipped, which means we want callerHasPermission to take the value true. So how is the value determined? By the following call:
    let callerHasPermission = this._hasProtocolHandlerPermission(
      aHandler.type,
      aPrincipal
    );
Let's take a look at the _hasProtocolHandlerPermission() method which is determining this. It can be found in the same file:
  /**
   * Test if a given principal has the open-protocol-handler permission for a
   * specific protocol.
   * @param {string} scheme - Scheme of the protocol.
   * @param {nsIPrincipal} aPrincipal - Principal to test for permission.
   * @returns {boolean} - true if permission is set, false otherwise.
   */
  _hasProtocolHandlerPermission(scheme, aPrincipal) {
    // Permission disabled by pref
    if (!nsContentDispatchChooser.isPermissionEnabled) {
      return true;
    }

    // If a handler is set to open externally by default we skip the dialog.
    if (
      Services.prefs.getBoolPref(
        &quot;network.protocol-handler.external.&quot; + scheme,
        false
      )
    ) {
      return true;
    }

    if (!aPrincipal) {
      return false;
    }

    if (aPrincipal.isAddonOrExpandedAddonPrincipal) {
      return true;
    }

    let key = this._getSkipProtoDialogPermissionKey(scheme);
    return (
      Services.perms.testPermissionFromPrincipal(aPrincipal, key) ===
      Services.perms.ALLOW_ACTION
    );
  }
There are a few different things happening here, but right at the top of this method we see it can be short-circuited if the value of isPermissionEnabled is set to true. And that value comes from this bit of code, also in the same file:
XPCOMUtils.defineLazyPreferenceGetter(
  nsContentDispatchChooser,
  &quot;isPermissionEnabled&quot;,
  &quot;security.external_protocol_requires_permission&quot;,
  true
);
This is a lazy getter for the following configuration preference:
security.external_protocol_requires_permission
When this preference is set the isPermissionEnabled variable will be set also. The default value is for it to be set, so we need to change this. Toggling it manually confirms that when unset, the mailto: and sms: links now work correctly. There are quite a few options at our disposal for having this unset. One would be to have it added to the prefs.js file in the profile folder like this:
[...]
user_pref(&quot;privacy.purge_trackers.date_in_cookie_database&quot;, 
    &quot;0&quot;);
user_pref(&quot;security.allow_disjointed_external_uri_loads&quot;, true);
user_pref(&quot;security.external_protocol_requires_permission&quot;, false);
[...]
But how to get it into this file? There are multiple options for this as well. For example it could be set in the WebEngineSettings::initialize() method where we set various other preferences. This is generally used for preferences that must be overwritten forcefully each time in order for the browser to run (so where we don't want the user to change them). But there are also preferences there which only get added to the file once when it's created. Another alternative would be to just add it to the prefs.js file that's in the sailfish-browser project. This is used as the base for the profile's prefs.js file when it's created.

The problem with both options is that they'll only take effect when the prefs.js is created for the first time. New users upgrading their browser won't the have the setting applied.

Typically we might add a one-shot for this, so that the preference gets set even when it's an upgrade, rather than a fresh install of the browser.

I could patch around the preference in the JavaScript we've just been looking at, but that would not only be ugly, it'd also mean the user couldn't then use the preference to disable external app launching.

Another option would be to flip the default value as it appears in the all.js file of gecko. This would be a convenient way not to have to worry about dealing with one-shots or upgrading preference files.

I've decided to give the last of these a go as the simplest option. It can be improved later if needed. Unfortunately this change will also require a full rebuild to test. So I've set it going and will see how it's got on in the morning.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
5 Aug 2024 : Day 310 #
The build was running overnight and by the morning had completed its work. Here's a diff of the changes I made yesterday in the hope of fixing the Select widget crash:
$ git diff HEAD~~ HEAD~
diff --git a/embedding/embedlite/config/mozconfig.merqtxulrunner b/embedding/
    embedlite/config/mozconfig.merqtxulrunner
index 2593d32fd5a9..045996645139 100644
--- a/embedding/embedlite/config/mozconfig.merqtxulrunner
+++ b/embedding/embedlite/config/mozconfig.merqtxulrunner
@@ -3,6 +3,7 @@ export CFLAGS=&quot;-O3 -I/usr/include/freetype2&quot;
 export CXXFLAGS=&quot;-O3 -I/usr/include/freetype2&quot;
 export MOZ_DEBUG_SYMBOLS=1
 export MOZILLA_OFFICIAL=1
+export MOZ_USE_NATIVE_POPUP_WINDOWS=1
 
 mk_add_options PROFILE_GEN_SCRIPT=@TOPSRCDIR@/build/profile_pageloader.pl
 
diff --git a/embedding/embedlite/confvars.sh b/embedding/embedlite/confvars.sh
index 63a9f9be6a7e..985a6dd55290 100755
--- a/embedding/embedlite/confvars.sh
+++ b/embedding/embedlite/confvars.sh
@@ -16,6 +16,5 @@ MOZ_SERVICES_COMMON=
 MOZ_SERVICES_CRYPTO=
 MOZ_SERVICES_SYNC=
 MOZ_MEDIA_NAVIGATOR=1
-MOZ_USE_NATIVE_POPUP_WINDOWS=1
 MOZ_SERVICES_HEALTHREPORT=
 MOZ_DISABLE_EXPORT_JS=1
This is literally just moving the MOZ_USE_NATIVE_POPUP_WINDOWS variable from one build file to another. The aim is for it to be picked up, not by old-configure as before, but by toolkit/moz.configure following changes upstream. My hope is that with this change the pre-processor define of the same name will now be set again.

To test this out, I've run a partial build with the same inserted errors that I used for testing yesterday. And yes, this time the "correct" error fires:
In file included from Unified_cpp_layout_forms0.cpp:29:
${PROJECT}/gecko-dev/layout/forms/nsComboboxControlFrame.cpp:1615:2: error: 
    #error MOZ_USE_NATIVE_POPUP_WINDOWS=1
 #error MOZ_USE_NATIVE_POPUP_WINDOWS=1
  ^~~~~
That looks good. Moreover testing the browser shows that the crash is now gone. Hooray! That means there'll be time today to work on something else. You may recall that I'm working my way through the following four issues:
  1. Single select widget.
  2. External links.
  3. Full screen.
  4. Double-tap.
With the first now solved that means it's now time to tackle the second: "external links". The issue here is best demonstrated through the use of Jolla's URL scheme handling test page. The links on this page do a variety of things. Some open new pages, but many are intended to open external applications instead. For example, selecting a mailto: link should open the email app. Selecting an sms: link should open the SMS messaging app. And so on.

Currently only the internal page links are working and my task for today is to find out why the others are broken. Although selecting a link to open an external app doesn't work, pressing and holding on one brings up a popup window with the correct option provided, and which can be used to open an external app. So things are at least working there, as these screenshots show.
 
Two screenshots. On the left the test page showing various links, including for sending emails, SMS links and phone links. On the right is the popup window that appears from pressing and holding on an SMS link.

So I'm thinking that looking at files related to the pop-up window might be a good place to start. So first up I'm going to look into the Util.js source file from embedlite-components where we see a bunch of URI used in various ways like this:
  /*
   * URIs and schemes
   */

  makeURI: function makeURI(aURL, aOriginCharset, aBaseURI) {
    return Services.io.newURI(aURL, aOriginCharset, aBaseURI);
  },

  makeURLAbsolute: function makeURLAbsolute(base, url) {
    // Note:  makeURI() will throw if url is not a valid URI
    return this.makeURI(url, null, this.makeURI(base)).spec;
  },

  isLocalScheme: function isLocalScheme(aURL) {
    return ((aURL.indexOf(&quot;about:&quot;) == 0 &&
             aURL != &quot;about:blank&quot; &&
             aURL != &quot;about:empty&quot; &&
             aURL != &quot;about:start&quot;) ||
            aURL.indexOf(&quot;chrome:&quot;) == 0);
  },

  isOpenableScheme: function isShareableScheme(aProtocol) {
    let dontOpen = /^(mailto|javascript|news|snews)$/;
    return (aProtocol && !dontOpen.test(aProtocol));
  },

  isShareableScheme: function isShareableScheme(aProtocol) {
    let dontShare = /^(chrome|about|file|javascript|resource)$/;
    return (aProtocol && !dontShare.test(aProtocol));
  },
[...]
Although a cursory skim of this file makes it look potentially relevant, I can't in fact find any direct link between this and the single-tap action. So I've moved on to the PopupOpener.qml file from embedlite-components. This file is the one that actually opens the pop we see in the screenshot.
        switch (topic) {
        case &quot;Content:ContextMenu&quot;: root._openContextMenu(data); 
    break;
        case &quot;embed:alert&quot;:         alert(data);    break;
        case &quot;embed:confirm&quot;:       confirm(data);  break;
        case &quot;embed:prompt&quot;:        prompt(data);   break;
        case &quot;embed:login&quot;:         login(data);    break;
        case &quot;embed:auth&quot;:          auth(data);     break;
        case &quot;embed:permissions&quot;: {
            if (data.title === &quot;geolocation&quot;) {
                permissions(data)
            } else {
                // Currently we don't support other permission requests.
                sendAsyncMessage(&quot;embedui:permissions&quot;, {
                    &quot;allow&quot;: false,
                    &quot;checkedDontAsk&quot;: false,
                    &quot;id&quot;: data.id
                })
            }
            break
        }
        case &quot;embed:webrtcrequest&quot;: webrtc(data);   break;
        case &quot;embed:popupblocked&quot;:  blocked(data);  break;
        case &quot;embed:select&quot;:        selector(data); break;
        }
From the contents of this file, written in QML and JavaScript, we can see the pop up is triggered by the Content:ContextMenu message. This, in turn, is sent from the ContextMenuHandler.js file in embedlite-components:
  /*
   * _processPopupNode - Generate and send a Content:ContextMenu message
   * to browser detailing the underlying content types at this.popupNode.
   * Note the event we receive targets the sub frame (if there is one) of
   * the page.
   */
  _processPopupNode: function _processPopupNode(aPopupNode, aX, aY, aInputSrc) {
    if (!aPopupNode)
      return;

    let { targetWindow: targetWindow,
          offsetX: offsetX,
          offsetY: offsetY } =
      Util.translateToTopLevelWindow(aPopupNode);

    let popupNode = this.popupNode = aPopupNode;
    let imageUrl = &quot;&quot;;

    let state = {
      types: [],
      label: &quot;&quot;,
      linkURL: &quot;&quot;,
      linkTitle: &quot;&quot;,
      linkProtocol: null,
      mediaURL: &quot;&quot;,
      contentType: &quot;&quot;,
      contentDisposition: &quot;&quot;,
      string: &quot;&quot;,
      visualViewport: {
          offsetLeft: 0,
          offsetTop: 0
      }
    };
[...]

    sendAsyncMessage(&quot;Content:ContextMenu&quot;, state);
  },
This looks promising, so worth digging a little deeper into. The ContextMenuHandler code is called indirectly by the event handler code in the same file:
  handleEvent: function ch_handleEvent(aEvent) {
    switch (aEvent.type) {
      case &quot;contextmenu&quot;:
        this._onContentContextMenu(aEvent);
        break;
      case &quot;pagehide&quot;:
        this.reset();
        break;
    }
  },
So now we're looking for the code that sends out the contextmenu event. This turns out to happen in the embedhelper.js file:
  _sendContextMenuEvent: function _sendContextMenuEvent(aElement, aX, aY) {
    let window = aElement.ownerDocument.defaultView;
    try {
      let cwu = window.windowUtils;
      cwu.sendMouseEventToWindow(&quot;contextmenu&quot;, aX, aY, 2, 1, 0, 
    false);
    } catch(e) {
      Cu.reportError(e);
    }
  },
The embedhelper.js file is another good place to dig further into. Not only is it responsible for opening up the context menu on long-press, it's also responsible for handling touch events on the page. That means it might tell us something about what happens when you select a link, not with a long press, but with a tap:
  receiveMessage: function receiveMessage(aMessage) {
    switch (aMessage.name) {
      case &quot;Gesture:ContextMenuSynth&quot;: {
        let [x, y] = [aMessage.json.x, aMessage.json.y];
        let element = this._touchElement;
        this._sendContextMenuEvent(element, x, y);
        break;
      }
      case &quot;Gesture:SingleTap&quot;: {
        if (SelectionHandler.isActive) {
            SelectionHandler._onSelectionCopy({xPos: aMessage.json.x, yPos: 
    aMessage.json.y});
        }

        try {
          let [x, y] = [aMessage.json.x, aMessage.json.y];
          this._sendMouseEvent(&quot;mousemove&quot;, content, x, y);
          this._sendMouseEvent(&quot;mousedown&quot;, content, x, y);
          this._sendMouseEvent(&quot;mouseup&quot;,   content, x, y);
        } catch(e) {
          Cu.reportError(e);
        }

        if (this._touchEventDefaultPrevented) {
          this._touchEventDefaultPrevented = false;
        } else {
          let uri = this._getLinkURI(this._touchElement);
          if (uri && (uri instanceof Ci.nsIURI)) {
            try {
              let winId = Services.embedlite.getIDByWindow(content);
              Services.embedlite.sendAsyncMessage(winId, &quot;embed:
    linkclicked&quot;,
                                                  JSON.stringify({
                                                                   
    &quot;uri&quot;: uri.asciiSpec
                                                                 }));
            } catch (e) {
              Logger.warn(&quot;embedhelper: sending async message 
    failed&quot;, e)
            }
          }
          this._touchElement = null;
        }
        break;
      }
Well, that looked like it was going to help, but after all that digging I can't see where it goes from there. However, as I was digging through all these files I did notice a few names repeatedly appearing that looked relevant. So this hasn't been entirely wasted effort. In particular I notice the phrase "protocol handler" coming up frequently. Searching through the entire project for this I find some more code that looks like it's relevant to what we need. There's even an interface defined in the nsIExternalProtocolService.idl file with the following method signature:
  /**
   * Retrieve the handler for the given protocol.  If neither the application
   * nor the OS knows about a handler for the protocol, the object this method
   * returns will represent a default handler for unknown content.
   *
   * @param aProtocolScheme the scheme from a URL: http, ftp, mailto, etc.
   *
   * Note: aProtocolScheme should not include a trailing colon, which is part
   * of the URI syntax, not part of the scheme itself (i.e. pass 
    &quot;mailto&quot; not
   * &quot;mailto:&quot;).
   *
   * @return the handler, if any; otherwise a default handler
   */
  nsIHandlerInfo getProtocolHandlerInfo(in ACString aProtocolScheme);
This is just an abstract interface and what I really what to find is the code that implements it. Searching for this throws up lots of uses, but distinguishing which concrete implementation we want doesn't turn out to be so easy. So I've stuck a breakpoint on it and executed the code to find out. Here's the backtrace from when one the methods is hit after selecting one of the links on the test page. This is from ESR 91:
Thread 10 &quot;GeckoWorkerThre&quot; hit Breakpoint 1, 
    nsExternalHelperAppService::GetProtocolHandlerInfo (this=0x7fb8228270, 
    aScheme=..., 
    aHandlerInfo=0x7fde7d6530)
    at uriloader/exthandler/nsExternalHelperAppService.cpp:1179
1179        const nsACString& aScheme, nsIHandlerInfo** aHandlerInfo) {
(gdb) bt
#0  nsExternalHelperAppService::GetProtocolHandlerInfo (this=0x7fb8228270, 
    aScheme=..., aHandlerInfo=0x7fde7d6530)
    at uriloader/exthandler/nsExternalHelperAppService.cpp:1179
#1  0x0000007ff225c364 in nsExternalHelperAppService::LoadURI (
    this=0x7fb8228270, aURI=<optimized out>, aTriggeringPrincipal=0x7fb90c2f30, 
    aRedirectPrincipal=0x0, aBrowsingContext=0x7fb82237d0, 
    aTriggeredExternally=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:1363
#2  0x0000007ff493bf90 in nsDocShell::OnLinkClickSync (this=0x7fb8abd300, 
    aContent=0x7fb91454b0, aLoadState=0x7efc013a30, 
    aNoOpenerImplied=<optimized out>, aTriggeringPrincipal=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:289
#3  0x0000007ff493c7e4 in OnLinkClickEvent::Run (this=0x7fb8960710)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:859
#4  0x0000007ff19a3d00 in mozilla::RunnableTask::Run (this=0x555634ebd0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
[...]
#26 0x0000007fef54889c in ?? () from /lib64/libc.so.6
(gdb) 
Although GetProtocolHandlerInfo() is the one with the breakpoint on, to my mind the interesting method is going to be the LoadURI() call that calls it in frame #1. Let's step through the code in this calling method to find out what it's doing; first on ESR 91:
Thread 10 &quot;GeckoWorkerThre&quot; hit Breakpoint 3, 
    nsExternalHelperAppService::LoadURI (this=0x7fb8228270, aURI=0x7fb90776b0,
    aTriggeringPrincipal=0x7fb90c2f30, aRedirectPrincipal=0x0, 
    aBrowsingContext=0x7fb82237d0, aTriggeredExternally=false)
    at uriloader/exthandler/nsExternalHelperAppService.cpp:
1014
1014                                        bool aTriggeredExternally) {
(gdb) n
1015      NS_ENSURE_ARG_POINTER(aURI);
(gdb) n
1017      if (XRE_IsContentProcess()) {
(gdb) n
1024      nsCOMPtr<nsIURI> escapedURI;
(gdb) n
1028      nsAutoCString scheme;
(gdb) n
1029      escapedURI->GetScheme(scheme);
(gdb) n
1030      if (scheme.IsEmpty()) return NS_OK;  // must have a scheme
(gdb) n
1033      nsAutoCString externalPref(kExternalProtocolPrefPrefix);
(gdb) n
1034      externalPref += scheme;
(gdb) n
1035      bool allowLoad = false;
(gdb) n
1038        if (NS_FAILED(
(gdb) n
1044      if (!allowLoad) {
(gdb) p allowLoad
$1 = true
(gdb) n
1053      if (aBrowsingContext && aTriggeringPrincipal &&
(gdb) n
1054          !StaticPrefs::security_allow_disjointed_external_uri_loads() &&
(gdb) n
434     in ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/BasePrincipal.h
(gdb) n
1059        RefPtr<BrowsingContext> bc = aBrowsingContext;
(gdb) n
1060        WindowGlobalParent* wgp = bc->Canonical()->GetCurrentWindowGlobal();
(gdb) n
1065        if (bc->IsTop() && !bc->HadOriginalOpener() && wgp) {
(gdb) n
1066          RefPtr<nsIURI> uri = wgp->GetDocumentURI();
(gdb) n
1068              uri && uri->GetSpecOrDefault().EqualsLiteral(&quot;about:
    blank&quot;);
(gdb) n
1066          RefPtr<nsIURI> uri = wgp->GetDocumentURI();
(gdb) n
1071        while (!foundAccessibleFrame) {
(gdb) n
1072          if (wgp) {
(gdb) n
1074                aTriggeringPrincipal->Subsumes(wgp->DocumentPrincipal());
(gdb) n
1078          BrowsingContext* parent = bc->GetParent();
(gdb) n
1079          if (!parent) {
(gdb) p parent
$3 = (mozilla::dom::BrowsingContext *) 0x0
(gdb) n
1086        if (!foundAccessibleFrame) {
(gdb) n
1099        if (!foundAccessibleFrame) {
(gdb) n
1059        RefPtr<BrowsingContext> bc = aBrowsingContext;
(gdb) n
1104      nsCOMPtr<nsIHandlerInfo> handler;
(gdb) n
1109          do_CreateInstance(&quot;@mozilla.org/
    content-dispatch-chooser;1&quot;, &rv);
(gdb) n
1110      NS_ENSURE_SUCCESS(rv, rv);
(gdb) n
1112      return chooser->HandleURI(
(gdb) n
1109          do_CreateInstance(&quot;@mozilla.org/
    content-dispatch-chooser;1&quot;, &rv);
(gdb)
Nothing looks particularly unreasonable here. The code executes all the way to the end of the method and calls HandleURI() at the end. There's no early return and the handler seems to be picked up correctly. Let's compare this against ESR 78 to see if there are any differences:
Thread 10 &quot;GeckoWorkerThre&quot; hit Breakpoint 2, 
    nsExternalHelperAppService::LoadURI (this=0x7f80b49c50, aURI=0x7f80b6bc30, 
    aTriggeringPrincipal=0x7f80ef4ba0, aBrowsingContext=0x7f80bf24c0)
    at uriloader/exthandler/nsExternalHelperAppService.cpp:936
936                                         BrowsingContext* aBrowsingContext) {
(gdb) n
937       NS_ENSURE_ARG_POINTER(aURI);
(gdb) n
939       if (XRE_IsContentProcess()) {
(gdb) n
948       if (spec.Find(&quot;%00&quot;) != -1) return NS_ERROR_MALFORMED_URI;
(gdb) n
950       spec.ReplaceSubstring(&quot;\&quot;&quot;, &quot;%22&quot;);
(gdb) n
951       spec.ReplaceSubstring(&quot;`&quot;, &quot;%60&quot;);
(gdb) n
953       nsCOMPtr<nsIIOService> ios(do_GetIOService());
(gdb) n
954       nsCOMPtr<nsIURI> uri;
(gdb) n
955       nsresult rv = ios->NewURI(spec, nullptr, nullptr, getter_AddRefs(
    uri));
(gdb) n
959       uri->GetScheme(scheme);
(gdb) n
960       if (scheme.IsEmpty()) return NS_OK;  // must have a scheme
(gdb) n
963       nsAutoCString externalPref(kExternalProtocolPrefPrefix);
(gdb) n
964       externalPref += scheme;
(gdb) n
965       bool allowLoad = false;
(gdb) n
966       if (NS_FAILED(Preferences::GetBool(externalPref.get(), &allowLoad))) {
(gdb) n
968         if (NS_FAILED(
(gdb) n
974       if (!allowLoad) {
(gdb) p allowLoad
$1 = true
(gdb) n
983       if (aBrowsingContext && aTriggeringPrincipal &&
(gdb) n
984           !StaticPrefs::security_allow_disjointed_external_uri_loads() &&
(gdb) n
1013      handler->GetPreferredAction(&preferredAction);
(gdb) n
1015      handler->GetAlwaysAskBeforeHandling(&alwaysAsk);
(gdb) n
1019      if (!alwaysAsk && (preferredAction == nsIHandlerInfo::useHelperApp ||
(gdb) n
1021        rv = handler->LaunchWithURI(uri, aBrowsingContext);
(gdb) n
1025        if (rv != NS_ERROR_FILE_NOT_FOUND) {
(gdb) p rv
$3 = nsresult::NS_OK
(gdb) n
1008      nsCOMPtr<nsIHandlerInfo> handler;
(gdb) 
There are definite differences here, but honestly it's not clear whether these are due to changes in the code between the two versions, or differences in the state. What is interesting, however, is that when stepping over the LaunchWithURI() method on ESR 78 the external Messaging app opens for me to enter an SMS message. That means that we're definitely in the right ballpark of a location here: it must be the LaunchWithURI() call that's opening the external link. This gets called on ESR 78, but it's not getting called on ESR 91.

Despite the differences between the two pieces of code, one area that separates the two execution flows is that on ESR 78 the following condition is false and the block it encloses is skipped:
(aBrowsingContext && aTriggeringPrincipal &&
      !StaticPrefs::security_allow_disjointed_external_uri_loads() &&
      !aTriggeringPrincipal->IsSystemPrincipal())
On ESR 91 the condition is slightly different, but unlike on ESR 78 the condition is true and the associated block is entered into as a result. Here's the condition on ESR 91:
(aBrowsingContext && aTriggeringPrincipal &&
      !StaticPrefs::security_allow_disjointed_external_uri_loads() &&
      // Add-on principals are always allowed:
      !BasePrincipal::Cast(aTriggeringPrincipal)->AddonPolicy() &&
      // As is chrome code:
      !aTriggeringPrincipal->IsSystemPrincipal())
I'm going to try to find out why one is entered and the other isn't. So on ESR 78 I've executed the code right up to just prior to the condition being checked and now have execution paused. I can now try to check the values that make up this condition. Here's the result on ESR 78:
(gdb) p aBrowsingContext
$4 = (mozilla::dom::BrowsingContext *) 0x7f80bf24c0
(gdb) p aTriggeringPrincipal
$5 = (nsIPrincipal *) 0x7f80ef4ba0
(gdb) p StaticPrefs::security_allow_disjointed_external_uri_loads()
No symbol &quot;security_allow_disjointed_external_uri_loads&quot; in namespace 
    &quot;mozilla::StaticPrefs&quot;.
(gdb) p aTriggeringPrincipal->IsSystemPrincipal()
Cannot evaluate function -- may be inlined
(gdb) p aTriggeringPrincipal->mKind
There is no member or method named mKind.
(gdb) p ((BasePrincipal*)aTriggeringPrincipal)->mKind
$6 = mozilla::BasePrincipal::eContentPrincipal
(gdb) 
On ESR 78 this is going to convert our condition in to the following, which will clearly resolve to false:
(true && true &&
      !true &&
      !false)
On ESR 91 things are a little more complex:
(gdb) p aBrowsingContext
$4 = (mozilla::dom::BrowsingContext *) 0x7fb82237d0
(gdb) p aTriggeringPrincipal
$5 = (nsIPrincipal *) 0x7fb90c2f30
(gdb) p StaticPrefs::security_allow_disjointed_external_uri_loads()
No symbol &quot;security_allow_disjointed_external_uri_loads&quot; in namespace 
    &quot;mozilla::StaticPrefs&quot;.
(gdb) p ((ContentPrincipal*)aTriggeringPrincipal)->mAddon
$6 = {<mozilla::detail::MaybeStorage<mozilla::WeakPtr<mozilla::extensions::
    WebExtensionPolicy, (mozilla::detail::WeakPtrDestructorBehavior)0>, false>> 
    = {<mozilla::detail::MaybeStorageBase<mozilla::WeakPtr<mozilla::extensions::
    WebExtensionPolicy, (mozilla::detail::WeakPtrDestructorBehavior)0>, false>> 
    = {
      mStorage = {val = {mRef = {mRawPtr = 0x7fb90b0710}}}}, 
    mIsSome = 1 '\001'}, <mozilla::detail::Maybe_CopyMove_Enabler<mozilla::
    WeakPtr<mozilla::extensions::WebExtensionPolicy, (mozilla::detail::
    WeakPtrDestructorBehavior)0>, false, true, true>> = {<No data fields>}, <No 
    data fields>}
(gdb) p ((ContentPrincipal*)aTriggeringPrincipal)->mAddon.mStorage.val
$7 = {mRef = {mRawPtr = 0x7fb90b0710}}
(gdb) p ((ContentPrincipal*)aTriggeringPrincipal)->mAddon.mIsSome
$8 = 1 '\001'
(gdb) p ((BasePrincipal*)aTriggeringPrincipal)->mKind
$9 = mozilla::BasePrincipal::eContentPrincipal
(gdb) 
Fitting these values into the condition on ESR 91 gives us the following:
(true && true &&
      true &&
      // Add-on principals are always allowed:
      !BasePrincipal::Cast(aTriggeringPrincipal)->AddonPolicy() &&
      // As is chrome code:
      !false)
I wasn't able to cleanly determine the value of AddonPolicy() using the debugger, but since the condition is passing we know that it must return false in order for the overall condition to resolve to true (which it does).

This is a very clear difference, but in practice it doesn't seem to be the relevant part that we need. Even if this condition is entered in to the code still ends up executing to the end of the method where the HandleURI() method is called. And that's the bit that should be triggering the Messaging app to open.

Still, the investigation has led us to the relevant place, that much we know for sure. So we can change tack slightly and take a look at the history for this bit of code. It's changed quite a bit between ESR 78 and ESR 91 and it might be helpful to understand why. Thankfully we have the tools for this in the form of git blame and git diff. First I want to find out the relevant changes. There are quite a few and I have to go back through a bunch of commits that just perform reformatting of the code, but eventually I get to this point:
$ git blame 3e36bc378fca8~ uriloader/exthandler/nsExternalHelperAppService.cpp \
    -L 1110,1119
Blaming lines:   0% (10/3132), done.
ca6da5a72eddc (sdwilsh@shawnwilsher.com 2007-07-25 21:24:25 -0700 1110)   
    nsCOMPtr<nsIHandlerInfo> handler;
ca6da5a72eddc (sdwilsh@shawnwilsher.com 2007-07-25 21:24:25 -0700 1111)
    rv = GetProtocolHandlerInfo(scheme, getter_AddRefs(handler));
ca6da5a72eddc (sdwilsh@shawnwilsher.com 2007-07-25 21:24:25 -0700 1112)   
    NS_ENSURE_SUCCESS(rv, rv);
430d42c69d3d8 (dveditz%cruzio.com       2004-10-25 07:46:01 +0000 1113) 
ca6da5a72eddc (sdwilsh@shawnwilsher.com 2007-07-25 21:24:25 -0700 1114)   
    nsCOMPtr<nsIContentDispatchChooser> chooser =
265e6721798a4 (Sylvestre Ledru          2018-11-30 11:46:48 +0100 1115)       
    do_CreateInstance(&quot;@mozilla.org/content-dispatch-chooser;1&quot;, &rv);
ca6da5a72eddc (sdwilsh@shawnwilsher.com 2007-07-25 21:24:25 -0700 1116)   
    NS_ENSURE_SUCCESS(rv, rv);
265e6721798a4 (Sylvestre Ledru          2018-11-30 11:46:48 +0100 1117) 
84589d971b4d4 (pbz                      2020-10-29 13:43:46 +0000 1118)   
    return chooser->HandleURI(handler, uri, aTriggeringPrincipal,
8002a3c48cde6 (Gijs Kruitbosch          2021-02-22 19:00:10 +0000 1119)         
                        aBrowsingContext, aTriggeredExternally);
The relevant change is on the penultimate line here: commit 84589d971b4d4. Let's take a look at the diff to the file between the two versions (before and after the commit) in full:
$ git diff 84589d971b4d4~ 84589d971b4d4 uriloader/exthandler/
    nsExternalHelperAppService.cpp
diff --git a/uriloader/exthandler/nsExternalHelperAppService.cpp b/uriloader/
    exthandler/nsExternalHelperAppService.cpp
index 40b7a00bf957..59ace07f55af 100644
--- a/uriloader/exthandler/nsExternalHelperAppService.cpp
+++ b/uriloader/exthandler/nsExternalHelperAppService.cpp
@@ -1047,30 +1047,12 @@ nsExternalHelperAppService::LoadURI(nsIURI* aURI,
   rv = GetProtocolHandlerInfo(scheme, getter_AddRefs(handler));
   NS_ENSURE_SUCCESS(rv, rv);
 
-  nsHandlerInfoAction preferredAction;
-  handler->GetPreferredAction(&preferredAction);
-  bool alwaysAsk = true;
-  handler->GetAlwaysAskBeforeHandling(&alwaysAsk);
-
-  // if we are not supposed to ask, and the preferred action is to use
-  // a helper app or the system default, we just launch the URI.
-  if (!alwaysAsk && (preferredAction == nsIHandlerInfo::useHelperApp ||
-                     preferredAction == nsIHandlerInfo::useSystemDefault)) {
-    rv = handler->LaunchWithURI(uri, aBrowsingContext);
-    // We are not supposed to ask, but when file not found the user most likely
-    // uninstalled the application which handles the uri so we will continue
-    // by application chooser dialog.
-    if (rv != NS_ERROR_FILE_NOT_FOUND) {
-      return rv;
-    }
-  }
-
   nsCOMPtr<nsIContentDispatchChooser> chooser =
       do_CreateInstance(&quot;@mozilla.org/content-dispatch-chooser;1&quot;, 
    &rv);
   NS_ENSURE_SUCCESS(rv, rv);
 
-  return chooser->Ask(handler, uri, aTriggeringPrincipal, aBrowsingContext,
-                      nsIContentDispatchChooser::REASON_CANNOT_HANDLE);
+  return chooser->HandleURI(handler, uri, aTriggeringPrincipal,
+                            aBrowsingContext);
 }
 
 ///////////////////////////////////////////////////////////////////////////////
    ///////////////////////
Now we're getting somewhere. This is where the code switched from the version in ESR 78 to the one we now see in ESR 91. As this shows, the LaunchWithURI() method that's doing the work on ESR 78 has been removed in the ESR 91 version. What's the reason for this? Let's check the commit message to find out.
$ git log -1 84589d971b4d4
commit 84589d971b4d419492d131a70874e5b92e3d5463
Author: pbz <pbz@mozilla.com>
Date:   Thu Oct 29 13:43:46 2020 +0000

    Bug 1565574 - Added permission required to open external protocol handlers. 
    r=Gijs
    
    - Added pref to toggle permission feature
    - Updated ContentDispatchChooser to check for permission and  manage a 
    multi dialog flow.
    
    Differential Revision: https://phabricator.services.mozilla.com/D92945
So it seems there have been changes to support opening a dialog window, rather than directly opening the URLs in a separate application. I'm not going to paste it here, but it's worth having a look at the full changeset to see what's going on there. It shows how much of the code has been moved out of the C++ file and into the JavaScript ContentDispatchChooser.jsm file. We can even see that the call to launchWithURI() has been moved there.

To find out what's going on I've annotated the code with some lines for generating debug output. I apologies for the lengthy code dump here, but in order to understand how the debug output relates to the execution flow it's essential to see where I've placed the calls to dump() that are generating them. So here it is. This is the code form ESR 91, identical to the original apart from the additional debug output lines that I've added:
  /**
   * Prompt the user to open an external application.
   * If the triggering principal doesn't have permission to open apps for the
   * protocol of aURI, we show a permission prompt first.
   * If the caller has permission and a preferred handler is set, we skip the
   * dialogs and directly open the handler.
   * @param {nsIHandlerInfo} aHandler - Info about protocol and handlers.
   * @param {nsIURI} aURI - URI to be handled.
   * @param {nsIPrincipal} [aPrincipal] - Principal which triggered the load.
   * @param {BrowsingContext} [aBrowsingContext] - Context of the load.
   * @param {bool} [aTriggeredExternally] - Whether the load came from outside
   * this application.
   */  
  async handleURI(
    aHandler,
    aURI,
    aPrincipal,
    aBrowsingContext,
    aTriggeredExternally = false
  ) { 
    let callerHasPermission = this._hasProtocolHandlerPermission(
      aHandler.type,
      aPrincipal
    );
    dump(&quot;HANDLE: handleURI type: &quot; + aHandler.type + &quot;\n&quot;);

    // Force showing the dialog for links passed from outside the application.
    // This avoids infinite loops, see bug 1678255, bug 1667468, etc.
    if (
      aTriggeredExternally &&
      gPrefs.promptForExternal &&
      // ... unless we intend to open the link with a website or extension:
      !(  
        aHandler.preferredAction == Ci.nsIHandlerInfo.useHelperApp &&
        aHandler.preferredApplicationHandler instanceof Ci.nsIWebHandlerApp
      )   
    ) { 
      dump(&quot;HANDLE: always ask\n&quot;);
      aHandler.alwaysAskBeforeHandling = true;
    }   

    // Skip the dialog if a preferred application is set and the caller has 
    // permission.
    if (
      callerHasPermission &&
      !aHandler.alwaysAskBeforeHandling &&
      (aHandler.preferredAction == Ci.nsIHandlerInfo.useHelperApp ||
        aHandler.preferredAction == Ci.nsIHandlerInfo.useSystemDefault)
    ) { 
      dump(&quot;HANDLE: lanchWithURI internal\n&quot;);
      try {
        aHandler.launchWithURI(aURI, aBrowsingContext);
      } catch (error) {
        dump(&quot;HANDLE: error\n&quot;);
        // We are not supposed to ask, but when file not found the user most 
    likely
        // uninstalled the application which handles the uri so we will continue
        // by application chooser dialog.
        if (error.result == Cr.NS_ERROR_FILE_NOT_FOUND) {
          aHandler.alwaysAskBeforeHandling = true;
        } else {
          throw error;
        }   
      }   
    }   

    // We will show a prompt, record telemetry.
    try {
      dump(&quot;HANDLE: show prompt\n&quot;);
      ContentDispatchChooserTelemetry.recordTelemetry(
        aHandler.type,
        aBrowsingContext,
        aPrincipal
      );  
    } catch (error) {
      dump(&quot;HANDLE: report error show prompt\n&quot;);
      Cu.reportError(error);
    }   

    let shouldOpenHandler = false;
    try {
      shouldOpenHandler = await this._prompt(
        aHandler,
        aPrincipal,
        callerHasPermission,
        aBrowsingContext
      );  
    } catch (error) {
      dump(&quot;HANDLE: report error prompt\n&quot;);
      Cu.reportError(error.message);
    }   

    dump(&quot;HANDLE: shouldOpenHandler: &quot; + shouldOpenHandler + 
    &quot;\n&quot;);
    if (!shouldOpenHandler) {
      return;
    }

    dump(&quot;HANDLE: lanchWithURI end\n&quot;);
    // Site was granted permission and user chose to open application.
    // Launch the external handler.
    aHandler.launchWithURI(aURI, aBrowsingContext);
  }
Now when I execute the browser and press on the SMS link on the test page I get the following debug output appearing in the console:
HANDLE: handleURI type: sms
HANDLE: show prompt
HANDLE: report error prompt
HANDLE: shouldOpenHandler: false
This is really useful. It tells us that the code is reaching the condition on shouldOpenHandler. Since this is set to false the method returns early at that point, and hence the launchWithURI() call right at the end of the method is never reached. It looks like the code is calling a handler related to opening a popup, which the Sailfish browser isn't set up to provide yet. Because of this the call to this._prompt() is returning false. We want it to return true, but while I'm still testing things I can force it to return true to see whether this fixes things. This change will guarantee that the launchWithURI() call at the end of the method is executed.

And indeed it is, but rather than opening the SMS app it instead generates the following error:
JavaScript error: resource://gre/modules/ContentDispatchChooser.jsm, line 339: 
    NS_ERROR_ILLEGAL_VALUE: Component returned failure code: 0x80070057 (
    NS_ERROR_ILLEGAL_VALUE) [nsIHandlerInfo.launchWithURI]
How to fix this? I still need to fix the prompting issue, but I'm going to put that on hold until I've dealt with this error. The rabbit hole continues to draw me further in, this time with the focus on the aHandler object associated with the launchWithURI() method. Since this is being passed in at the top of the method and so is coming from the C++, it means I can use the debugger to tell us a little more about it. Here's what the handler object provides:
(gdb) ptype (nsIHandlerInfo*)handler.mRawPtr
type = class nsIHandlerInfo : public nsISupports {
  public:
    virtual nsresult GetType(nsACString &);
    virtual nsresult GetDescription(nsAString &);
    virtual nsresult SetDescription(const nsAString &);
    virtual nsresult GetPreferredApplicationHandler(nsIHandlerApp **);
    virtual nsresult SetPreferredApplicationHandler(nsIHandlerApp *);
    virtual nsresult GetPossibleApplicationHandlers(nsIMutableArray **);
    virtual nsresult GetHasDefaultHandler(bool *);
    virtual nsresult GetDefaultDescription(nsAString &);
    virtual nsresult LaunchWithURI(nsIURI *, mozilla::dom::BrowsingContext *);
    virtual nsresult GetPreferredAction(nsHandlerInfoAction *);
    virtual nsresult SetPreferredAction(nsHandlerInfoAction);
    virtual nsresult GetAlwaysAskBeforeHandling(bool *);
    virtual nsresult SetAlwaysAskBeforeHandling(bool);
} *
(gdb) 
Searching for some of these methods in the code throws up the fact that we have a couple of patches related to this. These are patch 0056 "Use libcontentaction for custom scheme uri handling" and patch 0062 "Fix content action integration to work". Both of these are substantial patches which clearly relate to how the external protocol handling is supported. So my thought at this point is that I should try to apply these patches to get the underlying content action code in place before working on the other issues I've discovered. Thankfully, the patches apply cleanly on first attempt, which is unexpected:
$ git am --3way \
    ../rpm/0056-sailfishos-gecko-Use-libcontentaction-for-custom-sch.patch
Applying: Use libcontentaction for custom scheme uri handling. JB#47892
Using index info to reconstruct a base tree...
M       old-configure.in
M       uriloader/exthandler/unix/nsOSHelperAppService.cpp
M       xpcom/io/moz.build
M       xpcom/io/nsLocalFileUnix.cpp
Falling back to patching base and 3-way merge...
Auto-merging xpcom/io/nsLocalFileUnix.cpp
Auto-merging xpcom/io/moz.build
Auto-merging uriloader/exthandler/unix/nsOSHelperAppService.cpp
Auto-merging old-configure.in
$ git am --3way \
    ../rpm/0062-sailfishos-contentaction-Fix-content-action-integrat.patch
Applying: Fix content action integration to work. Fixes JB#51235
Using index info to reconstruct a base tree...
M       uriloader/exthandler/moz.build
M       uriloader/exthandler/unix/nsOSHelperAppService.cpp
M       xpcom/io/nsLocalFileUnix.cpp
Falling back to patching base and 3-way merge...
Auto-merging xpcom/io/nsLocalFileUnix.cpp
Auto-merging uriloader/exthandler/unix/nsOSHelperAppService.cpp
Auto-merging uriloader/exthandler/moz.build
There are significant changes to the C++ code caused by applying these patches, so to test them I'll need to rebuild the library. As always, that means this is the most suitable place to stop for today. It'll take until tomorrow for the rebuild to complete at which point I can run the same tests again to see if anything has changed. That'll be my task for tomorrow morning.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
4 Aug 2024 : Day 309 #
Yesterday turned out to be an unexpectedly fruitful day. I was able to determine the point at which the ESR 78 and ESR 91 execution flows diverged when the user presses the selection widget. ESR 78 goes one way while ESR 91 goes the other. The former works and the latter crashes.

Things often aren't this clear cut, but I'm very much hoping that in this particular case it'll be as simple as this. If this turns out to be true we'll be able to trace the difference back to a change in the code on ESR 91 that should be more like the code on ESR 78.

As we left things yesterday, I'd just stepped through nsComboboxControlFrame::ShowDropDown() and determined that the value for mDelayedShowDropDown is being set to true because the following condition is resolving to false:
(!fm || fm->GetFocusedElement() == GetContent())
So what happens when we step through nsComboboxControlFrame::ShowDropDown() on ESR 78? The answer is: "we don't!" Because the method is never called on ESR 78. So we're going to have to go a bit further down this rabbit hole. Here's the backtrace for the call on ESR 91:
Thread 10 &quot;GeckoWorkerThre&quot; hit Breakpoint 3, nsComboboxControlFrame::
    ShowDropDown (this=0x7fb9103c28, aDoDropDown=aDoDropDown@entry=true)
    at layout/forms/nsComboboxControlFrame.cpp:890
890     void nsComboboxControlFrame::ShowDropDown(bool aDoDropDown) {
(gdb) bt
#0  nsComboboxControlFrame::ShowDropDown (this=0x7fb9103c28, 
    aDoDropDown=aDoDropDown@entry=true)
    at layout/forms/nsComboboxControlFrame.cpp:890
#1  0x0000007ff41a3d94 in nsListControlFrame::MouseDown (this=0x7fb9103d88, 
    aMouseEvent=aMouseEvent@entry=0x7fb91ec9c0)
    at layout/forms/nsListControlFrame.cpp:1718
#2  0x0000007ff41a53bc in nsListEventListener::HandleEvent (this=0x7fb90ffb50, 
    aEvent=0x7fb91ec9c0)
    at layout/forms/nsListControlFrame.cpp:2359
#3  0x0000007ff33d7330 in mozilla::EventListenerManager::HandleEventSubType (
    this=this@entry=0x7fb912ea40, aListener=<optimized out>,
    aListener@entry=0x7fb912eb08, aDOMEvent=0x7fb91ec9c0, 
    aCurrentTarget=<optimized out>, aCurrentTarget@entry=0x7fb90ec570)
    at dom/events/EventListenerManager.cpp:1118
#4  0x0000007ff33da7f8 in mozilla::EventListenerManager::HandleEventInternal (
    this=0x7fb912ea40, aPresContext=0x7fb8fe4a70, aEvent=0x7fde7d4d88,
    aDOMEvent=aDOMEvent@entry=0x7fde7d4810, aCurrentTarget=<optimized out>, 
    aEventStatus=aEventStatus@entry=0x7fde7d4818,
    aItemInShadowTree=<optimized out>)
    at dom/events/EventListenerManager.cpp:1309
#5  0x0000007ff33daf0c in mozilla::EventListenerManager::HandleEvent (
    aItemInShadowTree=<optimized out>, aEventStatus=0x7fde7d4818,
    aCurrentTarget=<optimized out>, aDOMEvent=0x7fde7d4810, aEvent=<optimized 
    out>, aPresContext=<optimized out>, this=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/
    EventListenerManager.h:390
#6  mozilla::EventTargetChainItem::HandleEvent (this=this@entry=0x7fb8c9a088, 
    aVisitor=..., aCd=...)
    at dom/events/EventDispatcher.cpp:348
#7  0x0000007ff33db5b0 in mozilla::EventTargetChainItem::HandleEventTargetChain 
    (aChain=..., aVisitor=..., aCallback=aCallback@entry=0x7fde7d4aa8,
    aCd=...)
    at dom/events/EventDispatcher.cpp:550
#8  0x0000007ff33db954 in mozilla::EventTargetChainItem::HandleEventTargetChain 
    (aChain=..., aVisitor=..., aCallback=aCallback@entry=0x7fde7d4aa8,
    aCd=...)
    at dom/events/EventDispatcher.cpp:630
#9  0x0000007ff33dc93c in mozilla::EventDispatcher::Dispatch (
    aTarget=<optimized out>, aPresContext=aPresContext@entry=0x7fb8fe4a70,
    aEvent=aEvent@entry=0x7fde7d4d88, aDOMEvent=aDOMEvent@entry=0x0, 
    aEventStatus=aEventStatus@entry=0x7fde7d4d6c,
    aCallback=aCallback@entry=0x7fde7d4aa8, aTargets=aTargets@entry=0x0)
    at dom/events/EventDispatcher.cpp:1082
#10 0x0000007ff401dccc in mozilla::PresShell::EventHandler::DispatchEventToDOM (
    this=this@entry=0x7fde7d4bd8, aEvent=aEvent@entry=0x7fde7d4d88,
    aEventStatus=aEventStatus@entry=0x7fde7d4d6c, 
    aEventCB=aEventCB@entry=0x7fde7d4aa8)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:859
#11 0x0000007ff401e7a0 in mozilla::PresShell::EventHandler::DispatchEvent (
    this=this@entry=0x7fde7d4bd8,
    aEventStateManager=aEventStateManager@entry=0x7fb8fe4d60, 
    aEvent=aEvent@entry=0x7fde7d4d88, aTouchIsNew=false,
    aEventStatus=aEventStatus@entry=0x7fde7d4d6c, 
    aOverrideClickTarget=aOverrideClickTarget@entry=0x0)
    at layout/base/PresShell.cpp:8245
#12 0x0000007ff401f588 in mozilla::PresShell::EventHandler::
    HandleEventWithCurrentEventInfo (this=this@entry=0x7fde7d4bd8,
    aEvent=aEvent@entry=0x7fde7d4d88, 
    aEventStatus=aEventStatus@entry=0x7fde7d4d6c, 
    aIsHandlingNativeEvent=aIsHandlingNativeEvent@entry=true,
    aOverrideClickTarget=0x0)
    at layout/base/PresShell.cpp:8177
#13 0x0000007ff4023dbc in mozilla::PresShell::EventHandler::
    HandleEventUsingCoordinates (this=this@entry=0x7fde7d4ca8,
    aFrameForPresShell=aFrameForPresShell@entry=0x7fb9102980, 
    aGUIEvent=aGUIEvent@entry=0x7fde7d4d88, 
    aEventStatus=aEventStatus@entry=0x7fde7d4d6c,
    aDontRetargetEvents=aDontRetargetEvents@entry=false)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:859
#14 0x0000007ff4023fa0 in mozilla::PresShell::EventHandler::HandleEvent (
    this=this@entry=0x7fde7d4ca8,
    aFrameForPresShell=aFrameForPresShell@entry=0x7fb9102980, 
    aGUIEvent=aGUIEvent@entry=0x7fde7d4d88,
    aDontRetargetEvents=aDontRetargetEvents@entry=false, 
    aEventStatus=aEventStatus@entry=0x7fde7d4d6c)
    at layout/base/PresShell.cpp:6898
#15 0x0000007ff40240ec in mozilla::PresShell::HandleEvent (this=0x7fb90d7250, 
    aFrameForPresShell=0x7fb9102980, aGUIEvent=aGUIEvent@entry=0x7fde7d4d88,
    aDontRetargetEvents=aDontRetargetEvents@entry=false, 
    aEventStatus=aEventStatus@entry=0x7fde7d4d6c)
    at layout/base/PresShell.cpp:6841
#16 0x0000007ff270373c in nsContentUtils::SendMouseEvent (
    aPresShell=aPresShell@entry=0x7fb90d7250, aType=..., aX=aX@entry=128.851852,
    aY=aY@entry=98.9074097, aButton=aButton@entry=0, 
    aButtons=aButtons@entry=-1, aClickCount=aClickCount@entry=1, 
    aModifiers=aModifiers@entry=0,
    aIgnoreRootScrollFrame=aIgnoreRootScrollFrame@entry=false, 
    aPressure=aPressure@entry=0, aInputSourceArg=aInputSourceArg@entry=5,
    aIdentifier=aIdentifier@entry=0, aToWindow=aToWindow@entry=true, 
    aPreventDefault=aPreventDefault@entry=0x7fde7d4f57,
    aIsDOMEventSynthesized=<optimized out>, aIsDOMEventSynthesized@entry=true, 
    aIsWidgetEventSynthesized=aIsWidgetEventSynthesized@entry=false)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsView.h:268
#17 0x0000007ff2716980 in nsDOMWindowUtils::SendMouseEventCommon (
    this=this@entry=0x7fb84bc880, aType=..., aX=aX@entry=128.851852,
    aY=aY@entry=98.9074097, aButton=aButton@entry=0, 
    aClickCount=aClickCount@entry=1, aModifiers=aModifiers@entry=0,
    aIgnoreRootScrollFrame=aIgnoreRootScrollFrame@entry=false, 
    aPressure=aPressure@entry=0, aInputSourceArg=aInputSourceArg@entry=5,
    aPointerId=aPointerId@entry=0, aToWindow=aToWindow@entry=true, 
    aPreventDefault=aPreventDefault@entry=0x0,
    aIsDOMEventSynthesized=aIsDOMEventSynthesized@entry=true, 
    aIsWidgetEventSynthesized=aIsWidgetEventSynthesized@entry=false, 
    aButtons=aButtons@entry=-1)
    at dom/base/nsDOMWindowUtils.cpp:732
#18 0x0000007ff2716ca0 in nsDOMWindowUtils::SendMouseEventToWindow (
    this=0x7fb84bc880, aType=..., aX=128.851852, aY=98.9074097, aButton=0, 
    aClickCount=1,
    aModifiers=0, aIgnoreRootScrollFrame=false, aPressure=0, aInputSourceArg=5, 
    aIsDOMEventSynthesized=<optimized out>, aIsWidgetEventSynthesized=false,
    aButtons=0, aIdentifier=<optimized out>, aOptionalArgCount=3 '\003')
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/ProfilerLabels.h:249
[...]
#69 0x0000007fef54889c in ?? () from /lib64/libc.so.6
(gdb)
We need to know why this isn't called on ESR 78. Without having yet checked the code, it would seem that the answer must be in nsListControlFrame::MouseDown() given that this is frame #1 of the backtrace above and that it also gets called on ESR 78:
(gdb) b nsListControlFrame::MouseDown
Breakpoint 14 at 0x7fbc0b3d08: file layout/forms/nsListControlFrame.cpp, line 
    1659.
(gdb) c
Continuing.

Thread 10 &quot;GeckoWorkerThre&quot; hit Breakpoint 14, nsListControlFrame::
    MouseDown (this=0x7f80f671e0, aMouseEvent=aMouseEvent@entry=0x7f803cc1f0)
    at layout/forms/nsListControlFrame.cpp:1659
1659    nsresult nsListControlFrame::MouseDown(dom::Event* aMouseEvent) {
(gdb) 
Once again the code for this method appears to be identical in both versions. It's a bit longer than the previous examples, so I've removed some lines at the start that aren't so relevant. In practice though, we really do need to see all of the remaining lines to get a trip on what's happening.
nsresult nsListControlFrame::MouseDown(dom::Event* aMouseEvent) {
[...]
  if (!IsLeftButton(aMouseEvent)) {
    if (IsInDropDownMode()) {
      if (!IgnoreMouseEventForSelection(aMouseEvent)) {
        aMouseEvent->PreventDefault();
        aMouseEvent->StopPropagation();
      } else {
        return NS_OK;
      }
      return NS_ERROR_FAILURE;  // means consume event
    } else {
      return NS_OK;
    }
  }

  int32_t selectedIndex;
  if (NS_SUCCEEDED(GetIndexFromDOMEvent(aMouseEvent, selectedIndex))) {
    // Handle Like List
    mButtonDown = true;
    CaptureMouseEvents(true);
    AutoWeakFrame weakFrame(this);
    bool change =
        HandleListSelection(aMouseEvent, selectedIndex);  // might destroy us
    if (!weakFrame.IsAlive()) {
      return NS_OK;
    }
    mChangesSinceDragStart = change;
  } else {
    // NOTE: the combo box is responsible for dropping it down
    if (mComboboxFrame) {
      // Ignore the click that occurs on the option element when one is
      // selected from the parent process popup.
      if (mComboboxFrame->IsOpenInParentProcess()) {
        nsCOMPtr<nsIContent> econtent =
            do_QueryInterface(aMouseEvent->GetTarget());
        HTMLOptionElement* option = HTMLOptionElement::FromNodeOrNull(econtent);
        if (option) {
          return NS_OK;
        }
      }

      uint16_t inputSource = mouseEvent->MozInputSource();
      bool isSourceTouchEvent =
          inputSource == MouseEvent_Binding::MOZ_SOURCE_TOUCH;
      if (FireShowDropDownEvent(
              mContent, !mComboboxFrame->IsDroppedDownOrHasParentPopup(),
              isSourceTouchEvent)) {
        return NS_OK;
      }

      if (!IgnoreMouseEventForSelection(aMouseEvent)) {
        return NS_OK;
      }

      if (!nsComboboxControlFrame::ToolkitHasNativePopup()) {
        bool isDroppedDown = mComboboxFrame->IsDroppedDown();
        nsIFrame* comboFrame = do_QueryFrame(mComboboxFrame);
        AutoWeakFrame weakFrame(comboFrame);
        mComboboxFrame->ShowDropDown(!isDroppedDown);
        if (!weakFrame.IsAlive()) return NS_OK;
        if (isDroppedDown) {
          CaptureMouseEvents(false);
        }
      }
    }
  }

  return NS_OK;
}
Stepping through using the debugger, it looks like the return value for IgnoreMouseEventForSelection(aMouseEvent)) is what's causing the method to return early so that the ShowDropDown() method never gets called on ESR 78:
Thread 10 &quot;GeckoWorkerThre&quot; hit Breakpoint 14, nsListControlFrame::
    MouseDown (this=0x7f80f671e0, aMouseEvent=aMouseEvent@entry=0x7f803cc1f0)
    at layout/forms/nsListControlFrame.cpp:1659
1659    nsresult nsListControlFrame::MouseDown(dom::Event* aMouseEvent) {
(gdb) n
1662      MouseEvent* mouseEvent = aMouseEvent->AsMouseEvent();
1663      NS_ENSURE_TRUE(mouseEvent, NS_ERROR_FAILURE);
(gdb) n
1665      UpdateInListState(aMouseEvent);
(gdb) n
1668      if (eventStates.HasState(NS_EVENT_STATE_DISABLED)) {
(gdb) n
1675      if (!IsLeftButton(aMouseEvent)) {
(gdb) n
1690      if (NS_SUCCEEDED(GetIndexFromDOMEvent(aMouseEvent, selectedIndex))) {
(gdb) n
1703        if (mComboboxFrame) {
(gdb) n
162       bool IsOpenInParentProcess() { return mIsOpenInParentProcess; }
(gdb) n
1715          uint16_t inputSource = mouseEvent->MozInputSource();
(gdb) n
1718          if (FireShowDropDownEvent(
(gdb) n
1719                  mContent, !mComboboxFrame->IsDroppedDownOrHasParentPopup(
    ),
(gdb) n
1718          if (FireShowDropDownEvent(
(gdb) n
1724          if (!IgnoreMouseEventForSelection(aMouseEvent)) {
(gdb) n
nsListEventListener::HandleEvent (aEvent=0x7f803cc1f0, this=0x7f80f7d490)
    at layout/forms/nsListControlFrame.cpp:2352
2352      nsAutoString eventType;
If we compare this with the flow on ESR 91 we can see that there the method doesn't drop out early in this way:
Thread 10 &quot;GeckoWorkerThre&quot; hit Breakpoint 4, nsListControlFrame::
    MouseDown (this=0x7fb91555d8, aMouseEvent=aMouseEvent@entry=0x7fb9140940)
    at layout/forms/nsListControlFrame.cpp:1645
1645    nsresult nsListControlFrame::MouseDown(dom::Event* aMouseEvent) {
(gdb) n
1648      MouseEvent* mouseEvent = aMouseEvent->AsMouseEvent();
(gdb) n
1649      NS_ENSURE_TRUE(mouseEvent, NS_ERROR_FAILURE);
(gdb) n
1651      UpdateInListState(aMouseEvent);
(gdb) n
1653      EventStates eventStates = mContent->AsElement()->State();
(gdb) n
1661      if (!IsLeftButton(aMouseEvent)) {
(gdb) n
1676      if (NS_SUCCEEDED(GetIndexFromDOMEvent(aMouseEvent, selectedIndex))) {
(gdb) n
1689        if (mComboboxFrame) {
(gdb) n
156       bool IsOpenInParentProcess() { return mIsOpenInParentProcess; }
(gdb) n
1701          uint16_t inputSource = mouseEvent->MozInputSource();
(gdb) n
1702          bool isSourceTouchEvent =
(gdb) n
1704          if (FireShowDropDownEvent(
(gdb) n
1705                  mContent, !mComboboxFrame->IsDroppedDownOrHasParentPopup(
    ),
(gdb) n
1710          if (!IgnoreMouseEventForSelection(aMouseEvent)) {
(gdb) n
1715            bool isDroppedDown = mComboboxFrame->IsDroppedDown();
(gdb) n
1716            nsIFrame* comboFrame = do_QueryFrame(mComboboxFrame);
(gdb) n
1717            AutoWeakFrame weakFrame(comboFrame);
(gdb) n
1718            mComboboxFrame->ShowDropDown(!isDroppedDown);
(gdb) n
5610      bool IsAlive() const { return !!mFrame; }
(gdb) n
1720            if (isDroppedDown) {
(gdb) n
nsListEventListener::HandleEvent (this=0x7fb9138f20, aEvent=0x7fb9140940)
    at layout/forms/nsListControlFrame.cpp:2347
2347      nsAutoString eventType;
Different!

So it looks like IgnoreMouseEventForSelection() is returning different values depending on the version. If that really is what's happening here it would be really helpful to know why. Checking out IgnoreMouseEventForSelection(), the code is, once again, identical on both version:
bool nsListControlFrame::IgnoreMouseEventForSelection(dom::Event* aEvent) {
  if (!mComboboxFrame) return false;

  // Our DOM listener does get called when the dropdown is not
  // showing, because it listens to events on the SELECT element
  if (!mComboboxFrame->IsDroppedDown()) return true;

  return !mItemSelectionStarted;
}
To understand why this is returning different values it would be convenient if we could step through the method to find out why. Unfortunately the method has been highly optimised on ESR 78 so that it's impossible to step through. Nevertheless we can at least extract the values and figure out what the flow would be ourselves. First on ESR 78:
(gdb) p mComboboxFrame
$8 = (nsComboboxControlFrame *) 0x7f80f67078
(gdb) p mComboboxFrame->mDroppedDown
$9 = false
(gdb) p mItemSelectionStarted
$10 = true
(gdb) 
Then on ESR 91:
(gdb) p mComboboxFrame
$8 = (nsComboboxControlFrame *) 0x7fb9103b18
(gdb) p mComboboxFrame->mDroppedDown
$9 = false
(gdb) p mItemSelectionStarted
$10 = true
(gdb) 
The values are identical, which isn't what I was expecting. Same code, same state, means same return value. Since mComboboxFrame->mDroppedDown is set to false the method will return with a value of true. If we look back at the code for MouseDown() this means that the condition wrapping IgnoreMouseEventForSelection() won't cause an early return on ESR 91. But it won't cause on early return on ESR 78 either. So this was a misunderstanding on my part because of the way the code was optimised on ESR 78. That's because at the end of the method, although it doesn't return early, the code in MouseDown() actually skips straight passed the following conditional block:
      if (!nsComboboxControlFrame::ToolkitHasNativePopup()) {
        bool isDroppedDown = mComboboxFrame->IsDroppedDown();
        nsIFrame* comboFrame = do_QueryFrame(mComboboxFrame);
        AutoWeakFrame weakFrame(comboFrame);
        mComboboxFrame->ShowDropDown(!isDroppedDown);
        if (!weakFrame.IsAlive()) return NS_OK;
        if (isDroppedDown) {
          CaptureMouseEvents(false);
        }
      }
Once this has been skipped we immediately reach the end of the MouseDown() method and return. So it's not an early return at all, but just the rest of the code in the method being skipped and us reaching the end of the method.

So we can conclude from our step-throughs from earlier that in fact it's this block of code that's being skipped on ESR 78 but not in ESR 91, and that's because nsComboboxControlFrame::ToolkitHasNativePopup() returns different values. On ESR 78 it returns true whereas on ESR 91 it returns false.

I can't show you this with the debugger because the entire method is optimised down into a single value of either true or false. We can see why this optimisation happens if we look at the code. Unfortunately, looking at the code for this method alone also doesn't tell us which value it will return. Here it is:
/* static */
bool nsComboboxControlFrame::ToolkitHasNativePopup() {
#ifdef MOZ_USE_NATIVE_POPUP_WINDOWS
  return true;
#else
  return false;
#endif /* MOZ_USE_NATIVE_POPUP_WINDOWS */
}
The return value of this method, or more precisely the Boolean value that the compiler completely replaces this method with, is determined by the pre-processor depending on whether or not MOZ_USE_NATIVE_POPUP_WINDOWS has been set. The easiest way for me to check which value this represents on ESR 91 is to add some errors in to the code and attempt to compile it. Only the error inside the compiled block will be hit; the error in the other branch will be silently ignored. So here's the amendment I've made to this code:
/* static */
bool nsComboboxControlFrame::ToolkitHasNativePopup() {
#ifdef MOZ_USE_NATIVE_POPUP_WINDOWS
#error MOZ_USE_NATIVE_POPUP_WINDOWS=1
  return true;
#else
#error MOZ_USE_NATIVE_POPUP_WINDOWS=0
  return false;
#endif /* MOZ_USE_NATIVE_POPUP_WINDOWS */
}
Now when I try to compile this on ESR 91 I get the following output.
In file included from Unified_cpp_layout_forms0.cpp:29:
${PROJECT}/gecko-dev/layout/forms/nsComboboxControlFrame.cpp:1618:2: error: 
    #error MOZ_USE_NATIVE_POPUP_WINDOWS=0
 #error MOZ_USE_NATIVE_POPUP_WINDOWS=0
  ^~~~~
In other words, on ESR 91 MOZ_USE_NATIVE_POPUP_WINDOWS hasn't been defined even though it should have been. Clearly this is wrong and some brief testing shows that yes, if the return value from this method is reversed the crash goes away when pressing on a Select widget.

That means we've finally reached the answer we need. It's still not entirely clear how this should be fixed, but now I know there is a way to fix it and it's just a matter of finding the most appropriate approach.

Getting to a solution like this feels like coming up for air; coming out of the rabbit hole and seeing the sun. When you're still digging downwards trying to find out what the problem is you have no idea how hard it's going to be to find a solution, or whether you'll even be able to find a solution at all. As we saw with the rendering pipeline, finding the solution can take months. This time it took just a few days, but the relief of knowing that you can solve it... that's the real joy of this kind of development. It's a really good feeling.

But we do still need to figure out what changes to make to solve this and it's all going to come down to getting MOZ_USE_NATIVE_POPUP_WINDOWS defined as the correct value. On ESR 78, we can see that MOZ_USE_NATIVE_POPUP_WINDOWS is set in embedding/embedlite/confvars.sh:
MOZ_APP_BASENAME=xulrunner-qt5
MOZ_APP_NAME=xulrunner-qt5
MOZ_APP_DISPLAYNAME=XULRunner
MOZ_APP_ID={6105197a-d833-4e45-ac65-dfda38830696}
MOZ_UPDATER=0
MOZ_XULRUNNER=1
MOZ_CHROME_FILE_FORMAT=omni
MOZ_APP_VERSION=$MOZILLA_VERSION
MOZ_URL_CLASSIFIER=1
MOZ_SERVICES_COMMON=
MOZ_SERVICES_CRYPTO=
MOZ_SERVICES_SYNC=
MOZ_MEDIA_NAVIGATOR=1
MOZ_USE_NATIVE_POPUP_WINDOWS=1
MOZ_SERVICES_HEALTHREPORT=
MOZ_DISABLE_EXPORT_JS=1
This then gets used by old-configure so that it ends up defined for the pre-processor. The value also appears in confvars.sh in the ESR 91 code, so the obvious question is why this isn't then being set as a pre-processor define on ESR 91. The answer, it turns out, can be found in toolkit/moz.configure, where this bit of code appears in the ESR 91 version:
option(
    env=&quot;MOZ_USE_NATIVE_POPUP_WINDOWS&quot;,
    default=target_is_android,
    help=&quot;Whether to use native popup windows&quot;,
)

set_define(&quot;MOZ_USE_NATIVE_POPUP_WINDOWS&quot;, True, 
    when=&quot;MOZ_USE_NATIVE_POPUP_WINDOWS&quot;)
We don't see this on ESR 78 and using git blame it's possible to find out exactly what changed, when and why. Although it turns out to be a little more effort than expected, because there are multiple commits since then that just reformat the same code in slightly different ways, without changing their meaning. Eventually though, if we work back far enough, we get to the relevant upstream change:
$ git diff f5328d27ba2c2~ f5328d27ba2c2 -- toolkit/moz.configure
diff --git a/toolkit/moz.configure b/toolkit/moz.configure
index 2c4d8fa2445e..a30c6b227481 100644
--- a/toolkit/moz.configure
+++ b/toolkit/moz.configure
@@ -2097,3 +2097,9 @@ set_config('ANDROID_PACKAGE_NAME', android_package_name)
 option(env='MOZ_WINCONSOLE', nargs='?',
        help='Whether we can create a console window.')
 set_define('MOZ_WINCONSOLE', True, when=depends('MOZ_WINCONSOLE')(lambda x: x))
+
+option(env='MOZ_USE_NATIVE_POPUP_WINDOWS', default=target_is_android,
+       help='Whether to use native popup windows')
+
+set_define('MOZ_USE_NATIVE_POPUP_WINDOWS', True,
+           when='MOZ_USE_NATIVE_POPUP_WINDOWS')
Checking the commit log we can see how the change was described:
$ git log -1 f5328d27ba2c2
commit f5328d27ba2c2c591384aab2d6cef323fecdfceb
Author: Ricky Stewart <rstewart@mozilla.com>
Date:   Fri Aug 21 22:48:09 2020 +0000

    Bug 1659756 - Move `MOZ_USE_NATIVE_POPUP_WINDOWS` from `old-configure` to 
    Python `configure` r=geckoview-reviewers,mhentges,agi,froydnj,glandium
    
    Differential Revision: https://phabricator.services.mozilla.com/D87462
So this commit didn't just add some code in to moz.configure, it also removed some from old-configure.in:
$ git diff f5328d27ba2c2~ f5328d27ba2c2 -- old-configure.in
diff --git a/old-configure.in b/old-configure.in
index 696b31c10dff..69d558cb7304 100644
--- a/old-configure.in
+++ b/old-configure.in
@@ -1490,7 +1490,6 @@ MOZ_UNIVERSALCHARDET=1
 MOZ_XUL=1
 MOZ_ZIPWRITER=1
 MOZ_NO_SMART_CARDS=
-MOZ_USE_NATIVE_POPUP_WINDOWS=
 MOZ_EXCLUDE_HYPHENATION_DICTIONARIES=
 MOZ_SANDBOX=1
 MOZ_BINARY_EXTENSIONS=
@@ -1897,10 +1896,6 @@ for extension in $MOZ_EXTENSIONS; do
     fi
 done
 
-if test -n &quot;$MOZ_USE_NATIVE_POPUP_WINDOWS&quot;; then
-  AC_DEFINE(MOZ_USE_NATIVE_POPUP_WINDOWS)
-fi
-
 # Avoid defining MOZ_ENABLE_CAIRO_FT on Windows platforms because
 # &quot;cairo-ft-font.c&quot; includes <dlfcn.h>, which only exists on posix 
    platforms
 if test -n &quot;$MOZ_TREE_FREETYPE&quot; -a &quot;$OS_TARGET&quot; != WINNT; 
    then
Interpreting these changes, apparently the mechanism through which this MOZ_USE_NATIVE_POPUP_WINDOWS define is set has been updated. Previously it could be set using confvar.sh but I'm a bit confused about how it should be set now. Based on the code now found in the moz.configure file, the value is now set via an option. The documentation tells us what the various parts of this option statement mean the following:
 
class mozbuild.configure.options.Option(name=None, env=None, nargs=None, default=None, possible_origins=None, choices=None, category=None, help=None, define_depth=0)
Bases: object
Represents a configure option
A configure option can be a command line flag or an environment variable or both.
  • name is the full command line flag (e.g. –enable-foo).
  • env is the environment variable name (e.g. ENV)
  • nargs is the number of arguments the option may take. It can be a number or the special values ‘?’ (0 or 1), ‘*’ (0 or more), or ‘+’ (1 or more).
  • default can be used to give a default value to the option. When the name of the option starts with ‘–enable-’ or ‘–with-’, the implied default is a NegativeOptionValue (disabled). When it starts with ‘–disable-’ or ‘–without-’, the implied default is an empty PositiveOptionValue (enabled).
  • choices restricts the set of values that can be given to the option.
  • help is the option description for use in the –help output.
  • possible_origins is a tuple of strings that are origins accepted for this option. Example origins are ‘mozconfig’, ‘implied’, and ‘environment’.
  • category is a human-readable string used only for categorizing command- line options when displaying the output of configure –help. If not supplied, the script will attempt to infer an appropriate category based on the name of the file where the option was defined. If supplied it must be in the _ALL_CATEGORIES list above.
  • define_depth should generally only be used by templates that are used to instantiate an option indirectly. Set this to a positive integer to force the script to look into a deeper stack frame when inferring the category.

From this and the fact the option has a parameter of env="MOZ_USE_NATIVE_POPUP_WINDOWS" it would seem that we need to set an environment variable. If set, the set_define() part of this code will then apply, which is explained in the documentation as follows:
 
set_define_impl(name, value, when=None)
Implementation of set_define(). Set the define with the given name to the given value. Both name and value can be references to @depends functions, in which case the result from these functions is used. If the result of either function is None, the define is not set. If the result is False, the define is explicitly undefined (-U).

So we need to set an environment variable, but it's not clear to me why this isn't the same as adding the a key-value pair to confvars.sh. Nevertheless I've now also added it to the mozconfig.merqtxulrunner file:
export MOZ_USE_NATIVE_POPUP_WINDOWS=1
In order to test this I'm going to have to perform a full rebuild. This changes the build values, so I really do need to build everything, which will likely take at least until the end of the day, probably even until the early hours of the morning. So that's all I can reasonably do today. Tomorrow we'll see whether this change has worked or not.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
3 Aug 2024 : Day 308 #
Yesterday I completed step one of my three step programme: find where the problematic instance of EmbedLitePuppetWidget is being created on both ESR 78 and ESR 91.

Today it's step two, which is to find where the mLayerManager for the problematic EmbedLitePuppetWidget instance is being created. Eventually I'll get to digging for this using the debugger, but first I'm going to look through the backtraces from yesterday and see if there are clues in the code that might point to something useful.

There is something notable in amongst those backtraces which comes out from closer inspection. You may recall that I was a little surprised by the fact that there are three instances of EmbedLitePuppetWidget created on ESR 78, but five on ESR 91. That's an unusual difference. The widgets start as windows and form a tree, each related to an active item on the page. The pages are the same, there's only the one window; why the discrepancy?

This is intriguing, so while it might turn out to be completely explainable and unrelated to the issue I'm trying to fix, I do think it's worth spending the time to investigate further.

It seems like it might be relevant though, because the widget that's causing the crash appears to be one of the widgets that's not being created on ESR 78. There's a widget on ESR 78 that relates to the Select item on the page, but the backtrace for the widget on ESR 91 shows that it's being created as a child of another widget. We don't have to go too far through the backtrace to notice this either, just the top two items will do:
#0  mozilla::embedlite::EmbedLitePuppetWidget::EmbedLitePuppetWidget (
    this=0x7fb86664a0, view=0x0)
    at mobile/sailfishos/embedshared/EmbedLitePuppetWidget.cpp:47
#1  0x0000007ff4c4d828 in mozilla::embedlite::EmbedLitePuppetWidget::
    CreateChild (this=0x7fb8aaf500, aRect=..., aInitData=0x7fde7be0c0,
    aForceUseIWidgetParent=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/cxxalloc.h:33
Notice how the value for this is different between the first and second frames: 0x7fb86664a0 and 0x7fb8aaf500 respectively. That's become one is creating the other, executed using the CreateChild() method which looks like this:
already_AddRefed<nsIWidget>
EmbedLitePuppetWidget::CreateChild(const LayoutDeviceIntRect &aRect,
                                   nsWidgetInitData* aInitData,
                                   bool              aForceUseIWidgetParent)
{
  if (Destroyed()) {
    return nullptr;
  }

  LOGT();
  bool isPopup = IsPopup(aInitData);
  nsCOMPtr<nsIWidget> widget = new EmbedLitePuppetWidget(nullptr);
  nsresult rv = widget->Create(isPopup ? nullptr : this, nullptr, aRect, 
    aInitData);
  return NS_FAILED(rv) ? nullptr : widget.forget();
}
Here's the signature of the widget->Create() call in there:
nsresult PuppetWidgetBase::Create(
    nsIWidget *aParent,
    nsNativeWidget aNativeParent,
    const LayoutDeviceIntRect &aRect,
    nsWidgetInitData *aInitData
)
One of my immediate thoughts when looking at the code is that if the value for isPopup is set to true, then the aParent value going in to this Create() method will be set to null. I wonder if this is the problem: that the new widget has no parent, so when it goes to traverse the tree to find the root where the layer manager is stored, it has no where to go?

To test out this hypothesis I set a breakpoint on CreateChild(). Sure enough the value of isPopup is set to true, so I've attempted to change the value to false using the debugger. Unfortunately this isn't possible due to the way the underlying code has been optimised (the isPopup variable has been optimised away). So I've ended up stepping into the Create() method and changing the value of aParent from null to the memory location of the parent instead. this was all done on ESR 91, since as I explained, there is no equivalent flow on ESR 78. Here's me changing the value:
mozilla::embedlite::PuppetWidgetBase::Create (this=0x7fb8f94bb0, aParent=0x0, 
    aNativeParent=0x0, aRect=..., aInitData=0x7fde7be0c0)
    at mobile/sailfishos/embedshared
/PuppetWidgetBase.cpp:46
46        LOGT(&quot;Puppet: %p, parent: %p&quot;, this, aParent);
(gdb) set var aParent = 0x7fb913f480
(gdb) print aParent
$20 = (nsIWidget *) 0x7fb913f480
(gdb) finish
But after continuing execution after making this change, the browser still crashes as before. Okay, so it's not that. Let's go a bit further back then and find out why this additional EmbedLitePuppetWidget is being created at all. The backtrace for its creation happens as a result of tapping on the Select widget on the screen. At the point when I press the screen, ESR 78 and ESR 91 execution should be in pretty much the same, or at least similar, state. So there should be a point in the ESR 91 backtrace for the creation of the EmbedLitePuppetWidget that exists on both ESR 78 and ESR 91. If we continue up the backtrace towards the PC end the two will eventually diverge. If we can find the point of divergence, there should be something in the code that's causing that divergence to happen.

So here's my plan: while running on ESR 78, place breakpoints on each of the calls from the ESR 91 backtrace. Starting with the call at the PC end, we already know that the backtrace won't be hit (because the additional EmbedLitePuppetWidget doesn't get created on ESR 78). And indeed that's the case; there is no hit:
Thread 1 &quot;sailfish-browse&quot; received signal SIGINT, Interrupt.
0x0000007fb7352740 in __GI___poll (fds=0x5555c7e8b0, nfds=5, timeout=<optimized 
    out>) at ../sysdeps/unix/sysv/linux/poll.c:41
41        return SYSCALL_CANCEL (ppoll, fds, nfds, timeout_ts_p, NULL, 0);
(gdb) b EmbedLitePuppetWidget::CreateChild
Breakpoint 3 at 0x7fbca925d0: EmbedLitePuppetWidget::CreateChild. (2 locations)
(gdb) c
Continuing.
[...]
So we have to go one further down the ESR 91 backtrace, add a breakpoint, run the code, press the button and see whether it hits. No hit:
Thread 1 &quot;sailfish-browse&quot; received signal SIGINT, Interrupt.
0x0000007fb7352740 in __GI___poll (fds=0x5555c7e8b0, nfds=5, timeout=<optimized 
    out>) at ../sysdeps/unix/sysv/linux/poll.c:41
41        return SYSCALL_CANCEL (ppoll, fds, nfds, timeout_ts_p, NULL, 0);
(gdb) b nsView::CreateWidgetForPopup
Breakpoint 4 at 0x7fbbdb05b0: file view/nsView.cpp, line 587.
(gdb) c
Continuing.
[..]
Do the same again. No hit this time either:
(gdb) b nsComboboxControlFrame::ShowList
Breakpoint 6 at 0x7fbc0b19b0: file layout/forms/nsComboboxControlFrame.cpp, 
    line 324.
(gdb) c
Continuing.
[...]
Until eventually we get to the call to nsComboboxControlFrame::SetFocus() from the ESR 91 backtrace. This time the breakpoint hits:
(gdb) b nsComboboxControlFrame::SetFocus
Breakpoint 8 at 0x7fbc0b4148: file layout/forms/nsComboboxControlFrame.cpp, 
    line 262.
(gdb) c
Continuing.
[W] unknown:0 - QConnmanEngine: Unable to translate the bearer type of the 
    unknown connection type: &quot;&quot;
[W] unknown:0 - QConnmanEngine: Unable to translate the bearer type of the 
    unknown connection type: &quot;&quot;

Thread 10 &quot;GeckoWorkerThre&quot; hit Breakpoint 8, nsComboboxControlFrame::
    SetFocus (this=0x7f8124b078, aOn=true, aRepaint=true)
    at layout/forms/nsComboboxControlFrame.cpp:262
262     void nsComboboxControlFrame::SetFocus(bool aOn, bool aRepaint) {
(gdb) 
Well that is interesting. That suggests that somewhere in the SetFocus() method there's some control flow which goes one way on ESR 78 and the other way on ESR 91. The code is the same for them both and looks like this:
void nsComboboxControlFrame::SetFocus(bool aOn, bool aRepaint) {
  AutoWeakFrame weakFrame(this);
  if (aOn) {
    nsListControlFrame::ComboboxFocusSet();
    sFocused = this;
    if (mDelayedShowDropDown) {
      ShowDropDown(true);  // might destroy us
      if (!weakFrame.IsAlive()) {
        return;
      }
    }
  } else {
    sFocused = nullptr;
    mDelayedShowDropDown = false;
    if (mDroppedDown) {
      mListControlFrame->ComboboxFinish(mDisplayedIndex);  // might destroy us
      if (!weakFrame.IsAlive()) {
        return;
      }
    }
    // May delete |this|.
    mListControlFrame->FireOnInputAndOnChange();
  }

  if (!weakFrame.IsAlive()) {
    return;
  }

  // This is needed on a temporary basis. It causes the focus
  // rect to be drawn. This is much faster than ReResolvingStyle
  // Bug 32920
  InvalidateFrame();
}
The important line here is the call to ShowDropDown() because this is in the ESR 91 backtrace, but isn't being called on ESR 78. It goes on to call ShowList() that we can also see in the ESR 91 backtrace. I'm going to step through the SetFocus() method on ESR 78 to find out why this isn't getting called.
Thread 10 &quot;GeckoWorkerThre&quot; hit Breakpoint 9, nsComboboxControlFrame::
    SetFocus (this=0x7f8124b078, aOn=true, aRepaint=true)
    at layout/forms/nsComboboxControlFrame.cpp:262
262     void nsComboboxControlFrame::SetFocus(bool aOn, bool aRepaint) {
(gdb) n
5078      MOZ_IMPLICIT AutoWeakFrame(nsIFrame* aFrame)
(gdb) n
264       if (aOn) {
(gdb) p aOn
$2 = true
(gdb) n
265         nsListControlFrame::ComboboxFocusSet();
(gdb) n
266         sFocused = this;
(gdb) n
267         if (mDelayedShowDropDown) {
(gdb) p mDelayedShowDropDown
$3 = false
(gdb) n
5105      bool IsAlive() const { return !!mFrame; }
(gdb) n
293       InvalidateFrame();
(gdb) n
263       AutoWeakFrame weakFrame(this);
(gdb) 
As we can clearly see from this execution, the reason is because mDelayedShowDropDown is set to false. In order to go down the branch that calls ShowDropDown() this variable would have to be set to true. Let's do the same step-through on ESR 91:
Thread 10 &quot;GeckoWorkerThre&quot; hit Breakpoint 1, nsComboboxControlFrame::
    SetFocus (this=0x7fb916f718, aOn=true, aRepaint=true)
    at layout/forms/nsComboboxControlFrame.cpp:260
260     void nsComboboxControlFrame::SetFocus(bool aOn, bool aRepaint) {
(gdb) n
261       AutoWeakFrame weakFrame(this);
(gdb) n
262       if (aOn) {
(gdb) p aOn
$1 = true
(gdb) n
263         nsListControlFrame::ComboboxFocusSet();
(gdb) n
264         sFocused = this;
(gdb) n
265         if (mDelayedShowDropDown) {
(gdb) p mDelayedShowDropDown
$2 = true
(gdb) n
266           ShowDropDown(true);  // might destroy us
(gdb) n
5610      bool IsAlive() const { return !!mFrame; }
(gdb) n
291       InvalidateFrame();
(gdb) n
And there it is. The value of mDelayedShowDropDown is true!

This is surprising and also useful material. A search of the full gecko code shows that mDelayedShowDropDown is only ever set inside the nsComboboxControlFrame method (it's protected and not exposed through any other method or referenced in any other files). There is, in fact, only one place in the entire codebase where the value gets set to true and that's in the nsComboboxControlFrame::ShowDropDown() method. It's not a very long method, so it's worth me copying out the full source here:
void nsComboboxControlFrame::ShowDropDown(bool aDoDropDown) {
  MOZ_ASSERT(!XRE_IsContentProcess());
  mDelayedShowDropDown = false;
  EventStates eventStates = mContent->AsElement()->State();
  if (aDoDropDown && eventStates.HasState(NS_EVENT_STATE_DISABLED)) {
    return;
  }

  if (!mDroppedDown && aDoDropDown) {
    nsFocusManager* fm = nsFocusManager::GetFocusManager();
    if (!fm || fm->GetFocusedElement() == GetContent()) {
      DropDownPositionState state = AbsolutelyPositionDropDown();
      if (state == eDropDownPositionFinal) {
        ShowList(aDoDropDown);  // might destroy us
      } else if (state == eDropDownPositionPendingResize) {
        // Delay until after the resize reflow, see nsAsyncResize.
        mDelayedShowDropDown = true;
      }
    } else {
      // Delay until we get focus, see SetFocus().
      mDelayedShowDropDown = true;
    }
  } else if (mDroppedDown && !aDoDropDown) {
    ShowList(aDoDropDown);  // might destroy us
  }
}
The critical parts are the line in the middle that sets mDelayedShowDropDown to true with the explanatory comment stating "Delay until after the resize reflow, see nsAsyncResize" and the identical line towards the end with the comment "Delay until we get focus, see SetFocus()". So we should step through this method to take a look at whether either of these branches that set mDelayedShowDropDown to true are being executed or not. Here's the step through of this method on ESR 91:
Thread 10 &quot;GeckoWorkerThre&quot; hit Breakpoint 2, nsComboboxControlFrame::
    ShowDropDown (this=0x7fb9117528, aDoDropDown=aDoDropDown@entry=true)
    at layout/forms/nsComboboxControlFrame.cpp:890
890     void nsComboboxControlFrame::ShowDropDown(bool aDoDropDown) {
(gdb) n
892       mDelayedShowDropDown = false;
(gdb) p mDelayedShowDropDown
$1 = false
(gdb) n
894       if (aDoDropDown && eventStates.HasState(NS_EVENT_STATE_DISABLED)) {
(gdb) n
898       if (!mDroppedDown && aDoDropDown) {
(gdb) p mDroppedDown
$2 = false
(gdb) p aDoDropDwown
No symbol &quot;aDoDropDwown&quot; in current context.
(gdb) p aDoDropDown
$3 = true
(gdb) n
899         nsFocusManager* fm = nsFocusManager::GetFocusManager();
(gdb) n
900         if (!fm || fm->GetFocusedElement() == GetContent()) {
(gdb) n
827       nsIContent* GetContent() const { return mContent; }
(gdb) n
910           mDelayedShowDropDown = true;
(gdb) p fm
$4 = <optimized out>
And we can see that it is being set to true here, in the second of the two cases. The reason it's going this way is because the condition (!fm || fm->GetFocusedElement() == GetContent()) is false. Unfortunately fm has been optimised out which makes it harder for us to find out exactly which part of the condition is failing.

This is all rather a lot to absorb, but I'm hoping that all this will turn out to be key to determining what's causing ESR 91 to crash. It's super late now here, so time for me to pause for the day. I'll pick this up again tomorrow morning to find out what the equivalent flow is on ESR 78.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
2 Aug 2024 : Day 307 #
Thinking overnight about the problem I was hitting up against yesterday, I've hatched some kind of plan. I spent yesterday circulating around EmbedLitePuppetWidget, which is the top level element but which, on ESR 91 at least, has no layer manager defined for it. From what I can tell, this isn't a situation which should ever arise. The top level element should create a layer manager if it's asked for one and doesn't otherwise already have one.

So today I want to find out where the EmbedLitePuppetWidget is being created. My plan is then to place a breakpoint on the mLayerManager container variable to see where it gets set on ESR 78. Once I have a backtrace for that, I should be in a better position to figure out why the same thing isn't happening on ESR 91.

First task for the day then: find out where the EmbedLitePuppetWidget is being created. And whether there's more than one of them!

As we progress, if you're reading this, I need to warn you that the entry today is going to be full of very long, impenetrable and not-very-interesting backtraces. For this task it's going to be really important for me to keep track of these, and while they make for terrible reading, they also make for crucial reference material. This diary is both reference material and reading material, and today it's going to be far more of the former than the latter. So apologies in advance.

My advice: skip past the backtraces. Thankfully, as a digital diary, both the cost of keeping the backtraces in and the effort of skipping past them is low.

Here's the breakpoint, with backtrace, of the first case of an EmbedLitePuppetWidget being constructed on ESR 78:
Thread 10 &quot;GeckoWorkerThre&quot; hit Breakpoint 1, mozilla::embedlite::
    EmbedLitePuppetWidget::EmbedLitePuppetWidget (this=0x7f80c09a90, 
    view=0x7f80be5eb8)
    at mobile/sailfishos/embedshared/EmbedLitePuppetWidget.cpp:47
47      EmbedLitePuppetWidget::EmbedLitePuppetWidget(EmbedLiteViewChildIface* 
    view)
(gdb) bt 4
#0  mozilla::embedlite::EmbedLitePuppetWidget::EmbedLitePuppetWidget (
    this=0x7f80c09a90, view=0x7f80be5eb8)
    at mobile/sailfishos/embedshared/EmbedLitePuppetWidget.cpp:47
#1  0x0000007fbca92f5c in mozilla::embedlite::EmbedLiteViewChild::
    InitGeckoWindow (this=0x7f80be5e80, parentId=0, parentBrowsingContext=0x0, 
    isPrivateWindow=<optimized out>, isDesktopMode=false)
    at obj-build-mer-qt-xr/dist/include/mozilla/cxxalloc.h:33
#2  0x0000007fbca84b84 in mozilla::detail::RunnableMethodArguments<unsigned int 
    const, mozilla::dom::BrowsingContext*, bool const, bool const>::
    applyImpl<mozilla::embedlite::EmbedLiteViewChild, void (mozilla::embedlite::
    EmbedLiteViewChild::*)(unsigned int, mozilla::dom::BrowsingContext*, bool, 
    bool), StoreCopyPassByConstLRef<unsigned int const>, 
    StoreRefPtrPassByPtr<mozilla::dom::BrowsingContext>, 
    StoreCopyPassByConstLRef<bool const>, StoreCopyPassByConstLRef<bool const>, 
    0ul, 1ul, 2ul, 3ul> (args=..., m=<optimized out>, o=<optimized out>)
    at xpcom/threads/nsThreadUtils.h:990
#3  mozilla::detail::RunnableMethodArguments<unsigned int const, mozilla::dom::
    BrowsingContext*, bool const, bool const>::apply<mozilla::embedlite::
    EmbedLiteViewChild, void (mozilla::embedlite::EmbedLiteViewChild::*)(
    unsigned int, mozilla::dom::BrowsingContext*, bool, bool)> (m=<optimized 
    out>, 
    o=<optimized out>, this=<optimized out>) at xpcom/threads/nsThreadUtils.h:
    1191
(More stack frames follow...)
It turns out this isn't the only instance though, there are two others as well. Here's the second being created:
Thread 10 &quot;GeckoWorkerThre&quot; hit Breakpoint 1, mozilla::embedlite::
    EmbedLitePuppetWidget::EmbedLitePuppetWidget (this=0x7f80c02fd0, view=0x0)
    at mobile/sailfishos/embedshared/EmbedLitePuppetWidget.cpp:47
47      EmbedLitePuppetWidget::EmbedLitePuppetWidget(EmbedLiteViewChildIface* 
    view)
(gdb) bt 23
#0  mozilla::embedlite::EmbedLitePuppetWidget::EmbedLitePuppetWidget (
    this=0x7f80c02fd0, view=0x0)
    at mobile/sailfishos/embedshared/EmbedLitePuppetWidget.cpp:47
#1  0x0000007fbca92664 in mozilla::embedlite::EmbedLitePuppetWidget::
    CreateChild (aForceUseIWidgetParent=<optimized out>, 
    aInitData=0x7fa69cfde0, 
    aRect=..., this=0x7f80c09a90) at obj-build-mer-qt-xr/dist/include/mozilla/
    cxxalloc.h:33
#2  mozilla::embedlite::EmbedLitePuppetWidget::CreateChild (this=0x7f80c09a90, 
    aRect=..., aInitData=0x7fa69cfde0, aForceUseIWidgetParent=<optimized out>)
    at mobile/sailfishos/embedshared/EmbedLitePuppetWidget.cpp:69
#3  0x0000007fbbdb03e8 in nsView::CreateWidgetForParent (this=0x7f8045dd00, 
    aParentWidget=0x7f80c09a90, aWidgetInitData=0x7fa69cfde0, 
    aWidgetInitData@entry=0x0, aEnableDragDrop=true, 
    aResetVisibility=aResetVisibility@entry=false)
    at view/nsView.cpp:574
#4  0x0000007fbbf50d1c in nsDocumentViewer::MakeWindow (
    this=this@entry=0x7f8045da10, aSize=..., 
    aContainerView=aContainerView@entry=0x0)
    at layout/base/nsDocumentViewer.cpp:2353
#5  0x0000007fbbf92438 in nsDocumentViewer::InitInternal (this=0x7f8045da10, 
    aParentWidget=<optimized out>, aState=0x0, aActor=0x0, aBounds=..., 
    aDoCreation=<optimized out>, aNeedMakeCX=<optimized out>, 
    aForceSetNewDocument=<optimized out>)
    at obj-build-mer-qt-xr/dist/include/mozilla/gfx/BaseSize.h:34
#6  0x0000007fbc7c4d38 in nsDocShell::SetupNewViewer (
    this=this@entry=0x7f80c0b160, aNewViewer=aNewViewer@entry=0x7f8045da10, 
    aWindowActor=aWindowActor@entry=0x0) at obj-build-mer-qt-xr/dist/include/
    nsCOMPtr.h:847
#7  0x0000007fbc7ce75c in nsDocShell::Embed (this=this@entry=0x7f80c0b160, 
    aContentViewer=0x7f8045da10, aWindowActor=aWindowActor@entry=0x0)
    at docshell/base/nsDocShell.cpp:5441
#8  0x0000007fbc7cec44 in nsDocShell::CreateAboutBlankContentViewer (
    this=this@entry=0x7f80c0b160, aPrincipal=aPrincipal@entry=0x0, 
    aStoragePrincipal=aStoragePrincipal@entry=0x0, aCSP=<optimized out>, 
    aBaseURI=0x0, aTryToSaveOldPresentation=<optimized out>, 
    aTryToSaveOldPresentation@entry=true, 
    aCheckPermitUnload=aCheckPermitUnload@entry=true, aActor=aActor@entry=0x0)
    at obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:847
#9  0x0000007fbc7cf070 in nsDocShell::EnsureContentViewer (
    this=this@entry=0x7f80c0b160)
    at obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:847
#10 0x0000007fbc7cff94 in nsDocShell::EnsureContentViewer (this=0x7f80c0b160)
    at docshell/base/nsDocShell.cpp:6264
#11 nsDocShell::GetDocument (this=0x7f80c0b160) at docshell/base/nsDocShell.cpp:
    3041
#12 0x0000007fba95dcf8 in nsPIDOMWindowOuter::MaybeCreateDoc (this=<optimized 
    out>)
    at dom/base/nsGlobalWindowOuter.cpp:7594
#13 0x0000007fba95e17c in non-virtual thunk to nsGlobalWindowOuter::WrapObject(
    JSContext*, JS::Handle<JSObject*>) ()
    at obj-build-mer-qt-xr/dist/include/js/HeapAPI.h:677
#14 0x0000007fba325c60 in XPCConvert::NativeInterface2JSObject (
    cx=cx@entry=0x7f80225e50, d=d@entry=..., aHelper=..., 
    iid=iid@entry=0x7fa69d0710, 
    allowNativeWrapper=allowNativeWrapper@entry=true, pErr=pErr@entry=0x0)
    at obj-build-mer-qt-xr/dist/include/js/RootingAPI.h:596
#15 0x0000007fba326684 in XPCConvert::NativeData2JS (cx=cx@entry=0x7f80225e50, 
    d=d@entry=..., s=s@entry=0x7fa69d0858, type=..., 
    iid=iid@entry=0x7fa69d0710, arrlen=<optimized out>, pErr=pErr@entry=0x0)
    at js/xpconnect/src/XPCConvert.cpp:351
#16 0x0000007fba342b48 in nsXPCWrappedJS::CallMethod (this=<optimized out>, 
    methodIndex=<optimized out>, info=0x7fbdfc0f68 <xpt::detail::
    sMethods+8800>, 
    nativeParams=0x7fa69d0858) at obj-build-mer-qt-xr/dist/include/js/
    RootingAPI.h:1279
#17 0x0000007fb9c13da4 in PrepareAndDispatch (self=0x7f8042ebc0, 
    methodIndex=<optimized out>, args=<optimized out>, gprData=0x7fa69d0920, 
    fprData=0x7fa69d08e0) at xpcom/reflect/xptcall/md/unix/
    xptcstubs_aarch64.cpp:183
#18 0x0000007fb9c140f4 in SharedStub ()
    at xpcom/reflect/xptcall/md/unix/xptcstubs_asm_aarch64.s:38
#19 0x0000007fb9ba4bc0 in nsObserverList::NotifyObservers (this=<optimized 
    out>, aSubject=aSubject@entry=0x7f80bcbf70, 
    aTopic=aTopic@entry=0x7fbe6a3008 &quot;embedliteviewcreated&quot;, 
    someData=someData@entry=0x0)
    at xpcom/ds/nsTArray.h:1182
#20 0x0000007fb9ba7ab4 in nsObserverService::NotifyObservers (
    this=0x7f800470e0, aSubject=0x7f80bcbf70, aTopic=0x7fbe6a3008 
    &quot;embedliteviewcreated&quot;, 
    aSomeData=0x0) at xpcom/ds/nsObserverService.cpp:288
#21 0x0000007fbca93460 in mozilla::embedlite::EmbedLiteViewChild::
    InitGeckoWindow (this=0x7f80be5e80, parentId=<optimized out>, 
    parentBrowsingContext=<optimized out>, isPrivateWindow=<optimized out>, 
    isDesktopMode=false)
    at obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:847
#22 0x0000007fbca84b84 in mozilla::detail::RunnableMethodArguments<unsigned int 
    const, mozilla::dom::BrowsingContext*, bool const, bool const>::
    applyImpl<mozilla::embedlite::EmbedLiteViewChild, void (mozilla::embedlite::
    EmbedLiteViewChild::*)(unsigned int, mozilla::dom::BrowsingContext*, bool, 
    bool), StoreCopyPassByConstLRef<unsigned int const>, 
    StoreRefPtrPassByPtr<mozilla::dom::BrowsingContext>, 
    StoreCopyPassByConstLRef<bool const>, StoreCopyPassByConstLRef<bool const>, 
    0ul, 1ul, 2ul, 3ul> (args=..., m=<optimized out>, o=<optimized out>)
    at xpcom/threads/nsThreadUtils.h:990
(More stack frames follow...)
And here's the third:
Thread 10 &quot;GeckoWorkerThre&quot; hit Breakpoint 1, mozilla::embedlite::
    EmbedLitePuppetWidget::EmbedLitePuppetWidget (this=0x7f81005400, view=0x0)
    at mobile/sailfishos/embedshared/EmbedLitePuppetWidget.cpp:47
47      EmbedLitePuppetWidget::EmbedLitePuppetWidget(EmbedLiteViewChildIface* 
    view)
(gdb) bt 18
#0  mozilla::embedlite::EmbedLitePuppetWidget::EmbedLitePuppetWidget (
    this=0x7f81005400, view=0x0)
    at mobile/sailfishos/embedshared/EmbedLitePuppetWidget.cpp:47
#1  0x0000007fbca92664 in mozilla::embedlite::EmbedLitePuppetWidget::
    CreateChild (aForceUseIWidgetParent=<optimized out>, 
    aInitData=0x7fa69d02a0, 
    aRect=..., this=0x7f80c09a90) at obj-build-mer-qt-xr/dist/include/mozilla/
    cxxalloc.h:33
#2  mozilla::embedlite::EmbedLitePuppetWidget::CreateChild (this=0x7f80c09a90, 
    aRect=..., aInitData=0x7fa69d02a0, aForceUseIWidgetParent=<optimized out>)
    at mobile/sailfishos/embedshared/EmbedLitePuppetWidget.cpp:69
#3  0x0000007fbbdb03e8 in nsView::CreateWidgetForParent (this=0x7f80ec1750, 
    aParentWidget=0x7f80c09a90, aWidgetInitData=0x7fa69d02a0, 
    aWidgetInitData@entry=0x0, aEnableDragDrop=true, 
    aResetVisibility=aResetVisibility@entry=false)
    at view/nsView.cpp:574
#4  0x0000007fbbf50d1c in nsDocumentViewer::MakeWindow (
    this=this@entry=0x7f80ee9300, aSize=..., 
    aContainerView=aContainerView@entry=0x0)
    at layout/base/nsDocumentViewer.cpp:2353
#5  0x0000007fbbf92438 in nsDocumentViewer::InitInternal (this=0x7f80ee9300, 
    aParentWidget=<optimized out>, aState=0x0, aActor=0x0, aBounds=..., 
    aDoCreation=<optimized out>, aNeedMakeCX=<optimized out>, 
    aForceSetNewDocument=<optimized out>)
    at obj-build-mer-qt-xr/dist/include/mozilla/gfx/BaseSize.h:34
#6  0x0000007fbc7c4d38 in nsDocShell::SetupNewViewer (
    this=this@entry=0x7f80c0b160, aNewViewer=aNewViewer@entry=0x7f80ee9300, 
    aWindowActor=aWindowActor@entry=0x0) at obj-build-mer-qt-xr/dist/include/
    nsCOMPtr.h:847
#7  0x0000007fbc7ce75c in nsDocShell::Embed (this=this@entry=0x7f80c0b160, 
    aContentViewer=0x7f80ee9300, aWindowActor=aWindowActor@entry=0x0)
    at docshell/base/nsDocShell.cpp:5441
#8  0x0000007fbc7de358 in nsDocShell::CreateContentViewer (this=0x7f80c0b160, 
    aContentType=..., aRequest=0x7f80e162a0, aContentHandler=<optimized out>)
    at docshell/base/nsDocShell.cpp:7662
#9  0x0000007fbc7dee80 in nsDSURIContentListener::DoContent (
    this=this@entry=0x7ea4109b30, aContentType=..., 
    aIsContentPreferred=aIsContentPreferred@entry=false, 
    aRequest=aRequest@entry=0x7f80e162a0, aContentHandler=0x7f80724250, 
    aAbortProcess=aAbortProcess@entry=0x7fa69d08a0)
    at docshell/base/nsDSURIContentListener.cpp:178
#10 0x0000007fba4b0350 in nsDocumentOpenInfo::TryContentListener (
    this=0x7f80724230, aListener=0x7ea4109b30, aChannel=0x7f80e162a0)
    at obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:1351
#11 0x0000007fba4b07c8 in nsDocumentOpenInfo::DispatchContent (
    this=this@entry=0x7f80724230, request=request@entry=0x7f80e162a0, 
    aCtxt=aCtxt@entry=0x0)
    at obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:847
#12 0x0000007fba4b1398 in nsDocumentOpenInfo::OnStartRequest (
    this=0x7f80724230, request=0x7f80e162a0)
    at uriloader/base/nsURILoader.cpp:190
#13 0x0000007fb9f26b84 in mozilla::net::DocumentLoadListener::<lambda(const 
    mozilla::net::DocumentLoadListener::OnStartRequestParams&)>::operator() (
    __closure=<optimized out>, __closure=<optimized out>, aParams=...)
    at obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:847
#14 mozilla::detail::VariantImplementation<unsigned char, 0, mozilla::net::
    DocumentLoadListener::OnStartRequestParams, mozilla::net::
    DocumentLoadListener::OnDataAvailableParams, mozilla::net::
    DocumentLoadListener::OnStopRequestParams, mozilla::net::
    DocumentLoadListener::OnAfterLastPartParams>::matchN<mozilla::
    Variant<mozilla::net::DocumentLoadListener::OnStartRequestParams, mozilla::
    net::DocumentLoadListener::OnDataAvailableParams, mozilla::net::
    DocumentLoadListener::OnStopRequestParams, mozilla::net::
    DocumentLoadListener::OnAfterLastPartParams>, mozilla::net::
    DocumentLoadListener::ResumeSuspendedChannel(nsIStreamListener*)::<lambda(
    const mozilla::net::DocumentLoadListener::OnStartRequestParams&)>, mozilla::
    net::DocumentLoadListener::ResumeSuspendedChannel(nsIStreamListener*)::
    <lambda(const mozilla::net::DocumentLoadListener::OnDataAvailableParams&)>, 
    mozilla::net::DocumentLoadListener::ResumeSuspendedChannel(
    nsIStreamListener*)::<lambda(const mozilla::net::DocumentLoadListener::
    OnStopRequestParams&)>, mozilla::net::DocumentLoadListener::
    ResumeSuspendedChannel(nsIStreamListener*)::<lambda(const mozilla::net::
    DocumentLoadListener::OnAfterLastPartParams&)> > (aMi=..., aV=...)
    at obj-build-mer-qt-xr/dist/include/mozilla/Variant.h:280
#15 mozilla::Variant<mozilla::net::DocumentLoadListener::OnStartRequestParams, 
    mozilla::net::DocumentLoadListener::OnDataAvailableParams, mozilla::net::
    DocumentLoadListener::OnStopRequestParams, mozilla::net::
    DocumentLoadListener::OnAfterLastPartParams>::match<mozilla::net::
    DocumentLoadListener::ResumeSuspendedChannel(nsIStreamListener*)::<lambda(
    const mozilla::net::DocumentLoadListener::OnStartRequestParams&)>, mozilla::
    net::DocumentLoadListener::ResumeSuspendedChannel(nsIStreamListener*)::
    <lambda(const mozilla::net::DocumentLoadListener::OnDataAvailableParams&)>, 
    mozilla::net::DocumentLoadListener::ResumeSuspendedChannel(
    nsIStreamListener*)::<lambda(const mozilla::net::DocumentLoadListener::
    OnStopRequestParams&)>, mozilla::net::DocumentLoadListener::
    ResumeSuspendedChannel(nsIStreamListener*)::<lambda(const mozilla::net::
    DocumentLoadListener::OnAfterLastPartParams&)> > (aM1=..., aM0=..., 
    this=0x7f80f22de8)
    at obj-build-mer-qt-xr/dist/include/mozilla/Variant.h:811
#16 mozilla::net::DocumentLoadListener::ResumeSuspendedChannel (
    this=0x7f80e31960, aListener=0x7f80724230)
    at netwerk/ipc/DocumentLoadListener.cpp:979
#17 0x0000007fb9f27144 in mozilla::net::ParentProcessDocumentChannel::
    OnRedirectVerifyCallback (this=0x7f807beaa0, aResult=nsresult::NS_OK)
    at obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:847
(More stack frames follow...)
On ESR 78 there are only these three instances created, but which one is the one we're interested in? To find out I've placed a breakpoint on EmbedLitePuppetWidget::GetLayerManager() and will proceed to press the button on screen that's causing us the trouble.
(gdb) b EmbedLitePuppetWidget::GetLayerManager
Breakpoint 2 at 0x7fbca92838: EmbedLitePuppetWidget::GetLayerManager. (2 
    locations)
(gdb) c
Continuing.

Thread 10 &quot;GeckoWorkerThre&quot; hit Breakpoint 2, mozilla::embedlite::
    EmbedLitePuppetWidget::GetLayerManager (this=0x7f81005400, 
    aShadowManager=0x0, 
    aBackendHint=mozilla::layers::LayersBackend::LAYERS_NONE, 
    aPersistence=nsIWidget::LAYER_MANAGER_CURRENT)
    at mobile/sailfishos/embedshared/EmbedLitePuppetWidget.cpp:374
374       if (!mLayerManager) {
(gdb) p mLayerManager
$1 = {mRawPtr = 0x7f8088ffc0}
(gdb) 
What we're interested in here is the value of this, which is equal to 0x7f81005400 on this run. Comparing that with the values for this from the previous constructor calls, we can see that it's the third of the three instances of EmbedLitePuppetWidget that we're interested in:
#0  mozilla::embedlite::EmbedLitePuppetWidget::EmbedLitePuppetWidget (
    this=0x7f81005400, view=0x0)
    at mobile/sailfishos/embedshared/EmbedLitePuppetWidget.cpp:47
Notice also that the mLayerManager member is set to a valid pointer here. Let's run a similar test on ESR 91. There are five EmbedLitePuppetWidget instances being created, although I'm not sure why there are two more for ESR 91 than for ESR 78. I'm just keeping the backtrace for the last one because it turns out that's the one we're interested in.
Thread 10 &quot;GeckoWorkerThre&quot; hit Breakpoint 1, mozilla::embedlite::
    EmbedLitePuppetWidget::EmbedLitePuppetWidget (this=0x7fb8ab8040, 
    view=0x7fb8a7fe78)
    at mobile/sailfishos/embedshared/EmbedLitePuppetWidget.cpp:47
47      EmbedLitePuppetWidget::EmbedLitePuppetWidget(EmbedLiteViewChildIface* 
    view)

Thread 10 &quot;GeckoWorkerThre&quot; hit Breakpoint 1, mozilla::embedlite::
    EmbedLitePuppetWidget::EmbedLitePuppetWidget (this=0x7fb8aaf500, view=0x0)
    at mobile/sailfishos/embedshared/EmbedLitePuppetWidget.cpp:47
47      EmbedLitePuppetWidget::EmbedLitePuppetWidget(EmbedLiteViewChildIface* 
    view)

Thread 10 &quot;GeckoWorkerThre&quot; hit Breakpoint 1, mozilla::embedlite::
    EmbedLitePuppetWidget::EmbedLitePuppetWidget (this=0x7fb8d88b70, view=0x0)
    at mobile/sailfishos/embedshared/EmbedLitePuppetWidget.cpp:47
47      EmbedLitePuppetWidget::EmbedLitePuppetWidget(EmbedLiteViewChildIface* 
    view)

Thread 10 &quot;GeckoWorkerThre&quot; hit Breakpoint 1, mozilla::embedlite::
    EmbedLitePuppetWidget::EmbedLitePuppetWidget (this=0x7fb8aaf500, view=0x0)
    at mobile/sailfishos/embedshared/EmbedLitePuppetWidget.cpp:47
47      EmbedLitePuppetWidget::EmbedLitePuppetWidget(EmbedLiteViewChildIface* 
    view)

Thread 10 &quot;GeckoWorkerThre&quot; hit Breakpoint 1, mozilla::embedlite::
    EmbedLitePuppetWidget::EmbedLitePuppetWidget (this=0x7fb86664a0, view=0x0)
    at mobile/sailfishos/embedshared/EmbedLitePuppetWidget.cpp:47
47      EmbedLitePuppetWidget::EmbedLitePuppetWidget(EmbedLiteViewChildIface* 
    view)
(gdb) bt
#0  mozilla::embedlite::EmbedLitePuppetWidget::EmbedLitePuppetWidget (
    this=0x7fb86664a0, view=0x0)
    at mobile/sailfishos/embedshared/EmbedLitePuppetWidget.cpp:47
#1  0x0000007ff4c4d828 in mozilla::embedlite::EmbedLitePuppetWidget::
    CreateChild (this=0x7fb8aaf500, aRect=..., aInitData=0x7fde7be0c0,
    aForceUseIWidgetParent=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/cxxalloc.h:33
#2  0x0000007ff3e3e5c0 in nsView::CreateWidgetForPopup (
    this=this@entry=0x7fb919a840, 
    aWidgetInitData=aWidgetInitData@entry=0x7fde7be0c0,
    aParentWidget=aParentWidget@entry=0x0, 
    aEnableDragDrop=aEnableDragDrop@entry=true, 
    aResetVisibility=aResetVisibility@entry=true)
    at view/nsView.cpp:615
#3  0x0000007ff41a1718 in nsComboboxControlFrame::ShowList (
    this=this@entry=0x7fb9174118, aShowList=aShowList@entry=true)
    at layout/forms/nsComboboxControlFrame.cpp:335
#4  0x0000007ff41a180c in nsComboboxControlFrame::ShowDropDown (
    this=this@entry=0x7fb9174118, aDoDropDown=aDoDropDown@entry=true)
    at layout/forms/nsComboboxControlFrame.cpp:903
#5  0x0000007ff41a4018 in nsComboboxControlFrame::SetFocus (this=0x7fb9174118, 
    aOn=<optimized out>, aRepaint=<optimized out>)
    at layout/forms/nsComboboxControlFrame.cpp:266
#6  0x0000007ff35109f8 in nsGenericHTMLFormElement::PreHandleEvent (
    this=0x7fb8ccf370, aVisitor=...)
    at dom/html/nsGenericHTMLElement.cpp:1922
#7  0x0000007ff33c437c in mozilla::EventTargetChainItem::PreHandleEvent (
    this=0x7fb8c98e48, aVisitor=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:869
#8  0x0000007ff33dc394 in mozilla::EventDispatcher::Dispatch (
    aTarget=<optimized out>, aPresContext=0x7fb9112c60, 
    aEvent=aEvent@entry=0x7fde7be4c0,
    aDOMEvent=aDOMEvent@entry=0x0, aEventStatus=aEventStatus@entry=0x0, 
    aCallback=aCallback@entry=0x0, aTargets=aTargets@entry=0x0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsTArray.h:413
#9  0x0000007ff28ab1f8 in FocusBlurEvent::Run (this=0x7fb8238890)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:289
#10 0x0000007ff26f71cc in nsContentUtils::AddScriptRunner (aRunnable=..., 
    aRunnable@entry=...)
    at dom/base/nsContentUtils.cpp:5763
#11 0x0000007ff26f725c in nsContentUtils::AddScriptRunner (
    aRunnable=aRunnable@entry=0x7fb8238890)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/AlreadyAddRefed.h:48
#12 0x0000007ff28c6700 in nsFocusManager::FireFocusOrBlurEvent (
    this=this@entry=0x7fb80eeae0, aEventMessage=aEventMessage@entry=mozilla::
    eFocus,
    aPresShell=aPresShell@entry=0x7fb9168750, 
    aTarget=aTarget@entry=0x7fb8ccf370, aWindowRaised=aWindowRaised@entry=false,
    aIsRefocus=aIsRefocus@entry=false, aRelatedTarget=aRelatedTarget@entry=0x0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:435
#13 0x0000007ff28c6b14 in nsFocusManager::SendFocusOrBlurEvent (
    this=this@entry=0x7fb80eeae0, aEventMessage=aEventMessage@entry=mozilla::
    eFocus,
    aPresShell=aPresShell@entry=0x7fb9168750, aDocument=0x7fb8e62730, 
    aTarget=aTarget@entry=0x7fb8ccf370, aWindowRaised=aWindowRaised@entry=false,
    aIsRefocus=aIsRefocus@entry=false, aRelatedTarget=aRelatedTarget@entry=0x0)
    at dom/base/nsFocusManager.cpp:2782
#14 0x0000007ff28c95b8 in nsFocusManager::Focus (this=this@entry=0x7fb80eeae0, 
    aWindow=0x7fb8571990, aElement=aElement@entry=0x7fb8ccf370,
    aFlags=aFlags@entry=2101250, aIsNewDocument=<optimized out>, 
    aIsNewDocument@entry=false, aFocusChanged=aFocusChanged@entry=true,
    aWindowRaised=aWindowRaised@entry=false, 
    aAdjustWidget=aAdjustWidget@entry=true, aActionId=5, 
    aBlurredElementInfo=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/Maybe.h:443
#15 0x0000007ff28d3ce8 in nsFocusManager::SetFocusInner (
    this=this@entry=0x7fb80eeae0, aNewContent=aNewContent@entry=0x7fb8ccf370,
    aFlags=aFlags@entry=2101250, aFocusChanged=aFocusChanged@entry=true, 
    aAdjustWidget=aAdjustWidget@entry=true, aActionId=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:859
#16 0x0000007ff28d403c in nsFocusManager::SetFocus (
    this=this@entry=0x7fb80eeae0, aElement=0x7fb8ccf370, aFlags=2101250)
    at dom/base/nsFocusManager.cpp:492
#17 0x0000007ff339a6a8 in mozilla::EventStateManager::PostHandleEvent (
    this=this@entry=0x7fb856e100, aPresContext=aPresContext@entry=0x7fb9112c60,
    aEvent=aEvent@entry=0x7fde7bed88, aTargetFrame=0x0, 
    aStatus=aStatus@entry=0x7fde7bed6c, 
    aOverrideClickTarget=aOverrideClickTarget@entry=0x0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/dom/Element.h:2057
#18 0x0000007ff401e808 in mozilla::PresShell::EventHandler::DispatchEvent (
    this=this@entry=0x7fde7bebd8,
    aEventStateManager=aEventStateManager@entry=0x7fb856e100, 
    aEvent=aEvent@entry=0x7fde7bed88, aTouchIsNew=false,
    aEventStatus=aEventStatus@entry=0x7fde7bed6c, 
    aOverrideClickTarget=aOverrideClickTarget@entry=0x0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:289
#19 0x0000007ff401f588 in mozilla::PresShell::EventHandler::
    HandleEventWithCurrentEventInfo (this=this@entry=0x7fde7bebd8,
    aEvent=aEvent@entry=0x7fde7bed88, 
    aEventStatus=aEventStatus@entry=0x7fde7bed6c, 
    aIsHandlingNativeEvent=aIsHandlingNativeEvent@entry=true,
    aOverrideClickTarget=0x0)
    at layout/base/PresShell.cpp:8177
#20 0x0000007ff4023dbc in mozilla::PresShell::EventHandler::
    HandleEventUsingCoordinates (this=this@entry=0x7fde7beca8,
    aFrameForPresShell=aFrameForPresShell@entry=0x7fb9172e70, 
    aGUIEvent=aGUIEvent@entry=0x7fde7bed88, 
    aEventStatus=aEventStatus@entry=0x7fde7bed6c,
    aDontRetargetEvents=aDontRetargetEvents@entry=false)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:859
#21 0x0000007ff4023fa0 in mozilla::PresShell::EventHandler::HandleEvent (
    this=this@entry=0x7fde7beca8,
    aFrameForPresShell=aFrameForPresShell@entry=0x7fb9172e70, 
    aGUIEvent=aGUIEvent@entry=0x7fde7bed88,
    aDontRetargetEvents=aDontRetargetEvents@entry=false, 
    aEventStatus=aEventStatus@entry=0x7fde7bed6c)
    at layout/base/PresShell.cpp:6898
#22 0x0000007ff40240ec in mozilla::PresShell::HandleEvent (this=0x7fb9168750, 
    aFrameForPresShell=0x7fb9172e70, aGUIEvent=aGUIEvent@entry=0x7fde7bed88,
    aDontRetargetEvents=aDontRetargetEvents@entry=false, 
    aEventStatus=aEventStatus@entry=0x7fde7bed6c)
    at layout/base/PresShell.cpp:6841
#23 0x0000007ff270373c in nsContentUtils::SendMouseEvent (
    aPresShell=aPresShell@entry=0x7fb9168750, aType=..., aX=aX@entry=105.25927,
    aY=aY@entry=132.481491, aButton=aButton@entry=0, 
    aButtons=aButtons@entry=-1, aClickCount=aClickCount@entry=1, 
    aModifiers=aModifiers@entry=0,
    aIgnoreRootScrollFrame=aIgnoreRootScrollFrame@entry=false, 
    aPressure=aPressure@entry=0, aInputSourceArg=aInputSourceArg@entry=5,
    aIdentifier=aIdentifier@entry=0, aToWindow=aToWindow@entry=true, 
    aPreventDefault=aPreventDefault@entry=0x7fde7bef57,
    aIsDOMEventSynthesized=<optimized out>, aIsDOMEventSynthesized@entry=true, 
    aIsWidgetEventSynthesized=aIsWidgetEventSynthesized@entry=false)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsView.h:268
#24 0x0000007ff2716980 in nsDOMWindowUtils::SendMouseEventCommon (
    this=this@entry=0x7ef0000ff0, aType=..., aX=aX@entry=105.25927, 
    aY=aY@entry=132.481491,
    aButton=aButton@entry=0, aClickCount=aClickCount@entry=1, 
    aModifiers=aModifiers@entry=0, 
    aIgnoreRootScrollFrame=aIgnoreRootScrollFrame@entry=false,
    aPressure=aPressure@entry=0, aInputSourceArg=aInputSourceArg@entry=5, 
    aPointerId=aPointerId@entry=0, aToWindow=aToWindow@entry=true,
    aPreventDefault=aPreventDefault@entry=0x0, 
    aIsDOMEventSynthesized=aIsDOMEventSynthesized@entry=true,
    aIsWidgetEventSynthesized=aIsWidgetEventSynthesized@entry=false, 
    aButtons=aButtons@entry=-1)
    at dom/base/nsDOMWindowUtils.cpp:732
#25 0x0000007ff2716ca0 in nsDOMWindowUtils::SendMouseEventToWindow (
    this=0x7ef0000ff0, aType=..., aX=105.25927, aY=132.481491, aButton=0, 
    aClickCount=1,
    aModifiers=0, aIgnoreRootScrollFrame=false, aPressure=0, aInputSourceArg=5, 
    aIsDOMEventSynthesized=<optimized out>, aIsWidgetEventSynthesized=false,
    aButtons=0, aIdentifier=<optimized out>, aOptionalArgCount=3 '\003')
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/ProfilerLabels.h:249
#26 0x0000007ff19ca220 in _NS_InvokeByIndex ()
    at xpcom/reflect/xptcall/md/unix/xptcinvoke_asm_aarch64.S:74
[...]
#47 0x0000007ff4c543b4 in mozilla::embedlite::EmbedLiteViewChild::
    RecvHandleSingleTap (this=0x7fb8a7fe40, aPoint=..., 
    aModifiers=@0x7fde7c0594: 0,
    aGuid=..., aInputBlockId=@0x7fde7c05a0: 1)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#48 0x0000007ff1f06b6c in mozilla::embedlite::PEmbedLiteViewChild::
    OnMessageReceived (this=0x7fb8a7fe40, msg__=...) at PEmbedLiteViewChild.cpp:
    1718
#49 0x0000007ff1ef3844 in mozilla::embedlite::PEmbedLiteAppChild::
    OnMessageReceived (this=<optimized out>, msg__=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/ipc/ProtocolUtils.h:
    675
#50 0x0000007ff1ddfb20 in mozilla::ipc::MessageChannel::DispatchAsyncMessage (
    this=this@entry=0x7fb8b234f8, aProxy=aProxy@entry=0x7fb83a8700, aMsg=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/ipc/ProtocolUtils.h:
    675
#51 0x0000007ff1dee59c in mozilla::ipc::MessageChannel::DispatchMessage (
    this=this@entry=0x7fb8b234f8, aMsg=...)
    at ipc/glue/MessageChannel.cpp:2001
#52 0x0000007ff1def9f4 in mozilla::ipc::MessageChannel::RunMessage (
    this=0x7fb8b234f8, aTask=...)
    at ipc/glue/MessageChannel.cpp:1860
[...]
#76 0x0000007fef54889c in ?? () from /lib64/libc.so.6
From here I've placed a breakpoint on the PresShell::Paint() method so that we can catch the call to GetLayerManager() before the crash happens. Then I've pressed the button, waited for the breakpoint to hit, then switched to a breakpoint on the GetLayerManager() method. Here's what this all gives us:
Thread 10 &quot;GeckoWorkerThre&quot; hit Breakpoint 2, mozilla::embedlite::
    EmbedLitePuppetWidget::GetLayerManager (this=0x7fb86664a0, 
    aShadowManager=0x0,
    aBackendHint=mozilla::layers::LayersBackend::LAYERS_NONE, 
    aPersistence=nsIWidget::LAYER_MANAGER_CURRENT)
    at mobile/sailfishos/embedshared/EmbedLitePuppetWidget.cpp:371
371     {
(gdb) n
372       if (!mLayerManager) {
(gdb) p mLayerManager
$15 = {mRawPtr = 0x0}
There, as you can see, is the problematic null pointer of mLayerManager. And here's the backtrace for this call, just before the browser crashes:
(gdb) bt             
#0  mozilla::embedlite::EmbedLitePuppetWidget::GetLayerManager (
    this=0x7fb86664a0, aShadowManager=0x0, 
    aBackendHint=mozilla::layers::LayersBackend::LAYERS_NONE, 
    aPersistence=nsIWidget::LAYER_MANAGER_CURRENT)
    at mobile/sailfishos/embedshared/EmbedLitePuppetWidget.cpp:372
#1  0x0000007ff4c57cec in nsIWidget::GetLayerManager (this=0x7fb86664a0)
    at widget/nsIWidget.h:1303
#2  mozilla::embedlite::PuppetWidgetBase::Invalidate (this=0x7fb86664a0, 
    aRect=...)
    at mobile/sailfishos/embedshared/PuppetWidgetBase.cpp:261
#3  0x0000007ff4c56e70 in mozilla::embedlite::PuppetWidgetBase::Resize (
    this=0x7fb86664a0, aWidth=1.1986628770828247, aHeight=1.8786317110061646, 
    aRepaint=<optimized out>)
    at mobile/sailfishos/embedshared/PuppetWidgetBase.cpp:188
#4  0x0000007ff3e4a89c in nsBaseWidget::ResizeClient (this=0x7fb86664a0, 
    aSize=..., aRepaint=true)
    at widget/nsBaseWidget.cpp:1601
[...]
#17 0x0000007ff270373c in nsContentUtils::SendMouseEvent (
    aPresShell=aPresShell@entry=0x7fb9168750, aType=..., aX=aX@entry=105.25927, 
    aY=aY@entry=132.481491, aButton=aButton@entry=0, 
    aButtons=aButtons@entry=-1, aClickCount=aClickCount@entry=1, 
    aModifiers=aModifiers@entry=0, 
    aIgnoreRootScrollFrame=aIgnoreRootScrollFrame@entry=false, 
    aPressure=aPressure@entry=0, aInputSourceArg=aInputSourceArg@entry=5, 
    aIdentifier=aIdentifier@entry=0, aToWindow=aToWindow@entry=true, 
    aPreventDefault=aPreventDefault@entry=0x7fde7bef57, 
    aIsDOMEventSynthesized=<optimized out>, aIsDOMEventSynthesized@entry=true, 
    aIsWidgetEventSynthesized=aIsWidgetEventSynthesized@entry=false)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsView.h:268
#18 0x0000007ff2716980 in nsDOMWindowUtils::SendMouseEventCommon (
    this=this@entry=0x7ef0000ff0, aType=..., aX=aX@entry=105.25927, 
    aY=aY@entry=132.481491, 
    aButton=aButton@entry=0, aClickCount=aClickCount@entry=1, 
    aModifiers=aModifiers@entry=0, 
    aIgnoreRootScrollFrame=aIgnoreRootScrollFrame@entry=false, 
    aPressure=aPressure@entry=0, aInputSourceArg=aInputSourceArg@entry=5, 
    aPointerId=aPointerId@entry=0, aToWindow=aToWindow@entry=true, 
    aPreventDefault=aPreventDefault@entry=0x0, 
    aIsDOMEventSynthesized=aIsDOMEventSynthesized@entry=true, 
    aIsWidgetEventSynthesized=aIsWidgetEventSynthesized@entry=false, 
    aButtons=aButtons@entry=-1)
    at dom/base/nsDOMWindowUtils.cpp:732
#19 0x0000007ff2716ca0 in nsDOMWindowUtils::SendMouseEventToWindow (
    this=0x7ef0000ff0, aType=..., aX=105.25927, aY=132.481491, aButton=0, 
    aClickCount=1, 
    aModifiers=0, aIgnoreRootScrollFrame=false, aPressure=0, aInputSourceArg=5, 
    aIsDOMEventSynthesized=<optimized out>, aIsWidgetEventSynthesized=false, 
    aButtons=0, aIdentifier=<optimized out>, aOptionalArgCount=3 '\003')
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/ProfilerLabels.h:249
#20 0x0000007ff19ca220 in _NS_InvokeByIndex ()
    at xpcom/reflect/xptcall/md/unix/xptcinvoke_asm_aarch64.S:74
#21 0x0000007ff19ca5c8 in NS_InvokeByIndex (that=<optimized out>, 
    methodIndex=<optimized out>, paramCount=<optimized out>, params=<optimized 
    out>)
    at xpcom/reflect/xptcall/md/unix/xptcinvoke_aarch64.cpp:167
#22 0x0000007ff21a8584 in CallMethodHelper::Invoke (this=0x7fde7bf1e8)
    at js/xpconnect/src/XPCWrappedNative.cpp:1644
[...]
#41 0x0000007ff4c543b4 in mozilla::embedlite::EmbedLiteViewChild::
    RecvHandleSingleTap (this=0x7fb8a7fe40, aPoint=..., 
    aModifiers=@0x7fde7c0594: 0, 
    aGuid=..., aInputBlockId=@0x7fde7c05a0: 1)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#42 0x0000007ff1f06b6c in mozilla::embedlite::PEmbedLiteViewChild::
    OnMessageReceived (this=0x7fb8a7fe40, msg__=...) at PEmbedLiteViewChild.cpp:
    1718
#43 0x0000007ff1ef3844 in mozilla::embedlite::PEmbedLiteAppChild::
    OnMessageReceived (this=<optimized out>, msg__=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/ipc/ProtocolUtils.h:
    675
#44 0x0000007ff1ddfb20 in mozilla::ipc::MessageChannel::DispatchAsyncMessage (
    this=this@entry=0x7fb8b234f8, aProxy=aProxy@entry=0x7fb83a8700, aMsg=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/ipc/ProtocolUtils.h:
    675
#45 0x0000007ff1dee59c in mozilla::ipc::MessageChannel::DispatchMessage (
    this=this@entry=0x7fb8b234f8, aMsg=...)
    at ipc/glue/MessageChannel.cpp:2001
#46 0x0000007ff1def9f4 in mozilla::ipc::MessageChannel::RunMessage (
    this=0x7fb8b234f8, aTask=...)
    at ipc/glue/MessageChannel.cpp:1860
[...]
#70 0x0000007fef54889c in ?? () from /lib64/libc.so.6
(gdb) 
By comparing the value of this at the head of this backtrace with the constructors from earlier, we can see it's the last of the instances of EmbedLitePuppetWidget that's created which is causing the problem and which has the value of mLayerManager set to null.

Step one is completed. Step two is to add a breakpoint to the memory location of mLayerManager associated with the third instance of EmbedLitePuppetWidget on ESR 78 and the fifth instance on ESR 91. Confused? I know I am. But I'm going to try to keep a clear head. My theory is that the ESR 78 breakpoint will hit, but the ESR 91 breakpoint won't.

Unfortunately, pulling together all of these breakpoints is time consuming and I've hit the limit of what I can spend on this today, so I'll have to pick this up again in the morning. This is taking longer than expected, but we're inching forwards. I think this one is going to be solvable.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
1 Aug 2024 : Day 306 #
I expected this crash that's happening when the user pressed on a Selection list would be an easy one to fix. It's turning out to be far harder than I'd anticipated. The problem is simply that the mLayerManager member of the top level nsIWidget, which is of type EmbedLitePuppetWidget, is set to null. On ESR 78 it's correctly set to a value, but on ESR 91 that value is missing.

I've put together a sequence of steps for getting to the correct point in the code. It's a little intricate, but it does the trick. The steps are:
  1. Add breakpoints to PresShell::Paint() and PuppetWidgetBase::GetLayerManager().
  2. Disable the breakpoints.
  3. Run the browser and open the test page.
  4. Drop to the debugger and enable the Paint() breakpoint.
  5. Continue execution.
  6. Press on the problem Select widget on the web page.
  7. Wait for the breakpoint to hit.
  8. Disable the Paint() breakpoint and enable the GetLayerManager() breakpoint.
  9. Continue execution and wait for the breakpoint to hit.
At this point the debugger will be at the start of the method where the error is about to hit. Here's a run through of me performing these steps in practice (with some annotations added):
(gdb) delete break
Delete all breakpoints? (y or n) y
(gdb) break PresShell::Paint
Breakpoint 2 at 0x7ff40086f8: file layout/base/PresShell.cpp, line 6229.
(gdb) break PuppetWidgetBase::GetLayerManager
Breakpoint 3 at 0x7ff4c58000: file mobile/sailfishos/embedshared/
    PuppetWidgetBase.cpp, line 419.
(gdb) disable break
(gdb) c
Continuing.
[...]
# Wait for page to load: https://browser.sailfishos.org/tests/testselect.html
^C
Thread 1 &quot;sailfish-browse&quot; received signal SIGINT, Interrupt.
0x0000007fef53e740 in poll () from /lib64/libc.so.6
(gdb) enable break 2
(gdb) c
Continuing.

# Press the selection list button

Thread 10 &quot;GeckoWorkerThre&quot; hit Breakpoint 2, mozilla::PresShell::
    Paint (this=this@entry=0x7fb90f3970, 
    aViewToPaint=aViewToPaint@entry=0x7fb912ae80, 
    aDirtyRegion=..., aFlags=aFlags@entry=mozilla::PaintFlags::PaintLayers)
    at layout/base/PresShell.cpp:6229
6229                          PaintFlags aFlags) {
(gdb) disable break 2
(gdb) enable break 3
(gdb) c
Continuing.

Thread 10 &quot;GeckoWorkerThre&quot; hit Breakpoint 3, mozilla::embedlite::
    PuppetWidgetBase::GetLayerManager (this=this@entry=0x7fb8c99570, 
    aShadowManager=aShadowManager@entry=0x0, 
    aBackendHint=aBackendHint@entry=mozilla::layers::LayersBackend::
    LAYERS_NONE, 
    aPersistence=aPersistence@entry=nsIWidget::LAYER_MANAGER_CURRENT)
    at mobile/sailfishos/embedshared/PuppetWidgetBase.cpp:419
419     {
(gdb) 
Using these steps we can now more easily see the difference between ESR 78 and ESR 91. On ESR 78 we have this:
Thread 10 &quot;GeckoWorkerThre&quot; hit Breakpoint 12, mozilla::embedlite::
    PuppetWidgetBase::GetLayerManager (this=this@entry=0x7f81216bb0, 
    aShadowManager=aShadowManager@entry=0x0, 
    aBackendHint=aBackendHint@entry=mozilla::layers::LayersBackend::
    LAYERS_NONE, 
    aPersistence=aPersistence@entry=nsIWidget::LAYER_MANAGER_CURRENT)
    at mobile/sailfishos/embedshared/PuppetWidgetBase.cpp:420
420       if (Destroyed()) {
(gdb) bt 9
#0  mozilla::embedlite::PuppetWidgetBase::GetLayerManager (
    this=this@entry=0x7f81216bb0, aShadowManager=aShadowManager@entry=0x0, 
    aBackendHint=aBackendHint@entry=mozilla::layers::LayersBackend::
    LAYERS_NONE, aPersistence=aPersistence@entry=nsIWidget::
    LAYER_MANAGER_CURRENT)
    at mobile/sailfishos/embedshared/PuppetWidgetBase.cpp:424
#1  0x0000007fbca92868 in mozilla::embedlite::EmbedLitePuppetWidget::
    GetLayerManager (this=0x7f81216bb0, aShadowManager=0x0, 
    aBackendHint=mozilla::layers::LayersBackend::LAYERS_NONE, 
    aPersistence=nsIWidget::LAYER_MANAGER_CURRENT)
    at mobile/sailfishos/embedshared/EmbedLitePuppetWidget.cpp:381
#2  0x0000007fbbf3ce10 in nsIWidget::GetLayerManager (this=<optimized out>)
    at obj-build-mer-qt-xr/dist/include/nsIWidget.h:1266
#3  mozilla::PresShell::Paint (this=this@entry=0x7f81266db0, 
    aViewToPaint=aViewToPaint@entry=0x7f80ecb310, aDirtyRegion=..., 
    aFlags=aFlags@entry=mozilla::PaintFlags::PaintLayers)
    at layout/base/PresShell.cpp:6134
#4  0x0000007fbbdb40ac in nsViewManager::ProcessPendingUpdatesPaint (
    this=this@entry=0x7f80714980, aWidget=aWidget@entry=0x7f81216bb0)
    at obj-build-mer-qt-xr/dist/include/nsTArray.h:554
#5  0x0000007fbbdb43c0 in nsViewManager::ProcessPendingUpdatesForView (
    this=this@entry=0x7f80714980, aView=<optimized out>, 
    aFlushDirtyRegion=aFlushDirtyRegion@entry=true) at view/nsViewManager.cpp:
    395
#6  0x0000007fbbdb4b50 in nsViewManager::ProcessPendingUpdates (
    this=0x7f80714980)
    at view/nsViewManager.cpp:1018
#7  nsViewManager::ProcessPendingUpdates (this=this@entry=0x7f80714980)
    at view/nsViewManager.cpp:1004
#8  0x0000007fbbf13e4c in nsRefreshDriver::Tick (this=0x7f812166c0, aId=..., 
    aId@entry=..., aNowTime=aNowTime@entry=...)
    at layout/base/nsRefreshDriver.cpp:2201
(More stack frames follow...)
(gdb) n
530       bool Destroyed() const { return mOnDestroyCalled; }
(gdb) n
424       if (mLayerManager) {
(gdb) p mLayerManager
$22 = {mRawPtr = 0x7f80b6c6a0}
I've included the backtrace in the output here for both versions just to check that we really are in the same place. Here's the same on ESR 91:
Thread 10 &quot;GeckoWorkerThre&quot; hit Breakpoint 3, mozilla::embedlite::
    PuppetWidgetBase::GetLayerManager (this=this@entry=0x7fb8c99570, 
    aShadowManager=aShadowManager@entry=0x0, 
    aBackendHint=aBackendHint@entry=mozilla::layers::LayersBackend::
    LAYERS_NONE, 
    aPersistence=aPersistence@entry=nsIWidget::LAYER_MANAGER_CURRENT)
    at mobile/sailfishos/embedshared/PuppetWidgetBase.cpp:419
419     {
(gdb) bt 8
#0  mozilla::embedlite::PuppetWidgetBase::GetLayerManager (
    this=this@entry=0x7fb8c99570, aShadowManager=aShadowManager@entry=0x0, 
    aBackendHint=aBackendHint@entry=mozilla::layers::LayersBackend::
    LAYERS_NONE, aPersistence=aPersistence@entry=nsIWidget::
    LAYER_MANAGER_CURRENT)
    at mobile/sailfishos/embedshared/PuppetWidgetBase.cpp:424
#1  0x0000007ff4c52d40 in mozilla::embedlite::EmbedLitePuppetWidget::
    GetLayerManager (this=0x7fb8c99570, aShadowManager=0x0, 
    aBackendHint=mozilla::layers::LayersBackend::LAYERS_NONE, 
    aPersistence=nsIWidget::LAYER_MANAGER_CURRENT)
    at mobile/sailfishos/embedshared/EmbedLitePuppetWidget.cpp:379
#2  0x0000007ff4008a24 in nsIWidget::GetLayerManager (this=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsIWidget.h:1303
#3  mozilla::PresShell::Paint (this=this@entry=0x7fb90f3970, 
    aViewToPaint=aViewToPaint@entry=0x7fb912ae80, aDirtyRegion=..., 
    aFlags=aFlags@entry=mozilla::PaintFlags::PaintLayers)
    at layout/base/PresShell.cpp:6274
#4  0x0000007ff3e41068 in nsViewManager::ProcessPendingUpdatesPaint (
    this=this@entry=0x7fb90cdd70, aWidget=aWidget@entry=0x7fb8c99570)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/gfx/RectAbsolute.h:43
#5  0x0000007ff3e4141c in nsViewManager::ProcessPendingUpdatesForView (
    this=this@entry=0x7fb90cdd70, aView=<optimized out>, 
    aFlushDirtyRegion=aFlushDirtyRegion@entry=true)
    at view/nsViewManager.cpp:394
#6  0x0000007ff3e41a0c in nsViewManager::ProcessPendingUpdates (
    this=this@entry=0x7fb90cdd70)
    at view/nsViewManager.cpp:972
#7  0x0000007ff3fe94bc in nsRefreshDriver::Tick (this=0x7fb90cd7a0, aId=..., 
    aId@entry=..., aNowTime=aNowTime@entry=..., 
    aIsExtraTick=aIsExtraTick@entry=nsRefreshDriver::IsExtraTick::No)
    at layout/base/nsRefreshDriver.cpp:2477
(More stack frames follow...)
(gdb) n
564       bool Destroyed() const { return mOnDestroyCalled; }
(gdb) n
424       if (mLayerManager) {
(gdb) p mLayerManager
$1 = {mRawPtr = 0x0}
(gdb) 
As we can see from this, in the ESR 78 case we have a valid value for mLayerManager whereas on ESR 91 it's set to null. This leads to a different flow coming out of PuppetWidgetBase::GetLayerManager(). First on ESR 78:
(gdb) disable break
(gdb) finish
Run till exit from #0  mozilla::embedlite::PuppetWidgetBase::GetLayerManager (
    this=this@entry=0x7f81216bb0, aShadowManager=aShadowManager@entry=0x0, 
    aBackendHint=aBackendHint@entry=mozilla::layers::LayersBackend::
    LAYERS_NONE, aPersistence=aPersistence@entry=nsIWidget::
    LAYER_MANAGER_CURRENT)
    at mobile/sailfishos/embedshared/PuppetWidgetBase.cpp:424
mozilla::embedlite::EmbedLitePuppetWidget::GetLayerManager (this=0x7f81216bb0, 
    aShadowManager=0x0, 
    aBackendHint=mozilla::layers::LayersBackend::LAYERS_NONE, 
    aPersistence=nsIWidget::LAYER_MANAGER_CURRENT)
    at mobile/sailfishos/embedshared/EmbedLitePuppetWidget.cpp:382
382       if (lm) {
Value returned is $23 = (mozilla::layers::LayerManager *) 0x7f80b6c6a0
(gdb) p lm
$24 = (nsIWidget::LayerManager *) 0x7f80b6c6a0
This is as things should be. But on ESR 91 the null value for the layer manager results in additional code being executed. Because this is the top level widget no attempt can be made to extract the layer manager from a parent widget and so the method returns null:
(gdb) disable break
(gdb) finish
Run till exit from #0  mozilla::embedlite::PuppetWidgetBase::GetLayerManager (
    this=this@entry=0x7fb8c99570, aShadowManager=aShadowManager@entry=0x0, 
    aBackendHint=aBackendHint@entry=mozilla::layers::LayersBackend::
    LAYERS_NONE, aPersistence=aPersistence@entry=nsIWidget::
    LAYER_MANAGER_CURRENT)
    at mobile/sailfishos/embedshared/PuppetWidgetBase.cpp:424
[New LWP 22078]
0x0000007ff4c52d40 in mozilla::embedlite::EmbedLitePuppetWidget::
    GetLayerManager (this=0x7fb8c99570, aShadowManager=0x0, 
    aBackendHint=mozilla::layers::LayersBackend::LAYERS_NONE, 
    aPersistence=nsIWidget::LAYER_MANAGER_CURRENT)
    at mobile/sailfishos/embedshared/EmbedLitePuppetWidget.cpp:379
379       LayerManager *lm = PuppetWidgetBase::GetLayerManager(aShadowManager, 
    aBackendHint, aPersistence);
Value returned is $2 = (mozilla::layers::LayerManager *) 0x0
(gdb) n
380       if (lm) {
(gdb) p lm
$3 = (nsIWidget::LayerManager *) 0x0
(gdb) n
385       if (EmbedLiteApp::GetInstance()->GetType() == EmbedLiteApp::
    EMBED_INVALID) {
(gdb) n
396       nsIWidget* topWidget = GetTopLevelWidget();
(gdb) n
397       if (topWidget && topWidget != this) {
(gdb) p topWidget
$4 = (nsIWidget *) 0x7fb8c99570
(gdb) p this
$5 = (mozilla::embedlite::EmbedLitePuppetWidget * const) 0x7fb8c99570
(gdb) 
The obvious questions arises: "where is the layer manager actually created?". At this point I'm not really sure, but most of the opportunities to actually create the layer manager appear to happen inside nsBaseWidget. By placing breakpoints on all of the locations that trigger the creation of a layer manager, we should be able to find out which is the one creating it and where it's happening.

So here's the outcome of this test on ESR 78:
Thread 10 &quot;GeckoWorkerThre&quot; hit Breakpoint 14, nsBaseWidget::
    CreateCompositor (this=0x7f80b6b4b0, aWidth=1080, aHeight=2520)
    at widget/nsBaseWidget.cpp:1260
1260    void nsBaseWidget::CreateCompositor(int aWidth, int aHeight) {
(gdb) bt 10
#0  nsBaseWidget::CreateCompositor (this=0x7f80b6b4b0, aWidth=1080, 
    aHeight=2520)
    at widget/nsBaseWidget.cpp:1260
#1  0x0000007fbca9a1fc in mozilla::embedlite::nsWindow::GetLayerManager (
    this=0x7f80b6b4b0, aShadowManager=0x0, aBackendHint=<optimized out>, 
    aPersistence=nsIWidget::LAYER_MANAGER_CURRENT)
    at mobile/sailfishos/embedshared/nsWindow.cpp:225
#2  0x0000007fbca99318 in nsIWidget::GetLayerManager (this=0x7f80b6b4b0)
    at widget/nsIWidget.h:1266
#3  mozilla::embedlite::PuppetWidgetBase::Invalidate (aRect=..., 
    this=0x7f80b6b4b0)
    at mobile/sailfishos/embedshared/PuppetWidgetBase.cpp:261
#4  mozilla::embedlite::PuppetWidgetBase::Invalidate (this=0x7f80b6b4b0, 
    aRect=...)
    at mobile/sailfishos/embedshared/PuppetWidgetBase.cpp:253
#5  0x0000007fbca9c1b4 in mozilla::embedlite::PuppetWidgetBase::UpdateBounds (
    this=0x7f80b6b4b0, aRepaint=<optimized out>)
    at mobile/sailfishos/embedshared/PuppetWidgetBase.cpp:395
#6  0x0000007fbcaa5950 in mozilla::embedlite::EmbedLiteWindowChild::
    CreateWidget (this=0x7f80b73670)
    at mobile/sailfishos/embedshared/EmbedLiteWindowChild.cpp:198
#7  0x0000007fbca95474 in mozilla::detail::RunnableMethodArguments<>::
    applyImpl<mozilla::embedlite::EmbedLiteWindowChild, void (mozilla::
    embedlite::EmbedLiteWindowChild::*)()>(mozilla::embedlite::
    EmbedLiteWindowChild*, void (mozilla::embedlite::EmbedLiteWindowChild::*)(
    ), mozilla::Tuple<>&, std::integer_sequence<unsigned long>) (args=..., 
    m=<optimized out>, o=<optimized out>)
    at xpcom/threads/nsThreadUtils.h:1188
#8  mozilla::detail::RunnableMethodArguments<>::apply<mozilla::embedlite::
    EmbedLiteWindowChild, void (mozilla::embedlite::EmbedLiteWindowChild::*)()> 
    (
    this=<optimized out>, m=<optimized out>, o=<optimized out>)
    at xpcom/threads/nsThreadUtils.h:1191
#9  mozilla::detail::RunnableMethodImpl<mozilla::embedlite::
    EmbedLiteWindowChild*, void (mozilla::embedlite::EmbedLiteWindowChild::*)(
    ), true, (mozilla::RunnableKind)1>::Run (this=<optimized out>) at xpcom/
    threads/nsThreadUtils.h:1237
(More stack frames follow...)
As we can see, the creation happens in nsBaseWidget::CreateCompositor(). Interestingly on ESR 91 it's also being created in roughly the same place:
Thread 10 &quot;GeckoWorkerThre&quot; hit Breakpoint 2, nsBaseWidget::
    CreateCompositor (this=this@entry=0x7fb8a426a0, aWidth=aWidth@entry=1080, 
    aHeight=aHeight@entry=2520)
    at widget/nsBaseWidget.cpp:1415
1415    void nsBaseWidget::CreateCompositor(int aWidth, int aHeight) {
(gdb) bt 10
#0  nsBaseWidget::CreateCompositor (this=this@entry=0x7fb8a426a0, 
    aWidth=aWidth@entry=1080, aHeight=aHeight@entry=2520)
    at widget/nsBaseWidget.cpp:1415
#1  0x0000007ff4c58208 in mozilla::embedlite::nsWindow::CreateCompositor (
    this=0x7fb8a426a0, aWidth=1080, aHeight=2520)
    at mobile/sailfishos/embedshared/nsWindow.cpp:159
#2  0x0000007ff4c572dc in mozilla::embedlite::nsWindow::CreateCompositor (
    this=0x7fb8a426a0)
    at mobile/sailfishos/embedshared/nsWindow.cpp:152
#3  0x0000007ff4c5a100 in mozilla::embedlite::nsWindow::GetLayerManager (
    this=0x7fb8a426a0, aShadowManager=<optimized out>, aBackendHint=<optimized 
    out>, 
    aPersistence=nsIWidget::LAYER_MANAGER_CURRENT)
    at mobile/sailfishos/embedshared/nsWindow.cpp:214
#4  0x0000007ff4c57d2c in nsIWidget::GetLayerManager (this=0x7fb8a426a0)
    at widget/nsIWidget.h:1303
#5  mozilla::embedlite::PuppetWidgetBase::Invalidate (this=0x7fb8a426a0, 
    aRect=...)
    at mobile/sailfishos/embedshared/PuppetWidgetBase.cpp:261
#6  0x0000007ff4c5c650 in mozilla::embedlite::PuppetWidgetBase::UpdateBounds (
    this=0x7fb8a426a0, aRepaint=aRepaint@entry=true)
    at mobile/sailfishos/embedshared/PuppetWidgetBase.cpp:395
#7  0x0000007ff4c6583c in mozilla::embedlite::EmbedLiteWindowChild::
    CreateWidget (this=0x7fb8b9c010)
    at xpcom/base/nsCOMPtr.h:851
#8  0x0000007ff4c55da8 in mozilla::detail::RunnableMethodArguments<>::
    applyImpl<mozilla::embedlite::EmbedLiteWindowChild, void (mozilla::
    embedlite::EmbedLiteWindowChild::*)()>(mozilla::embedlite::
    EmbedLiteWindowChild*, void (mozilla::embedlite::EmbedLiteWindowChild::*)(
    ), mozilla::Tuple<>&, std::integer_sequence<unsigned long>) (args=..., 
    m=<optimized out>, o=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1151
#9  mozilla::detail::RunnableMethodArguments<>::apply<mozilla::embedlite::
    EmbedLiteWindowChild, void (mozilla::embedlite::EmbedLiteWindowChild::*)()> 
    (
    m=<optimized out>, o=<optimized out>, this=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1154
(More stack frames follow...)
I say "roughly" because the stack trace isn't identical. The call to nsWindow::CreateCompositor() only happens on ESR 91, which is unexpected. The actual difference is happening inside nsWindow::GetLayerManager(). The code is similar on both, and where they end up in nsBaseWidget::CreateCompositor() is the same, but for some reason the backtrace on ESR 91 is including the two hops needed to get there, whereas they're being skipped in the ESR 78 backtrace. In both cases the call in nsWindow::GetLayerManager() is to CreateCompositor():
void
nsWindow::CreateCompositor()
{
  LOGT();
  // Compositor should be created only for top level widgets, aka windows.
  MOZ_ASSERT(mWindow);
  LayoutDeviceIntRect size = mWindow->GetSize();
  CreateCompositor(size.width, size.height);
}

void
nsWindow::CreateCompositor(int aWidth, int aHeight)
{
  LOGT();
  nsBaseWidget::CreateCompositor(aWidth, aHeight);
}
I'll need to look in to this further, but it looks to me like this might just be a case of optimisation in the ESR 78 code. Unfortunately I've run out of time and energy for today, so I'll have to continue the investigation tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment