flypig.co.uk

Personal Blog

View the blog index.

RSS feed Click the icon for the blog RSS feed.

Blog

5 most recent items

26 Jul 2024 : Day 300 #
Over the last few days I've been working through the testing tasks in Issue #1053 on the sailfish-browser issue tracker on GitHub. Yesterday I got to the end of the list, with the result that 17 out of 22 tests passed (feels pretty good), but with five tests failing. In addition quite a few of the tests generated error output.

Here are the five failing cases that need looking in to:
  1. Video rendering and controls: total fail.
  2. Audio output and controls: partial fail.
  3. Password storage: total fail.
  4. Automatic dark/light theme switching: partial fail.
  5. Everything on the browser test page: fails: single select widget; external links; full screen; double-tap.
In addition to the above there were several cases where error output was shown on the screen:
// Print to PDF
JavaScript error: resource://gre/actors/BrowserElementParent.jsm, line 24: 
    TypeError: browser is null
// Exiting the browser
JavaScript error: file:///usr/lib64/mozembedlite/components/
    EmbedLiteChromeManager.js, line 170: TypeError: chromeListener is undefined
// Saving a downloaded file
JavaScript error: resource://gre/modules/pdfjs.js, line 29: 
    NS_ERROR_NOT_AVAILABLE: 
// Login manager
JavaScript error: resource://gre/modules/LoginHelper.jsm, line 1734: TypeError: 
    browser is null
So that's a quick summary of the things that need to be fixed. Today I'm going to start looking at the password storage and login manager, which seems to be the biggest failure right now (some may argue video is more important, but that also falls further outside my area of expertise).

As with much of the other functionality, Jolla has a handy login test page. There's no backend functionality to the page, but it allows you to enter fake credentials which, if things are working, the browser should pick up and offer to store for future reuse.

On ESR 78 this works correctly, but on ESR 91 it fails with the following error message:
JavaScript error: resource://gre/modules/LoginHelper.jsm, line 1734: TypeError: 
    browser is null
An error message is an error message, but what's odd about this one is that probably the LoginHelper.jsm functionality shouldn't be being queried at all. To corroborate this I put some debug output inside the getBrowserForPrompt() method on ESR 78 and checked whether it got printed when saving a password. It didn't.

The reason for this quickly becomes apparent when I check the ESR 78 patches. Patch 0082 has the title "Allow LoginManagerPrompter to find its window" and comes with the following description:
 
This patch blocks loading of gecko's LoginManagerAuthPrompter.jsm so that the embedlite-components version can be used instead.
It also patches the nsILoginManagerPrompter interface to allow a reference to the window to be passed through, to allow the embedlite component to understand its context.

Finally it patches ParentChannelListener to pass the correct window object through to the nsILoginManagerAuthPrompter component.


That very first line "so that the embedlite-components version can be used instead" is crucial. Without this patch the upstream login manager will be used. We want our Sailfish-specific login manager to be used instead, which means tweaking the guts of gecko to allow it.

Checking the ESR 91 source clearly shows that this patch hasn't yet been applied, so now would be the time to do this.

Attempting to apply the patch directly fails, but brilliantly, attempting a 3-way merge succeeds:
$ git am ../rpm/0082-sailfishos-gecko-Allow-LoginManagerPrompter-to-find-.patch
Applying: Allow LoginManagerPrompter to find its window. JB#55760, OMP#JOLLA-418
error: patch failed: toolkit/components/passwordmgr/nsILoginManagerPrompter.idl:
    29
error: toolkit/components/passwordmgr/nsILoginManagerPrompter.idl: patch does 
    not apply
Patch failed at 0001 Allow LoginManagerPrompter to find its window. JB#55760, 
    OMP#JOLLA-418
hint: Use 'git am --show-current-patch=diff' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am 
    --abort".
$ git am --abort
$ git am --3way ../rpm/
    0082-sailfishos-gecko-Allow-LoginManagerPrompter-to-find-.patch
Applying: Allow LoginManagerPrompter to find its window. JB#55760, OMP#JOLLA-418
Using index info to reconstruct a base tree...
M       netwerk/protocol/http/ParentChannelListener.cpp
M       toolkit/components/passwordmgr/LoginManagerParent.jsm
M       toolkit/components/passwordmgr/components.conf
M       toolkit/components/passwordmgr/nsILoginManagerPrompter.idl
Falling back to patching base and 3-way merge...
Auto-merging toolkit/components/passwordmgr/nsILoginManagerPrompter.idl
Auto-merging toolkit/components/passwordmgr/components.conf
Auto-merging toolkit/components/passwordmgr/LoginManagerParent.jsm
Auto-merging netwerk/protocol/http/ParentChannelListener.cpp
Most of the changes here are to the JavaScript, so could potentially be applied dynamically for testing. However there's also a change to an interface IDL file, plus it's the start of my working day so I won't be able to return to this for the next eight hours anyway, so I may as well kick off a build. When I return, if things have gone well, I'll be able to test out this change.

[...]

The build went through successfully, but when I try to use the login manager I now get the following error:
JavaScript error: file:///usr/lib64/mozembedlite/components/
    LoginManagerPrompter.js, line 1530: ReferenceError: ComponentUtils is not 
    defined
JavaScript error: resource://gre/modules/XPCOMUtils.jsm, line 161: 
    NS_ERROR_XPC_GS_RETURNED_FAILURE: ServiceManager::GetService returned 
    failure code:    
For the first of these errors it looks like I'll just need to add the following line to the top of the LoginManagerPrompter.js file (see for example the changes made by Raine in embedlite-components Issue #99)
const { ComponentUtils } = ChromeUtils.import("resource://gre/modules/
    ComponentUtils.jsm");
The good news is that in addition to making this change in the package source, I can also make it directly on my device for immediate testing.
devel-su vim /usr/lib64/mozembedlite/components/LoginManagerPrompter.js
[...]
With this line added, both errors are now fixed and the login manager prompter is working correctly! That means I've also now finally been able to test clearing of the password data as well. All is working correctly and without error messages.

I'm going to call it a day. Tomorrow I'll look at the failing "Automatic dark/light theme switching".

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
25 Jul 2024 : Day 299 #
I'm at 15 out of 22 tests this morning with 14 passes and one fail. Next on the list is permissions: "location, pop-ups, cookies, camera, microphone."

First up is location. The permission part of this seems to be working correctly. From Jolla's positioning test page, when I select "Get position" the permissions dialog appears. If I deny access the page announces correctly "User denied geolocation prompt". I can then reset the permission via the little browser lock icon in the address bar.

Now if I allow the geolocation permission rather than denying it, a little "toast" notification appears on the screen to say "Positioning is disabled". On the console I get the following output:
ContentPermissionPrompt.js on message received: top:embedui:permissions, msg:
    {"allow":true,"checkedDontAsk":false,"id":
    "browser.sailfishos.org geolocation"}
[D] unknown:0 - Geoclue client path: "/org/freedesktop/Geoclue/Master/
    client1"
My conclusion is that the positioning permissions dialog is working just fine, even if positioning itself isn't. And in fact, when I try to do the same thing on ESR 78, not only do I get the same "Positioning is disabled" toast, I also get an identical error at the console:
ContentPermissionPrompt.js on message received: top:embedui:permissions, msg:
    {"allow":true,"checkedDontAsk":false,"id":
    "browser.sailfishos.org geolocation"}
[D] unknown:0 - Geoclue client path: "/org/freedesktop/Geoclue/Master/
    client1"
This might need a little more investigation, but if the functionality is identical across the two versions, then I'll take that as a good sign.

For the popup permission the behaviour is a little odd. If I visit Jolla's pop-up test page and wait a couple of seconds the pop-up permission dialog appears. I can select Deny or Allow but the result appears to be the same either way: the pop-up never opens. If I select to remember my choice, no change is made to the underlying permissions, which I can check using the little padlock in the address bar.

However, if I change the permission from "Deny" to "Allow" via the padlock and select the link to open another pop-up? Well, then the pop-up opens correctly.

I've always been confused by this functionality: if I select to permanently set the state to "Allow", shouldn't all future pop-ups be allowed, at least until I remove the setting? It doesn't feel quite right to me, but it turns out it does at least match the approach from ESR 78.

So while I'm not entirely comfortable with how it's working, given the behaviour matches ESR 78, I'm considering this a pass.

Next up cookies. I'm using another of Jolla's test pages for this, which allows you to set a cookie, then check its value after restarting the browser.

I've checked that the cookie gets successfully set when Allowed and blocked when Denied. I've also checked that if cookies are blocked in general but with an exception for a particular site, then the cookie is nevertheless stored correctly when visiting a page from the site. Everything on ESR 91 works the same as on ESR 78 and it all feels intuitive and correct.

For the camera and microphone I'm using the same page as I did for the WebRTC tests, Mozilla's getUserMedia Test Page. Although the camera still has the same colouring issue from earlier, everything is otherwise good. And specifically the permission dialogs do their job as expected.

So location, pop-ups, cookies, camera and microphone permission dialogs are all working correctly. I've updated the issue on GitHub to reflect this.

Next I'm going to find out what Happens when I clear the browser data. There's an option in the settings for this, with various different categories available for clearing: history, cookies, passwords, cache, bookmarks and site permissions.

History, cookies, cache, bookmarks and site permissions all appear to work as expected. Unfortunately I'm not able to test password clearing because the functionality to add passwords is currently broken. But I'll come back to that when it's fixed.

For the dark/light theming, switching between one of the fixed values (light or dark) from the settings page works as expected: the site updates its ambience to match (I'm testing using DuckDuckGo). But switching between light and dark phone ambiences doesn't update the site, even though it successfully updates the user interface elements. So that's going to have to be a fail for now.

For audio testing the results are also unfortunately a fail. I'm testing using BBC Sounds which works fine on ESR 78. But on my ESR 91 build we don't get any audio, just an error message that states "This content doesn't seem to be working". Disappointing.

I get the same with the BBC iPlayer for video: it works on ESR 78 but not ESR 91. When using Jolla's video test page I get the same experience. On YouTube as well.

This isn't, to be honest, much of a surprise. I've not applied the changes needed to get audio and video working yet. It's not all bad news though. For example the audio and audio controls on Jolla's audio test page are working correctly. So it looks like the problems are down to the available codecs, rather than something more fundamentally broken with the way audio or video works (or doesn't, in this case).

The final test is for "Everything on the browser test page". Which is a bit nebulous if I'm honest, but I'll still give it my best shot.

All of the prompt dialogs work fine. The multi-select groups work, but the single select widget actually managed to crash the browser. So that's something to look in to.

Text input, radio buttons and checkboxes are all working fine. History (back and forward) works as expected.

Mouse click positioning is looking good.

Interestingly, external links (for example to email or the phone app) are not working. There's no error in the output console either, which won't make fixing the issue any easier. But for now, I just need to identify the fact that this is broken.

The user agent string is good. Window opening and file pickers all work as expected. Localisation, anchors, CSS, Storage, Service Workers are all working.

Full screen doesn't appear to be working. There's also a difference in behaviour when double-tapping. On ESR 78 the double tap goes through, but on ESR 91 it zooms to the enclosing box item instead, as it does with non-selectable items. This will need fixing.

Everything else on the test pages works fine. So while it's not an unambiguous pass, it's not far off.

So that's everything in the list of tests. Seventeen out of twenty two tests passed. Three were partial failures and two were total failures. Here's the full list:
  1. Video rendering and controls: total fail.
  2. Audio output and controls: partial fail.
  3. Private browsing: pass.
  4. Search on page: pass.
  5. Share link: pass.
  6. Save web page as PDF: pass.
  7. Desktop/mobile view switching: pass.
  8. Bookmarks: pass.
  9. History: pass.
  10. Downloads (including setting save destination): pass.
  11. Configuration using about:config: pass.
  12. Home page functionality: pass.
  13. Search providers: pass.
  14. Close tabs on exit: pass.
  15. Do not track: pass.
  16. JavaScript enable/disable toggle: pass.
  17. Password storage: total fail.
  18. Permissions: location, pop-ups, cookies, camera, microphone: pass.
  19. Clearing the browser data: pass.
  20. Automatic dark/light theme switching: partial fail.
  21. Everything on the browser test page: fails: single select widget; external links; full screen; double-tap.
  22. WebRTC audio and video: pass.
Honestly, I don't think that's looking too bad. Tomorrow I'll start working on the failing cases, the first of which will be the password storage.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
23 Jul 2024 : Day 297 #
Yesterday I said I was going to persevere with the hang caused by switching to a non-null cover. But I had a change of heart overnight. I have a tendency to get obsessed with small bugs like this because I'm desperate to know the reason for them. I know there is an answer to this, so the only thing stopping me from finding out is my own inadequacy. When you frame things like that it's easy to overstate the importance of something and end up prioritising the wrong thing.

But while I'd love to know the reason, I do appreciate there are more important things to be working on. I could lose days of work chasing an answer only to discover that someone smarter and better informed knows how to fix it already. During that time I could have fixed several other easier but more impactful bugs.

So I'm going to pause work on the hanging bug and move on to something else. If no solution appears naturally I'll return to it later.

Consequently I started off this morning by fixing all of the remaining cases where the fromExternal flag was needed in the front-end code. This was complicated somewhat by the fact that while it's needed for some calls to newTab(), there are also others where it's not. Some care was therefore needed.

But I think I got all of the relevant cases and none of the extraneous ones. I've committed and pushed my changes and since then I've had some time to spare today to look at other things.

I've gone on to working on Issue 1053 ("Test browser functionality with ESR 91"). This one issue is comprised of 22 subtasks, each of which involves testing some simple functionality of the browser.

So far I've tested the following functionalities, all of which are working as expected, at least to the extent I've been able to test:
  1. Private browsing.
  2. Search on page.
  3. Share link.
  4. Save web page as PDF.
  5. Desktop/mobile view switching.
  6. Bookmarks.
  7. History.
  8. Downloads (including setting save destination).
Since the functionality works I've ticked all of these off on the issue, which feels like a good start. However in some cases alongside the working functionality there were also some errors showing in the debug output.

Given that the errors aren't blocking the functionality, they can't be too serious, but I'm still keen to both document them here and also fix them if they're as straightforward as I hope they are.

The following error appeared while performing a print to PDF:
JavaScript error: resource://gre/actors/BrowserElementParent.jsm, line 24: 
    TypeError: browser is null
When exiting the browser the following error appears:
JavaScript error: file:///usr/lib64/mozembedlite/components/
    EmbedLiteChromeManager.js, line 170: TypeError: chromeListener is undefined
Finally When downloading a file to save it out, the following error appears:
JavaScript error: resource://gre/modules/pdfjs.js, line 29: 
    NS_ERROR_NOT_AVAILABLE: 
I'm not going to have time to look into these today, but my plan is to continue testing tomorrow, followed by trying to find simple solutions for each of the errors I encounter as I go through. But that's it for today; there'll be more testing tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
22 Jul 2024 : Day 296 #
Today I've been trying — really quite hard actually — to fix the hang that happens when switching to private browsing mode. Having looked at this before I refreshed myself yesterday to the point of understanding that it's due to the call to setBrowserCover() that happens when mode switch is made on the tab screen.

I've got a bit further today, the results of which only makes things even more confusing. To explain all this, it'll help to take a look at the setBrowserCover() method, which looks like this:
s
    function setBrowserCover(model) {
        if (!model || model.count === 0 || !WebUtils.firstUseDone) {
            cover = Qt.resolvedUrl("cover/NoTabsCover.qml")
        } else {
            if (cover != null && window.webView) {
                window.webView.clearSurface()
            }
            cover = null
        }
    }
Let's break this down a bit. The browser manages two tab models, one for normal browsing and the other for private browsing. When switching between the two modes one model is switched out for the other. This setBrowserCover() method is called just proceeding the change in model. So by the time we find ourselves in this method we've already switched the model to the one for private browsing.

Whenever the browser is closed any private browsing state — including the associated tab model — is either destroyed or forgotten. This includes the private browsing tab model. That means that having just opened the browser we know the private browsing tab model will have no tabs and so model.count in the above code will be zero.

That means we're going through the first half of the if statement above. There's only one line of functionality that therefore gets called as a result and that's the following:
            cover = Qt.resolvedUrl("cover/NoTabsCover.qml")
Typically the cover model for the browser will be set to null so that it shows the contents of the current page. If there are no pages open the cover is replaced, as we can see with this line of code, with the cover layout defined in the NoTabsCover.qml file.

So far so good. This is exactly what happens when we move to private browsing mode for the very first time. If I comment out the above line there are two consequences:
 
  1. When there are no active web pages the cover just shows a blank background.
  2. There's no hang.


We can conclude that it seems to be the act of setting the cover that's triggering the hang. This feels very strange because there's nothing special or magical about the cover or the way it gets switched in and out. I've tried a whole host of things in an attempt to get a clearer picture.

For example I wondered whether this was related to private browser mode or not, so I added a timer that switched out the cover after a delay of five seconds, irrespective of what's happening at the time. What I found was that this also hangs the browser, even you just have a static web page open and there's nothing exceptional happening. This suggests that it's not private browsing per se that's causing the problem, but rather the switching of the cover.

Intriguingly, if you do the switch while performing a pan and zoom, there's a crash instead of a hang. This has allowed me to collect the following backtrace:
[D] onTriggered:45 - Set cover: file:///usr/share/sailfish-browser/cover/
    NoTabsCover.qml
[New LWP 2607]
sailfish-browser: ../../../platforms/wayland/wayland_window_common.cpp:256: 
    void WaylandNativeWindow::releaseBuffer(wl_buffer*): Assertion `it != 
    fronted.end()' failed.

Thread 38 "Compositor" received signal SIGABRT, Aborted.
[Switching to LWP 2574]
0x0000007fef49a344 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x0000007fef49a344 in raise () from /lib64/libc.so.6
#1  0x0000007fef47fce8 in abort () from /lib64/libc.so.6
#2  0x0000007fef48ebd8 in ?? () from /lib64/libc.so.6
#3  0x0000007fef48ec40 in __assert_fail () from /lib64/libc.so.6
#4  0x0000007fe74e3044 in WaylandNativeWindow::releaseBuffer(wl_buffer*) () 
    from /usr/lib64/libhybris//eglplatform_wayland.so
#5  0x0000007fee8fa050 in ?? () from /usr/lib64/libffi.so.8
#6  0x0000007fee8f65f8 in ?? () from /usr/lib64/libffi.so.8
#7  0x0000007fe7795f98 in ?? () from /usr/lib64/libwayland-client.so.0
#8  0x0000007fe7792d80 in ?? () from /usr/lib64/libwayland-client.so.0
#9  0x0000007fe7794038 in wl_display_dispatch_queue_pending () from /usr/lib64/
    libwayland-client.so.0
#10 0x0000007fe74e3204 in WaylandNativeWindow::readQueue(bool) () from /usr/
    lib64/libhybris//eglplatform_wayland.so
#11 0x0000007fe74e23ec in WaylandNativeWindow::finishSwap() () from /usr/lib64/
    libhybris//eglplatform_wayland.so
#12 0x0000007fef090210 in _my_eglSwapBuffersWithDamageEXT () from /usr/lib64/
    libEGL.so.1
#13 0x0000007ff2397110 in mozilla::gl::GLLibraryEGL::fSwapBuffers (
    surface=0x5555991a60, dpy=<optimized out>, this=<optimized out>)
    at gfx/gl/GLLibraryEGL.h:303
#14 mozilla::gl::EglDisplay::fSwapBuffers (surface=0x5555991a60, 
    this=<optimized out>)
    at gfx/gl/GLLibraryEGL.h:694
#15 mozilla::gl::GLContextEGL::SwapBuffers (this=0x7ed41a6e30)
    at gfx/gl/GLContextProviderEGL.cpp:558
#16 0x0000007ff2440e00 in mozilla::layers::CompositorOGL::EndFrame (
    this=0x7ed41a1d70)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#17 0x0000007ff25174dc in mozilla::layers::LayerManagerComposite::Render (
    this=this@entry=0x7ed41a8a70, aInvalidRegion=..., aOpaqueRegion=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#18 0x0000007ff2517728 in mozilla::layers::LayerManagerComposite::
    UpdateAndRender (this=this@entry=0x7ed41a8a70)
    at gfx/layers/composite/LayerManagerComposite.cpp:657
#19 0x0000007ff2517ad8 in mozilla::layers::LayerManagerComposite::
    EndTransaction (this=this@entry=0x7ed41a8a70, aTimeStamp=..., 
    aFlags=aFlags@entry=mozilla::layers::LayerManager::END_DEFAULT)
    at gfx/layers/composite/LayerManagerComposite.cpp:572
#20 0x0000007ff2559274 in mozilla::layers::CompositorBridgeParent::
    CompositeToTarget (this=0x7fb89aba80, aId=..., aTarget=0x0, 
    aRect=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#21 0x0000007ff253e9bc in mozilla::layers::CompositorVsyncScheduler::Composite (
    this=0x7fb8b500e0, aVsyncEvent=...)
    at gfx/layers/ipc/CompositorVsyncScheduler.cpp:256
#22 0x0000007ff2536e34 in mozilla::detail::RunnableMethodArguments<mozilla::
    VsyncEvent>::applyImpl<mozilla::layers::CompositorVsyncScheduler, void (
    mozilla::layers::CompositorVsyncScheduler::*)(mozilla::VsyncEvent const&), 
    StoreCopyPassByConstLRef<mozilla::VsyncEvent>, 0ul> (args=..., m=<optimized 
    out>, 
    o=<optimized out>) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/
    nsThreadUtils.h:887
#23 mozilla::detail::RunnableMethodArguments<mozilla::VsyncEvent>::
    apply<mozilla::layers::CompositorVsyncScheduler, void (mozilla::layers::
    CompositorVsyncScheduler::*)(mozilla::VsyncEvent const&)> (m=<optimized 
    out>, o=<optimized out>, this=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1154
#24 mozilla::detail::RunnableMethodImpl<mozilla::layers::
    CompositorVsyncScheduler*, void (mozilla::layers::CompositorVsyncScheduler::
    *)(mozilla::VsyncEvent const&), true, (mozilla::RunnableKind)1, mozilla::
    VsyncEvent>::Run (this=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1201
[...]
#34 0x0000007fef54989c in ?? () from /lib64/libc.so.6
(gdb) 
This hints at the possibility that the render buffers may be being swapped in the wrong thread. But my attempts to dig deeper into this haven't as yet thrown up anything that could give more of a hint about what's going on.

I've also added some debug output to the setBrowserCover() method so it now looks like this:
    function setBrowserCover(model) {
        console.log(&quot;model: &quot; + model);
        console.log(&quot;model.count: &quot; + model.count);
        if (!model || model.count === 0 || !WebUtils.firstUseDone) {
            console.log(&quot;Setting cover&quot;);
            cover = Qt.resolvedUrl(&quot;cover/NoTabsCover.qml&quot;)
            console.log(&quot;Set cover: &quot; + cover);
        } else {
            console.log(&quot;Not setting cover&quot;);
            if (cover != null && window.webView) {
                window.webView.clearSurface()
            }
            cover = null
        }
        console.log(&quot;Exiting&quot;);
    }
When switching to private browsing mode, whether it's done via the menu or the tab list, the following output is triggered:
[D] setBrowserCover:20 - model: PrivateTabModel(0x62bf23d0e0)
[D] setBrowserCover:21 - model.count: 0
[D] setBrowserCover:23 - Setting cover
[D] setBrowserCover:25 - Set cover: file:///usr/share/sailfish-browser/cover/
    NoTabsCover.qml
[D] setBrowserCover:33 - Exiting
Immediately after the last debug print here the browser hangs. I've been trying hard to find some method inside the browser that's executed between the last line of this debug output and the actual hang, but without success. I've been doing this by adding breakpoints to various methods, switching to private browsing and watching to see if any of the breakpoints are hit.

So far without luck. Here are just a few of the methods I've attached breakpoints to and tested this way:
GLContextEGL::SwapBuffers()
GLContextEGL::SetDamage()
GLContextEGL::RenewSurface()
GLScreenBuffer::Swap()
ReadBuffer::Attach()
BeginTransaction()
EndEmptyTransaction()
NeedsPaint()
QOpenGLWebPage::onDrawOverlay()
Many of the breakpoints on these methods are triggered at other points in the browsing process, but if this happens I've just been continuing execution until the point at which I manually switch to private browsing. I get the same output and the same hang as when there's no breakpoint, like this:
Thread 39 &quot;Compositor&quot; hit Breakpoint 1, mozilla::layers::
    LayerManagerComposite::BeginTransaction (this=0x7ed41a8c20, aURL=...)
    at gfx/layers/composite/LayerManagerComposite.cpp:232
232     bool LayerManagerComposite::BeginTransaction(const nsCString& aURL) {
(gdb) c
Continuing.
[D] setBrowserCover:20 - model: PrivateTabModel(0x7fd800da50)
[D] setBrowserCover:21 - model.count: 0
[D] setBrowserCover:23 - Setting cover
[D] setBrowserCover:25 - Set cover: file:///usr/share/sailfish-browser/cover/
    NoTabsCover.qml
[D] setBrowserCover:33 - Exiting
There are a few other things I think it's worth mentioning. The hang happens when the cover is set, but not when it's cleared. If the cover is set right at the start and left as it is (so it's never set to null), everything runs fine. So it very much seems to be the act of switching from null to non-null that causes the problem.

Having not managed to find any methods that are fired between the cover being set and the hang occurring, I got frustrated and went for a walk outside. We have a lake nearby that's beautifully calm at this time of year. The air is warm and calm without being oppressive, which makes going for a walk a great way for me to clear my thoughts and come back feeling calmer.

I didn't have any revelations while walking, but I did think about whether I can approach this from a different angle. Rather than trying to find the gecko methods that are causing the problem by seeing if they're being used, what if I were to try to disable gecko functionality in the hope that the hang might suddenly vanish.

If the hang goes away with a particular piece of functionality disabled, then it may indicate some kind of clash between the cover change and the disabled functionality.

So I've tried a whole bunch of things, for example, setting it so that the page is always inactive by forcing the state to be always set to false:
 void QOpenGLWebPage::setActive(bool active)
 {
+    active = false;
     // WebPage is in inactive state until the view is initialized.
     // ::processViewInitialization always forces active state so we
     // can just ignore early activation calls.
     if (!d || !d->mViewInitialized)
         return;
 
     if (d->mActive != active) {
         d->mActive = active;
         d->mView->SetIsActive(d->mActive);
         Q_EMIT activeChanged();
     }
 }
I also tried disabling the initialisation code:
 void QOpenGLWebPage::initialize()
 {
-    d->createView();
 }
Plus a whole bunch of other similar things, from disconnecting various signals to preventing the EGL Display from being initialised. Many of these changes prevented rendering, but none of them prevented the hang.

I don't have an answer for why this is happening, but I'll persevere with it. As with everything computer-related, there is definitely an answer, it's just a case of finding it.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
21 Jul 2024 : Day 295 #
Yesterday, you may recall, I didn't do any coding but rather started looking through some changes from Raine that I needed to incorporate into my local codebase. I'm continuing that today, in the hope of wrapping that up and then moving on to some of the other issues in the issue tracker on GitHub.

So I've reviewed and approved both of Raine's PRs. The first, which I already looked at yesterday, updates the sailfish-components-webview code to support the fromExternal flag. This change is needed to get the code to compile against the changes already made to gecko.

The second switches instances of XPCOMUtils for ComponentUtils since upstream has moved the generateNSGetFactory() method from the former to the latter. I'd already made this change in a few places, but Raine has propagated it across the entire codebase, which is great. After looking through and checking it, I've now approved it.

Those are the two PRs I can see, so I've moved on to the issues. I've marked Issue 1024 ("Restore WebRTC code to ESR 91") as completed. I addressed this issue in some of the most recent changes I made (between days 278 and 286). I've also closed Issue 1020 ("Fix ESR 91 rendering pipeline"). If you've been following along you'll know that this was one of the largest pieces of work I had to do, which I spent a total of 152 days on. First for the main browser rendering between days 51 and 83. Then for the WebView rendering between days 159 and 245. Then following on from this up to Day 277 for the WebGL. So I'm frankly pretty pleased to be able to finally close it.

To wrap up I've decided to start looking at Issue 1053, which is an epic test issue comprised of 22 separate things to check. It looks long, but at this stage I'm hoping I can go through and tick them all off pretty swiftly (we'll have to see whether this actually happens or not!).

The very first thing I try is private browsing mode as it looks like one of the easier things to check.

Immediately I hit a problem. You'll recall that yesterday I closed Issue 1051 ("Fix hang when calling window.setBrowserCover()") thinking it was fixed. Well, I was wrong, it is still there after all. So now I'll need to look in to this further.

The problem manifests itself when you switch between normal and private browsing. As I discovered first time around, the bug causes a hang rather than a crash. Which means I have to interrupt the execution in order to get a backtrace:
Thread 1 &quot;sailfish-browse&quot; received signal SIGINT, Interrupt.
0x0000007fef866718 in pthread_cond_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x0000007fef866718 in pthread_cond_wait () from /lib64/libpthread.so.0
#1  0x0000007fef979924 in QWaitCondition::wait(QMutex*, unsigned long) () from /
    usr/lib64/libQt5Core.so.5
#2  0x0000007ff0afde08 in ?? () from /usr/lib64/libQt5Quick.so.5
#3  0x0000007ff0b00aa8 in ?? () from /usr/lib64/libQt5Quick.so.5
#4  0x0000007ff0b01270 in ?? () from /usr/lib64/libQt5Quick.so.5
#5  0x0000007ff05503dc in QWindow::event(QEvent*) () from /usr/lib64/
    libQt5Gui.so.5
#6  0x0000007ff0b307b8 in QQuickWindow::event(QEvent*) () from /usr/lib64/
    libQt5Quick.so.5
#7  0x0000007fefb31144 in QCoreApplication::notify(QObject*, QEvent*) () from /
    usr/lib64/libQt5Core.so.5
#8  0x0000007fefb312e8 in QCoreApplication::notifyInternal2(QObject*, QEvent*) (
    ) from /usr/lib64/libQt5Core.so.5
#9  0x0000007ff0546488 in QGuiApplicationPrivate::processExposeEvent(
    QWindowSystemInterfacePrivate::ExposeEvent*) () from /usr/lib64/
    libQt5Gui.so.5
#10 0x0000007ff05470b4 in QGuiApplicationPrivate::processWindowSystemEvent(
    QWindowSystemInterfacePrivate::WindowSystemEvent*) ()
   from /usr/lib64/libQt5Gui.so.5
#11 0x0000007ff05256e4 in QWindowSystemInterface::sendWindowSystemEvents(
    QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib64/libQt5Gui.so.5
#12 0x0000007fe7882c4c in ?? () from /usr/lib64/libQt5WaylandClient.so.5
#13 0x0000007fef2cfd34 in g_main_context_dispatch () from /usr/lib64/
    libglib-2.0.so.0
#14 0x0000007fef2cffa0 in ?? () from /usr/lib64/libglib-2.0.so.0
#15 0x0000007fef2d0034 in g_main_context_iteration () from /usr/lib64/
    libglib-2.0.so.0
#16 0x0000007fefb83a90 in QEventDispatcherGlib::processEvents(QFlags<QEventLoop:
    :ProcessEventsFlag>) () from /usr/lib64/libQt5Core.so.5
#17 0x0000007fefb2f608 in QEventLoop::exec(QFlags<QEventLoop::
    ProcessEventsFlag>) () from /usr/lib64/libQt5Core.so.5
#18 0x0000007fefb371d4 in QCoreApplication::exec() () from /usr/lib64/
    libQt5Core.so.5
#19 0x000000555557bf88 in main (argc=<optimized out>, argv=<optimized out>) at 
    main.cpp:203
(gdb) 
Getting a backtrace this was is always unsatisfactory because you don't know whether you stopped on the correct thread, or somewhere totally unrelated in one of the other threads that's whirring away at the same time. Gecko has tens of threads running simultaneously (last time this happened it was 70 in total), so that adds a pretty big dose of uncertainty.

I looked at this originally back on Day 122 and kept a record of all of the backtraces for all of the threads back then. They weren't, I have to say, super enlightening and I don't plan to repeat the process again.

The way I fixed it at the time was by amending BrowserPage.qml to remove the following line:
            window.setBrowserCover(webView.tabModel)
Here's that line in context:
    // Use Connections so that target updates when model changes.
    Connections {
        target: AccessPolicy.browserEnabled && webView && webView.tabModel || 
    null
        ignoreUnknownSignals: true
        // Animate overlay to top if needed.
        onCountChanged: {
            if (webView.tabModel.count === 0) {
                webView.handleModelChanges(false)
            }
            window.setBrowserCover(webView.tabModel)
        }
    }
Today I'm going to comment that line out again, just to see whether this is actually an error with private browsing or not. When I do this there are immediately a couple of further errors coming from the QML:
[W] unknown:130 - file:///usr/share/sailfish-browser/pages/components/
    Overlay.qml:130: Error: Insufficient arguments
Thankfully these QML errors are easier to fix. You may recall that some time back I was having trouble with DuckDuckGo rendering. The issue turned out to be related to the Sec-Fetch-* headers being set incorrectly. You can refer back to what I wrote on Day 140 for some of the background if you're interested. But the crucial point is that I had to add a fromExternal flag to various methods, so that it could be passed on for use with the Sec-Fetch-* headers.

It turns out I'd not added a parameter for this flag to all the places where it's called. To fix this I've added the fromExternal to all of the methods that need it.

Following this change the private browsing now seems to work as expected. I still need to fix the hang, but that'll be a task for tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment