Gecko-dev Diary
Starting in August 2023 I'll be upgrading the Sailfish OS browser from Gecko version ESR 78 to ESR 91. This page catalogues my progress.
Latest code changes are in the gecko-dev sailfishos-esr91 branch.
There is an index of all posts in case you want to jump to a particular day.
Gecko
5 most recent items
22 Jul 2024 : Day 296 #
Today I've been trying — really quite hard actually — to fix the hang that happens when switching to private browsing mode. Having looked at this before I refreshed myself yesterday to the point of understanding that it's due to the call to setBrowserCover() that happens when mode switch is made on the tab screen.
I've got a bit further today, the results of which only makes things even more confusing. To explain all this, it'll help to take a look at the setBrowserCover() method, which looks like this:
Whenever the browser is closed any private browsing state — including the associated tab model — is either destroyed or forgotten. This includes the private browsing tab model. That means that having just opened the browser we know the private browsing tab model will have no tabs and so model.count in the above code will be zero.
That means we're going through the first half of the if statement above. There's only one line of functionality that therefore gets called as a result and that's the following:
So far so good. This is exactly what happens when we move to private browsing mode for the very first time. If I comment out the above line there are two consequences:
We can conclude that it seems to be the act of setting the cover that's triggering the hang. This feels very strange because there's nothing special or magical about the cover or the way it gets switched in and out. I've tried a whole host of things in an attempt to get a clearer picture.
For example I wondered whether this was related to private browser mode or not, so I added a timer that switched out the cover after a delay of five seconds, irrespective of what's happening at the time. What I found was that this also hangs the browser, even you just have a static web page open and there's nothing exceptional happening. This suggests that it's not private browsing per se that's causing the problem, but rather the switching of the cover.
Intriguingly, if you do the switch while performing a pan and zoom, there's a crash instead of a hang. This has allowed me to collect the following backtrace:
I've also added some debug output to the setBrowserCover() method so it now looks like this:
So far without luck. Here are just a few of the methods I've attached breakpoints to and tested this way:
Having not managed to find any methods that are fired between the cover being set and the hang occurring, I got frustrated and went for a walk outside. We have a lake nearby that's beautifully calm at this time of year. The air is warm and calm without being oppressive, which makes going for a walk a great way for me to clear my thoughts and come back feeling calmer.
I didn't have any revelations while walking, but I did think about whether I can approach this from a different angle. Rather than trying to find the gecko methods that are causing the problem by seeing if they're being used, what if I were to try to disable gecko functionality in the hope that the hang might suddenly vanish.
If the hang goes away with a particular piece of functionality disabled, then it may indicate some kind of clash between the cover change and the disabled functionality.
So I've tried a whole bunch of things, for example, setting it so that the page is always inactive by forcing the state to be always set to false:
I don't have an answer for why this is happening, but I'll persevere with it. As with everything computer-related, there is definitely an answer, it's just a case of finding it.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
I've got a bit further today, the results of which only makes things even more confusing. To explain all this, it'll help to take a look at the setBrowserCover() method, which looks like this:
s function setBrowserCover(model) { if (!model || model.count === 0 || !WebUtils.firstUseDone) { cover = Qt.resolvedUrl("cover/NoTabsCover.qml") } else { if (cover != null && window.webView) { window.webView.clearSurface() } cover = null } }Let's break this down a bit. The browser manages two tab models, one for normal browsing and the other for private browsing. When switching between the two modes one model is switched out for the other. This setBrowserCover() method is called just proceeding the change in model. So by the time we find ourselves in this method we've already switched the model to the one for private browsing.
Whenever the browser is closed any private browsing state — including the associated tab model — is either destroyed or forgotten. This includes the private browsing tab model. That means that having just opened the browser we know the private browsing tab model will have no tabs and so model.count in the above code will be zero.
That means we're going through the first half of the if statement above. There's only one line of functionality that therefore gets called as a result and that's the following:
cover = Qt.resolvedUrl("cover/NoTabsCover.qml")Typically the cover model for the browser will be set to null so that it shows the contents of the current page. If there are no pages open the cover is replaced, as we can see with this line of code, with the cover layout defined in the NoTabsCover.qml file.
So far so good. This is exactly what happens when we move to private browsing mode for the very first time. If I comment out the above line there are two consequences:
- When there are no active web pages the cover just shows a blank background.
- There's no hang.
We can conclude that it seems to be the act of setting the cover that's triggering the hang. This feels very strange because there's nothing special or magical about the cover or the way it gets switched in and out. I've tried a whole host of things in an attempt to get a clearer picture.
For example I wondered whether this was related to private browser mode or not, so I added a timer that switched out the cover after a delay of five seconds, irrespective of what's happening at the time. What I found was that this also hangs the browser, even you just have a static web page open and there's nothing exceptional happening. This suggests that it's not private browsing per se that's causing the problem, but rather the switching of the cover.
Intriguingly, if you do the switch while performing a pan and zoom, there's a crash instead of a hang. This has allowed me to collect the following backtrace:
[D] onTriggered:45 - Set cover: file:///usr/share/sailfish-browser/cover/ NoTabsCover.qml [New LWP 2607] sailfish-browser: ../../../platforms/wayland/wayland_window_common.cpp:256: void WaylandNativeWindow::releaseBuffer(wl_buffer*): Assertion `it != fronted.end()' failed. Thread 38 "Compositor" received signal SIGABRT, Aborted. [Switching to LWP 2574] 0x0000007fef49a344 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x0000007fef49a344 in raise () from /lib64/libc.so.6 #1 0x0000007fef47fce8 in abort () from /lib64/libc.so.6 #2 0x0000007fef48ebd8 in ?? () from /lib64/libc.so.6 #3 0x0000007fef48ec40 in __assert_fail () from /lib64/libc.so.6 #4 0x0000007fe74e3044 in WaylandNativeWindow::releaseBuffer(wl_buffer*) () from /usr/lib64/libhybris//eglplatform_wayland.so #5 0x0000007fee8fa050 in ?? () from /usr/lib64/libffi.so.8 #6 0x0000007fee8f65f8 in ?? () from /usr/lib64/libffi.so.8 #7 0x0000007fe7795f98 in ?? () from /usr/lib64/libwayland-client.so.0 #8 0x0000007fe7792d80 in ?? () from /usr/lib64/libwayland-client.so.0 #9 0x0000007fe7794038 in wl_display_dispatch_queue_pending () from /usr/lib64/ libwayland-client.so.0 #10 0x0000007fe74e3204 in WaylandNativeWindow::readQueue(bool) () from /usr/ lib64/libhybris//eglplatform_wayland.so #11 0x0000007fe74e23ec in WaylandNativeWindow::finishSwap() () from /usr/lib64/ libhybris//eglplatform_wayland.so #12 0x0000007fef090210 in _my_eglSwapBuffersWithDamageEXT () from /usr/lib64/ libEGL.so.1 #13 0x0000007ff2397110 in mozilla::gl::GLLibraryEGL::fSwapBuffers ( surface=0x5555991a60, dpy=<optimized out>, this=<optimized out>) at gfx/gl/GLLibraryEGL.h:303 #14 mozilla::gl::EglDisplay::fSwapBuffers (surface=0x5555991a60, this=<optimized out>) at gfx/gl/GLLibraryEGL.h:694 #15 mozilla::gl::GLContextEGL::SwapBuffers (this=0x7ed41a6e30) at gfx/gl/GLContextProviderEGL.cpp:558 #16 0x0000007ff2440e00 in mozilla::layers::CompositorOGL::EndFrame ( this=0x7ed41a1d70) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313 #17 0x0000007ff25174dc in mozilla::layers::LayerManagerComposite::Render ( this=this@entry=0x7ed41a8a70, aInvalidRegion=..., aOpaqueRegion=...) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313 #18 0x0000007ff2517728 in mozilla::layers::LayerManagerComposite:: UpdateAndRender (this=this@entry=0x7ed41a8a70) at gfx/layers/composite/LayerManagerComposite.cpp:657 #19 0x0000007ff2517ad8 in mozilla::layers::LayerManagerComposite:: EndTransaction (this=this@entry=0x7ed41a8a70, aTimeStamp=..., aFlags=aFlags@entry=mozilla::layers::LayerManager::END_DEFAULT) at gfx/layers/composite/LayerManagerComposite.cpp:572 #20 0x0000007ff2559274 in mozilla::layers::CompositorBridgeParent:: CompositeToTarget (this=0x7fb89aba80, aId=..., aTarget=0x0, aRect=<optimized out>) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313 #21 0x0000007ff253e9bc in mozilla::layers::CompositorVsyncScheduler::Composite ( this=0x7fb8b500e0, aVsyncEvent=...) at gfx/layers/ipc/CompositorVsyncScheduler.cpp:256 #22 0x0000007ff2536e34 in mozilla::detail::RunnableMethodArguments<mozilla:: VsyncEvent>::applyImpl<mozilla::layers::CompositorVsyncScheduler, void ( mozilla::layers::CompositorVsyncScheduler::*)(mozilla::VsyncEvent const&), StoreCopyPassByConstLRef<mozilla::VsyncEvent>, 0ul> (args=..., m=<optimized out>, o=<optimized out>) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/ nsThreadUtils.h:887 #23 mozilla::detail::RunnableMethodArguments<mozilla::VsyncEvent>:: apply<mozilla::layers::CompositorVsyncScheduler, void (mozilla::layers:: CompositorVsyncScheduler::*)(mozilla::VsyncEvent const&)> (m=<optimized out>, o=<optimized out>, this=<optimized out>) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1154 #24 mozilla::detail::RunnableMethodImpl<mozilla::layers:: CompositorVsyncScheduler*, void (mozilla::layers::CompositorVsyncScheduler:: *)(mozilla::VsyncEvent const&), true, (mozilla::RunnableKind)1, mozilla:: VsyncEvent>::Run (this=<optimized out>) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1201 [...] #34 0x0000007fef54989c in ?? () from /lib64/libc.so.6 (gdb)This hints at the possibility that the render buffers may be being swapped in the wrong thread. But my attempts to dig deeper into this haven't as yet thrown up anything that could give more of a hint about what's going on.
I've also added some debug output to the setBrowserCover() method so it now looks like this:
function setBrowserCover(model) { console.log("model: " + model); console.log("model.count: " + model.count); if (!model || model.count === 0 || !WebUtils.firstUseDone) { console.log("Setting cover"); cover = Qt.resolvedUrl("cover/NoTabsCover.qml") console.log("Set cover: " + cover); } else { console.log("Not setting cover"); if (cover != null && window.webView) { window.webView.clearSurface() } cover = null } console.log("Exiting"); }When switching to private browsing mode, whether it's done via the menu or the tab list, the following output is triggered:
[D] setBrowserCover:20 - model: PrivateTabModel(0x62bf23d0e0) [D] setBrowserCover:21 - model.count: 0 [D] setBrowserCover:23 - Setting cover [D] setBrowserCover:25 - Set cover: file:///usr/share/sailfish-browser/cover/ NoTabsCover.qml [D] setBrowserCover:33 - ExitingImmediately after the last debug print here the browser hangs. I've been trying hard to find some method inside the browser that's executed between the last line of this debug output and the actual hang, but without success. I've been doing this by adding breakpoints to various methods, switching to private browsing and watching to see if any of the breakpoints are hit.
So far without luck. Here are just a few of the methods I've attached breakpoints to and tested this way:
GLContextEGL::SwapBuffers() GLContextEGL::SetDamage() GLContextEGL::RenewSurface() GLScreenBuffer::Swap() ReadBuffer::Attach() BeginTransaction() EndEmptyTransaction() NeedsPaint() QOpenGLWebPage::onDrawOverlay()Many of the breakpoints on these methods are triggered at other points in the browsing process, but if this happens I've just been continuing execution until the point at which I manually switch to private browsing. I get the same output and the same hang as when there's no breakpoint, like this:
Thread 39 "Compositor" hit Breakpoint 1, mozilla::layers:: LayerManagerComposite::BeginTransaction (this=0x7ed41a8c20, aURL=...) at gfx/layers/composite/LayerManagerComposite.cpp:232 232 bool LayerManagerComposite::BeginTransaction(const nsCString& aURL) { (gdb) c Continuing. [D] setBrowserCover:20 - model: PrivateTabModel(0x7fd800da50) [D] setBrowserCover:21 - model.count: 0 [D] setBrowserCover:23 - Setting cover [D] setBrowserCover:25 - Set cover: file:///usr/share/sailfish-browser/cover/ NoTabsCover.qml [D] setBrowserCover:33 - ExitingThere are a few other things I think it's worth mentioning. The hang happens when the cover is set, but not when it's cleared. If the cover is set right at the start and left as it is (so it's never set to null), everything runs fine. So it very much seems to be the act of switching from null to non-null that causes the problem.
Having not managed to find any methods that are fired between the cover being set and the hang occurring, I got frustrated and went for a walk outside. We have a lake nearby that's beautifully calm at this time of year. The air is warm and calm without being oppressive, which makes going for a walk a great way for me to clear my thoughts and come back feeling calmer.
I didn't have any revelations while walking, but I did think about whether I can approach this from a different angle. Rather than trying to find the gecko methods that are causing the problem by seeing if they're being used, what if I were to try to disable gecko functionality in the hope that the hang might suddenly vanish.
If the hang goes away with a particular piece of functionality disabled, then it may indicate some kind of clash between the cover change and the disabled functionality.
So I've tried a whole bunch of things, for example, setting it so that the page is always inactive by forcing the state to be always set to false:
void QOpenGLWebPage::setActive(bool active) { + active = false; // WebPage is in inactive state until the view is initialized. // ::processViewInitialization always forces active state so we // can just ignore early activation calls. if (!d || !d->mViewInitialized) return; if (d->mActive != active) { d->mActive = active; d->mView->SetIsActive(d->mActive); Q_EMIT activeChanged(); } }I also tried disabling the initialisation code:
void QOpenGLWebPage::initialize() { - d->createView(); }Plus a whole bunch of other similar things, from disconnecting various signals to preventing the EGL Display from being initialised. Many of these changes prevented rendering, but none of them prevented the hang.
I don't have an answer for why this is happening, but I'll persevere with it. As with everything computer-related, there is definitely an answer, it's just a case of finding it.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comments
Uncover Disqus comments