flypig.co.uk

List items

Items from the current list are shown below.

Blog

29 Dec 2023 : Day 122 #
Over the last couple of days the plan had been to wrap up creating a QSortFilterProxyModel to hide the "hidden" tabs, stick it in the QML and call it a good 'un. But while doing that I discovered that switching from persistent to private tab mode causes the user interface thread of the browser to hang.

It's turning out to be quite nasty because, since there's no crash, the debugger won't tell us where the problem is. Looking through the code carefully hasn't revealed anything obvious either.

So to try to figure out what's going on I have to wait until the hang occurs, manually break the execution (using Ctrl-c) and then check each of the threads to find out what they're up to.

First of all, let's find out what threads are running.
(gdb) c
Continuing.
[Parent 23887: Unnamed thread 7f88002670]: E/EmbedLite NON_IMPL:
    EmbedLite::virtual nsresult WebBrowserChrome::GetDimensions(uint32_t,
    int32_t*, int32_t*, int32_t*, int32_t*):542 GetView dimensitions
[New LWP 26091]
^C
Thread 1 "sailfish-browse" received signal SIGINT, Interrupt.
0x0000007fb7bcf718 in pthread_cond_wait () from /lib64/libpthread.so.0
(gdb) info thread
  Id   Target Id                   Frame 
* 1    LWP 23887 "sailfish-browse" 0x0000007fb7bcf718 in pthread_cond_wait ()
  2    LWP 24132 "QQmlThread"      0x0000007fb78a8740 in poll ()
  3    LWP 24133 "QDBusConnection" 0x0000007fb78a8740 in poll ()
  4    LWP 24134 "gmain"           0x0000007fb78a8740 in poll ()
  5    LWP 24135 "dconf worker"    0x0000007fb78a8740 in poll ()
  6    LWP 24137 "gdbus"           0x0000007fb78a8740 in poll ()
  7    LWP 24147 "QThread"         0x0000007fb78a8740 in poll ()
  8    LWP 24149 "GeckoWorkerThre" StringMatch (text=0x7f9df3bf40,
                                   pat=0x7f9e0400c0, start=start@entry=0)
                                   at js/src/builtin/String.cpp:1944
  10   LWP 24151 "IPC I/O Parent"  0x0000007fb78ade24 in syscall ()
  11   LWP 24152 "QSGRenderThread" 0x0000007fb7bcf718 in pthread_cond_wait ()
  12   LWP 24153 "Netlink Monitor" 0x0000007fb78a8740 in poll ()
  13   LWP 24154 "Socket Thread"   0x0000007fb78a8740 in poll ()
  15   LWP 24156 "TaskCon~read #0" 0x0000007fb7bcf718 in pthread_cond_wait ()
  16   LWP 24157 "TaskCon~read #1" 0x0000007fb7bcf718 in pthread_cond_wait ()
  17   LWP 24158 "TaskCon~read #2" 0x0000007fb7bcf718 in pthread_cond_wait ()
  18   LWP 24159 "TaskCon~read #3" 0x0000007fb7bcf718 in pthread_cond_wait ()
  19   LWP 24160 "TaskCon~read #4" 0x0000007fb7bcf718 in pthread_cond_wait ()
  20   LWP 24161 "TaskCon~read #5" 0x0000007fb7bcf718 in pthread_cond_wait ()
  21   LWP 24162 "TaskCon~read #6" 0x0000007fb7bcf718 in pthread_cond_wait ()
  22   LWP 24163 "TaskCon~read #7" 0x0000007fb7bcf718 in pthread_cond_wait ()
  24   LWP 24165 "Timer"           0x0000007fb7bcfb80 in pthread_cond_timedwait ()
  25   LWP 24167 "IPDL Background" 0x0000007fb7bcf718 in pthread_cond_wait ()
  26   LWP 24168 "Cache2 I/O"      0x0000007fb7bcf718 in pthread_cond_wait ()
  27   LWP 24169 "Cookie"          0x0000007fb7bcf718 in pthread_cond_wait ()
  32   LWP 24174 "Worker Launcher" 0x0000007fb7bcf718 in pthread_cond_wait ()
  33   LWP 24175 "QuotaManager IO" 0x0000007fb7bcf718 in pthread_cond_wait ()
  35   LWP 24177 "Softwar~cThread" 0x0000007fb7bcfb80 in pthread_cond_timedwait ()
  36   LWP 24178 "Compositor"      0x0000007fb7bcf718 in pthread_cond_wait ()
  37   LWP 24179 "ImageIO"         0x0000007fb7bcf718 in pthread_cond_wait ()
  38   LWP 24181 "DOM Worker"      0x0000007fb7bcf718 in pthread_cond_wait ()
  40   LWP 24183 "ImageBridgeChld" 0x0000007fb7bcf718 in pthread_cond_wait ()
  42   LWP 24185 "Permission"      0x0000007fb7bcf718 in pthread_cond_wait ()
  43   LWP 24186 "TRR Background"  0x0000007fb7bcf718 in pthread_cond_wait ()
  44   LWP 24187 "URL Classifier"  0x0000007fb7bcf718 in pthread_cond_wait ()
  48   LWP 24191 "ProxyResolution" 0x0000007fb7bcf718 in pthread_cond_wait ()
  49   LWP 24193 "mozStorage #1"   0x0000007fb7bcf718 in pthread_cond_wait ()
  50   LWP 24195 "HTML5 Parser"    0x0000007fb7bcf718 in pthread_cond_wait ()
  51   LWP 24196 "localStorage DB" 0x0000007fb7bcf718 in pthread_cond_wait ()
  53   LWP 24198 "StyleThread#0"   0x0000007fb7bcf718 in pthread_cond_wait ()
  54   LWP 24199 "StyleThread#1"   0x0000007fb7bcf718 in pthread_cond_wait ()
  55   LWP 24200 "StyleThread#2"   0x0000007fb7bcf718 in pthread_cond_wait ()
  56   LWP 24202 "StyleThread#3"   0x0000007fb7bcf718 in pthread_cond_wait ()
  57   LWP 24203 "StyleThread#4"   0x0000007fb7bcf718 in pthread_cond_wait ()
  58   LWP 24204 "StyleThread#5"   0x0000007fb7bcf718 in pthread_cond_wait ()
  65   LWP 24293 "mozStorage #2"   0x0000007fb7bcf718 in pthread_cond_wait ()
  66   LWP 24294 "mozStorage #3"   0x0000007fb7bcf718 in pthread_cond_wait ()
  69   LWP 24942 "Backgro~Pool #5" 0x0000007fb7bcfb80 in pthread_cond_timedwait ()
  70   LWP 26091 "QSGRenderThread" 0x0000007fb78a8740 in poll ()
That's rather a lot of threads to check through, but nonetheless it's still likely to be our most fruitful approach right now. We can ask the debugger to print a backtrace for every single thread by calling thread apply all bt.

I won't copy out all of the resulting output here. Instead I'll include just the interesting backtraces and summarise the others.
(gdb) thread apply all bt

Thread 70 (LWP 26091):
#0  0x0000007fb78a8740 in poll () from /lib64/libc.so.6
#1  0x0000007fafb38bfc in ?? () from /usr/lib64/libwayland-client.so.0
#2  0x0000007fafb3a258 in wl_display_dispatch_queue ()
    from /usr/lib64/libwayland-client.so.0
#3  0x0000007faf885204 in WaylandNativeWindow::readQueue(bool) ()
    from /usr/lib64/libhybris//eglplatform_wayland.so
#4  0x0000007faf8843ec in WaylandNativeWindow::finishSwap() ()
    from /usr/lib64/libhybris//eglplatform_wayland.so
#5  0x0000007fb73f9210 in _my_eglSwapBuffersWithDamageEXT ()
    from /usr/lib64/libEGL.so.1
#6  0x0000007fafa4e080 in ?? () from
    /usr/lib64/qt5/plugins/wayland-graphics-integration-client/libwayland-egl.so
#7  0x0000007fb88e5180 in QOpenGLContext::swapBuffers(QSurface*) ()
    from /usr/lib64/libQt5Gui.so.5
#8  0x0000007fb8e64c68 in ?? () from /usr/lib64/libQt5Quick.so.5
#9  0x0000007fb8e6ac10 in ?? () from /usr/lib64/libQt5Quick.so.5
#10 0x0000007fb7ce20e8 in ?? () from /usr/lib64/libQt5Core.so.5
#11 0x0000007fb7bc8a4c in ?? () from /lib64/libpthread.so.0
#12 0x0000007fb78b289c in ?? () from /lib64/libc.so.6

[...]

Thread 11 (LWP 24152):
#0  0x0000007fb7bcf718 in pthread_cond_wait () from /lib64/libpthread.so.0
#1  0x0000007fb7ce2924 in QWaitCondition::wait(QMutex*, unsigned long) ()
    from /usr/lib64/libQt5Core.so.5
#2  0x0000007fb8e6a7cc in ?? () from /usr/lib64/libQt5Quick.so.5
#3  0x0000007fb8e6ac60 in ?? () from /usr/lib64/libQt5Quick.so.5
#4  0x0000007fb7ce20e8 in ?? () from /usr/lib64/libQt5Core.so.5
#5  0x0000007fb7bc8a4c in ?? () from /lib64/libpthread.so.0
#6  0x0000007fb78b289c in ?? () from /lib64/libc.so.6

Thread 10 (LWP 24151):
#0  0x0000007fb78ade24 in syscall () from /lib64/libc.so.6
#1  0x0000007fba03cdd0 in epoll_wait (epfd=<optimized out>,
    events=events@entry=0x7f8c0018a0, maxevents=<optimized out>,
    timeout=timeout@entry=-1)
    at ipc/chromium/src/third_party/libevent/epoll_sub.c:62
#2  0x0000007fba03f7a0 in epoll_dispatch (base=0x7f8c0015e0, tv=<optimized out>)
    at ipc/chromium/src/third_party/libevent/epoll.c:462
#3  0x0000007fba041568 in event_base_loop (base=0x7f8c0015e0,
    flags=flags@entry=1) at ipc/chromium/src/third_party/libevent/event.c:1947
#4  0x0000007fba01e248 in base::MessagePumpLibevent::Run (this=0x7f8c001560,
    delegate=0x7f9eeb9de0) at ipc/chromium/src/base/message_pump_libevent.cc:346
#5  0x0000007fba01f5bc in MessageLoop::RunInternal (this=this@entry=0x7f9eeb9de0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#6  0x0000007fba01f800 in MessageLoop::RunHandler (this=0x7f9eeb9de0)
    at ipc/chromium/src/base/message_loop.cc:352
#7  MessageLoop::Run (this=this@entry=0x7f9eeb9de0)
    at ipc/chromium/src/base/message_loop.cc:334
#8  0x0000007fba0336d8 in base::Thread::ThreadMain (this=0x7f88031040)
    at ipc/chromium/src/base/thread.cc:187
#9  0x0000007fba01dba0 in ThreadFunc (closure=<optimized out>)
    at ipc/chromium/src/base/platform_thread_posix.cc:40
#10 0x0000007fb7bc8a4c in ?? () from /lib64/libpthread.so.0
#11 0x0000007fb78b289c in ?? () from /lib64/libc.so.6

Thread 8 (LWP 24149):
#0  StringMatch (text=0x7f9df3bf40, pat=0x7f9e0400c0, start=start@entry=0)
    at js/src/builtin/String.cpp:1944
#1  0x0000007fbcc68b0c in js::str_indexOf (cx=<optimized out>,
    argc=<optimized out>, vp=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/js/CallArgs.h:245
#2  0x0000007f00708994 in ?? ()
#3  0x0000007fbd165538 in js::jit::MaybeEnterJit (cx=0xc021, state=...)
    at js/src/jit/Jit.cpp:207
#4  0x0000007fbd165538 in js::jit::MaybeEnterJit (cx=0x7f881cd720, state=...)
    at js/src/jit/Jit.cpp:207
#5  0x0000007ef4101401 in ?? ()
Backtrace stopped: Cannot access memory at address 0x84041a32f7d6

[...]
(gdb)
Of the other threads 35 are in a pthread_cond_wait() state and three are in a pthread_cond_timedwait() state. Nine are in a poll() state.

That leaves thread 8 and thread 10 in slightly different states, but in practice their backtraces still don't tell us anything particularly interesting.

The one that looks most interesting to me is thread 70. It looks to be a rendering thread, but thread 11 is already there as a rendering thread and during normal operation I'd only expect to see one of these.

Thread 70 is doing something to do with EGL and wayland.

Unfortunately this still doesn't give me anything actionable. So I'm going to try something else.

The hang occurs when we switch to private browsing mode. So let's take a look at the code that happens as a result of pressing the button to do this in the user interface. This can be found in declarativewebcontainer.cpp and looks like this:
void DeclarativeWebContainer::updateMode()
{
    setTabModel((BrowserApp::captivePortal() || m_privateMode)
        ? m_privateTabModel.data() : m_persistentTabModel.data());
    emit tabIdChanged();

    // Reload active tab from new mode
    if (m_model->count() > 0) {
        reload(false);
    } else {
        setWebPage(NULL);
        emit contentItemChanged();
    }
}
By commenting out different bits of the code and rebuilding I should be able to narrow the issue down.

And it turns out that here it's the call to setTabModel() that triggers the hang. Comment this line out and everything continues without hanging. Of course the functionality isn't all working (it doesn't correctly switch to private browsing mode), but this does at least get us somewhere to start. Let's dig deeper into this setTabModel() method. The implementation for this method looks like this:
void DeclarativeWebContainer::setTabModel(DeclarativeTabModel *model)
{
    if (m_model != model) {
        int oldCount = 0;
        if (m_model) {
            disconnect(m_model, 0, 0, 0);
            oldCount = m_model->count();
        }

        m_model = model;
        int newCount = 0;
        if (m_model) {
            connect(m_model.data(), &DeclarativeTabModel::activeTabChanged,
                    this, &DeclarativeWebContainer::onActiveTabChanged);
            connect(m_model.data(), &DeclarativeTabModel::activeTabChanged,
                    this, &DeclarativeWebContainer::tabIdChanged);
            connect(m_model.data(), &DeclarativeTabModel::loadedChanged,
                    this, &DeclarativeWebContainer::initialize);
            connect(m_model.data(), &DeclarativeTabModel::tabClosed,
                    this, &DeclarativeWebContainer::releasePage);
            connect(m_model.data(), &DeclarativeTabModel::newTabRequested,
                    this, &DeclarativeWebContainer::onNewTabRequested);
            newCount = m_model->count();
        }
        emit tabModelChanged();
        if (m_model && oldCount != newCount) {
            emit m_model->countChanged();
        }
    }
}
Again, by commenting out different parts of the code it should be possible to narrow down what's causing the problem. And indeed in this case the only line that consistently causes the hang to happen is the following:
        emit tabModelChanged();
I can leave all of the other lines in and the hang dissipates, but keep this line in, even with the others removed, and the hang returns.

So that means there's something hooked into this signal that's causing the hang. There's only one connection made to this in the C++ code and commenting that out doesn't make any difference. So the problem must be in the QML code.

Unfortunately there are many different bits of code that hang off this signal. I've commented large chunks of code related to this signal out to see whether skipping them prevents the hang. Most seem to have no effect on the outcome. But if I comment out enough stuff the hang no longer occurs.

Now I'm working backwards adding code back in until the hang returns. It's laborious, but at least guaranteed to move things forwards one step at a time.

Eventually after a lot of trial an error I've been able to narrow down the problem to this small snippet of code in BrowserPage.qml:
    // Use Connections so that target updates when model changes.
    Connections {
        target: AccessPolicy.browserEnabled && webView
                && webView.tabModel || null
        ignoreUnknownSignals: true
        // Animate overlay to top if needed.
        onCountChanged: {
            if (webView.tabModel.count === 0) {
                webView.handleModelChanges(false)
            }
            window.setBrowserCover(webView.tabModel)
        }
    }
Comment this code out and everything seems to work okay. At least, everything except the cover preview. In particular, the proxy filter model doesn't seem to be causing any issues.

With a bit more trial and error I'm able to narrow it down even further, to just this line:
            window.setBrowserCover(webView.tabModel)
With this line commented out things are looking up: the print to PDF functionality works nicely; there are no extraneous tabs added during printing; there's just the slightest of flickers when the printing starts; but crucially there are no other obvious issues and no hanging.

I'm really happy with this. It means that the code for getting the print to PDF functionality working is now all there. It'll be important to deal with the hang, but it's actually unrelated to these changes.

Given all this, that leaves me three things to deal with:
  1. Check whether the flicker can be removed by skipping the activation of the hidden window.
  2. Tidy up the QML implementation of the new proxy filter model.
  3. Move the DownloadPDFSaver class from DownloadCore.js to EmbedliteDownloadManager.js.
  4. Record an issue for the setBrowserCover() hang we were just looking at.
That's still quite a lot to do. But at least I'm happy that everything is now under control. What's now clear is that it'll definitely be possible to restore the PDF printing functionality, all I really have to do now is clean up the implementation. Fixing the hang can go in a separate issue.

Today I'm happy: it's been a productive day, we got to the bottom of the hang, and all of the printing changes are coming together into something usable. Now it's time for bed and hopefully a very sound sleep as a result.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.

Comments

Uncover Disqus comments