Gecko-dev Diary
Starting in August 2023 I'll be upgrading the Sailfish OS browser from Gecko version ESR 78 to ESR 91. This page catalogues my progress.
Latest code changes are in the gecko-dev sailfishos-esr91 branch.
There is an index of all posts in case you want to jump to a particular day.
Gecko
5 most recent items
16 Aug 2024 : Day 321 #
I've been working my way through the last few issues with the browser. The latest has been to get DuckDuckGo rendering correctly. Ever since the SecFetch header changes back in January the page has been working, it's just been too wide for the screen. It turns out to be expected behaviour: the server thinks we're a desktop browser and intentionally serves us a wide page. Changing the user agent string so we look like Android fixed the issue.
Yesterday I just hacked around with the ua-update.json file in my profile on my device to get this working. Today I need to make the change permanent by adding it to ua-update.json.in in the sailfish-browser repository.
The reason the issue with the OpenAI community pages came to light is that Raine mentioned it to me. Alongside that page, he also asked about two other pages, discussion.fedoraproject.org and help.nextcloud.com. All three use the latest Discourse forum software and all three of them share the a similar issue, which is that they all render too wide for the screen. That's if we send our correct user agent string to them.
If we pretend to be Android on the other hand we get served slightly different code, with the result that they then render really quite cleanly.
Discourse must be doing some kind of user-agent checking on the server and serving pages that different depending on the browser the server thinks is in use. It's an ugly thing to do, but not at all uncommon.
The problem with Discourse using this approach is especially important when it comes to Sailfish OS because the Sailfish OS Forum uses Discourse for its backend as well. Currently the version is being held back so that it continues to work correctly with ESR 78 on Sailfish OS devices. Once ESR 91 is released, it should also then be safe for Jolla to upgrade the Sailfish OS Forum Discourse version as well.
That's just a bit of background. The task now is to cement all of these fixes by adding them to the ua-update.json.in file in the sailfish-browser repository. Here's what I've addeed:
I'm going to test this now. Unfortunately when I try activating the default cover I still get the same hang, even after fixing the video crash bug. So there's still something to address here and, since fixing this may be what's needed to get the browser working on Sailfish OS 4.6, I think I'll need to return to it.
I've the same difficulty as before, which is that it's a hang, not a crash. So getting a clean backtrace is proving to be a real challenge. There are also so many threads running, almost all of which appear to be stalled, making it hard to discern which is the one blocking progress:
The hang happens, but none of the breakpoints hit. Not immediately at least. After maybe a few seconds the SchedulePaint() method is called:
I'm not sure what this all tells me. I think it may be that the render code is the wrong place to be looking, but I'm not fully certain. What I am certain about is that fixing this issue is going to take a whole lot of trial, error and luck.
Unfortuantely that's all I can manage for today, but I'll return to exploring this issue again tomorrow.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Yesterday I just hacked around with the ua-update.json file in my profile on my device to get this working. Today I need to make the change permanent by adding it to ua-update.json.in in the sailfish-browser repository.
The reason the issue with the OpenAI community pages came to light is that Raine mentioned it to me. Alongside that page, he also asked about two other pages, discussion.fedoraproject.org and help.nextcloud.com. All three use the latest Discourse forum software and all three of them share the a similar issue, which is that they all render too wide for the screen. That's if we send our correct user agent string to them.
If we pretend to be Android on the other hand we get served slightly different code, with the result that they then render really quite cleanly.
Discourse must be doing some kind of user-agent checking on the server and serving pages that different depending on the browser the server thinks is in use. It's an ugly thing to do, but not at all uncommon.
The problem with Discourse using this approach is especially important when it comes to Sailfish OS because the Sailfish OS Forum uses Discourse for its backend as well. Currently the version is being held back so that it continues to work correctly with ESR 78 on Sailfish OS devices. Once ESR 91 is released, it should also then be safe for Jolla to upgrade the Sailfish OS Forum Discourse version as well.
That's just a bit of background. The task now is to cement all of these fixes by adding them to the ua-update.json.in file in the sailfish-browser repository. Here's what I've addeed:
"duckduckgo.com": "Mozilla/5.0 (Mobile; rv:91.0) Gecko/91.0 Firefox/91.0", "openai.com": "Mozilla/5.0 (Mobile; rv:91.0) Gecko/91.0 Firefox/91.0", "fedoraproject.org": "Mozilla/5.0 (Mobile; rv:91.0) Gecko/91.0 Firefox/91.0", "nextcloud.com": "Mozilla/5.0 (Mobile; rv:91.0) Gecko/91.0 Firefox/91.0"With that out of the way I'm left with the task of getting the browser to work on Sailfish OS 4.6. The reports I've received are that it fails due to a potential double-free on some Wayland-related resource. It's possible the changes made to fix video rendering by avoiding duplicate calls to eglTerminate() may have addressed this. But I have another concern as well, which is that the hang caused by switching to the default cover that we were investigating on Day 296 may share the same underlying cause as well.
I'm going to test this now. Unfortunately when I try activating the default cover I still get the same hang, even after fixing the video crash bug. So there's still something to address here and, since fixing this may be what's needed to get the browser working on Sailfish OS 4.6, I think I'll need to return to it.
I've the same difficulty as before, which is that it's a hang, not a crash. So getting a clean backtrace is proving to be a real challenge. There are also so many threads running, almost all of which appear to be stalled, making it hard to discern which is the one blocking progress:
1 "sailfish-browse" pthread_cond_wait () from /lib64/libpthread.so.0 2 "QQmlThread" poll () from /lib64/libc.so.6 3 "QDBusConnection" poll () from /lib64/libc.so.6 4 "gmain" poll () from /lib64/libc.so.6 5 "dconf worker" poll () from /lib64/libc.so.6 6 "gdbus" poll () from /lib64/libc.so.6 7 "QThread" poll () from /lib64/libc.so.6 8 "QQuickPixmapRea" poll () from /lib64/libc.so.6 9 "Qt bearer threa" poll () from /lib64/libc.so.6 10 "GeckoWorkerThre" poll () from /lib64/libc.so.6 12 "QSGRenderThread" pthread_cond_wait () from /lib64/libpthread.so.0 14 "IPC I/O Parent" syscall () from /lib64/libc.so.6 15 "Netlink Monitor" poll () from /lib64/libc.so.6 16 "Socket Thread" poll () from /lib64/libc.so.6 18 "TaskCon~read #0" pthread_cond_wait () from /lib64/libpthread.so.0 19 "TaskCon~read #1" pthread_cond_wait () from /lib64/libpthread.so.0 20 "TaskCon~read #2" pthread_cond_wait () from /lib64/libpthread.so.0 21 "TaskCon~read #3" pthread_cond_wait () from /lib64/libpthread.so.0 22 "TaskCon~read #4" pthread_cond_wait () from /lib64/libpthread.so.0 23 "TaskCon~read #5" pthread_cond_wait () from /lib64/libpthread.so.0 24 "TaskCon~read #6" pthread_cond_wait () from /lib64/libpthread.so.0 25 "TaskCon~read #7" pthread_cond_wait () from /lib64/libpthread.so.0 27 "Timer" pthread_cond_timedwait () from /lib64/ libpthread.so.0 28 "IPDL Background" pthread_cond_wait () from /lib64/libpthread.so.0 29 "Cache2 I/O" pthread_cond_wait () from /lib64/libpthread.so.0 30 "Cookie" pthread_cond_wait () from /lib64/libpthread.so.0 32 "StreamTrans #1" pthread_cond_timedwait () from /lib64/ libpthread.so.0 35 "Worker Launcher" pthread_cond_wait () from /lib64/libpthread.so.0 36 "QuotaManager IO" pthread_cond_wait () from /lib64/libpthread.so.0 38 "Softwar~cThread" pthread_cond_wait () from /lib64/libpthread.so.0 39 "Compositor" pthread_cond_wait () from /lib64/libpthread.so.0 40 "ImageIO" pthread_cond_wait () from /lib64/libpthread.so.0 41 "ImageBridgeChld" pthread_cond_wait () from /lib64/libpthread.so.0 44 "Permission" pthread_cond_wait () from /lib64/libpthread.so.0 45 "TRR Background" pthread_cond_wait () from /lib64/libpthread.so.0 46 "URL Classifier" pthread_cond_wait () from /lib64/libpthread.so.0 47 "DNS Resolver #1" pthread_cond_timedwait () from /lib64/ libpthread.so.0 48 "DNS Resolver #2" pthread_cond_timedwait () from /lib64/ libpthread.so.0 49 "DNS Resolver #3" pthread_cond_timedwait () from /lib64/ libpthread.so.0 50 "ProxyResolution" pthread_cond_wait () from /lib64/libpthread.so.0 51 "DNS Resolver #4" pthread_cond_timedwait () from /lib64/ libpthread.so.0 52 "DNS Resolver #5" pthread_cond_timedwait () from /lib64/ libpthread.so.0 56 "HTML5 Parser" pthread_cond_wait () from /lib64/libpthread.so.0 57 "localStorage DB" pthread_cond_wait () from /lib64/libpthread.so.0 58 "StyleThread#0" pthread_cond_wait () from /lib64/libpthread.so.0 59 "StyleThread#1" pthread_cond_wait () from /lib64/libpthread.so.0 60 "StyleThread#2" pthread_cond_wait () from /lib64/libpthread.so.0 61 "StyleThread#3" pthread_cond_wait () from /lib64/libpthread.so.0 62 "StyleThread#4" pthread_cond_wait () from /lib64/libpthread.so.0 63 "StyleThread#5" pthread_cond_wait () from /lib64/libpthread.so.0 64 "Compositor" ?? () 65 "Compositor" ?? () 68 "Backgro~Pool #3" pthread_cond_timedwait () from /lib64/ libpthread.so.0 71 "mozStorage #1" pthread_cond_wait () from /lib64/libpthread.so.0 72 "mozStorage #2" pthread_cond_wait () from /lib64/libpthread.so.0 73 "BgIOThr~Pool #1" pthread_cond_timedwait () from /lib64/ libpthread.so.0 74 "QSGRenderThread" poll () from /lib64/libc.so.6So I'm having to make soem guesses here, educated or otherwise. My guess is that the issue is something to do with rendering, so I've placed breakpoints on a few crucial methods that participate in the rendering process:
(gdb) break Paint Breakpoint 1 at 0x7ff23bc57c: Paint. (87 locations) (gdb) break OnFirstPaint Breakpoint 2 at 0x7ff4c3fcf4: OnFirstPaint. (4 locations) (gdb) break SchedulePaint Breakpoint 3 at 0x7ff3ba92e8: SchedulePaint. (3 locations)I've disabled all of these breakpoints and run the browser up until I'm about to switch to Private browsing mode. It's at this point that the blank cover is attached and the browser hangs. I've now re-enabled the breakpoints and pressed the button to switch browser mode.
The hang happens, but none of the breakpoints hit. Not immediately at least. After maybe a few seconds the SchedulePaint() method is called:
Thread 10 "GeckoWorkerThre" hit Breakpoint 3, nsIFrame::SchedulePaint (this=this@entry=0x7e5c02db20, aType=aType@entry=nsIFrame::PAINT_DEFAULT, aFrameChanged=aFrameChanged@entry=true) at ${PROJECT}/gecko-dev/layout/generic/nsIFrame.cpp:7415 7415 ${PROJECT}/gecko-dev/layout/generic/nsIFrame.cpp: No such file or directory. (gdb) bt #0 nsIFrame::SchedulePaint (this=this@entry=0x7e5c02db20, aType=aType@entry=nsIFrame::PAINT_DEFAULT, aFrameChanged=aFrameChanged@entry=true) at ${PROJECT}/gecko-dev/layout/generic/nsIFrame.cpp:7415 #1 0x0000007ff41029cc in nsIFrame::SetParent (this=this@entry=0x7e5c02db20, aParent=aParent@entry=0x7e5c02da80) at ${PROJECT}/gecko-dev/layout/generic/nsIFrame.cpp:11047 #2 0x0000007ff4102a34 in nsFrameList::ApplySetParent ( this=this@entry=0x7fde775e08, aParent=aParent@entry=0x7e5c02da80) at ${PROJECT}/gecko-dev/layout/generic/nsFrameList.cpp:280 #3 0x0000007ff4102a70 in nsFrameList::InsertFrames ( this=this@entry=0x7e5c02db08, aParent=aParent@entry=0x7e5c02da80, aPrevSibling=aPrevSibling@entry=0x0, aFrameList=...) at ${PROJECT}/gecko-dev/layout/generic/nsFrameList.cpp:130 #4 0x0000007ff4092abc in nsContainerFrame::InsertFrames (this=0x7e5c02da80, aListID=mozilla::layout::kPrincipalList, aPrevFrame=0x0, aPrevFrameLine=<optimized out>, aFrameList=...) at ${PROJECT}/gecko-dev/layout/generic/nsContainerFrame.cpp:144 #5 0x0000007ff402e164 in nsFrameManager::InsertFrames ( this=this@entry=0x7fb9189c40, aParentFrame=aParentFrame@entry=0x7e5c02da80, aListID=aListID@entry=mozilla::layout::kPrincipalList, aPrevFrame=aPrevFrame@entry=0x0, aFrameList=...) at ${PROJECT}/gecko-dev/layout/base/nsFrameManager.cpp:89 #6 0x0000007ff4045a0c in nsCSSFrameConstructor::AppendFramesToParent ( this=this@entry=0x7fb9189c40, aState=..., aParentFrame=0x7e5c02da80, aFrameList=..., aPrevSibling=aPrevSibling@entry=0x0, aIsRecursiveCall=aIsRecursiveCall@entry=false) at ${PROJECT}/gecko-dev/layout/base/nsCSSFrameConstructor.cpp:5933 #7 0x0000007ff404f1f0 in nsCSSFrameConstructor::ContentAppended ( this=this@entry=0x7fb9189c40, aFirstNewContent=aFirstNewContent@entry=0x7fb8d0ef00, aInsertionKind=aInsertionKind@entry=nsCSSFrameConstructor::InsertionKind:: Sync) at ${PROJECT}/gecko-dev/layout/base/nsCSSFrameConstructor.cpp:6819 #8 0x0000007ff401c718 in mozilla::RestyleManager::ProcessRestyledFrames ( this=this@entry=0x7fb9189d30, aChangeList=...) at ${PROJECT}/gecko-dev/layout/base/RestyleManager.cpp:1402 #9 0x0000007ff401d364 in mozilla::RestyleManager::DoProcessPendingRestyles ( this=0x7fb9189d30, aFlags=aFlags@entry=mozilla::ServoTraversalFlags::Empty) at ${PROJECT}/gecko-dev/layout/base/RestyleManager.cpp:3051 #10 0x0000007ff401d8cc in mozilla::RestyleManager::ProcessPendingRestyles ( this=<optimized out>) at ${PROJECT}/gecko-dev/layout/base/RestyleManager.cpp:3130 #11 0x0000007ff401df7c in mozilla::PresShell::DoFlushPendingNotifications ( this=0x7fb918d9c0, aFlush=...) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:290 #12 0x0000007ff3fd8758 in mozilla::PresShell::FlushPendingNotifications ( this=this@entry=0x7fb918d9c0, aType=..., aType@entry=...) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/PresShell.h:1414 #13 0x0000007ff3fe4850 in nsRefreshDriver::Tick (this=0x7fb916b650, aId=..., aNowTime=..., aIsExtraTick=aIsExtraTick@entry=nsRefreshDriver::IsExtraTick::No) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/FlushType.h:61 #14 0x0000007ff3fe5abc in nsRefreshDriver::<lambda()>::operator() ( __closure=0x7eec002780) at ${PROJECT}/gecko-dev/layout/base/nsRefreshDriver.cpp:234 #15 mozilla::detail::RunnableFunction<nsRefreshDriver::EnsureTimerStarted( nsRefreshDriver::EnsureTimerStartedFlags)::<lambda()> >::Run(void) ( this=0x7eec002770) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/ nsThreadUtils.h:532 #16 0x0000007ff19948d4 in mozilla::RunnableTask::Run (this=0x7f24092130) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313 [...] #38 0x0000007fef53689c in ?? () from /lib64/libc.so.6 (gdb)This is followed by a further seven or so calls to SchedulePaint() in quick succession. After disabling the SchedulePaint() breakpoint this is followed by multiple hits to the Paint() method. Finally I disable the Paint() breakpoint and now there's no more obvious activity occurring.
I'm not sure what this all tells me. I think it may be that the render code is the wrong place to be looking, but I'm not fully certain. What I am certain about is that fixing this issue is going to take a whole lot of trial, error and luck.
Unfortuantely that's all I can manage for today, but I'll return to exploring this issue again tomorrow.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comments
Uncover Disqus comments