List items
Items from the current list are shown below.
Gecko
29 Jul 2024 : Day 303 #
Today I'm coding from Buxton in Derbyshire in a park called "The Slopes". It's bright and the weather is calm, but also cool enough in the shade to be extremely comfortable, so I'm sat on a park bench making the most of it. Truly, this is the way to live! I've spent most of my life, at least since I was at University in the nineties, trying to code in parks; first by connecting to a Nokia feature phone with a Psion; more recently using laptops with screens I could barely read in the sun and Wifi connections that were as flaky as they were slow. Now a quarter of a century later it finally feels like we've reached the point of park-coding viability. This really is progress.
Yesterday I was able to get the ambience colour-mode switching to a stable state. This meant that when in "follow ambience" mode the colour mode changed depending on whether the phone was set to a light or a dark ambience. When it comes to the browser, only the front end is really in a position to listen for ambience changes, since it has the appropriate hooks into Lipstick and can listen on DBus signals. The underlying gecko engine has none of these advantages. The front-end sends notifications to the gecko engine, which then keeps its own variable representing the state of the ambience updated.
Those signals are now working correctly in general, but there's still a glitch which occurs during initialisation. The initial state of the internal value is currently not set correctly. The front end sends a signal to gecko on initialisation so that in theory the engine knows the state of the ambience. This is being sent from the WegEngineSettings class like this:
Perhaps the surprising thing is that this is apparently a new phenomenon for ESR 91. As far as I'm aware there was no such problem with ESR 78. As I write this I have no idea how the ordering, or speed of initialisation, is handled by gecko. So I guess that's what I'll need to look in to now.
For context, it'll be useful to know where the nsLookAndFeel class is being instantiated. Here's the backtrace from the constructor on ESR 91:
On the sender side, there's a method Msg_PEmbedLiteViewConstructor() that's not used to send the message, but it is used to construct an instance of the message. Chances are this will be called just before the message gets sent. So let's stick a breakpoint on that and see what it gives us.
There's one other thing I'd really like to check. The code that calls notifyColorSchemeChanged() in WebEngineSettings::initialize() looks like this:
All of this is useful context and interesting background, but it's not giving us any answers about how to fix the problem. So I figure it might be useful to find out what's happening on ESR 78 for comparison. I've registered the same set of breakpoints that we saw earlier for ESR 91 and have set the app to run in the debugger. Here's what we get:
When I try the debugging again, but this time with a breakpoint on nsLookAndFeel::Observer::Observe() I find that this doesn't fire on initialisation. So maybe it was actually broken on ESR 78 as well? The fact is, when I try it several times, the initial message setting the colour mode is generally going through just fine:
It's worth adding that during debugging the timings are sent completely off course compared to a usual start-up run. The breakpoints stop the flow of execution, add delays that wouldn't have been there otherwise and generally mess things around. These kinds of race conditions are notoriously susceptible to acting differently during debugging.
Anyway, I'll need to find a new way to fix this. The investigation up until now has revealed a problem and shown that there's no immediately obvious solution. I have an idea about what might work, but that will now have to wait until tomorrow.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Yesterday I was able to get the ambience colour-mode switching to a stable state. This meant that when in "follow ambience" mode the colour mode changed depending on whether the phone was set to a light or a dark ambience. When it comes to the browser, only the front end is really in a position to listen for ambience changes, since it has the appropriate hooks into Lipstick and can listen on DBus signals. The underlying gecko engine has none of these advantages. The front-end sends notifications to the gecko engine, which then keeps its own variable representing the state of the ambience updated.
Those signals are now working correctly in general, but there's still a glitch which occurs during initialisation. The initial state of the internal value is currently not set correctly. The front end sends a signal to gecko on initialisation so that in theory the engine knows the state of the ambience. This is being sent from the WegEngineSettings class like this:
void SailfishOS::WebEngineSettings::notifyColorSchemeChanged() { Silica::Theme *silicaTheme = Silica::Theme::instance(); QString scheme = silicaTheme->colorScheme() == Silica::Theme::LightOnDark ? QStringLiteral("dark") : QStringLiteral("light"); SailfishOS::WebEngine::instance()->notifyObservers(QStringLiteral( "ambience-theme-changed"), scheme); }But despite the intent, this signal isn't being picked up by the engine. When we looked at this a couple of days back we determined that the reason was one of timing. The internal gecko theming classes aren't initialised in time and so never receive the notification. Here's an abridged summary of the sequencing, captured using the debugger (see the full output on Day 301 for the full version):
Thread 1 hit Breakpoint 1, WebEngineSettings::notifyColorSchemeChanged () at webenginesettings.cpp:220 Thread 10 hit Breakpoint 3, nsLookAndFeel::nsLookAndFeel () at nsLookAndFeel.cpp:62 Thread 10 hit Breakpoint 2, nsLookAndFeel::Observer::Observe () at nsLookAndFeel.cpp:81What we see is that the notification is sent from the WebEngineSettings class before the nsLookAndFeel or its nested Observer class have been initialised. The nsLookAndFeel class isn't going to pick up a notification until it's been instantiated so we need to somehow get this ordering reversed.
Perhaps the surprising thing is that this is apparently a new phenomenon for ESR 91. As far as I'm aware there was no such problem with ESR 78. As I write this I have no idea how the ordering, or speed of initialisation, is handled by gecko. So I guess that's what I'll need to look in to now.
For context, it'll be useful to know where the nsLookAndFeel class is being instantiated. Here's the backtrace from the constructor on ESR 91:
Thread 10 "GeckoWorkerThre" hit Breakpoint 2, nsLookAndFeel:: nsLookAndFeel (this=0x7fb8ce1030) at ${PROJECT}/gecko-dev/widget/qt/nsLookAndFeel.cpp:62 62 ${PROJECT}/gecko-dev/widget/qt/nsLookAndFeel.cpp: No such file or directory. (gdb) bt #0 nsLookAndFeel::nsLookAndFeel (this=0x7fb8ce1030) at ${PROJECT}/gecko-dev/widget/qt/nsLookAndFeel.cpp:62 #1 0x0000007ff3e8af14 in nsXPLookAndFeel::GetInstance () at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/cxxalloc.h:33 #2 0x0000007ff3e8b090 in mozilla::LookAndFeel::GetColor (aId=aId@entry=mozilla: :StyleSystemColor::WindowBackground, aScheme=aScheme@entry=mozilla::LookAndFeel::ColorScheme::Light, aUseStandins=aUseStandins@entry=mozilla::LookAndFeel::UseStandins::No) at ${PROJECT}/gecko-dev/widget/nsXPLookAndFeel.cpp:996 #3 0x0000007ff4a88ba4 in mozilla::LookAndFeel::Color (aDefault=4278190080, aUseStandins=mozilla::LookAndFeel::UseStandins::No, aScheme=mozilla::LookAndFeel::ColorScheme::Light, aId=mozilla:: StyleSystemColor::WindowBackground) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/LookAndFeel.h:472 #4 nsWebBrowser::Create (aContainerWindow=0x7fb8ce7130, aParentWidget=0x7fb8706e90, aBrowsingContext=aBrowsingContext@entry=0x7fb8ce0200, aInitialWindowChild=aInitialWindowChild@entry=0x0) at ${PROJECT}/gecko-dev/toolkit/components/browser/nsWebBrowser.cpp:133 #5 0x0000007ff4c54634 in mozilla::embedlite::EmbedLiteViewChild:: InitGeckoWindow (this=0x7fb8a803b0, parentId=<optimized out>, parentBrowsingContext=0x0, isPrivateWindow=false, isDesktopMode=false, isHidden=<optimized out>) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:859 #6 0x0000007ff4c45bbc in mozilla::detail::RunnableMethodArguments<unsigned int const, mozilla::dom::BrowsingContext*, bool const, bool const, bool const>:: applyImpl<mozilla::embedlite::EmbedLiteViewChild, void (mozilla::embedlite:: EmbedLiteViewChild::*)(unsigned int, mozilla::dom::BrowsingContext*, bool, bool, bool), StoreCopyPassByConstLRef<unsigned int const>, StoreRefPtrPassByPtr<mozilla::dom::BrowsingContext>, StoreCopyPassByConstLRef<bool const>, StoreCopyPassByConstLRef<bool const>, StoreCopyPassByConstLRef<bool const>, 0ul, 1ul, 2ul, 3ul, 4ul> (args=..., m=<optimized out>, o=<optimized out>) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:280 #7 mozilla::detail::RunnableMethodArguments<unsigned int const, mozilla::dom:: BrowsingContext*, bool const, bool const, bool const>::apply<mozilla:: embedlite::EmbedLiteViewChild, void (mozilla::embedlite::EmbedLiteViewChild: :*)(unsigned int, mozilla::dom::BrowsingContext*, bool, bool, bool)> ( m=<optimized out>, o=<optimized out>, this=<optimized out>) at ${PROJECT}/gecko-dev/xpcom/threads/nsThreadUtils.h:1154 #8 mozilla::detail::RunnableMethodImpl<mozilla::embedlite:: EmbedLiteViewChild*, void (mozilla::embedlite::EmbedLiteViewChild::*)( unsigned int, mozilla::dom::BrowsingContext*, bool, bool, bool), true, ( mozilla::RunnableKind)0, unsigned int const, mozilla::dom:: BrowsingContext*, bool const, bool const, bool const>::Run (this=<optimized out>) at ${PROJECT}/gecko-dev/xpcom/threads/nsThreadUtils.h:1201 #9 0x0000007ff19a4d80 in mozilla::RunnableTask::Run (this=0x7fb8c6c7e0) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313 [...] #31 0x0000007fef54989c in ?? () from /lib64/libc.so.6 (gdb)Perhaps the interesting thing is that this is being triggered from EmbedLiteViewChild::InitGeckoWindow(), which is part of the EmbedLite code. That's our code, not upstream's code, so it's unlikely this will have changed substantially. Let's compare this with the same backtrace using ESR 78:
Thread 10 "GeckoWorkerThre" hit Breakpoint 1, nsLookAndFeel:: nsLookAndFeel (this=0x7f8045f4f0) at widget/qt/nsLookAndFeel.cpp:62 62 nsLookAndFeel::nsLookAndFeel() (gdb) bt #0 nsLookAndFeel::nsLookAndFeel (this=0x7f8045f4f0) at widget/qt/ nsLookAndFeel.cpp:62 #1 0x0000007fbbde7098 in nsXPLookAndFeel::GetInstance () at obj-build-mer-qt-xr/dist/include/mozilla/cxxalloc.h:33 #2 0x0000007fbbdeca88 in nsXPLookAndFeel::GetInstance () at widget/nsXPLookAndFeel.cpp:239 #3 mozilla::LookAndFeel::GetColor (aID=aID@entry=mozilla::StyleSystemColor:: WindowBackground, aResult=aResult@entry=0x7f8045eed8) at widget/nsXPLookAndFeel.cpp:1022 #4 0x0000007fbc8ffe3c in nsWebBrowser::Create (aContainerWindow=<optimized out>, aParentWidget=0x7f80bfb860, aBrowsingContext=aBrowsingContext@entry= 0x7f809bd7a0, aInitialWindowChild=aInitialWindowChild@entry=0x0) at toolkit/components/browser/nsWebBrowser.cpp:128 #5 0x0000007fbca930e8 in mozilla::embedlite::EmbedLiteViewChild:: InitGeckoWindow (this=0x7f80be6790, parentId=0, parentBrowsingContext=0x0, isPrivateWindow=<optimized out>, isDesktopMode=false) at obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:847 #6 0x0000007fbca84b84 in mozilla::detail::RunnableMethodArguments<unsigned int const, mozilla::dom::BrowsingContext*, bool const, bool const>:: applyImpl<mozilla::embedlite::EmbedLiteViewChild, void (mozilla::embedlite:: EmbedLiteViewChild::*)(unsigned int, mozilla::dom::BrowsingContext*, bool, bool), StoreCopyPassByConstLRef<unsigned int const>, StoreRefPtrPassByPtr<mozilla::dom::BrowsingContext>, StoreCopyPassByConstLRef<bool const>, StoreCopyPassByConstLRef<bool const>, 0ul, 1ul, 2ul, 3ul> (args=..., m=<optimized out>, o=<optimized out>) at xpcom/threads/nsThreadUtils.h:990 #7 mozilla::detail::RunnableMethodArguments<unsigned int const, mozilla::dom:: BrowsingContext*, bool const, bool const>::apply<mozilla::embedlite:: EmbedLiteViewChild, void (mozilla::embedlite::EmbedLiteViewChild::*)( unsigned int, mozilla::dom::BrowsingContext*, bool, bool)> (m=<optimized out>, o=<optimized out>, this=<optimized out>) at xpcom/threads/nsThreadUtils.h: 1191 #8 mozilla::detail::RunnableMethodImpl<mozilla::embedlite:: EmbedLiteViewChild*, void (mozilla::embedlite::EmbedLiteViewChild::*)( unsigned int, mozilla::dom::BrowsingContext*, bool, bool), true, (mozilla:: RunnableKind)0, unsigned int const, mozilla::dom::BrowsingContext*, bool const, bool const>::Run ( this=<optimized out>) at xpcom/threads/nsThreadUtils.h:1237 #9 0x0000007fb9c014c4 in nsThread::ProcessNextEvent (aResult=0x7fa69e6d67, aMayWait=<optimized out>, this=0x7f8002fd60) at xpcom/threads/nsThread.cpp:1211 [...] #26 0x0000007fb735c89c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/ clone.S:78 (gdb)These backtraces are broadly the same. Certainly the closer we get to the PC end of the stack the more similar they get. I'm left feeling a little surprised: why is this no longer working on ESR 91 when it worked fine on ESR 78? It would be useful to know exactly how the sequence works its way through the system and to do that we really need to know what triggers the call to EmbedLiteViewChild::InitGeckoWindow(). A search of the code shows this happens in the EmbedLiteViewChild constructor:
EmbedLiteViewChild::EmbedLiteViewChild(const uint32_t &aWindowId, const uint32_t &aId, const uint32_t &aParentId, mozilla::dom::BrowsingContext *parentBrowsingContext, const bool &isPrivateWindow, const bool &isDesktopMode, const bool &isHidden) [...] { [...] MessageLoop::current()->PostTask(NewRunnableMethod<const uint32_t, mozilla:: dom::BrowsingContext*, const bool, const bool, const bool> ("mozilla::embedlite:: EmbedLiteViewChild::InitGeckoWindow", this, &EmbedLiteViewChild::InitGeckoWindow, aParentId, parentBrowsingContext, isPrivateWindow, isDesktopMode, isHidden)); }So that means we need to find out where this is constructed. Using the debugger we can establish that AllocPEmbedLiteViewChild() is furthest down the usable part of the call stack:
Thread 10 "GeckoWorkerThre" hit Breakpoint 14, mozilla::embedlite:: EmbedLiteViewChild::EmbedLiteViewChild (this=0x7fb8b93e30, aWindowId=@0x7fde7c06dc: 1, aId=@0x7fde7c06e0: 1, aParentId=@0x7fde7c06e4: 0, parentBrowsingContext=0x0, isPrivateWindow=@0x7fde7c06ce: false, isDesktopMode=@0x7fde7c06cf: false, isHidden=@0x7fde7c06d0: false) at ${PROJECT}/gecko-dev/mobile/sailfishos/embedshared/ EmbedLiteViewChild.cpp:71 71 ${PROJECT}/gecko-dev/mobile/sailfishos/embedshared/ EmbedLiteViewChild.cpp: No such file or directory. (gdb) bt #0 mozilla::embedlite::EmbedLiteViewChild::EmbedLiteViewChild ( this=0x7fb8b93e30, aWindowId=@0x7fde7c06dc: 1, aId=@0x7fde7c06e0: 1, aParentId=@0x7fde7c06e4: 0, parentBrowsingContext=0x0, isPrivateWindow=@0x7fde7c06ce: false, isDesktopMode=@0x7fde7c06cf: false, isHidden=@0x7fde7c06d0: false) at ${PROJECT}/gecko-dev/mobile/sailfishos/embedshared/ EmbedLiteViewChild.cpp:71 #1 0x0000007ff4c5e850 in mozilla::embedlite::EmbedLiteViewThreadChild:: EmbedLiteViewThreadChild (this=0x7fb8b93e30, windowId=<optimized out>, id=<optimized out>, parentId=<optimized out>, parentBrowsingContext=<optimized out>, isPrivateWindow=<optimized out>, isDesktopMode=<optimized out>, isHidden=<optimized out>) at ${PROJECT}/gecko-dev/mobile/sailfishos/embedthread/ EmbedLiteViewThreadChild.cpp:15 #2 0x0000007ff4c65448 in mozilla::embedlite::EmbedLiteAppThreadChild:: AllocPEmbedLiteViewChild (this=0x7fb8aba0e0, windowId=@0x7fde7c06dc: 1, id=@0x7fde7c06e0: 1, parentId=@0x7fde7c06e4: 0, parentBrowsingContext=<optimized out>, isPrivateWindow=@0x7fde7c06ce: false, isDesktopMode=@0x7fde7c06cf: false, isHidden=@0x7fde7c06d0: false) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/cxxalloc.h:33 #3 0x0000007ff1ef4cd4 in mozilla::embedlite::PEmbedLiteAppChild:: OnMessageReceived (this=0x7fb8aba0e0, msg__=...) at PEmbedLiteAppChild.cpp: 529 [...] #30 0x0000007fef54989c in ?? () from /lib64/libc.so.6 (gdb)Everything prior to this relates to the message loop, which isn't much use to us. So we'll need to start looking on the other side of the message loop, where the message is sent. The method we're interested in for this is AllocPEmbedLiteViewChild(), but there's actually no code in any of the gecko source that references this method. It turns out that's because the calling code is autogenerated. Searching the autogenerated code gives us this hit:
$ grep -rIn "AllocPEmbedLiteViewChild" * [...] gecko-dev/obj-build-mer-qt-xr/ipc/ipdl/PEmbedLiteAppChild.cpp:529: PEmbedLiteViewChild* actor = (static_cast<EmbedLiteAppChild*>( this))->AllocPEmbedLiteViewChild(windowId, id, parentId, parentBrowsingContext, isPrivateWindow, isDesktopMode, isHidden);Let's look at this bit of code in more detail.
auto PEmbedLiteAppChild::OnMessageReceived(const Message& msg__) -> PEmbedLiteAppChild::Result { switch (msg__.type()) { case PEmbedLiteApp::Msg_PEmbedLiteViewConstructor__ID: [...] PEmbedLiteViewChild* actor = (static_cast<EmbedLiteAppChild*>( this))->AllocPEmbedLiteViewChild(windowId, id, parentId, parentBrowsingContext, isPrivateWindow, isDesktopMode, isHidden); [...]As we can see, this waits for the Msg_PEmbedLiteViewConstructor__ID message and, when received, calls the AllocPEmbedLiteViewChild() method. We don't want to put a breakpoint on this because this is the code that receives the message. The debugger can't follow the messages, so it'll just give us a stack trace down to the message processing loop. Instead we want a breakpoint on where the message is sent.
On the sender side, there's a method Msg_PEmbedLiteViewConstructor() that's not used to send the message, but it is used to construct an instance of the message. Chances are this will be called just before the message gets sent. So let's stick a breakpoint on that and see what it gives us.
Thread 1 "sailfish-browse" hit Breakpoint 1, mozilla::embedlite:: PEmbedLiteApp::Msg_PEmbedLiteViewConstructor ( routingId=routingId@entry=2147483647) at PEmbedLiteApp.cpp:68 68 PEmbedLiteApp.cpp: No such file or directory. (gdb) bt #0 mozilla::embedlite::PEmbedLiteApp::Msg_PEmbedLiteViewConstructor ( routingId=routingId@entry=2147483647) at PEmbedLiteApp.cpp:68 #1 0x0000007ff1f062e0 in mozilla::embedlite::PEmbedLiteAppParent:: SendPEmbedLiteViewConstructor (this=this@entry=0x7fb81fa7a0, actor=<optimized out>, windowId=@0x7fffffe3b4: 1, id=@0x7ff7e258f0: 1, parentId=@0x7fffffe3ac: 0, parentBrowsingContext=@0x7fffffe3a0: 0, isPrivateWindow=@0x7fffffe3ab: false, isDesktopMode=@0x7fffffe3aa: false, isHidden=@0x7fffffe3a9: false) at PEmbedLiteAppParent.cpp:193 #2 0x0000007ff1f064d0 in mozilla::embedlite::PEmbedLiteAppParent:: SendPEmbedLiteViewConstructor (this=this@entry=0x7fb81fa7a0, windowId=@0x7fffffe3b4: 1, id=@0x7ff7e258f0: 1, parentId=@0x7fffffe3ac: 0, parentBrowsingContext=@0x7fffffe3a0: 0, isPrivateWindow=@0x7fffffe3ab: false, isDesktopMode=@0x7fffffe3aa: false, isHidden=@0x7fffffe3a9: false) at PEmbedLiteAppParent.cpp:170 #3 0x0000007ff4c535f4 in mozilla::embedlite::EmbedLiteApp::CreateView ( this=0x5555884370, aWindow=0x5555d8a9b0, aParent=<optimized out>, aParentBrowsingContext=<optimized out>, aIsPrivateWindow=<optimized out>, isDesktopMode=<optimized out>, isHidden=<optimized out>) at ${PROJECT}/gecko-dev/mobile/sailfishos/EmbedLiteApp.cpp:478 #4 0x0000007ff7e8595c in QMozViewPrivate::createView (this=0x5555af0e50) at qmozview_p.cpp:864 #5 QMozViewPrivate::createView (this=0x5555af0e50) at qmozview_p.cpp:850 #6 0x00000055555d286c in WebPageFactory::createWebPage (this=0x55559e9090, webContainer=0x5555704610, initialTab=...) at ../factories/webpagefactory.cpp:44 #7 0x00000055555ade60 in WebPages::page (this=0x55559e9220, tab=...) at /srv/mer/toolings/SailfishOS-4.5.0.18/opt/cross/aarch64-meego-linux-gnu/ include/c++/8.3.0/bits/atomic_base.h:390 #8 0x0000005555591dcc in DeclarativeWebContainer::activatePage ( this=this@entry=0x5555704610, tab=..., force=force@entry=true, fromExternal=fromExternal@entry=184) at /srv/mer/toolings/SailfishOS-4.5.0.18/opt/cross/aarch64-meego-linux-gnu/ include/c++/8.3.0/bits/atomic_base.h:390 #9 0x0000005555592f64 in DeclarativeWebContainer::loadTab ( this=this@entry=0x5555704610, tab=..., force=force@entry=true, fromExternal=184) at ../core/declarativewebcontainer.cpp:1211 #10 0x0000005555593984 in DeclarativeWebContainer::initialize ( this=0x5555704610) at ../core/declarativewebcontainer.cpp:1053 #11 0x0000007fefb5b204 in QMetaObject::activate(QObject*, int, int, void**) () from /usr/lib64/libQt5Core.so.5 #12 0x0000007fefb5b204 in QMetaObject::activate(QObject*, int, int, void**) () from /usr/lib64/libQt5Core.so.5 #13 0x0000007ff7e73690 in QMozContextPrivate::Initialized ( this=0x7ff7f497e0 <(anonymous namespace)::Q_QGS_mozContextPrivateInstance:: innerFunction()::holder>) at qmozcontext.cpp:133 #14 0x0000007ff4c4c0a4 in mozilla::embedlite::EmbedLiteApp::Initialized ( this=0x5555884370) at ${PROJECT}/gecko-dev/mobile/sailfishos/EmbedLiteApp.cpp:656 #15 0x0000007ff4c5969c in mozilla::embedlite::EmbedLiteAppThreadParent:: RecvInitialized (this=0x7fb81fa7a0) at ${PROJECT}/gecko-dev/mobile/sailfishos/embedthread/ EmbedLiteAppThreadParent.cpp:46 #16 0x0000007ff1f037d4 in mozilla::embedlite::PEmbedLiteAppParent:: OnMessageReceived (this=0x7fb81fa7a0, msg__=...) at PEmbedLiteAppParent.cpp: 721 [...] #37 0x000000555557bf88 in main (argc=<optimized out>, argv=<optimized out>) at main.cpp:203 (gdb)Once again we eventually reach the message loop in the backtrace. But this already tells us a lot of what we need to know. We can see that the call to DeclarativeWebContainer::initialize() includes a subsequent call to EmbedLiteApp::CreateView() and this method looks like this:
EmbedLiteView* EmbedLiteApp::CreateView(EmbedLiteWindow* aWindow, uint32_t aParent, uintptr_t aParentBrowsingContext, bool aIsPrivateWindow, bool isDesktopMode, bool isHidden) { LOGT(); NS_ASSERTION(mState == INITIALIZED, "The app must be up and runnning by now"); static uint32_t sViewCreateID = 0; sViewCreateID++; PEmbedLiteViewParent* viewParent = static_cast<PEmbedLiteViewParent*>( mAppParent->SendPEmbedLiteViewConstructor(aWindow->GetUniqueID(), sViewCreateID, aParent, aParentBrowsingContext, aIsPrivateWindow, isDesktopMode, isHidden)); EmbedLiteView* view = new EmbedLiteView(this, aWindow, viewParent, sViewCreateID); mViews[sViewCreateID] = view; return view; }There's the call that triggers the sending of the message that ends up constructing the class that will fire a runnable that will eventually result in the creation of our nsLookAndFeel class. Let's summarise the process.
- EmbedLiteApp::Initialized() is called.
- This triggers a call to DeclarativeWebContainer::initialize().
- This then sends an AllocPEmbedLiteViewChild message.
- Once the message is received the EmbedLiteViewChild is constructed.
- The constructor posts a task to trigger a call to EmbedLiteViewChild::InitGeckoWindow().
- This is set to run on a different thread, and once running there are a series of direct calls which end up calling nsXPLookAndFeel::GetInstance().
- This causes the creation of the nsLookAndFeel class which creates the observer that we need.
Thread 1 hit Breakpoint 2, DeclarativeWebContainer::initialize () at declarativewebcontainer.cpp:998 Thread 1 hit Breakpoint 2, DeclarativeWebContainer::initialize () at declarativewebcontainer.cpp:998 Thread 1 hit Breakpoint 2, DeclarativeWebContainer::initialize () at declarativewebcontainer.cpp:998 Thread 1 hit Breakpoint 9, EmbedLiteApp::CreateView () at EmbedLiteApp.cpp:472 Thread 1 hit Breakpoint 11, PEmbedLiteAppParent::SendPEmbedLiteViewConstructor () at PEmbedLiteAppParent.cpp:168 Thread 1 hit Breakpoint 11, PEmbedLiteAppParent::SendPEmbedLiteViewConstructor () at PEmbedLiteAppParent.cpp:182 Thread 1 hit Breakpoint 6, PEmbedLiteApp::Msg_PEmbedLiteViewConstructor () at PEmbedLiteApp.cpp:68 Thread 1 hit Breakpoint 3, WebEngineSettings::notifyColorSchemeChanged () at webenginesettings.cpp:220 Thread 10 hit Breakpoint 7, EmbedLiteAppThreadChild::AllocPEmbedLiteViewChild () at EmbedLiteAppThreadChild.cpp:45 Thread 10 hit Breakpoint 10, EmbedLiteViewChild::EmbedLiteViewChild () at EmbedLiteViewChild.cpp:71 Thread 1 hit Breakpoint 2, DeclarativeWebContainer::initialize () at declarativewebcontainer.cpp:998 Thread 10 hit Breakpoint 5, nsLookAndFeel::nsLookAndFeel () at nsLookAndFeel.cpp:62 Thread 1 hit Breakpoint 2, DeclarativeWebContainer::initialize () at declarativewebcontainer.cpp:998This shows us a few things. The call to DeclarativeWebContainer::initialize() happens early. Consequently the sending of Msg_PEmbedLiteViewConstructor happens early as well. But it doesn't trigger the call to AllocPEmbedLiteViewChild() immediately and so the construction of nsLookAndFeel is delayed. In between the two the call to notifyColorSchemeChanged() occurs and the message sent from this is seemingly lost as a result.
There's one other thing I'd really like to check. The code that calls notifyColorSchemeChanged() in WebEngineSettings::initialize() looks like this:
// Notify gecko when the ambience switches between light and dark if (engineSettings->isInitialized()) { engineSettings->notifyColorSchemeChanged(); } else { connect(engineSettings, &QMozEngineSettings::initialized, engineSettings, &SailfishOS::WebEngineSettings:: notifyColorSchemeChanged); }I'm interested to know whether it gets called immediately because QMozEngineSettings is already initialised or whether it has to wait for the initialisation signal to be received. To find this out, I'm going to step through the code of WebEngineSettings::initialize().
Thread 1 "sailfish-browse" hit Breakpoint 12, SailfishOS:: WebEngineSettings::initialize () at webenginesettings.cpp:96 96 if (isInitialized) { (gdb) n 100 SailfishOS::WebEngineSettings *engineSettings = instance(); (gdb) n 103 QString langs; (gdb) 105 if (locale.size() > 1) { (gdb) n 106 langs = QString("%1-%2,%3").arg(locale.at(0)).arg( locale.at(1)).arg(locale.at(0)); (gdb) n 110 engineSettings->setPreference(QStringLiteral( "intl.accept_languages"), (gdb) n 111 QVariant::fromValue<QString>(langs)); (gdb) n 125 connect(engineSettings, &QMozEngineSettings::initialized, (gdb) p engineSettings->isInitialized() $1 = falseSo it's not called immediately but rather called only on receiving the initialisation signal from QMozEngineSettings.
All of this is useful context and interesting background, but it's not giving us any answers about how to fix the problem. So I figure it might be useful to find out what's happening on ESR 78 for comparison. I've registered the same set of breakpoints that we saw earlier for ESR 91 and have set the app to run in the debugger. Here's what we get:
Thread 1 hit Breakpoint 1, DeclarativeWebContainer::initialize () at declarativewebcontainer.cpp:963 Thread 1 hit Breakpoint 1, DeclarativeWebContainer::initialize () at declarativewebcontainer.cpp:963 Thread 1 hit Breakpoint 1, DeclarativeWebContainer::initialize () at declarativewebcontainer.cpp:963 Thread 1 hit Breakpoint 2, EmbedLiteApp::CreateView () at EmbedLiteApp.cpp:472 Thread 1 hit Breakpoint 3, SendPEmbedLiteViewConstructor () at PEmbedLiteAppParent.cpp:162 Thread 1 hit Breakpoint 3, SendPEmbedLiteViewConstructor () at PEmbedLiteAppParent.cpp:174 Thread 1 hit Breakpoint 4, Msg_PEmbedLiteViewConstructor () at ipc_message.h:118 Thread 1 hit Breakpoint 5, notifyColorSchemeChanged () from /usr/lib64/libsailfishwebengine.so.1 Thread 1 hit Breakpoint 5, notifyColorSchemeChanged () from /usr/lib64/libsailfishwebengine.so.1 Thread 10 hit Breakpoint 6, AllocPEmbedLiteViewChild () at EmbedLiteAppThreadChild.cpp:45 Thread 10 hit Breakpoint 7, EmbedLiteViewChild::EmbedLiteViewChild () at EmbedLiteViewChild.cpp:102 Thread 10 hit Breakpoint 8, nsLookAndFeel::nsLookAndFeel () at nsLookAndFeel.cpp:62The odd thing is that when we compare this against the ESR 91 flow, they look pretty similar. Crucially, the call to notifyColorSchemeChanged() happens before the construction of nsLookAndFeel in both cases. Which begs the obvious question of how this ever worked in ESR 78?
When I try the debugging again, but this time with a breakpoint on nsLookAndFeel::Observer::Observe() I find that this doesn't fire on initialisation. So maybe it was actually broken on ESR 78 as well? The fact is, when I try it several times, the initial message setting the colour mode is generally going through just fine:
[(null) 26214: Unnamed thread 0x78c8002670]: I/LookAndFeel Ambience set to darkHowever there were a couple of occasions when the message got lost and as a result the browser came up in light mode when it should have been in dark mode. The vast majority of times it worked fine, but not always. The conclusion I've come to is that, whether on ESR 78 or ESR 91, it's all about timing. On ESR 78 the timing for all but a minority of runs works fine and the correct colour mode is set. But occasionally not. That would be good enough reason to try to fix it, but on ESR 91 the message seems to get missed the majority of the time. Possibly just down to different timings during start up. These are asynchronous processes, so seemingly this can happen.
It's worth adding that during debugging the timings are sent completely off course compared to a usual start-up run. The breakpoints stop the flow of execution, add delays that wouldn't have been there otherwise and generally mess things around. These kinds of race conditions are notoriously susceptible to acting differently during debugging.
Anyway, I'll need to find a new way to fix this. The investigation up until now has revealed a problem and shown that there's no immediately obvious solution. I have an idea about what might work, but that will now have to wait until tomorrow.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comments
Uncover Disqus comments