flypig.co.uk

List items

Items from the current list are shown below.

Gecko

31 Jul 2024 : Day 305 #
Before moving on to today's development I want to say a few things about the ambience switching implementation that I've been working on over the last few days. You may recall that at one point I explained that simply setting the preference to "light" or "dark" in the front end and letting the gecko engine pick up the change that way wasn't going to work, because the gecko engine recognises only two values ("light" and "dark") whereas on Sailfish OS we need three values ("light", "dark" and "follow ambience").

An obvious question that arises is "why not introduce a second setting?". Then the existing ui.systemUsesDarkTheme setting could be used to switch between light and dark modes, with a subsequent setting, ui.systemDarkThemeFollowsAmbience say, used to control whether or not the user interface should update the ui.systemUsesDarkTheme value when the ambience changes.

This preference-based approach has several advantages. First it would avoid the need to patch the gecko engine code, since only the user interface would need to know about the new setting and the existing setting could be left to do what it already does. Plus it would have avoided all of the problems related to race conditions at initialisation, because the preferences are already set up to handle all that correctly.

It sounds too good to be true. So why not do that? The answer is two-fold. First, it feels like the wrong thing to do architecturally. Using settings as a signalling mechanism is ugly, especially when there's literally a proper signalling mechanism already available. We have the same issue on Sailfish OS more generally, where it's very tempting to use dconf values as a signalling mechanism between the front-end and other parts of the code, or between different processes. Adding a dconf value is maybe three lines of code, whereas exposing and consuming a DBus interface — which is the correct way to signal between processes on Sailfish OS — requires more boilerplate than Stevenson's Rocket.

It can make it very tempting then to use preference values, rather than inter-process communication (IPC). But using preferences this way can be wasteful, leaves extra preferences lying around that can be hard to get rid of later, and is also usually less efficient than using the intended approach.

Those are the abstract reasons. The more concrete and practical reason is that using an extra preference would have meant diverging from the ESR 78 implementation. So either the user preference would have to be lost during the upgrade to ESR 91, or we'd have had to add some hard-to-maintain one-shot code to convert it.

So I decided to patch the gecko code instead. Neither option is ideal, but since I'm writing about this whole development process already, it makes sense to explain the possible different approaches and my reasons for choosing one over another.

Okay, on with the coding. Today I'm going to start with the task of addressing issues from the "Everything on the browser test page" task. The test page this is referring to is Jolla's gecko testing suite, which has been built up over many years to capture bugs and issues that the browser has suffered at various times during its life.

The tests are all manual, but their existence makes it far easier to test a whole swathe of different functionalities. When I worked through the tests previously I encountered the following problems.
  1. Single select widget.
  2. External links.
  3. Full screen.
  4. Double-tap.
The first of these — single select widget — refers to the very first item on the selection test page. Tapping on either of the first two widgets ("second" or "Volvo") should open up a Silica page listing several selectable options. Instead the browser just crashes. In contrast the following widgets (the list of fruit and the celestial bodies) work fine and open up their respective pages as intended.

As this is a crash the obvious place to start is by capturing a backtrace using the debugger.
Thread 10 "GeckoWorkerThre" received signal SIGSEGV, Segmentation 
    fault.
[Switching to LWP 15152]
mozilla::PresShell::Paint (this=this@entry=0x7fb90feb60, 
    aViewToPaint=aViewToPaint@entry=0x7eb4007270, aDirtyRegion=..., 
    aFlags=aFlags@entry=mozilla::PaintFlags::PaintLayers)
    at ${PROJECT}/gecko-dev/layout/base/PresShell.cpp:6276
6276    ${PROJECT}/gecko-dev/layout/base/PresShell.cpp: No such file or 
    directory.
(gdb) bt
#0  mozilla::PresShell::Paint (this=this@entry=0x7fb90feb60, 
    aViewToPaint=aViewToPaint@entry=0x7eb4007270, aDirtyRegion=..., 
    aFlags=aFlags@entry=mozilla::PaintFlags::PaintLayers)
    at ${PROJECT}/gecko-dev/layout/base/PresShell.cpp:6276
#1  0x0000007ff3e410c8 in nsViewManager::ProcessPendingUpdatesPaint (
    this=this@entry=0x7fb8d5d6f0, aWidget=aWidget@entry=0x7fb8cdfa80)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/gfx/RectAbsolute.h:43
#2  0x0000007ff3e4147c in nsViewManager::ProcessPendingUpdatesForView (
    this=this@entry=0x7fb8d5d6f0, aView=<optimized out>, 
    aFlushDirtyRegion=aFlushDirtyRegion@entry=true)
    at ${PROJECT}/gecko-dev/view/nsViewManager.cpp:394
#3  0x0000007ff3e41a6c in nsViewManager::ProcessPendingUpdates (
    this=this@entry=0x7fb8d5d6f0)
    at ${PROJECT}/gecko-dev/view/nsViewManager.cpp:972
#4  0x0000007ff3fe94e4 in nsRefreshDriver::Tick (this=0x7fb90dc8b0, aId=..., 
    aNowTime=..., 
    aIsExtraTick=aIsExtraTick@entry=nsRefreshDriver::IsExtraTick::No)
    at ${PROJECT}/gecko-dev/layout/base/nsRefreshDriver.cpp:2477
#5  0x0000007ff3fe9e78 in nsRefreshDriver::<lambda()>::operator() (
    __closure=0x7fb8d8f2f0)
    at ${PROJECT}/gecko-dev/layout/base/nsRefreshDriver.cpp:234
#6  mozilla::detail::RunnableFunction<nsRefreshDriver::EnsureTimerStarted(
    nsRefreshDriver::EnsureTimerStartedFlags)::<lambda()> >::Run(void) (
    this=0x7fb8d8f2e0) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/
    nsThreadUtils.h:532
[...]
#29 0x0000007fef54889c in ?? () from /lib64/libc.so.6
Here's the code that causing the crash:
  bool shouldInvalidate = layerManager->NeedsWidgetInvalidation();
It might be helpful to have a bit more context for this line, so here it is again in its natural habitat:
  LayerManager* layerManager = aViewToPaint->GetWidget()->GetLayerManager();
  NS_ASSERTION(layerManager, &quot;Must be in paint event&quot;);
  bool shouldInvalidate = layerManager->NeedsWidgetInvalidation();
The reason for the crash is that layerManager is null:
(gdb) p layerManager
$1 = (mozilla::PresShell::LayerManager *) 0x0
For comparison, the code in ESR 78 looks pretty much identical, but unlike ESR 91 the call to GetLayerManager() is returning a value. There are many different classes that extend nsIWidget and override this method, but in our case it's an instance of EmbedLitePuppetWidget. The code is quite complex but identical on both ESR 78 and ESR 91:
LayerManager *
EmbedLitePuppetWidget::GetLayerManager(PLayerTransactionChild *aShadowManager, 
    LayersBackend aBackendHint, LayerManagerPersistence aPersistence)
{
  if (!mLayerManager) {
    if (!mShutdownObserver || Destroyed()) {
      // We are shutting down, do not try to re-create a LayerManager
      return nullptr;
    }
  }

  LayerManager *lm = PuppetWidgetBase::GetLayerManager(aShadowManager, 
    aBackendHint, aPersistence);
  if (lm) {
    mLayerManager = lm;
    return mLayerManager;
  }

  if (EmbedLiteApp::GetInstance()->GetType() == EmbedLiteApp::EMBED_INVALID) {
    LOGT(&quot;Create Layer Manager for Process View&quot;);

    mLayerManager = new ClientLayerManager(this);
    ShadowLayerForwarder* lf = mLayerManager->AsShadowForwarder();
    if (!lf->HasShadowManager() && aShadowManager) {
      lf->SetShadowManager(aShadowManager);
    }
    return mLayerManager;
  }

  nsIWidget* topWidget = GetTopLevelWidget();
  if (topWidget && topWidget != this) {
      mLayerManager = topWidget->GetLayerManager(aShadowManager, aBackendHint, 
    aPersistence);
      return mLayerManager;
  }
  else {
      return nullptr;
  }
}
I've been stepping carefully through this code on both ESR 78 and ESR 91 all evening. I've tested it when it's first called and also on the call that's happening inside nsPresShell::Paint(). In the latter case there's a divergence between ESR 78 and ESR 91. On ESR 78 PuppetWidgetBase::GetLayerManager() is returning a value. On ESR 91 this returns null and the code fails through to the call to GetTopLevelWidget() towards the end after skipping all the preceding conditions. On ESR 91 the value returned as topWidget matches the value of this and so the condition is skipped and nullptr is returned. The crash follows.

Here's the code for PuppetWidgetBase::GetLayerManager(), identical on both ESR 78 and ESR 91:
LayerManager *
PuppetWidgetBase::GetLayerManager(PLayerTransactionChild *aShadowManager,
                                                LayersBackend aBackendHint,
                                                LayerManagerPersistence 
    aPersistence)
{
  if (Destroyed()) {
    return nullptr;
  }

  if (mLayerManager) {
    // This layer manager might be used for painting outside of DoDraw(), so we 
    need
    // to set the correct rotation on it.
    if (mLayerManager->GetBackendType() == LayersBackend::LAYERS_CLIENT) {
      ClientLayerManager* manager =
          static_cast<ClientLayerManager*>(mLayerManager.get());
      manager->SetDefaultTargetConfiguration(mozilla::layers::BufferMode::
    BUFFER_NONE,
                                             mRotation);
    }
    return mLayerManager;
  }

  // Layer manager can be null here. Sub-class shall handle this.
  return mLayerManager;
}
At this point in time I have no idea why PuppetWidgetBase::GetLayerManager() is returning a value on ESR 78 but not on ESR 91. Unfortunately the answer to this question is going to have to wait until tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.

Comments

Uncover Disqus comments