List items

Items from the current list are shown below.


11 Oct 2023 : Day 56 #
This morning I woke up feeling much refreshed. That's just as well because it's a bit of a long one today (you have been warned!) I checked the build as soon as I got up but was surprised to find it still running. It had reached the final step of writing out the rpm packages, but wasn't quite there yet. This is either an indication that the build is taking far too long, or my sleep is too short. Given my bright demeanour this morning, I'm going for the former.

Thankfully by the end of breakfast the build had completed. I'm now scp-ing it over to my phone to test. The changes from yesterday will hopefully have hooked the EmbedLite compositor into the render pipeline. There's a very small chance that it will attempt to render something now. More likely is that this will be only one step on the way to fixing the render. Even more likely yet is that this change will cause the browser to crash. None of these would necessarily be a bad thing.

As the files copy over to my phone I realise I've made a stupid mistake:
The version numbers on those packages are all 78.15.1. Aaargh. I've built the correct code, but with the wrong version number.

How does this happen? It's all down to the way Jolla specifies version numbers, which is baked into the Sailfish SDK. As any Sailfish OS developer will know, in order to package up some code into an RPM package you have to create what's called a spec-file. This specifies all sorts of aspects of the packages, including their names, how to build the code, which files to include and so on. It also specifies the version number of the packages.

Jolla chooses not to use the version number in the spec file. Instead Jolla devs tag commits in the git repository with the version number. When you build a package using the Platform or Application SDK, the build tooling will check the latest tag and use that instead of the value in the spec file. I'm not entirely certain why Jolla does it this way. Maybe it's to avoid having to commit changes to the repo just to set the version number. Maybe it's because having a dynamic version number can be helpful. Whatever the reason, this is how it is.

I don't want to tag gecko with the new version number yet because it's not ready. So I prefer to use the version in the spec file during development. There is a way to get sfdk to honour this by setting the no-fix-version configuration option. Do this and the build chain will use the version number in the spec file.

I usually set this configuration option like this:
sfdk config --session --push no-fix-version
That essentially sets the option until you close the shell you're working in. I've been doing it this way because I only want the option to apply to gecko, not to other things I build. So it's convenient to constrain it just to a single session.

Somehow my session changed. Maybe I opened a new shell or switched my gnu screen window; probably it was a consequence of me being so tired, but now checking this morning I can see the option hasn't been set.

So the build used the last tag rather than the version in the spec file. I think I need to fix my processes. So rather than using a session configuration value, in future I'm going to add it to my build command, like this:
sfdk -c no-fix-version build -d -p --with git_workaround
I need the build to be using the correct version number, otherwise it will mess up the dependencies; possibly mess up where the files are installed; potentially other messy things. So after it spent all night building I've just now run this command to do it all over again.


The build completed (much more quickly this time); I've scp-d over the packages to my phone and installed them. Time for another run.
$ EMBED_CONSOLE=1 sailfish-browser 
[D] unknown:0 - Using Wayland-EGL
library "" not found
library "" not found
greHome from GRE_HOME:/usr/bin is not found, in /usr/bin/
Created LOG for EmbedLiteTrace
[W] unknown:0 - Unable to open bookmarks  "/home/defaultuser/.local/share/
[D] onCompleted:105 - ViewPlaceholder requires a SilicaFlickable parent
Created LOG for EmbedLite
JavaScript error: file:///usr/lib64/mozembedlite/components/
  EmbedLiteConsoleListener.js, line 251: TypeError: 
  XPCOMUtils.generateNSGetFactory is not a function
JavaScript error: file:///usr/lib64/mozembedlite/components/
  ContentPermissionManager.js, line 94: TypeError:
  XPCOMUtils.generateNSGetFactory is not a function
JavaScript error: resource://gre/modules/EnterprisePoliciesParent.jsm, line 500:
  TypeError: Services.appinfo is undefined
JavaScript error: resource://gre/modules/AddonManager.jsm, line 1479:
  NS_ERROR_NOT_INITIALIZED: AddonManager is not initialized
JavaScript error: resource://gre/modules/URLQueryStrippingListService.jsm, line 42:
  TypeError: Services.appinfo is undefined
Created LOG for EmbedPrefs
Created LOG for EmbedLiteLayerManager
JSScript: ContextMenuHandler.js loaded
JSScript: SelectionPrototype.js loaded
JSScript: SelectionHandler.js loaded
JSScript: SelectAsyncHelper.js loaded
JSScript: FormAssistant.js loaded
JSScript: InputMethodHandler.js loaded
EmbedHelper init called
Available locales: en-US, fi, ru
Frame script: embedhelper.js loaded
JavaScript error: chrome://embedlite/content/embedhelper.js, line 259: TypeError: sessionHistory is null
Segmentation fault (core dumped)
The application runs for a bit and starts downloading Jolla's Web site. It gets far enough to download and decode the TLS certificate. But as soon as it hits rendering properly it now crashes with a segmentation fault. This doesn't sound great but it might turn out to be good news. It gives us a lead. If we run it through the debugger we can figure out where the crash is occurring.

So having done that, here's the backtrace.
Thread 35 "Compositor" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 9321]
mozilla::gl::SwapChain::OffscreenSize (this=0x0) at
129       return mPresenter->mBackBuffer->mFb->mSize;
(gdb) bt
#0  mozilla::gl::SwapChain::OffscreenSize (this=0x0) at
#1  0x0000007fbcc8149c in mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    CompositeToDefaultTarget (this=0x7f8859d7f0, aId=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:290
#2  0x0000007fba8d1bec in mozilla::layers::CompositorVsyncScheduler::
    ForceComposeToTarget (this=0x7f88737560, aTarget=aTarget@entry=0x0, 
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/layers/LayersTypes.h:82
#3  0x0000007fba8d1c48 in mozilla::layers::CompositorBridgeParent::
    ResumeComposition (this=this@entry=0x7f8859d7f0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
#4  0x0000007fba8d1cd4 in mozilla::layers::CompositorBridgeParent::
    ResumeCompositionAndResize (this=0x7f8859d7f0, x=,
    y=, width=, height=) at
#5  0x0000007fba8ca870 in mozilla::detail::RunnableMethodArguments::applyImpl, StoreCopyPassByConstLRef,
    StoreCopyPassByConstLRef, StoreCopyPassByConstLRef, 0ul, 1ul, 2ul,
    3ul> (args=..., m=, o=)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1151
#17 0x0000007fb79cf89c in ?? () from /lib64/
From this we can see that the crash is happening in the SwapChain::OffscreenSize() method called from EmbedLiteCompositorBridgeParent::CompositeToDefaultTarget() on line 129 of the file GLScreenBuffer.cpp.

Crashes are never good of course, but I'm nevertheless quite happy about this. This is part of the code I had to hack around with to get the library to build, so it's not so surprising that it's causing problems. The issue seems to be that the SwapChain instance hasn't been constructed at all; the pointer to it is set to null. That should be possible to fix.

The backtrace doesn't show it, but the call to SwapChain::OffscreenSize() is happening on line 160 of EmbedLiteCompositorBridgeParent.cpp. It's being masked in the backtrace by the UniquePtr. Here's the problem line:
    if (context->GetSwapChain()->OffscreenSize() != mEGLSurfaceSize
      && !context->GetSwapChain()->Resize(mEGLSurfaceSize)) {
Presumably GetSwapChain() is returning null. Let's dig into that. Before I changed it this code used to look like this:
    if (context->OffscreenSize() != mEGLSurfaceSize
      && !context->ResizeOffscreen(mEGLSurfaceSize)) {
Through a roundabout kind of route, I added the context->GetSwapChain() method which looks like this:
  SwapChain* GetSwapChain() const { return mSwapChain.get(); }
The equivalent method used for returning mScreen used to look like this:
  GLScreenBuffer* Screen() const { return mScreen.get(); }
I guess you can see the pattern there. The context->OffscreenSize() method used to look like this:
const gfx::IntSize& GLContext::OffscreenSize() const {
  return mScreen->Size();
So in essence what's happened is I've swapped a call (excuse the notational looseness here) to GLContext::mScreen.get()->Size() with a call to GLContext::mSwapChain.get().OffscreenSize().

What we've drilled down to — and the point I'm rather clumsily trying to make — is that mSwapChain is the new mScreen and at this point in the previous code mScreen contained a live instance. So in the current code mSwapChain should be pointing to a live instance by now as well. But it's not.

Wherever mScreen was being created in the old code, mSwapChain should probably be being created now.

Checking the code, right now it looks like mSwapChain is never constructed. The method I'd expect to create it is called from GLContext like this:
  GLContext* context = static_cast(state->mLayerManager->GetCompositor())->gl();
In the previous version mScreen was being created like this:
bool GLContext::CreateScreenBufferImpl(const IntSize& size,
                                       const SurfaceCaps& caps) {
  UniquePtr newScreen =
      GLScreenBuffer::Create(this, size, caps);
  if (!newScreen) return false;

  if (!newScreen->Resize(size)) {
    return false;

  // This will rebind to 0 (Screen) if needed when
  // it falls out of scope.
  ScopedBindFramebuffer autoFB(this);

  mScreen = std::move(newScreen);

  return true;
Just by looking at the code I can see the call stack for this is something like this:
None of these methods exist anymore and following this back in the ESR 78 code is turning out to be a bit troublesome, so I'm going to fire up a version of the old browser in the debugger to check this properly.
$ gdb sailfish-browser
(gdb) b GLContext::CreateScreenBufferImpl
(gdb) r
Starting program: /usr/bin/sailfish-browser 
(gdb) info break
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x0000007ff29237d8 in mozilla::gl::GLContext::
  const&, mozilla::gl::SurfaceCaps const&) 
                                                   at /usr/src/debug/
But the breakpoint is never hit. It's definitely set correctly, it's just that this particular piece of code is never executed.

That's interesting.

Some of the methods higher up the stack are being called.
Thread 39 "Compositor" hit Breakpoint 2, mozilla::layers::
    CompositorBridgeParent::ResumeCompositionAndResize (this=0x7fb89bd230, x=0,
    y=0, width=1080, height=2520)
    at /usr/src/debug/xulrunner-qt5-78.15.1+git33.2-1.21.1.jolla.aarch64/gfx/
777	  SetEGLSurfaceRect(x, y, width, height);
(gdb) bt
#0  mozilla::layers::CompositorBridgeParent::ResumeCompositionAndResize
    (this=0x7fb89bd230, x=0, y=0, width=1080, height=2520)
    at /usr/src/debug/xulrunner-qt5-78.15.1+git33.2-1.21.1.jolla.aarch64/gfx/
#1  0x0000007ff2a662ec in mozilla::detail::RunnableMethodArguments::applyImpl, StoreCopyPassByConstLRef,
    StoreCopyPassByConstLRef, StoreCopyPassByConstLRef, 0ul, 1ul, 2ul,
    3ul> (args=..., m=, o=)
    at /home/abuild/rpmbuild/BUILD/xulrunner-qt5-78.15.1+git33.2/
#2  mozilla::detail::RunnableMethodArguments::apply
    o=, this=)
    at /home/abuild/rpmbuild/BUILD/xulrunner-qt5-78.15.1+git33.2/
#14 0x0000007fef65b89c in ?? () from /lib64/
But I can't get an identical backtrace up any further than this. There are calls to EmbedLiteCompositorBridgeParent::CompositeToDefaultTarget() but these don't go via CompositorBridgeParent::ResumeCompositionAndResize().

It looks like the crucial difference is that context->IsOffscreen() is set differently for each. There is a difference in implementation for these. In ESR 91:
  bool IsOffscreen() const { return mDesc.isOffscreen; }
And in ESR 78:
  bool IsOffscreen() const { return mIsOffscreen; }
Let's check the value for ESR 78 in the debugger:
(gdb) p context
$1 = (mozilla::gl::GLContext *) 0x7ed41a2840
(gdb) p context->IsOffscreen()
Cannot evaluate function -- may be inlined
(gdb) p context->mIsOffscreen
$3 = false
In ESR 78 this is set when the context is created and never changed:
GLContext::GLContext(CreateContextFlags flags, const SurfaceCaps& caps,
                     GLContext* sharedContext, bool isOffscreen,
                     bool useTLSIsCurrent)
    : mUseTLSIsCurrent(ShouldUseTLSIsCurrent(useTLSIsCurrent)),
          StaticPrefs::gfx_work_around_driver_bugs_AtStartup()) {
  mOwningThreadId = PlatformThread::CurrentId();
This is also true in ESR 91 except in that case the value of mDesc is passed to the GLContext constructor. Here's the constructor being called for ESR 91; notice in this case that mDesc.isOffscreen is set to true:
Thread 35 "Compositor" hit Breakpoint 1, mozilla::gl::GLContext::GLContext
    (this=this@entry=0x7eb0108e40, desc=..., 
    sharedContext=sharedContext@entry=0x0, useTLSIsCurrent=useTLSIsCurrent@entry=false)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/gl/GLContext.cpp:283
283     GLContext::GLContext(const GLContextDesc& desc, GLContext* sharedContext,
(gdb) p desc
$1 = (const mozilla::gl::GLContextDesc &) @0x7ebbf7b0f0: {
     = {
    flags = mozilla::gl::CreateContextFlags::REQUIRE_COMPAT_PROFILE},
    isOffscreen = true}
(gdb) bt
#0  mozilla::gl::GLContext::GLContext (this=this@entry=0x7eb0108e40, desc=...,
    useTLSIsCurrent=useTLSIsCurrent@entry=false) at /usr/src/debug/
#1  0x0000007fba727f58 in mozilla::gl::GLContextEGL::GLContextEGL
    (this=0x7eb0108e40, egl=std::shared_ptr (use count
    5, weak count 2) = {...}, desc=..., config=0x5555a80fc0, surface=0x7eb0003ce0, 
    context=0x7eb0004bb0) at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/
#2  0x0000007fba74c314 in mozilla::gl::GLContextEGL::CreateGLContext
    (egl=std::shared_ptr (use count 5, weak count 2) =
    {...}, desc=..., config=, config@entry=0x5555a80fc0,
    surface=surface@entry=0x7eb0003ce0, useGles=useGles@entry=true, 
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/cxxalloc.h:33
#27 0x0000007fb79cf89c in ?? () from /lib64/
And here's the same on ESR 78; note the value of isOffscreen is set to false:
Thread 38 "Compositor" hit Breakpoint 2, mozilla::gl::GLContext::GLContext
    (this=this@entry=0x7ed41a2810, flags=mozilla::gl::CreateContextFlags::NONE, 
    caps=..., sharedContext=sharedContext@entry=0x0, isOffscreen=false,
    at /usr/src/debug/xulrunner-qt5-78.15.1+git33.2-1.21.1.jolla.aarch64/gfx/gl/
274	GLContext::GLContext(CreateContextFlags flags, const SurfaceCaps& caps,
(gdb) bt
#0  mozilla::gl::GLContext::GLContext (this=this@entry=0x7ed41a2810,
    flags=mozilla::gl::CreateContextFlags::NONE, caps=..., 
    sharedContext=sharedContext@entry=0x0, isOffscreen=false,
    at /usr/src/debug/xulrunner-qt5-78.15.1+git33.2-1.21.1.jolla.aarch64/gfx/gl/
#1  0x0000007ff2909ad0 in mozilla::gl::GLContextEGL::GLContextEGL
    (this=0x7ed41a2810, egl=0x7ed41a25c0, flags=, caps=..., 
    isOffscreen=, config=0x0, surface=0x5555ba33a0,
    at /usr/src/debug/xulrunner-qt5-78.15.1+git33.2-1.21.1.jolla.aarch64/gfx/gl/
#2  0x0000007ff29110b8 in mozilla::gl::GLContextProviderEGL::
    CreateWrappingExisting (aContext=0x7ed4004e40, aSurface=0x5555ba33a0, 
    aDisplay=) at /home/abuild/rpmbuild/BUILD/
#26 0x0000007fef65b89c in ?? () from /lib64/
The crucial difference appears to be that in CompositorOGL::CreateContext() on ESR 91 GLContextProviderEGL::CreateHeadless() is being called whereas on ESR 78 it's embedlite::nsWindow::GetGLContext.

However there's a crucial part inside this CreateContext() method that's the same in both versions and looks like this:
  void* widgetOpenGLContext =
      widget ? widget->GetNativeData(NS_NATIVE_OPENGL_CONTEXT) : nullptr;
The decision about whether to go headless or not is dependent on whether this returns null or not. We're honing in. The embedlite::nsWindow::GetNativeData() method looks like this:
void *
nsWindow::GetNativeData(uint32_t aDataType)
  LOGT("t:%p, DataType: %i", this, aDataType);
  switch (aDataType) {
      LOGW("aDataType:%i\n", aDataType);
      return (void*)nullptr;
      return GetGLContext();
      LOGW("nsWindow::GetNativeData not implemented for this type");
      NS_WARNING("nsWindow::GetNativeData called with bad value");

  return nullptr;
This is one of the methods explicitly highlighted to me by Raine in our call yesterday. We checked it at the time (I even wrote about it) and decided the correct aDataType was being passed in. When I check it today that's still the case. Inside this method the active ingredient is the call to GetGLContext() which looks like this:
nsWindow::GetGLContext() const
  LOGT("this:%p, UseExternalContext:%d", this, sUseExternalGLContext);
  if (sUseExternalGLContext) {
    void* context = nullptr;
    void* surface = nullptr;
    void* display = nullptr;
    if (mWindow && mWindow->GetListener()->RequestGLContext(context, surface,
      display)) {
      MOZ_ASSERT(context && surface);
      RefPtr mozContext = GLContextProvider::CreateWrappingExisting
                                     (context, surface, display);
      if (!mozContext || !mozContext->Init()) {
        NS_ERROR("Failed to initialize external GL context!");
        return nullptr;
      return mozContext.forget().take();
    } else {
      NS_ERROR("Embedder wants to use external GL context without actually providing it!");
  return nullptr;
When I step through this with ESR 91 I can see that this isn't returning anything useful, in the first case because sUseExternalGLContext is set to false. If I force the value to be true, the other parts step through okay.
Thread 35 "Compositor" hit Breakpoint 1, mozilla::embedlite::nsWindow::
    GetNativeData (this=0x7f88778610, aDataType=12)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/mobile/sailfishos/
180     {
(gdb) n
181       LOGT("t:%p, DataType: %i", this, aDataType);
(gdb) n
182       switch (aDataType) {
(gdb) n
189           return GetGLContext();
(gdb) s
mozilla::embedlite::nsWindow::GetGLContext (this=this@entry=0x7f88778610)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/mobile/sailfishos/
413       LOGT("this:%p, UseExternalContext:%d", this, sUseExternalGLContext);
(gdb) n
414       if (sUseExternalGLContext) {
(gdb) p sUseExternalGLContext
$1 = false
(gdb) set sUseExternalGLContext = true
(gdb) p sUseExternalGLContext
$2 = true
(gdb) n
415         void* context = nullptr;
(gdb) n
416         void* surface = nullptr;
(gdb) n
417         void* display = nullptr;
(gdb) n
418         if (mWindow && mWindow->GetListener()->RequestGLContext(context,
                surface, display)) {
(gdb) n
420           RefPtr mozContext = GLContextProvider::
                CreateWrappingExisting(context, surface, display);
(gdb) n
421           if (!mozContext || !mozContext->Init()) {
(gdb) p mozContext
$3 = {mRawPtr = 0x7eb0111680}
(gdb) n
mozilla::layers::CompositorOGL::CreateContext (this=this@entry=0x7eb0002f10)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/layers/opengl/
234       if (widgetOpenGLContext) {
Having forced this change in the debugger, now when I continue the browser runs without crashing. Still no rendering, but no crashes either.

Once again it looks like the underlying reason is due to the BoolVarCache config values that need to be changed to static preferences. These are set in sailfish-browser like this:
    // Use external Qt window for rendering content
I need to make sure these values are reflected in the gecko code that I changed. I've updated both embedlite.compositor.external_gl_context and embedlite.compositor.request_external_gl_context_early so that they're true when previously they were set to false.

They're only small changes, but I'll need to perform a rebuild to establish their full effects. Which means that this will be all the changes for today. We'll see whether this has had any positive effect tomorrow.

If you'd like to read more about all this gecko stuff, do take a look at my full Gecko Dev Diary.


Uncover Disqus comments