flypig.co.uk

List items

Items from the current list are shown below.

Blog

24 Feb 2024 : Day 166 #
We're back on to looking at the SwapChain code, part of the offscreen rendering pipeline, today. This is all for the purpose of getting the WebView working. Currently opening a WebView will cause the parent app to crash; it's clearly something that needs fixing before ESR 91 will be acceptable for daily use.

We managed to persuade the SwapChain object to be created by force-disabling the embedlite.compositor.external_gl_context static preference. Now we've found that the mBackBuffer is null which also results in a segfault. The mBackBuffer variable is a member of SwapChainPresenter *mPresenter, which is itself contained within the SwapChain class.

Looking through the code in GLScreenBuffer.cpp there are currently only two ways for the mBackBuffer variable to be initialised. Either it gets swapped in as a result of a call to SwapBackBuffer() like this:
std::shared_ptr<SharedSurface> SwapChainPresenter::SwapBackBuffer(
    std::shared_ptr<SharedSurface> back) {
[...]
  auto old = mBackBuffer;
  mBackBuffer = back;
  if (mBackBuffer) {
    mBackBuffer->WaitForBufferOwnership();
    mBackBuffer->ProducerAcquire();
    mBackBuffer->LockProd();
  }
  return old;
}

Or it gets created as a result of a call to the Resize() method like this:
bool SwapChain::Resize(const gfx::IntSize& size) {
  UniquePtr<SharedSurface> newBack =
      mFactory->CreateShared(size);
  if (!newBack) return false;

  if (mPresenter->mBackBuffer) mPresenter->mBackBuffer->ProducerRelease();

  mPresenter->mBackBuffer.reset(newBack.release());

  mPresenter->mBackBuffer->ProducerAcquire();

  return true;
}
We should check whether either of these are being called. It is possible that SwapBackBuffer() is being called but with an empty back parameter, or that the CreateShared() call in Resize() is failing. Either of these would leave us in our current situation. However just as likely is that neither of these are being called and there's not even an attempt being made to initialise mBackBuffer. We need to know!
(gdb) delete break
(gdb) b SwapChainPresenter::SwapBackBuffer
Breakpoint 1 at 0x7ff1109c14: file gfx/gl/GLScreenBuffer.cpp, line 82.
(gdb) b SwapChain::Resize
Breakpoint 2 at 0x7ff110a398: file gfx/gl/GLScreenBuffer.cpp, line 132.
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/bin/harbour-webview 
[...]

Thread 37 "Compositor" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 4362]
0x0000007ff110a38c in mozilla::gl::SwapChain::OffscreenSize
    (this=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:290
290     ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:
    No such file or directory.
(gdb) frame 1
#1  0x0000007ff3667930 in mozilla::embedlite::EmbedLiteCompositorBridgeParent::
    CompositeToDefaultTarget (this=0x7fc4b3f070, aId=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:290
290     ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:
    No such file or directory.
(gdb) p context->mSwapChain.mTuple.mFirstA->mPresenter->mBackBuffer
$1 = std::shared_ptr<mozilla::gl::SharedSurface> (empty) = {get() = 0x0}
(gdb) 
No breakpoint hits before the segfault, so neither of those methods are being called. Armed with this knowledge we must now turn our thoughts to solutions, and there are multiple potential solutions we could choose: guard against a null access in SwapChain::OffscreenSize(); ensure mBackBuffer is created at the same time as the mPresenter that contains it; find out if the process flow should be calling one of the swap or resize methods prior to this call.

It's now time for my work day, so answers to these questions will have to wait until tonight. Still, this is progress.

[...]

To try to get a handle on where this mBackBuffer ought to be created, I thought it might help to figure out some sequencing. Here's how things seem to be happening:
  1. The SwapChain is created in EmbedLiteCompositorBridgeParent::PrepareOffscreen().
  2. The SwapChainPresenter() is created immediately afterwards.
  3. In EmbedLiteCompositorBridgeParent::CompositeToDefaultTarget() the SwapChain::OffscreenSize() method is called, causing the crash.
  4. Immediately after this SwapChain::Resize() is called, which if done earlier, would prevent the crash.
Here's that sequence of code where the crash is caused:
    if (context->GetSwapChain()->OffscreenSize() != mEGLSurfaceSize
      && !context->GetSwapChain()->Resize(mEGLSurfaceSize)) {
      return;
    }
It's also worth noting, I think, that SwapChain::Acquire() calls SwapBackBuffer() with a non-null back buffer parameter, which if called early enough would also prevent the code from crashing subsequently when OffscreenSize() is read.

Having re-reviewed the original D75055 changeset that introduced the SwapChain, along with the history of the related files that aren't part of the changeset, I'm beginning to get a better picture.

For example, the code that makes the failing call to OffscreenSize() was added by me earlier on in this process:
$ git blame \
    embedding/embedlite/embedthread/EmbedLiteCompositorBridgeParent.cpp \
    -L 153,156
^01b1a0352034 embedthread/EmbedLiteCompositorBridgeParent.cpp
    (Raine Makelainen  2020-07-24 16:25:17 +0300 153)
    MutexAutoLock lock(mRenderMutex);
d59d44a5bccaf embedding/embedlite/embedthread/EmbedLiteCompositorBridgeParent.cpp
    (David Llewellyn-Jones 2023-09-04 09:03:49 +0100 154)
    if (context->GetSwapChain()->OffscreenSize() != mEGLSurfaceSize
d59d44a5bccaf embedding/embedlite/embedthread/EmbedLiteCompositorBridgeParent.cpp
    (David Llewellyn-Jones 2023-09-04 09:03:49 +0100 155)
    && !context->GetSwapChain()->Resize(mEGLSurfaceSize)) {
^01b1a0352034 embedthread/EmbedLiteCompositorBridgeParent.cpp
    (Raine Makelainen      2020-07-24 16:25:17 +0300 156)
    return;
Prior to this the logic was somewhat different:
$ git diff d59d44a5bccaf~ d59d44a5bccaf

@@ -153,7 +157,8 @@ EmbedLiteCompositorBridgeParent::CompositeToDefaultTarget
    (VsyncId aId)
[...] 
   if (context->IsOffscreen()) {
     MutexAutoLock lock(mRenderMutex);
-    if (context->OffscreenSize() != mEGLSurfaceSize && !context->
    ResizeOffscreen(mEGLSurfaceSize)) {
+    if (context->GetSwapChain()->OffscreenSize() != mEGLSurfaceSize
+      && !context->GetSwapChain()->Resize(mEGLSurfaceSize)) {
       return;
     }
   }
The code for returning the offscreen size was also added by me slightly earlier on the same day:
$ git blame gfx/gl/GLScreenBuffer.cpp -L 128,130
f536dbf9f6f8a (David Llewellyn-Jones 2023-09-04 08:52:00 +0100 128)
    const gfx::IntSize& SwapChain::OffscreenSize() const {
f536dbf9f6f8a (David Llewellyn-Jones 2023-09-04 08:52:00 +0100 129)
      return mPresenter->mBackBuffer->mFb->mSize;
f536dbf9f6f8a (David Llewellyn-Jones 2023-09-04 08:52:00 +0100 130)
    }
And as you can see, a lot of code was changed in this commit, especially this code:
diff --git a/gfx/gl/GLScreenBuffer.cpp b/gfx/gl/GLScreenBuffer.cpp
index 0398dd7dc6a2..e71263068777 100644
--- a/gfx/gl/GLScreenBuffer.cpp
+++ b/gfx/gl/GLScreenBuffer.cpp
@@ -116,4 +116,38 @@ SwapChain::~SwapChain() {
   }
 }
 
[...]
+
+const gfx::IntSize& SwapChain::OffscreenSize() const {
+  return mPresenter->mBackBuffer->mFb->mSize;
+}
+
[...]
So much — if not all — of this faulty code is down to me. Back then I was coding without being able to test, so this isn't a huge surprise. But it does also mean I have more scope to control the situation and make changes to the implementation.

Prior to all these changes the OffscreenSize() implementation looked like this:
const gfx::IntSize& GLContext::OffscreenSize() const {
  MOZ_ASSERT(IsOffscreen());
  return mScreen->Size();
}
The mScreen being used here is analogous to the mBackBuffer and is created in this method:
bool GLContext::CreateScreenBufferImpl(const IntSize& size,
                                       const SurfaceCaps& caps) {
  UniquePtr<GLScreenBuffer> newScreen =
      GLScreenBuffer::Create(this, size, caps);
  if (!newScreen) return false;

  if (!newScreen->Resize(size)) {
    return false;
  }

  // This will rebind to 0 (Screen) if needed when
  // it falls out of scope.
  ScopedBindFramebuffer autoFB(this);

  mScreen = std::move(newScreen);

  return true;
}
And the mScreen is created in a method called InitOffscreen().
bool GLContext::InitOffscreen(const gfx::IntSize& size,
                              const SurfaceCaps& caps) {
  if (!CreateScreenBuffer(size, caps)) return false;
[...]
Finally InitOffScreen() is being called in GLContextProviderCGL::CreateOffscreen():
already_AddRefed<GLContext> GLContextProviderCGL::CreateOffscreen(
    const IntSize& size,
    const SurfaceCaps& minCaps,
    CreateContextFlags flags,
    nsACString* const out_failureId) {
  RefPtr<GLContext> gl = CreateHeadless(flags, out_failureId);
  if (!gl) {
    return nullptr;
  }

  if (!gl->InitOffscreen(size, minCaps)) {
    *out_failureId = NS_LITERAL_CSTRING("FEATURE_FAILURE_CGL_INIT");
    return nullptr;
  }

  return gl.forget();
}
It feels like we've nearly come full circle, which would be good because then that would be enough to make a clear decision about how to address this. But it's already late here now and time for me to call it a night, so that decision will have to wait for tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.

Comments

Uncover Disqus comments