flypig.co.uk

List items

Items from the current list are shown below.

Blog

7 Jun 2024 : Day 256 #
It's the big one! A full 2^8 days of development have gone in to this now, which seems like an absurd amount of effort.
 
2^8 in the centre of a bright coloured flash

Unfortunately, while numerically this is very exciting, the actual work I'm doing right now isn't, so there's no big reveal to impress you with. Instead I'm going to continue hacking away at the WebGL bug I discovered a couple of days back.

To elaborate, I'm currently trying to find out why the WebView rendering fix has caused WebGL rendering to fail. Both are types of offscreen rendering, so it's not surprising that one has affected the other, but it's important that both of them are working correctly.

Over the last couple of days I discovered that the problem definitely exists in the latest commit added to the code. I checked that by rolling the repository back one commit, rebuilding and checking that the problem doesn't happen with the slightly older version.

Now I need to find out what has changed in the flow of the code to make the problem appear.

From the earlier backtraces we know that the problem is a call to SharedSurface_Basic::ToSurfaceDescriptor(), which itself is called from WebGLContext::GetFrontBuffer(). Stepping through this method I can see that there's no immediate crashing happening there, and execution continues into ShareableCanvasRenderer::UpdateCompositableClient(). The code being executed there looks like this:
    // First, let's see if we can get a no-copy TextureClient from the canvas.
    auto tc = fnGetExistingTc();
    if (!tc) {
      // Otherwise, snapshot the surface and copy into a TexClient.
      tc = fnMakeTcFromSnapshot();
    }
    if (tc != mFrontBufferFromDesc) {
      mFrontBufferFromDesc = nullptr;
    }
Both fnGetExistingTc() and fnMakeTcFromSnapshot() are lambda functions defined inside the method. But the first of these is where the call to SharedSurface_Basic::ToSurfaceDescriptor() occurs. This is returning null because a call to SharedSurface_Basic::ToSurfaceDescriptor() always returns Nothing().

However, the following call to fnMakeTcFromSnapshot() is returning a value, as we can see in the following debug steps:
(gdb) n
32        return Nothing();
(gdb) n
50      ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/MaybeStorageBase.h: 
    No such file or directory.
(gdb) n
mozilla::ClientWebGLContext::GetFrontBuffer (this=this@entry=0x7fc8b4a4b0, 
    fb=fb@entry=0x0, vr=<optimized out>, vr@entry=false)
    at dom/canvas/ClientWebGLContext.cpp:368
368       const auto notLost = mNotLost;
(gdb) n
mozilla::layers::ShareableCanvasRenderer::<lambda()>::operator() (
    __closure=<synthetic pointer>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/Maybe.h:443
443     ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/Maybe.h: No such 
    file or directory.
(gdb) 
149         if (!desc) return nullptr;
(gdb) n
148         const auto desc = webgl->GetFrontBuffer(nullptr);
(gdb) n
mozilla::layers::ShareableCanvasRenderer::UpdateCompositableClient (
    this=0x7fc98a98e0)
    at gfx/layers/ShareableCanvasRenderer.cpp:196
196         if (!tc) {
(gdb) p tc
$8 = {mRawPtr = 0x0}
(gdb) n
198           tc = fnMakeTcFromSnapshot();
(gdb) n
200         if (tc != mFrontBufferFromDesc) {
(gdb) p tc
$9 = {mRawPtr = 0x7fc93bc9a0}
(gdb) 
This will need comparing against what happens in our newer build where the crash occurs. Thinking back, I'm now a little concerned that the sole reason for the crash is this line that I added to SharedSurface_Basic::ToSurfaceDescriptor():
Maybe<layers::SurfaceDescriptor> SharedSurface_Basic::ToSurfaceDescriptor() {
  MOZ_CRASH(&quot;GFX: ToSurfaceDescriptor&quot;);
  return Nothing();
}
Certainly this will cause a crash, but I thought I'd also tested it without this. Now I'm not so sure...

Sadly I didn't keep copies of the newer packages to install back again, but I do have a copy of the libxul.so library from back then. I'm not sure if I'll be able to debug using it, but it's worth a try. If it turns out not to be debuggable I'll just have to do another complete rebuild (although, this time, I'll keep a copy of the current packages so I can reinstall them if I need to do another comparison!).

Sadly I don't get any joy testing the library:
Thread 8 &quot;GeckoWorkerThre&quot; received signal SIGSEGV, Segmentation 
    fault.
0x0000007fe5ee13a8 in ?? ()
(gdb) bt
#0  0x0000007fe5ee13a8 in ?? ()
#1  0x0000007fdf293e08 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb) 
I'm going to have to do a rebuild. This means restoring the original branch, then performing the build to create the full set of RPM packages.
$ cd gecko-dev
$ git checkout -b temp
$ git checkout FIREFOX_ESR_91_9_X_RELBRANCH_patches
$ git log --oneline -5
7437a9d17284 (HEAD -> FIREFOX_ESR_91_9_X_RELBRANCH_patches) Restore 
    GLScreenBuffer and TextureImageEGL
d3ba4df29a32 (temp) Restore NotifyDidPaint event and timers
f55057391ac0 Prevent errors from DownloadPrompter
eab04b8c0d80 Enable dconf
c6ea49286566 (origin/FIREFOX_ESR_91_9_X_RELBRANCH_patches) Disable SessionStore 
    functionality
$ cd ..
Before now performing the build I must remove the code that's guaranteed to cause a crash:
Maybe<layers::SurfaceDescriptor> SharedSurface_Basic::ToSurfaceDescriptor() {
  return Nothing();
}
Now to build:
$ sfdk build -d --with git_workaround
[...]
The build won't be ready until the morning at the earliest. So I'm going to pause there and come back to this tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.

Comments

Uncover Disqus comments