List items
Items from the current list are shown below.
Gecko
17 Jun 2024 : Day 261 #
Yesterday I collected a bunch of backtraces. These were all from methods added in the last commit for the purpose of WebView rendering, but which also trigger when rendering WebGL content on the browser. They're the methods I'm really interested in.
The very first of the methods to hit a breakpoint was the CompositorOGL::CreateContext() method. Checking the diff between the previous commit and HEAD, we can see that I made the following changes to this method:
Okay, that's something I can change back easily. So I've added this change on top, effectively reversing the change that I made before:
Having rebuilt, reinstalled and executed these changes, there's no change on the browser side, but for the WebView I now get a segfault crash occurring like this:
This is fine. At present I really only care about getting the browser to work. If the change I just made turns out to be necessary I'll have to return to this and fix it, so I'll need this backtrace, but right now I have to forge on and focus on the WebGL rendering.
I'm now checking the following change:
I've continued stepping through the method, out of the method and up the stack, but so far without yet hitting the crash. I'm going to continue looking at this, but to do so will have to wait until tomorrow now. I'm sure I'll get to the bottom of it and that we're approaching a solution, even if it is now taking longer than I'd have preferred.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
The very first of the methods to hit a breakpoint was the CompositorOGL::CreateContext() method. Checking the diff between the previous commit and HEAD, we can see that I made the following changes to this method:
@@ -247,11 +247,9 @@ already_AddRefed<mozilla::gl::GLContext> CompositorOGL:: CreateContext() { // Allow to create offscreen GL context for main Layer Manager if (!context && gfxEnv::LayersPreferOffscreen()) { nsCString discardFailureId; - context = GLContextProvider::CreateHeadless( - {CreateContextFlags::REQUIRE_COMPAT_PROFILE}, &discardFailureId); - if (!context->CreateOffscreenDefaultFb(mSurfaceSize)) { - context = nullptr; - } + context = GLContextProvider::CreateOffscreen( + mSurfaceSize, CreateContextFlags::REQUIRE_COMPAT_PROFILE, + &discardFailureId); } if (!context) {In other words, I messed around with the generation of the RefPtr<GLContext> context variable so that it gets created using GLContextProvider::CreateOffscreen() rather than GLContextProvider::CreateHeadless() as it was doing before.
Okay, that's something I can change back easily. So I've added this change on top, effectively reversing the change that I made before:
diff --git a/gfx/layers/opengl/CompositorOGL.cpp b/gfx/layers/opengl/ CompositorOGL.cpp index 122709eaf2de..06c84a9ebdaa 100644 --- a/gfx/layers/opengl/CompositorOGL.cpp +++ b/gfx/layers/opengl/CompositorOGL.cpp @@ -247,9 +247,15 @@ already_AddRefed<mozilla::gl::GLContext> CompositorOGL:: CreateContext() { // Allow to create offscreen GL context for main Layer Manager if (!context && gfxEnv::LayersPreferOffscreen()) { nsCString discardFailureId; - context = GLContextProvider::CreateOffscreen( - mSurfaceSize, CreateContextFlags::REQUIRE_COMPAT_PROFILE, - &discardFailureId); + + context = GLContextProvider::CreateHeadless( + {CreateContextFlags::REQUIRE_COMPAT_PROFILE}, &discardFailureId); + if (!context->CreateOffscreenDefaultFb(mSurfaceSize)) { + context = nullptr; + } +// context = GLContextProvider::CreateOffscreen( +// mSurfaceSize, CreateContextFlags::REQUIRE_COMPAT_PROFILE, +// &discardFailureId); } if (!context) {As you can see, this comments out the new call to CreateOffscreen() and replaces it with a call to CreateHeadless() so that it's doing the same as it did before. I've left the removed code in as a comment to make it easier for me to compare, but actually git is keeping track of all this for me already, so I could have safely just deleted that code. I may yet do that!
Having rebuilt, reinstalled and executed these changes, there's no change on the browser side, but for the WebView I now get a segfault crash occurring like this:
Thread 37 "Compositor" received signal SIGSEGV, Segmentation fault. [Switching to LWP 305] 0x0000007ff366470c in mozilla::gl::GLScreenBuffer::Size (this=0x0) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:290 290 ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h: No such file or directory. (gdb) bt #0 0x0000007ff366470c in mozilla::gl::GLScreenBuffer::Size (this=0x0) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:290 #1 mozilla::embedlite::EmbedLiteCompositorBridgeParent:: CompositeToDefaultTarget (this=0x7fc4b0a5a0, aId=...) at ${PROJECT}/gecko-dev/mobile/sailfishos/embedthread/ EmbedLiteCompositorBridgeParent.cpp:151 #2 0x0000007ff12b49d8 in mozilla::layers::CompositorVsyncScheduler:: ForceComposeToTarget (this=0x7fc4c98760, aTarget=aTarget@entry=0x0, aRect=aRect@entry=0x0) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/layers/LayersTypes.h: 82 #3 0x0000007ff12b4a34 in mozilla::layers::CompositorBridgeParent:: ResumeComposition (this=this@entry=0x7fc4b0a5a0) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313 #4 0x0000007ff12b4ac0 in mozilla::layers::CompositorBridgeParent:: ResumeCompositionAndResize (this=0x7fc4b0a5a0, x=<optimized out>, y=<optimized out>, width=<optimized out>, height=<optimized out>) at ${PROJECT}/gecko-dev/gfx/layers/ipc/CompositorBridgeParent.cpp:794 #5 0x0000007ff12ad65c in mozilla::detail::RunnableMethodArguments<int, int, int, int>::applyImpl<mozilla::layers::CompositorBridgeParent, void (mozilla: :layers::CompositorBridgeParent::*)(int, int, int, int), StoreCopyPassByConstLRef<int>, StoreCopyPassByConstLRef<int>, StoreCopyPassByConstLRef<int>, StoreCopyPassByConstLRef<int>, 0ul, 1ul, 2ul, 3ul> (args=..., m=<optimized out>, o=<optimized out>) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1151 #6 mozilla::detail::RunnableMethodArguments<int, int, int, int>::apply<mozilla: :layers::CompositorBridgeParent, void (mozilla::layers:: CompositorBridgeParent::*)(int, int, int, int)> (m=<optimized out>, o=<optimized out>, this=<optimized out>) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1154 #7 mozilla::detail::RunnableMethodImpl<mozilla::layers:: CompositorBridgeParent*, void (mozilla::layers::CompositorBridgeParent::*)( int, int, int, int), true, (mozilla::RunnableKind)0, int, int, int, int>:: Run (this=<optimized out>) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1201 #8 0x0000007ff07fd018 in nsThread::ProcessNextEvent (this=0x7fc4c25780, aMayWait=<optimized out>, aResult=0x7f1f608cb7) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:869 [...] #17 0x0000007ff6a0289c in ?? () from /lib64/libc.so.6 (gdb)It seems that a call is now being made to GLScreenBuffer::Size() where the GLScreenBuffer object that the method belongs to doesn't exist.
This is fine. At present I really only care about getting the browser to work. If the change I just made turns out to be necessary I'll have to return to this and fix it, so I'll need this backtrace, but right now I have to forge on and focus on the WebGL rendering.
I'm now checking the following change:
@@ -1081,7 +1251,11 @@ static void FillContextAttribs(bool es3, bool useGles, nsTArray<EGLint>* out) { } else #endif { - out->AppendElement(LOCAL_EGL_PBUFFER_BIT); + if (useWindow) { + out->AppendElement(LOCAL_EGL_WINDOW_BIT); + } else { + out->AppendElement(LOCAL_EGL_PBUFFER_BIT); + } } if (useGles) {As we can see, previously it was always LOCAL_EGL_PBUFFER_BIT being appended. Now there's a switch it could potentially change. Here's what the debugger shows us:
Thread 8 "GeckoWorkerThre" hit Breakpoint 1, mozilla::gl:: FillContextAttribs (out=0x7fdf293b78, useWindow=false, useGles=false, es3=false) at ${PROJECT}/gecko-dev/gfx/gl/GLContextProviderEGL.cpp:1245 1245 ${PROJECT}/gecko-dev/gfx/gl/GLContextProviderEGL.cpp: No such file or directory. (gdb) p useWindow $1 = false (gdb)With useWindow set to false we'll still be appending LOCAL_EGL_PBUFFER_BIT, so this isn't a relevant change. It also appears to get called at initialisation, but not when the WebGL view is created. So this looks safe to leave as it is. We see something similar happening in CreateEGLPBufferOffscreenContextImpl():
@@ -1202,9 +1387,14 @@ RefPtr<GLContextEGL> GLContextEGL:: CreateEGLPBufferOffscreenContextImpl( } else #endif { - surface = GLContextEGL::CreatePBufferSurfaceTryingPowerOfTwo( - *egl, config, LOCAL_EGL_NONE, pbSize); + if (useWindow) { + surface = CreateEmulatorBufferSurface(egl->mLib, config, pbSize); + } else { + surface = GLContextEGL::CreatePBufferSurfaceTryingPowerOfTwo( + *egl, config, LOCAL_EGL_NONE, pbSize); + } }Again, what we want is for useWindow to be set to false and debugging the app shows this to be the case. We have to work a little harder for it this time though, because the variable is set based on conditions at the start of the method and the condition occurs further down. I would have place a breakpoint on the exact line but our partial build shenanigans has broken line-by-line debugging, so I've placed breakpoints on each of the called methods instead:
(gdb) break CreateEmulatorBufferSurface Breakpoint 6 at 0x7ff28c1538: file ${PROJECT}/gecko-dev/gfx/gl/ GLContextProviderEGL.cpp, line 982. (gdb) break GLContextEGL::CreatePBufferSurfaceTryingPowerOfTwo Breakpoint 7 at 0x7ff28ae934: file ${PROJECT}/gecko-dev/gfx/gl/ GLContextProviderEGL.cpp, line 911. (gdb) c Continuing. Thread 8 "GeckoWorkerThre" hit Breakpoint 7, mozilla::gl:: GLContextEGL::CreatePBufferSurfaceTryingPowerOfTwo (egl=..., config=config@entry=0x5555ab3950, bindToTextureFormat=bindToTextureFormat@entry=12344, pbsize=...) at ${PROJECT}/gecko-dev/gfx/gl/GLContextProviderEGL.cpp:911 911 in ${PROJECT}/gecko-dev/gfx/gl/GLContextProviderEGL.cpp (gdb)I'm going to try to tackle GLLibraryEGL::Init() now. There seems to be one significant chunk of code removed from this method. Adding it back in would be straightforward were it not for the fact the display value is no longer passed in to the method. Adding it back in required a cascade of changes, but in essence the important change is the following:
-bool GLLibraryEGL::Init(bool forceAccel, nsACString* const out_failureId) { +bool GLLibraryEGL::Init(bool forceAccel, nsACString* const out_failureId, EGLDisplay aDisplay) { MOZ_RELEASE_ASSERT(!mSymbols.fTerminate); mozilla::ScopedGfxFeatureReporter reporter("EGL"); @@ -501,6 +501,11 @@ bool GLLibraryEGL::Init(bool forceAccel, nsACString* const out_failureId) { } // - + std::shared_ptr<EglDisplay> defaultDisplay = CreateDisplay(forceAccel, out_failureId, aDisplay); + if (!defaultDisplay) { + return false; + } + mDefaultDisplay = defaultDisplay; InitLibExtensions();Now when I execute this code I get an almost immediate segfault:
Thread 38 "Compositor" received signal SIGSEGV, Segmentation fault. [Switching to LWP 32407] 0x0000007fe7e324cc in wl_proxy_marshal_constructor () from /usr/lib64/ libwayland-client.so.0 (gdb) bt #0 0x0000007fe7e324cc in wl_proxy_marshal_constructor () from /usr/lib64/ libwayland-client.so.0 #1 0x0000007fe7b8242c in ServerWaylandBuffer::ServerWaylandBuffer(unsigned int, unsigned int, int, int, android_wlegl*, wl_event_queue*) () from /usr/lib64/libhybris//eglplatform_wayland.so #2 0x0000007fe7b824c8 in WaylandNativeWindow::addBuffer() () from /usr/lib64/ libhybris//eglplatform_wayland.so #3 0x0000007fe7b81728 in WaylandNativeWindow::dequeueBuffer( BaseNativeWindowBuffer**, int*) () from /usr/lib64/libhybris// eglplatform_wayland.so #4 0x0000007fe7b48124 in BaseNativeWindow::_dequeueBuffer(ANativeWindow*, ANativeWindowBuffer**, int*) () from /usr/lib64/ libhybris-platformcommon.so.1 #5 0x0000007fe4f69188 in ?? () #6 0x0000000000000438 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb)It's not clear to me why the change I made has this effect, given all of this seems to be happening inside the Wayland code. However, by stepping through the code, even without the source lines, I'm been able to establish that the Init() method is called, as is the new code I added in. This new code executes without error, so there's nothing in the new code that's directly triggering the segfault. But some consequence of the change certainly is.
I've continued stepping through the method, out of the method and up the stack, but so far without yet hitting the crash. I'm going to continue looking at this, but to do so will have to wait until tomorrow now. I'm sure I'll get to the bottom of it and that we're approaching a solution, even if it is now taking longer than I'd have preferred.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comments
Uncover Disqus comments