List items
Items from the current list are shown below.
Blog
23 Jun 2024 : Day 267 #
I have a bit more time for development today than I had yesterday, so I'm hoping I can properly follow up on this issue I noticed yesterday with the library working or not, depending on which version of the package is installed.
As part of this, I want to explore what happens when I run a configuration with the "working WebGL" packages (i.e. the ones with all of the changes from my latest commit reverted), plus my latest library, but also running the WebView rather than the browser.
I'm expecting this to fail, but it'll be interesting to see where.
[...]
And it does fail. But now I have a backtrace to inspect from it and it's a lot more interesting than the backtraces from the Wayland failure we've been getting so often recently. Here's the backtrace:
To be honest, this is just what I'd expect. But it also tells us that this whole process hasn't been in vain: cutting out things brought us to a similar point to before, but we're closer to resolving both the WebGL and WebView issues this time.
The next step is to establish whether the new SwapChain is getting used. I'd previously thought it was never used by the browser, but I have a new perspective now: although it's not used when rendering general web pages, maybe it's used when rending WebGL within a page? Most pages don't do this, but when they do, I'm now expecting there to be some offscreen rendering.
I've placed a breakpoint on the SwapChain constructor. To start with, here's where the SwapChain gets created when using a WebView component. This is for comparison, captured using the latest code:
When I render a website without WebGL (e.g. the Jolla site) the constructor goes unused. But if I visit a site that uses WebGL (e.g. my personal website where the animated background is generated using a WebGL shader) it does get hit. It comes with a crazy long backtrace that shows it's happening inside a DOM element, which is again what I'd expect. I've chopped quite a lot out from the below backtrace, but still kept the parts I think are most relevant:
[...]
I've now spent a good few hours looking through the WebGLContext code, since this is what we see in the backtrace above. There's definitely something in the idea that we should be using this instead of GLContext. But WebGLContext isn't inheriting anything from GLContext and their interfaces look quite different to me. It certainly isn't the case that one would be a drop-in replacement for the other. Quite the contrary in fact. While switching to use WebGLContext might be a better solution in the long-term, I've convinced myself (again) that this isn't what we need right now.
So I'm going back to my original plan, but now we're going in the opposite direction. Rather than removing code I'll now start to reintroduce code. In particular, the one thing I'm convinced that we can't do without is the GLScreenBuffer object, as encapsulated in the GLContext::mScreen member variable.
So I'm adding this class back in. Thankfully git makes this a very easy process:
With just these two files reverted, attempting to build throws up a whole host of errors. Here are just a few:
It's going to be the convex hull of the GLScreenBuffer dependencies.
[...]
I've got to the stage where the partial build seems to be compiling. But it required changes to the EmbedLite code, which I don't yet have a method of including in the partial build. But it's already late here, so I'm going to set the full build running overnight and see where that gets us.
Today has been a very productive day of development. If I can be similarly productive tomorrow, I'll feel like all of the work I've been putting in over the last week, despite the slow progress, will nevertheless have been worth it.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
As part of this, I want to explore what happens when I run a configuration with the "working WebGL" packages (i.e. the ones with all of the changes from my latest commit reverted), plus my latest library, but also running the WebView rather than the browser.
I'm expecting this to fail, but it'll be interesting to see where.
[...]
And it does fail. But now I have a backtrace to inspect from it and it's a lot more interesting than the backtraces from the Wayland failure we've been getting so often recently. Here's the backtrace:
Thread 38 "Compositor" received signal SIGSEGV, Segmentation fault. [Switching to LWP 9220] 0x0000007ff110864c in mozilla::gl::SwapChain::OffscreenSize (this=<optimized out>) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:290 290 ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h: No such file or directory. (gdb) bt #0 0x0000007ff110864c in mozilla::gl::SwapChain::OffscreenSize ( this=<optimized out>) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:290 #1 0x0000007ff3666230 in mozilla::embedlite::EmbedLiteCompositorBridgeParent:: CompositeToDefaultTarget (this=0x7fc4ad76f0, aId=...) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:290 #2 0x0000007ff12b64d8 in mozilla::layers::CompositorVsyncScheduler:: ForceComposeToTarget (this=0x7fc4c39f60, aTarget=aTarget@entry=0x0, aRect=aRect@entry=0x0) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/layers/LayersTypes.h: 82 #3 0x0000007ff12b6534 in mozilla::layers::CompositorBridgeParent:: ResumeComposition (this=this@entry=0x7fc4ad76f0) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313 #4 0x0000007ff12b65c0 in mozilla::layers::CompositorBridgeParent:: ResumeCompositionAndResize (this=0x7fc4ad76f0, x=<optimized out>, y=<optimized out>, width=<optimized out>, height=<optimized out>) at ${PROJECT}/gecko-dev/gfx/layers/ipc/CompositorBridgeParent.cpp:794 #5 0x0000007ff12af15c in mozilla::detail::RunnableMethodArguments<int, int, int, int>::applyImpl<mozilla::layers::CompositorBridgeParent, void (mozilla: :layers::CompositorBridgeParent::*)(int, int, int, int), StoreCopyPassByConstLRef<int>, StoreCopyPassByConstLRef<int>, StoreCopyPassByConstLRef<int>, StoreCopyPassByConstLRef<int>, 0ul, 1ul, 2ul, 3ul> (args=..., m=<optimized out>, o=<optimized out>) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1151 #6 mozilla::detail::RunnableMethodArguments<int, int, int, int>::apply<mozilla: :layers::CompositorBridgeParent, void (mozilla::layers:: CompositorBridgeParent::*)(int, int, int, int)> (m=<optimized out>, o=<optimized out>, this=<optimized out>) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1154 #7 mozilla::detail::RunnableMethodImpl<mozilla::layers:: CompositorBridgeParent*, void (mozilla::layers::CompositorBridgeParent::*)( int, int, int, int), true, (mozilla::RunnableKind)0, int, int, int, int>:: Run (this=<optimized out>) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1201 #8 0x0000007ff0801ab8 in nsThread::ProcessNextEvent (this=0x7fc4c01730, aMayWait=<optimized out>, aResult=0x7f1796bcb7) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsCOMPtr.h:869 #9 0x0000007ff07f098c in NS_ProcessNextEvent (aThread=<optimized out>, aThread@entry=0x7fc4c01730, aMayWait=aMayWait@entry=false) at ${PROJECT}/gecko-dev/xpcom/threads/nsThreadUtils.cpp:466 #10 0x0000007ff0bbcab0 in mozilla::ipc::MessagePumpForNonMainThreads::Run ( this=0x7edc001840, aDelegate=0x7f1796bdc0) at ${PROJECT}/gecko-dev/ipc/glue/MessagePump.cpp:300 #11 0x0000007ff0b7b87c in MessageLoop::RunInternal ( this=this@entry=0x7f1796bdc0) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313 #12 0x0000007ff0b7bac0 in MessageLoop::RunHandler (this=0x7f1796bdc0) at ${PROJECT}/gecko-dev/ipc/chromium/src/base/message_loop.cc:352 #13 MessageLoop::Run (this=this@entry=0x7f1796bdc0) at ${PROJECT}/gecko-dev/ipc/chromium/src/base/message_loop.cc:334 #14 0x0000007ff08034b8 in nsThread::ThreadFunc (aArg=0x7fc4c018d0) at ${PROJECT}/gecko-dev/xpcom/threads/nsThread.cpp:392 #15 0x0000007feca419f0 in ?? () from /usr/lib64/libnspr4.so #16 0x0000007fefd05a4c in ?? () from /lib64/libpthread.so.0 #17 0x0000007ff6a0289c in ?? () from /lib64/libc.so.6While we're here, let's do a little exploration into why this crash occurred using the debugger.
(gdb) frame 1 #1 0x0000007ff3666230 in mozilla::embedlite::EmbedLiteCompositorBridgeParent:: CompositeToDefaultTarget (this=0x7fc4ad76f0, aId=...) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:290 290 ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h: No such file or directory. (gdb) p context $1 = (mozilla::gl::GLContext *) 0x7edc19ede0 (gdb) p context->mSwapChain $2 = { mTuple = {<mozilla::detail::CompactPairHelper<mozilla::gl::SwapChain*, mozilla::DefaultDelete<mozilla::gl::SwapChain>, (mozilla::detail:: StorageType)1, (mozilla::detail::StorageType)0>> = {<mozilla:: DefaultDelete<mozilla::gl::SwapChain>> = {<No data fields>}, mFirstA = 0x7edc1ce070}, <No data fields>}} (gdb) p context->mSwapChain.mTuple $3 = {<mozilla::detail::CompactPairHelper<mozilla::gl::SwapChain*, mozilla:: DefaultDelete<mozilla::gl::SwapChain>, (mozilla::detail::StorageType)1, ( mozilla::detail::StorageType)0>> = {<mozilla::DefaultDelete<mozilla::gl:: SwapChain>> = {<No data fields>}, mFirstA = 0x7edc1ce070}, <No data fields>} (gdb) p context->mSwapChain.mTuple.mFirstA $4 = (mozilla::gl::SwapChain *) 0x7edc1ce070 (gdb) p context->mSwapChain.mTuple.mFirstA->mPresenter $5 = (mozilla::gl::SwapChainPresenter *) 0x7edc1a1300 (gdb) p context->mSwapChain.mTuple.mFirstA->mPresenter->mBackBuffer $6 = std::shared_ptr<mozilla::gl::SharedSurface> (empty) = {get() = 0x0} (gdb)What's this telling us? Well, it's very similar to the crash we got back on Day 177 when we first started trying out the WebView. The SwapChain is being created and accessed, but it's deep inside the object that the problem occurs: it's the SharedSurface backbuffer object stored inside the SwapChainPresenter object, stored inside a smart pointer, stored inside the GLContext that's stored inside the SwapChain that's not been set:
(gdb) p context->mSwapChain.mTuple.mFirstA->mPresenter->mBackBuffer $6 = std::shared_ptr<mozilla::gl::SharedSurface> (empty) = {get() = 0x0}This might be an initialisation issue, or it might be more involved. It's not quite the same as what was happening on Day 177 since the code is different this time. But the underlying issue is the same.
To be honest, this is just what I'd expect. But it also tells us that this whole process hasn't been in vain: cutting out things brought us to a similar point to before, but we're closer to resolving both the WebGL and WebView issues this time.
The next step is to establish whether the new SwapChain is getting used. I'd previously thought it was never used by the browser, but I have a new perspective now: although it's not used when rendering general web pages, maybe it's used when rending WebGL within a page? Most pages don't do this, but when they do, I'm now expecting there to be some offscreen rendering.
I've placed a breakpoint on the SwapChain constructor. To start with, here's where the SwapChain gets created when using a WebView component. This is for comparison, captured using the latest code:
=============== Preparing offscreen rendering context =============== [Switching to LWP 9891] Thread 37 "Compositor" hit Breakpoint 1, mozilla::gl::SwapChain:: SwapChain (this=0x7ee01ce090) at ${PROJECT}/gecko-dev/gfx/gl/GLScreenBuffer.h:63 63 ${PROJECT}/gecko-dev/gfx/gl/GLScreenBuffer.h: No such file or directory. (gdb) bt #0 mozilla::gl::SwapChain::SwapChain (this=0x7ee01ce090) at ${PROJECT}/gecko-dev/gfx/gl/GLScreenBuffer.h:63 #1 0x0000007ff3666ac0 in mozilla::embedlite::EmbedLiteCompositorBridgeParent:: PrepareOffscreen (this=this@entry=0x7fc4b01c50) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/cxxalloc.h:33 #2 0x0000007ff3666b7c in mozilla::embedlite::EmbedLiteCompositorBridgeParent:: AllocPLayerTransactionParent (this=0x7fc4b01c50, aBackendHints=..., aId=...) at ${PROJECT}/gecko-dev/mobile/sailfishos/embedthread/ EmbedLiteCompositorBridgeParent.cpp:90 #3 0x0000007ff0c63d90 in mozilla::layers::PCompositorBridgeParent:: OnMessageReceived (this=0x7fc4b01c50, msg__=...) at PCompositorBridgeParent.cpp:1285 [...] #18 0x0000007ff6a0289c in ?? () from /lib64/libc.so.6 (gdb)As we can see from this, it's created inside the EmbedLiteCompositorBridgeParent::PrepareOffscreen() method. Here's what the code looks like that's creating it, for reference:
void EmbedLiteCompositorBridgeParent::PrepareOffscreen() { fprintf(stderr, "=============== Preparing offscreen rendering context ===============\n"); const CompositorBridgeParent::LayerTreeState* state = CompositorBridgeParent:: GetIndirectShadowTree(RootLayerTreeId()); NS_ENSURE_TRUE(state && state->mLayerManager, ); GLContext* context = static_cast<CompositorOGL*>( state->mLayerManager->GetCompositor())->gl(); NS_ENSURE_TRUE(context, ); // TODO: The switch from GLSCreenBuffer to SwapChain needs completing // See: https://phabricator.services.mozilla.com/D75055 if (context->IsOffscreen()) { UniquePtr<SurfaceFactory> factory; if (context->GetContextType() == GLContextType::EGL) { // [Basic/OGL Layers, OMTC] WebGL layer init. factory = SurfaceFactory_EGLImage::Create(*context); } else { // [Basic Layers, OMTC] WebGL layer init. // Well, this *should* work... factory = MakeUnique<SurfaceFactory_Basic>(*context); } SwapChain* swapChain = context->GetSwapChain(); if (swapChain == nullptr) { swapChain = new SwapChain(); new SwapChainPresenter(*swapChain); context->mSwapChain.reset(swapChain); } if (factory) { swapChain->Morph(std::move(factory)); } } }Now I want to know whether it's ever used by the browser using an execution flow that doesn't depend on EmbedLite.
When I render a website without WebGL (e.g. the Jolla site) the constructor goes unused. But if I visit a site that uses WebGL (e.g. my personal website where the animated background is generated using a WebGL shader) it does get hit. It comes with a crazy long backtrace that shows it's happening inside a DOM element, which is again what I'd expect. I've chopped quite a lot out from the below backtrace, but still kept the parts I think are most relevant:
Thread 8 "GeckoWorkerThre" hit Breakpoint 1, mozilla::gl::SwapChain:: SwapChain (this=0x7fc9ce3588) at ${PROJECT}/gecko-dev/gfx/gl/GLScreenBuffer.h:63 63 ${PROJECT}/gecko-dev/gfx/gl/GLScreenBuffer.h: No such file or directory. (gdb) bt #0 mozilla::gl::SwapChain::SwapChain (this=0x7fc9ce3588) at ${PROJECT}/gecko-dev/gfx/gl/GLScreenBuffer.h:63 #1 0x0000007ff369a54c in mozilla::WebGLContext::WebGLContext ( this=0x7fc9ce30f0, host=..., desc=...) at include/c++/8.3.0/bits/move.h:74 #2 0x0000007ff36a9c90 in mozilla::WebGLContext::<lambda()>::operator() ( __closure=<optimized out>) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/cxxalloc.h:33 #3 mozilla::WebGLContext::Create (host=..., desc=..., out=out@entry=0x7fcb9660c8) at ${PROJECT}/gecko-dev/dom/canvas/WebGLContext.cpp:562 #4 0x0000007ff3661920 in mozilla::HostWebGLContext::Create (ownerData=..., desc=..., out=out@entry=0x7fcb9660c8) at ${PROJECT}/gecko-dev/dom/canvas/HostWebGLContext.cpp:59 #5 0x0000007ff3691374 in mozilla::ClientWebGLContext::<lambda()>::operator() ( __closure=<optimized out>) at ${PROJECT}/gecko-dev/dom/canvas/ClientWebGLContext.cpp:625 #6 mozilla::ClientWebGLContext::CreateHostContext ( this=this@entry=0x7fc9991820, requestedSize=...) at ${PROJECT}/gecko-dev/dom/canvas/ClientWebGLContext.cpp:654 #7 0x0000007ff3691e5c in mozilla::ClientWebGLContext::SetDimensions ( this=0x7fc9991820, signedWidth=<optimized out>, signedHeight=<optimized out>) at ${PROJECT}/gecko-dev/dom/canvas/ClientWebGLContext.cpp:563 #8 0x0000007ff362b27c in mozilla::dom::CanvasRenderingContextHelper:: UpdateContext (this=0x7e6036c790, aCx=<optimized out>, aNewContextOptions=..., aRvForDictionaryInit=...) at ${PROJECT}/gecko-dev/dom/canvas/CanvasRenderingContextHelper.cpp:238 #9 0x0000007ff363a348 in mozilla::dom::CanvasRenderingContextHelper:: GetContext (this=this@entry=0x7e6036c790, aCx=0x7fc81defd0, aContextId=..., aContextOptions=..., aRv=...) at ${PROJECT}/gecko-dev/dom/canvas/CanvasRenderingContextHelper.cpp:190 #10 0x0000007ff390bf18 in mozilla::dom::HTMLCanvasElement::GetContext ( this=this@entry=0x7e6036c710, aCx=aCx@entry=0x7fc81defd0, aContextId=..., aContextOptions=aContextOptions@entry=..., aRv=...) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/js/Value.h:670 #11 0x0000007ff3549764 in mozilla::dom::HTMLCanvasElement_Binding::getContext ( cx=0x7fc81defd0, obj=..., void_self=0x7e6036c710, args=...) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/js/RootingAPI.h:1297 #12 0x0000007ff35e0bec in mozilla::dom::binding_detail::GenericMethod<mozilla:: dom::binding_detail::NormalThisPolicy, mozilla::dom::binding_detail:: ThrowExceptions> (cx=0x7fc81defd0, argc=<optimized out>, vp=<optimized out>) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/js/CallArgs.h:207 #13 0x0000007ff4e7d5d4 in CallJSNative (args=..., reason=js::CallReason::Call, native=0x7ff35e09ac <mozilla::dom::binding_detail::GenericMethod<mozilla:: dom::binding_detail::NormalThisPolicy, mozilla::dom::binding_detail:: ThrowExceptions>(JSContext*, unsigned int, JS::Value*)>, cx=0x7fc81defd0) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/js/CallArgs.h:285 #14 js::InternalCallOrConstruct (cx=cx@entry=0x7fc81defd0, args=..., construct=construct@entry=js::NO_CONSTRUCT, reason=reason@entry=js:: CallReason::Call) at ${PROJECT}/gecko-dev/js/src/vm/Interpreter.cpp:511 [...] #63 0x0000007fefbb189c in ?? () from /lib64/libc.so.6 (gdb)For comparison, it's interesting to check whether the backbuffer is already created at this point. The debugger suggests not:
(gdb) p mPresenter->mBackBuffer $3 = std::shared_ptr<mozilla::gl::SharedSurface> (expired, weak count 0) = {get( ) = 0x21}On this version of the browser the WebGL is working, using offscreen rendering, but the WebView is broken. So now I'm rethinking my need to introduce all the old GLScreenBuffer code. Could I try to use the ~SwapChain after all? You may recall that I already considered this much earlier, tried it, failed and then reassessed. Maybe I now know more, enough to make it work now? I'm going to look carefully through the code and reconsider.
[...]
I've now spent a good few hours looking through the WebGLContext code, since this is what we see in the backtrace above. There's definitely something in the idea that we should be using this instead of GLContext. But WebGLContext isn't inheriting anything from GLContext and their interfaces look quite different to me. It certainly isn't the case that one would be a drop-in replacement for the other. Quite the contrary in fact. While switching to use WebGLContext might be a better solution in the long-term, I've convinced myself (again) that this isn't what we need right now.
So I'm going back to my original plan, but now we're going in the opposite direction. Rather than removing code I'll now start to reintroduce code. In particular, the one thing I'm convinced that we can't do without is the GLScreenBuffer object, as encapsulated in the GLContext::mScreen member variable.
So I'm adding this class back in. Thankfully git makes this a very easy process:
$ git checkout gfx/gl/GLScreenBuffer.cpp $ git checkout gfx/gl/GLScreenBuffer.hThis is the minimal change I think is needed to get the WebView working again. I'm building from a base where WebGL is working. So I feel like I'm back on track again.
With just these two files reverted, attempting to build throws up a whole host of errors. Here are just a few:
${PROJECT}/gecko-dev/gfx/gl/GLScreenBuffer.cpp: In destructor ‘virtual mozilla:: gl::GLScreenBuffer: :~GLScreenBuffer()’: ${PROJECT}/gecko-dev/gfx/gl/GLScreenBuffer.cpp:205:8: error: invalid use of incomplete type ‘class mozilla::layers::SharedSurfaceTextureClient’ mBack->Surf()->ProducerRelease(); ^~ In file included from ${PROJECT}/gecko-dev/gfx/gl/GLScreenBuffer.h:23, from ${PROJECT}/gecko-dev/gfx/gl/GLScreenBuffer.cpp:6: ${PROJECT}/gecko-dev/gfx/gl/SharedSurface.h:45:7: note: forward declaration of ‘class mozilla::laye rs::SharedSurfaceTextureClient’ class SharedSurfaceTextureClient; ^~~~~~~~~~~~~~~~~~~~~~~~~~ ${PROJECT}/gecko-dev/gfx/gl/GLScreenBuffer.cpp: In member function ‘void mozilla::gl::GLScreenBuffe r::BindFB(GLuint)’: ${PROJECT}/gecko-dev/gfx/gl/GLScreenBuffer.cpp:218:10: error: ‘class mozilla:: gl::GLContext’ has no member named ‘raw_fBindFramebuffer’; did you mean ‘raw_fBlitFramebuffer’? mGL->raw_fBindFramebuffer(LOCAL_GL_FRAMEBUFFER, mInternalDrawFB); ^~~~~~~~~~~~~~~~~~~~ raw_fBlitFramebufferThis isn't unexpected. My process now is to reintroduce removed code, but only where absolutely necessary to get the build working again. So I'm essentially doing the opposite of what I was doing before: adding code rather than removing it. As before, git is my biggest help here because it's kept a neat record of everything I've changed. I'm reverting it in small pieces, so it's taking a while to make the changes, but I'm still satisfied that what I'll end up with is the smallest set of changes I can reasonably expect, given that we've added the GlScreenBuffer class back in.
It's going to be the convex hull of the GLScreenBuffer dependencies.
[...]
I've got to the stage where the partial build seems to be compiling. But it required changes to the EmbedLite code, which I don't yet have a method of including in the partial build. But it's already late here, so I'm going to set the full build running overnight and see where that gets us.
Today has been a very productive day of development. If I can be similarly productive tomorrow, I'll feel like all of the work I've been putting in over the last week, despite the slow progress, will nevertheless have been worth it.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comments
Uncover Disqus comments