flypig.co.uk

List items

Items from the current list are shown below.

Gecko

1 Jul 2024 : Day 275 #
After completing a partial build yesterday we were left with working WebGL when using the browser and a partially working WebView. My suspicion was that a regression has resulted in the same hanging that we experienced much earlier in the process, but I have yet to investigate that.

More to the point, I want to find out where SharedSurface_Basic objects get created (there may be multiple) during each of the different permutations of rendering involving the browser, WebView, non-WebGL-adorned pages and pages with WebGL content.

So that I can do this properly using the debugger I left a build running overnight. When I got up this morning it hadn't yet completed, but it was in the next best state: everything was already being packaged up into rpms. I've never seen it fail from this point onwards and it would usually take no more than a few minutes from here, so this was good to see.
 
A console showing the gecko build about to finish

And indeed it did complete very quickly. So the build was successful and I now have packages to test. But before I do test them, I'm going to run with the original working WebGL packages to find out where the ShareSurface_Basic objects, if any, get created. To test this I've placed a breakpoint on the ShareSurface_Basic::Create() method.

What I find is that there's no call to ShareSurface_Basic::Create() when using the browser generally, unless there's WebGL content on the page. If there is WebGL content it gets called like this:
Thread 8 "GeckoWorkerThre" hit Breakpoint 1, mozilla::gl::
    SharedSurface_Basic::Create (desc=...)
    at gfx/gl/SharedSurfaceGL.cpp:20
20          const SharedSurfaceDesc& desc) {
(gdb) bt
#0  mozilla::gl::SharedSurface_Basic::Create (desc=...)
    at gfx/gl/SharedSurfaceGL.cpp:20
#1  0x0000007ff28a8130 in mozilla::gl::SurfaceFactory_Basic::CreateSharedImpl (
    this=<optimized out>, desc=...)
    at gfx/gl/SharedSurfaceGL.h:41
#2  0x0000007ff28aaee8 in mozilla::gl::SurfaceFactory::CreateShared (size=..., 
    this=0x7fc98a1980)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefCounted.h:240
#3  mozilla::gl::SwapChain::Acquire (this=this@entry=0x7fc949ac48, size=...)
    at gfx/gl/GLScreenBuffer.cpp:56
#4  0x0000007ff369d340 in mozilla::WebGLContext::PresentInto (
    this=this@entry=0x7fc949a7b0, swapChain=...)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:290
#5  0x0000007ff369d7ac in mozilla::WebGLContext::Present (
    this=this@entry=0x7fc949a7b0, xrFb=<optimized out>,
    consumerType=consumerType@entry=mozilla::layers::TextureType::Unknown, 
    webvr=webvr@entry=false)
    at dom/canvas/WebGLContext.cpp:9
36
#6  0x0000007ff36656a8 in mozilla::HostWebGLContext::Present (webvr=false, 
    t=mozilla::layers::TextureType::Unknown, xrFb=<optimized out>,
    this=<optimized out>) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/
    mozilla/RefPtr.h:280
#7  mozilla::ClientWebGLContext::Run<void (mozilla::HostWebGLContext::*)(
    unsigned long, mozilla::layers::TextureType, bool) const, &(mozilla::
    HostWebGLCont
ext::Present(unsigned long, mozilla::layers::TextureType, bool) const), 
    unsigned long, mozilla::layers::TextureType const&, bool const&> (
    this=<optimized out>, args#0=@0x7fdf2f26c0: 0, args#1=@0x7fdf2f26bf: 
    mozilla::layers::TextureType::Unknown, args#2=@0x7fdf2f26be: false)
    at dom/canvas/ClientWebGLContext
.cpp:313
#8  0x0000007ff3665810 in mozilla::ClientWebGLContext::Present (
    this=this@entry=0x7fc9564c10, xrFb=xrFb@entry=0x0, type=<optimized out>,
    webvr=<optimized out>, webvr@entry=false)
    at dom/canvas/ClientWebGLContext
.cpp:363
#9  0x0000007ff3691020 in mozilla::ClientWebGLContext::OnBeforePaintTransaction 
    (this=0x7fc9564c10)
    at dom/canvas/ClientWebGLContext
.cpp:345
#10 0x0000007ff290058c in mozilla::layers::CanvasRenderer::
    FirePreTransactionCallback (this=this@entry=0x7fc9840c50)
    at gfx/layers/CanvasRenderer.cpp
:75
#11 0x0000007ff29b1228 in mozilla::layers::ShareableCanvasRenderer::
    UpdateCompositableClient (this=0x7fc9840c50)
    at gfx/layers/ShareableCanvasRen
derer.cpp:192
#12 0x0000007ff29f0dc4 in mozilla::layers::ClientCanvasLayer::RenderLayer (
    this=0x7fc989ee20)
    at gfx/layers/client/ClientCanva
sLayer.cpp:25
#13 0x0000007ff29efee4 in mozilla::layers::ClientLayer::RenderLayerWithReadback 
    (this=<optimized out>, aReadback=<optimized out>)
    at gfx/layers/client/ClientLayer
Manager.h:365
#14 0x0000007ff2a00224 in mozilla::layers::ClientContainerLayer::RenderLayer (
    this=0x7fc9898670)
    at gfx/layers/Layers.h:1051
#15 0x0000007ff29efee4 in mozilla::layers::ClientLayer::RenderLayerWithReadback 
    (this=<optimized out>, aReadback=<optimized out>)
    at gfx/layers/client/ClientLayer
Manager.h:365
#16 0x0000007ff2a00224 in mozilla::layers::ClientContainerLayer::RenderLayer (
    this=0x7fc963f3f0)
    at gfx/layers/Layers.h:1051
#17 0x0000007ff29efee4 in mozilla::layers::ClientLayer::RenderLayerWithReadback 
    (this=<optimized out>, aReadback=<optimized out>)
    at gfx/layers/client/ClientLayer
Manager.h:365
#18 0x0000007ff2a00224 in mozilla::layers::ClientContainerLayer::RenderLayer (
    this=0x7fc984bb20)
    at gfx/layers/Layers.h:1051
#19 0x0000007ff2a06f08 in mozilla::layers::ClientLayerManager::
    EndTransactionInternal (this=this@entry=0x7fc8b18820, 
    aCallback=aCallback@entry=
    0x7ff46a379c <mozilla::FrameLayerBuilder::DrawPaintedLayer(mozilla::layers::
    PaintedLayer*, gfxContext*, mozilla::gfx::IntRegionTyped<mozilla::gfx::Unkn
ownUnits> const&, mozilla::gfx::IntRegionTyped<mozilla::gfx::UnknownUnits> 
    const&, mozilla::layers::DrawRegionClip, mozilla::gfx::
    IntRegionTyped<mozilla::g
fx::UnknownUnits> const&, void*)>, 
    aCallbackData=aCallbackData@entry=0x7fdf2f3268)
    at gfx/layers/client/ClientLayerManager.cpp:341
#20 0x0000007ff2a11e08 in mozilla::layers::ClientLayerManager::EndTransaction (
    this=0x7fc8b18820, 
    aCallback=0x7ff46a379c <mozilla::FrameLayerBuilder::DrawPaintedLayer(
    mozilla::layers::PaintedLayer*, gfxContext*, mozilla::gfx::
    IntRegionTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::
    IntRegionTyped<mozilla::gfx::UnknownUnits> const&, mozilla::layers::
    DrawRegionClip, mozilla::gfx::IntRegionTyped<mozilla::gfx::UnknownUnits> 
    const&, void*)>, aCallbackData=0x7fdf2f3268, aFlags=mozilla::layers::
    LayerManager::END_DEFAULT)
    at gfx/layers/client/ClientLayerManager.cpp:397
#21 0x0000007ff46a0bbc in nsDisplayList::PaintRoot (
    this=this@entry=0x7fdf2f5078, aBuilder=aBuilder@entry=0x7fdf2f3268, 
    aCtx=aCtx@entry=0x0, 
    aFlags=aFlags@entry=13, aDisplayListBuildTime=...)
    at layout/painting/nsDisplayList.cpp:2622
#22 0x0000007ff442cf18 in nsLayoutUtils::PaintFrame (
    aRenderingContext=aRenderingContext@entry=0x0, 
    aFrame=aFrame@entry=0x7fc93460b0, aDirtyRegion=..., 
    aBackstop=aBackstop@entry=4294967295, 
    aBuilderMode=aBuilderMode@entry=nsDisplayListBuilderMode::Painting, 
    aFlags=aFlags@entry=(nsLayoutUtils::PaintFrameFlags::WidgetLayers | 
    nsLayoutUtils::PaintFrameFlags::ExistingTransaction | nsLayoutUtils::
    PaintFrameFlags::NoComposite)) at ${PROJECT}/obj-build-mer-qt-xr/dist/
    include/mozilla/MaybeStorageBase.h:80
#23 0x0000007ff43b760c in mozilla::PresShell::Paint (
    this=this@entry=0x7fc92c2810, aViewToPaint=aViewToPaint@entry=0x7fc92a0d20, 
    aDirtyRegion=..., 
    aFlags=aFlags@entry=mozilla::PaintFlags::PaintLayers)
    at layout/base/PresShell.cpp:6400
#24 0x0000007ff41ef4dc in nsViewManager::ProcessPendingUpdatesPaint (
    this=this@entry=0x7fc92a0cb0, aWidget=aWidget@entry=0x7fc9250910)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/gfx/RectAbsolute.h:43
[...]
#57 0x0000007fefbae89c in ?? () from /lib64/libc.so.6
(gdb) 
That's a rather long backtrace, but it captures what I need to know, which is that it's not getting called from the CompositorOGL code, but rather from the painting code. That's because it's being created specifically for the WebGL entity on the page.

It'll also be useful to find out whether CompositorOGL::CreateContext() gets called when rendering browser pages. That's useful because it'll tell us why the code there isn't creating a SharedSurface_Basic object. So I've placed a breakpoint on the method and will step through to find out. This, again, is using the code installed from our original packages with working WebGL and on a page that includes WebGL content:
Thread 37 &quot;Compositor&quot; hit Breakpoint 3, mozilla::layers::
    CompositorOGL::CreateContext (this=this@entry=0x7ed81a1c80)
    at gfx/layers/opengl/CompositorOGL.cpp:227
227     already_AddRefed<mozilla::gl::GLContext> CompositorOGL::CreateContext() 
    {
(gdb) n
231       nsIWidget* widget = mWidget->RealWidget();
(gdb) n
232       void* widgetOpenGLContext =
(gdb) n
234       if (widgetOpenGLContext) {
(gdb) p widgetOpenGLContext
$1 = (void *) 0x7ed81a6fe0
(gdb) n
mozilla::layers::CompositorOGL::Initialize (this=0x7ed81a1c80, 
    out_failureReason=0x7f1651d510)
    at gfx/layers/opengl/CompositorOGL.cpp:395
395         mGLContext = CreateContext();
Stepping through the code, after reaching the condition on widgetOpenGLContext, we then find ourselves outside the CompositorOGL::CreateContext() method:
(gdb) bt 2
#0  mozilla::layers::CompositorOGL::Initialize (this=0x7ed81a1c80, 
    out_failureReason=0x7f1651d510)
    at gfx/layers/opengl/CompositorOGL.cpp:395
#1  0x0000007ff2a66d88 in mozilla::layers::CompositorBridgeParent::
    NewCompositor (this=this@entry=0x7fc89ab730, aBackendHints=...)
    at gfx/layers/ipc/CompositorBridgeParent.cpp:1493
(More stack frames follow...)
(gdb) 
That's because of the early return from near the start of the method, as we can see here:
already_AddRefed<mozilla::gl::GLContext> CompositorOGL::CreateContext() {
  RefPtr<GLContext> context;

  // Used by mock widget to create an offscreen context
  nsIWidget* widget = mWidget->RealWidget();
  void* widgetOpenGLContext =
      widget ? widget->GetNativeData(NS_NATIVE_OPENGL_CONTEXT) : nullptr;
  if (widgetOpenGLContext) {
    GLContext* alreadyRefed = reinterpret_cast<GLContext*>(widgetOpenGLContext);
    return already_AddRefed<GLContext>(alreadyRefed);
  }
[...]
So what we can infer from this is, in short, that widget->GetNativeData(NS_NATIVE_OPENGL_CONTEXT) is set in the case of WebGL rendering. but not when it's WebView rendering.

It turns out we get the same flow when rendering a page without WebGL, so this isn't a consequence of the WebGL on the page, it's just because we're using the browser rather than a WebView, as we can see here:
Thread 39 &quot;Compositor&quot; hit Breakpoint 3, mozilla::layers::
    CompositorOGL::CreateContext (this=this@entry=0x7ed81a1c70)
    at gfx/layers/opengl/CompositorOGL.cpp:227
227     already_AddRefed<mozilla::gl::GLContext> CompositorOGL::CreateContext() 
    {
(gdb) bt 3
#0  mozilla::layers::CompositorOGL::CreateContext (this=this@entry=0x7ed81a1c70)
    at gfx/layers/opengl/CompositorOGL.cpp:227
#1  0x0000007ff2950ff8 in mozilla::layers::CompositorOGL::Initialize (
    this=0x7ed81a1c70, out_failureReason=0x7f3633e510)
    at gfx/layers/opengl/CompositorOGL.cpp:395
#2  0x0000007ff2a66d88 in mozilla::layers::CompositorBridgeParent::
    NewCompositor (this=this@entry=0x7fc89a9d80, aBackendHints=...)
    at gfx/layers/ipc/CompositorBridgeParent.cpp:1493
(More stack frames follow...)
(gdb) n
231       nsIWidget* widget = mWidget->RealWidget();
(gdb) n
232       void* widgetOpenGLContext =
(gdb) n
234       if (widgetOpenGLContext) {
(gdb) p widgetOpenGLContext
$2 = (void *) 0x7ed81a6fd0
(gdb) n
79      ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h: No such 
    file or directory.
(gdb) n
mozilla::layers::CompositorOGL::Initialize (this=0x7ed81a1c70, 
    out_failureReason=0x7f3633e510)
    at gfx/layers/opengl/CompositorOGL.cpp:395
395         mGLContext = CreateContext();
(gdb) 
Okay, great. There's one other important piece of information that we need in order to unravel all this and that's that the code we added to the SharedSurface_Basic class in order to get the WebView working (it's not working yet, but we now know we'll need it) is actually only an extension of the SharedSurface_Basic code, without removing any existing code. In fact it adds a new constructor and new Create() method (as an override with a different signature), but without removing the original constructor or Create() method. Here are the old and new Create() methods as they appear side-by-side in the code:
  static UniquePtr<SharedSurface_Basic> Create(const SharedSurfaceDesc&);

  static UniquePtr<SharedSurface_Basic> Create(GLContext* gl,
                                               const gfx::IntSize& size);
The only code that's not strictly an addition happens in the SurfaceFactory_Basic class where the Create() method that's used to create the SharedSurface_Basic object in the CreateSharedImpl() method is switched from the old one to the new one.
 class SurfaceFactory_Basic final : public SurfaceFactory {
  public:
   explicit SurfaceFactory_Basic(GLContext& gl);
   explicit SurfaceFactory_Basic(GLContext* gl,
                                 const layers::TextureFlags& flags);
 
   virtual UniquePtr<SharedSurface> CreateSharedImpl(
       const SharedSurfaceDesc& desc) override {
-    return SharedSurface_Basic::Create(desc);
+    return SharedSurface_Basic::Create(mDesc.gl, desc.size);
   }
 };
It would be helpful if I could confirm this, so I've built a version of the library that's exactly the same as the version with broken WebGL and I've switched it to use the original SharedSurface_Basic::Create() method. That's the only change. And when I run it, the WebGL is working. So that's confirmed, it's definitely the new code that's been added to the SharedSurface_Basic class that's causing the problem. It's great to have this finally pinned down, but what's even better is that it looks like the problem can be bypassed simply by choosing a different constructor.

What does this all mean? It means that we know all of the following:
 
  1. The code added to the SharedSurface_Basic needed for WebView rendering is what's causing the WebGL to fail in the browser.
  2. The change that actually causes WebView rendering to fail is just one line: the switch of the Create() method in the SurfaceFactory_Basic::CreateSharedImpl() method.
  3. The calls to SurfaceFactory_Basic that are used to create the SharedSurface_Basic objects are in different places depending on whether they're used for the WebGL or the WebView. In the former case they happen in the painting loop. In the latter case they happen in CompositorOGL::CreateContext().


We can use all of this to our advantage. I've taken the SurfaceFactory_Basic class and created a new version, which I've called SurfaceFactory_GL. This performs the new constructor type. So we now have two versions, like this:
class SurfaceFactory_Basic final : public SurfaceFactory {
 public:
  explicit SurfaceFactory_Basic(GLContext& gl);
  explicit SurfaceFactory_Basic(GLContext* gl,
                                const layers::TextureFlags& flags);

  virtual UniquePtr<SharedSurface> CreateSharedImpl(
      const SharedSurfaceDesc& desc) override {
    return SharedSurface_Basic::Create(desc);
  }
};

class SurfaceFactory_GL final : public SurfaceFactory {
 public:
  explicit SurfaceFactory_GL(GLContext& gl);
  explicit SurfaceFactory_GL(GLContext* gl,
                                const layers::TextureFlags& flags);

  virtual UniquePtr<SharedSurface> CreateSharedImpl(
      const SharedSurfaceDesc& desc) override {
    return SharedSurface_Basic::Create(mDesc.gl, desc.size);
  }
};
Notice how they differ only in the parameters passed to the SharedSurface_Basic::Create() method. Together they'll allow us to create the SharedSurface_Basic objects needed for the two different pipelines. I may find I need something more nuanced in the future and, if so, I may end up having to create an entirely new type of SharedSurface. But if that does turn out to be the case then I now know it'll work out fine: it's all nice self-contained code.

Alright, I've made this change and I want to see it in action. The partial build shows good results, but I want to check things with the debugger again. So I've set off another full build. Let's find out where things are at when it's done.

Before I sign off for the day, let me say that I'm getting increasingly hopeful about this. It really feels like we've got all the information needed to fix the problem and that we now just need to carefully move all of the metaphorical girders into their correct positions.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.

Comments

Uncover Disqus comments