List items

Items from the current list are shown below.


10 Oct 2023 : Day 55 #
Yesterday I started looking at the rendering pipeline — currently broken — to try to make headway on task #1020. I posted a message on the mozilla graphics matrix channel in the evening and although I did get a response, so far nothing that's going to move things forwards.

So this morning I'm working through the D75055 changeset again. My hunch is that this holds the key. I'm going to write up my thoughts as I work through it.

However, this entire task does highlight a problem with this diary format. Up until now I've mostly had to tackle small incremental and well-defined changes. Rarely did I hit anything that couldn't be solved in one day, two days at most. That works well for a daily diary.

But if this task is going to require "deep thought"/"head scratching" for multiple days there will be very little to show for it.

So you'll have to forgive me if these diary entries turn into inarticulate mind dumps for a bit. I'm certain that writing out my thoughts can help crystallise my ideas, but whether they make sense or not to anyone else... well, That's to be seen.

So, back to rendering.

There are a few key files in this process, which you can see in the change set.

Core to all this is the GLContext class defined in GLContext.h and GLContext.cpp. These files provide base functionality for the graphics context.

Then we have the GLContextEGL class defined in GLContextEGL.h which inherits from GLContext. Just to add confusion the implementation for this subclass is provided in GLContextProviderEGL.cpp (there is no GLContextEGL.cpp).

The other crucial files for Sailfish OS are GLScreenBuffer.h and GLScreenBuffer.cpp. As you might expect these define a class called GLScreenBuffer in ESR 78. It looks like this provided the surfaces to render on and also the double-buffering which allowed gecko to render off-screen, switch buffers to show the result on-screen, then rinse and repeat.

This is where the big change happened. If you check the diff for the file between ESR 78 and ESR 91, you'll notice that this GLScreenBuffer class was completely removed. In its place we find a class called SwapChain.

But while GLScreenBuffer did all of this exciting stuff with surfaces, in contrast SwapChain is really sparse. It's sparse enough that I can list the entire definition of the class here without breaking a sweat.
class SwapChain final {
  friend class SwapChainPresenter;

  UniquePtr mFactory;
  bool mPreserve = false;

  std::queue> mPool;
  std::shared_ptr mFrontBuffer;

      mPrevFrontBuffer;  // Hold this ref while it's in-flight.
  SwapChainPresenter* mPresenter = nullptr;

  virtual ~SwapChain();

  void ClearPool();
  const auto& FrontBuffer() const { return mFrontBuffer; }
  UniquePtr Acquire(const gfx::IntSize&);

  const gfx::IntSize& Size() const;
  const gfx::IntSize& OffscreenSize() const;
  bool Resize(const gfx::IntSize & size);
  bool PublishFrame(const gfx::IntSize& size) { return Swap(size); }
  void Morph(UniquePtr newFactory);

  // Returns false on error or inability to resize.
  bool Swap(const gfx::IntSize& size);
Compare that to the GLScreenBuffer monstrosity from ESR 78. So one of the main questions I think I need to answer is "how should I make use of the SwapChain class? Is it a replacement for GLScreenBuffer? If it is, do I need to patch a SwapChain instance into GLContext in the same way GLScreenBuffer was or should it be stored somewhere else?"

Okay, that's actually several questions. I'll have to post all of this up on Matrix.

Before I do that I want to write up my meeting with Raine. As I've mentioned previously, Raine was the lead on the Gecko project while I was there and is super-knowledgeable about how it works; far more knowledgeable than I am. So if anyone can help with this, he can.

We discussed the patches I've already applied and the ones I've not applied but which may be necessary for the rendering pipeline. We also checked a few things to see what was — and wasn't — happening at run time.

This was a really helpful conversation.

We started out by looking at CompositorOGL::CreateContext() and whether this is being called. This is one of the early stages of the rendering pipeline which sets up the widget and rendering context, so it's a useful sanity check. We ran the debugger during the call to check it and yes, it is being called. Here's the call stack.
Thread 33 "Compositor" hit Breakpoint 1, mozilla::layers::CompositorOGL::
    CreateContext (this=this@entry=0x7eb8002e90)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/layers/opengl/
227	already_AddRefed CompositorOGL::CreateContext() {
(gdb) bt
#0  mozilla::layers::CompositorOGL::CreateContext (this=this@entry=0x7eb8002e90)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/layers/opengl/
#1  0x0000007fba7c9020 in mozilla::layers::CompositorOGL::Initialize
    (this=0x7eb8002e90, out_failureReason=0x7f002fd580)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/layers/opengl/
#2  0x0000007fba8dea5c in mozilla::layers::CompositorBridgeParent::
    NewCompositor (this=this@entry=0x7f8879e5b0, aBackendHints=...)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/layers/ipc/
#3  0x0000007fba8e9ad8 in mozilla::layers::CompositorBridgeParent::
    InitializeLayerManager (this=this@entry=0x7f8879e5b0, aBackendHints=...)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/layers/ipc/
#4  0x0000007fba8e9c08 in mozilla::layers::CompositorBridgeParent::
    AllocPLayerTransactionParent (this=0x7f8879e5b0, aBackendHints=..., aId=...)
    at /usr/src/debug/xulrunner-qt5-91.9.1-1.aarch64/gfx/layers/ipc/
#5  0x0000007fba27e828 in mozilla::layers::PCompositorBridgeParent::
    OnMessageReceived (this=0x7f8879e5b0, msg__=...) at
This all looks okay. Raine also noted that there's an important change made by Patch 0031 "Create EmbedLiteCompositorBridgeParent in CompositorManagerParent (part 2)". This switches out a call to gecko's CompositorBridgeParent constructor for a call to the EmbedLiteCompositorBridgeParent constructor instead. This is crucial because it essentially links the compositor and EmbedLite together. Without this there'll be no rendering.

As it happens I've not applied this patch. So this is the first thing to fix. It's only a small patch, so should be easy, but crucial.

We also looked at patch 0039 "Do not create CreateFallbackSurface". It's not clear how crucial this is right now. Some time ago Adam Pigg made some changes to the rendering pipeline to get it to work on native ports. We think the patch may have been part of these changes. Now while the changes were only needed for native ports (so not needed on the Xperia 10 II I'm testing on for example), there were also changes required to QtMozEmbed. It's possible that having the QtMozEmbed change without this patch will result in problems. Better to apply the patch.

Happily it's another self-contained patch, so should be easy to apply. For the same reason I should certainly apply patch 0038 "Fix mesa egl display and buffer initialisation" as well.

We checked that GetNativeData() is being called and that the values its being called with are correct:
$ MOZ_LOG="EmbedLiteTrace:5" sailfish-browser 2>&1 | grep GetNativeData
[Parent 16595: Compositor]: D/EmbedLiteTrace TRACE::virtual void*
  mozilla::embedlite::nsWindow::GetNativeData(uint32_t):181 t:74fc788480,
  DataType: 12
[Parent 16595: Unnamed thread 74fc0020b0]: D/EmbedLiteTrace TRACE::virtual void* mozilla::embedlite::EmbedLitePuppetWidget::GetNativeData(uint32_t):114 t: 74fc4528d0,
  DataType: 3
This is called as part of the CompositorOGL constructor, so the fact it's being called is a good sign. The DataType from the debug output is a reference to NS_NATIVE_OPENGL_CONTEXT defined in nsIWidget.h as the value 12, which is correct. So that's also promising.

When the browser screen comes up the Sailfish Browser code performs a dummy swap to render the screen. This is certainly happening because as Raine explained without this you don't even get the native QML user interface rendered, which it is. However, apart from the lack of the above patches, there are multiple other reasons why the render might not be running beyond that.

For example, we have various checks in place which intentionally pause or block rendering. These include SetIsActive and the SuspendRendering/ResumeRendering states, all of which need to be set appropriately (to one of true or false). Raine recommended that I go through and force all of these to the correct states.

When the browser runs it will usually toggle these states for performance reasons, for example if the browser goes into the background. But right now performance isn't our problem, rendering is, so better to make sure none of these are blocking rendering unintentionally.

Raine also noted that mUseExternalGLContext should be set to true in EmbedLiteCompositorBridgeParent. Previously this was set by a BoolVarCache value. But you may recall I stripped the code of all of these way back on Day 17. I set them all to hard coded values with the intention of switching them out for static prefs later. As part of this I had to choose what I thought were sensible fixed values to set them to.

Well, it turns out for this one I made the wrong choice! So I'll need to flip this to true. Small things can make a huge difference.

There was other useful advice too. The browser and the WebView use slightly different rendering techniques. In particular the WebView uses what's called off-screen rendering, which the browser doesn't make use of. Raine felt that it was quite possible the SwapChain changes I've made would affect the WebView but not the Browser. If so, getting the browser to render may be easier than I thought (although all of this will have to be fixed for the WebView as well, but that can come later).

Once I've made these changes, when running the browser I should check for the following debug output:
=============== Preparing offscreen rendering context ===============
That's an indicator that the WebView rendering approach is in use. I should therefore check that this isn't appearing for the browser.

And as a final thought, Raine noted that when initially testing the rendering, I should start with the about:blank page and move on to the about:license page since these are text-only, so will put the least strain possible on the system. Once they're working, then it makes sense to check with more complex sites.

So there you have it. Lots of really useful advice. I'm going to take a bit of time now to apply the changes and perform the checks Raine suggested. Rest assured I'll come back and tell you about the results!

First I'm applying patch 0031:
$ git am --3way ../rpm/0031-sailfishos-gecko-Create-EmbedLiteCompositorBridgePar.patch
Applying: Create EmbedLiteCompositorBridgeParent in CompositorManagerParent (part 2). JB#50505
Using index info to reconstruct a base tree...
M       gfx/layers/ipc/CompositorManagerParent.cpp
Falling back to patching base and 3-way merge...
Auto-merging gfx/layers/ipc/CompositorManagerParent.cpp
Patch 0038 doesn't go quite so smoothly:
$ git am --3way ../rpm/0038-sailfishos-embedlite-egl-Fix-mesa-egl-display-and-bu.patch
Applying: Fix mesa egl display and buffer initialisation
Using index info to reconstruct a base tree...
M       gfx/gl/GLContextProviderEGL.cpp
M       gfx/gl/GLContextProviderImpl.h
M       gfx/gl/GLLibraryEGL.cpp
M       gfx/gl/GLLibraryEGL.h
Falling back to patching base and 3-way merge...
Auto-merging gfx/gl/GLLibraryEGL.h
CONFLICT (content): Merge conflict in gfx/gl/GLLibraryEGL.h
Auto-merging gfx/gl/GLLibraryEGL.cpp
CONFLICT (content): Merge conflict in gfx/gl/GLLibraryEGL.cpp
Auto-merging gfx/gl/GLContextProviderImpl.h
Auto-merging gfx/gl/GLContextProviderEGL.cpp
CONFLICT (content): Merge conflict in gfx/gl/GLContextProviderEGL.cpp
error: Failed to merge in the changes.
Patch failed at 0001 Fix mesa egl display and buffer initialisation
hint: Use 'git am --show-current-patch=diff' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
So I had to apply Patch 38 "Fix mesa egl display and buffer initialisation" manually, but as it's such a small patch it was relatively straightforward to do. Patch 39 "Do not create CreateFallbackSurface" on the other hand wouldn't apply at all. Checking it manually it appears a huge amount of gecko code related to it has been rearranged or removed entirely.

Right now I feel absolutely exhausted. That's nothing to do with gecko and more to do with a long day at work followed by a long commute. So I don't have the energy to look into Patch 39 further tonight.

Having applied as many of these patches as practically possible I've set the build to run overnight again. Feels like old times.

If you'd like to read more about all this gecko stuff, do take a look at my full Gecko Dev Diary.


Uncover Disqus comments