flypig.co.uk

List items

Items from the current list are shown below.

Gecko

7 Nov 2023 : Day 83 #
Over the last few of days I've been struggling with the EglLibrary initialisation routines. It's been a bit of an uphill climb, but I'm glad to say that things are working out better today.

In celebration of the fact, Thigg, who is now the sailfish-gecko artist in residence, very generously created the following wonderful image capturing the actual process of gecko learning to render.
 
A gecko holding a blue colouring pencil standing next to an easel drawing a pig with wings.

But we'll come on to that. First back to the usual rundown of my day's development.

As I mentioned I've been struggling with the EglLibrary initialisation routines. These were changed upstream to remove some crucial piece of functionality. Adding this back in wasn't too much of a problem, except my progress was hindered by the application failing to start at all for two days. I got to the bottom of that and now I'm back trying to figure out how to get EGL working correctly.

From stepping through the code I can see that this line is returning false:
  bool result = lib.operator->()->Init(false, &failureId, aDisplay);
There's only one part of this Init() method I've made edits to and that's to add in the aDisplay parameter and the call to CreateDisplay() that users it. So it seems reasonable to imagine that this is what's causing the problem. The new code looks like this:
  mDefaultDisplay = CreateDisplay(forceAccel, out_failureId, aDisplay);
  if (!mDefaultDisplay.lock()) {
    return false;
  }
Stepping through the code confirms my suspicion: the condition we see here is being entered, resulting in a return value of false. That's a failure state and so guaranteed to prevent anything else from working.

However the reason it's happening isn't what I was expecting. In theory the CreateDisplay() method should be pretty simple: because we're passing in an aDisplay parameter it's supposed to create an EglDisplay with the aDisplay stored inside it and just return that. It should be pretty straightforward, but I suspected I'd made some logic error in this call.

That turned out not to be the case. The debugger confirmed that CreateDisplay() was returning a value, rather than null.

So if the mDefaultDisplay value isn't null, why is the condition above being entered into?

The reason turned out to be a misunderstanding on my part about how the std::weak_ptr pointer wrapper works. Calling lock() on this is supposed to return a usable std::shared_ptr on which I had expected the null check ought to work.

I'm not exactly sure why the logic above doesn't work, but in practice !mDefaultDisplay.lock() equates to true. Changing the code to the following, however, works as expected:
  std::shared_ptr defaultDisplay = CreateDisplay(forceAccel, out_failureId, aDisplay);
  if (!defaultDisplay) {
    return false;
  }
  mDefaultDisplay = defaultDisplay;
It's a small change, but one that makes all the difference. Making this change allows the initialisation process to complete correctly. Or at least, it looks correct to me.

But there's still no rendering. Thankfully I can now step through and try to find out what's going on.

My efforts to step through the code are hampered somewhat by the fact that when using partial builds the debug source gets out-of-sync with the code, which means the debugger will tell you which line you're on, but not the source code on the line itself. It's not the end of the world as it's possible to refer back to the source in a text editor to check the line, but it's a bit of a cumbersome approach compared to just having the debugger tell you.

Nevertheless, stepping through this way and comparing against the same steps debugging ESR 78 shows that the EGLDisplay is now correct. I can even see that the GLContextEGL::SwapBuffers() method is being called to update the display.

But there's still nothing rendering to the screen.

I do notice one difference between the two versions though. The SwapBuffers() method is identical for them both. Here's what it looks like:
bool GLContextEGL::SwapBuffers() {
  EGLSurface surface =
      mSurfaceOverride != EGL_NO_SURFACE ? mSurfaceOverride : mSurface;
  if (surface) {
    if ((mEgl->IsExtensionSupported(
             EGLExtension::EXT_swap_buffers_with_damage) ||
         mEgl->IsExtensionSupported(
             EGLExtension::KHR_swap_buffers_with_damage))) {
      std::vector rects;
      for (auto iter = mDamageRegion.RectIter(); !iter.Done(); iter.Next()) {
        const IntRect& r = iter.Get();
        rects.push_back(r.X());
        rects.push_back(r.Y());
        rects.push_back(r.Width());
        rects.push_back(r.Height());
      }
      mDamageRegion.SetEmpty();
      return mEgl->fSwapBuffersWithDamage(surface, rects.data(),
                                          rects.size() / 4);
    }
    return mEgl->fSwapBuffers(surface);
  } else {
    return false;
  }
}
Even though stepping through is a bit of a pain, I can see that on ESR 78 it's the mEgl->fSwapBuffers() branch that's being executed, whereas on ESR 91 it's the mEgl->fSwapBuffersWithDamage() branch that's used.

The reason for this is that the mEgl->mAvailableExtensions flags are set differently. Crucially, on ESR 78 the EXT_swap_buffers_with_damage flag (that's bit 19 of the bitfield) is cleared:
Thread 38 "Compositor" hit Breakpoint 1, mozilla::gl::GLContextEGL::SwapBuffers
    (this=0x7ed41a27b0)
    at gfx/gl/GLContextProviderEGL.cpp:643
643	bool GLContextEGL::SwapBuffers() {
(gdb) p mEgl.mRawPtr->mAvailableExtensions
$4 = std::bitset = {[0] = 1, [2] = 1, [3] = 1, [5] = 1, [6] = 1, [7] = 1,
                    [13] = 1, [21] = 1, [22] = 1}
(gdb) p (int)mEgl.mRawPtr->mAvailableExtensions & (1 << (19 - 1))
$5 = 0
(gdb) 
Whereas on ESR 91 it's set:
Thread 34 "Compositor" hit Breakpoint 1, mozilla::gl::GLContextEGL::SwapBuffers
    (this=0x7eb8115b30)
    at gfx/gl/GLContextProviderEGL.cpp:538
538	bool GLContextEGL::SwapBuffers() {
(gdb) p mEgl->mAvailableExtensions
$1 = std::bitset = {[0] = 1, [2] = 1, [4] = 1, [5] = 1, [6] = 1, [7] = 1,
                    [8] = 1, [11] = 1, [16] = 1, [17] = 1, [19] = 1, [20] = 1,
                    [22] = 1}
(gdb) p (int)mEgl.get()->mAvailableExtensions & (1 << ((int)EGLExtension::EXT_swap_buffers_with_damage - 1))
$2 = 262144
That may sound like a small difference, but while I've updated the browser code, one thing I've not done is update the underlying EGL implementation to offer different capabilities. So these flags ought to be the same.

Looking through the code I eventually realise that the flag has been forcefully cleared in the ESR 78 version. Which I track down to this patch (which I probably should have remembered):
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Raine Makelainen 
Date: Tue, 28 Sep 2021 09:42:30 +0300
Subject: [PATCH] [sailfishos][egl] Drop swap_buffers_with_damage extension
 support. Fixes JB#55571 OMP#JOLLA-396

At least on "Adreno (TM) 508" swapping with damage triggered a hang on the
compositor thread i.e. call never returns. Conceptually I think we should
bind eglSwapBuffersWithDamage on hybris side to the eglSwapBuffers.

Co-authored-by: David Llewellyn-Jones 
Signed-off-by: Raine Makelainen 
---
 gfx/gl/GLLibraryEGL.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gfx/gl/GLLibraryEGL.cpp b/gfx/gl/GLLibraryEGL.cpp
index 13b69f2427c1..a07b8d983ab7 100644
--- a/gfx/gl/GLLibraryEGL.cpp
+++ b/gfx/gl/GLLibraryEGL.cpp
@@ -713,7 +713,7 @@ bool GLLibraryEGL::DoEnsureInitialized(bool forceAccel,
         {(PRFuncPtr*)&mSymbols.fSwapBuffersWithDamage,
          {{"eglSwapBuffersWithDamageEXT"}}},
         END_OF_SYMBOLS};
-    if (!fnLoadSymbols(symbols)) {
+    if (!fnLoadSymbols(symbols) || true) {
       NS_ERROR(
           "EGL supports EXT_swap_buffers_with_damage without exposing its "
           "functions!");
@@ -726,7 +726,7 @@ bool GLLibraryEGL::DoEnsureInitialized(bool forceAccel,
         {(PRFuncPtr*)&mSymbols.fSwapBuffersWithDamage,
          {{"eglSwapBuffersWithDamageKHR"}}},
         END_OF_SYMBOLS};
-    if (!fnLoadSymbols(symbols)) {
+    if (!fnLoadSymbols(symbols) || true) {
       NS_ERROR(
           "EGL supports KHR_swap_buffers_with_damage without exposing its "
           "functions!");
-- 
2.31.1
The way the flags are set has completely changed in ESR 91 so I can't reapply the patch. But I can find some other way to mask out these flags. This seems like a good idea given they clearly caused problems for ESR 78. I achieve this by adding in a couple of calls like this to the Init() method:
  MarkExtensionUnsupported(EGLExtension::EXT_swap_buffers_with_damage);
  MarkExtensionUnsupported(EGLExtension::KHR_swap_buffers_with_damage);
Stepping through without the source has become quite onerous, so I've decided to do a full rebuild this time, rather than the partial rebuilds I've been doing until now. This will put the source and symbols back in sync, but it will also take a while. I'll test it as soon as it's built. [...] It's later and I finally have my new build. There are a lot of symbols in the library and it takes the debugger a minute or so to load them, so as is usual I do my first test without the debugger. And to my astonishment... it renders!  
Screenshots of the very first renders coming from ESR 91 on Sailfish OS

Honestly, I was expecting this change to be the one that made the difference. I was also expecting there to be render or layout artefacts, but as you can see from the screenshots it's not looking bad either.
 
Screenshots of the very first renders coming from ESR 91 on Sailfish OS

The external browser checker picks it up as an ESR 91 version. As you can see from the screenshot it passes slightly fewer tests than the ESR 78 version, but that's to be expected because I had to disable some capabilities (e.g. WebRTC). More significantly, touch isn't working, so there's currently no way to interact with the site. That's also not too surprising and will be much easier to look into that now that it's rendering. I'm also pretty surprised to see that even WebGL is working fine on my website.
 
Side-by-side screenshot comparisons of ESR 78 and ESR 91. The ESR 78 version scores 451 in the HTML 5 Test, whereas ESR 91 scores 440.

I'm pretty satisfied with this to be honest. It's certainly not a usable replacement for ESR 78 yet, but once touch is working it will be getting there.

This also seems like a good place to pause.

You'll recall that on Day 77 I mentioned I'd be taking a pause from gecko development while I participate in a Turing Way Book Dash. With the rendering now working, it seems like the right time for this break. I'm anticipating a gap of a couple of weeks, but if I do get any time to work on gecko in the meantime, I'll be certain to post about it. During the pause, feel free to catch up on all the previous entries on my Gecko-dev Diary page.

Comments

Uncover Disqus comments