flypig.co.uk

Gecko-dev Diary

Starting in August 2023 I'll be upgrading the Sailfish OS browser from Gecko version ESR 78 to ESR 91. This page catalogues my progress.

Latest code changes are in the gecko-dev sailfishos-esr91 branch.

There is an index of all posts in case you want to jump to a particular day.

Gecko RSS feed Click the icon for the Gecko-dev Diary RSS feed.

Gecko

5 most recent items

31 May 2024 : Day 249 #
Overnight I've been running a build. For the last few days — including last night — I've been on holiday lodging in a hotel, which has been lovely, but sadly today is the last day which means I'll be spending today travelling. It would have been great to have the build complete overnight so that I can start my edit-rebuild-test cycle while I travel. But the build failed, which means I have some more fixing to do first.

Here's the build error.
98:13.71 In file included from Unified_cpp_gfx_gl1.cpp:2:
98:13.71 ${PROJECT}/gecko-dev/gfx/gl/SharedSurfaceEGL.cpp: In member function 
    ‘virtual bool mozilla::gl::SharedSurface_EGLImage::ReadPixels(GLint, GLint, 
    GLsizei, GLsizei, GLenum, GLenum, GLvoid*)’:
98:13.71 ${PROJECT}/gecko-dev/gfx/gl/SharedSurfaceEGL.cpp:186:38: error: ‘void 
    mozilla::gl::GLContext::raw_fDeleteFramebuffers(GLsizei, const GLuint*)’ is 
    private within this context
98:13.71    gl->raw_fDeleteFramebuffers(1, &fbo);
98:13.71                                       ^
The problem here is that I moved raw_fDeleteFramebuffers() into the private block of the class definition it's part of. It ought to be in the private block, but there's obviously an attempt being made in SharedSurfaceEGL.cpp to call it from another class.

My suspicion is that this is part of the debug code I added. It looks like this is being used by a ReadPixels() method I added for the purposes of extracting pixel colours:
bool SharedSurface_EGLImage::ReadPixels(GLint x, GLint y, GLsizei width, 
    GLsizei height,
                        GLenum format, GLenum type, GLvoid* pixels) {
  const auto& gl = GLContextEGL::Cast(mDesc.gl);

  // See https://stackoverflow.com/a/53993894
  GLuint fbo;
  gl->fGenFramebuffers(1, &fbo);
  gl->raw_fBindFramebuffer(LOCAL_GL_FRAMEBUFFER, fbo);
  gl->fFramebufferTexture2D(LOCAL_GL_FRAMEBUFFER, LOCAL_GL_COLOR_ATTACHMENT0, 
    LOCAL_GL_TEXTURE_2D, mProdTex, 0);

  gl->raw_fReadPixels(x, y, width, height, format, type, pixels);

  gl->raw_fBindFramebuffer(LOCAL_GL_FRAMEBUFFER, 0);
  gl->raw_fDeleteFramebuffers(1, &fbo);

  return true;
}
There's a little nuance here though. This SharedSurface_EGLImage::ReadPixels() method is called from the GLScreenBuffer::ReadPixels() method, like so:
bool GLScreenBuffer::ReadPixels(GLint x, GLint y, GLsizei width, GLsizei height,
                                GLenum format, GLenum type, GLvoid* pixels) {
  // If the currently bound framebuffer is backed by a SharedSurface
  // then it might want to override how we read pixel data from it.
  // This is normally only the default framebuffer, but we can also
  // have SharedSurfaces bound to other framebuffers when doing
  // readback for BasicLayers.
  SharedSurface* surf;
  if (GetReadFB() == 0) {
    surf = SharedSurf();
  } else {
    surf = mGL->mFBOMapping[GetReadFB()];
  }
  if (surf) {
    return surf->ReadPixels(x, y, width, height, format, type, pixels);
  }

  return false;
}
This method exists in the GLScreenBuffer class of ESR 78 as well, so it's initially surprising that this is happy to call surf->ReadPixels() given SharedSurface_EGLImage::ReadPixels() doesn't exist.

Looking in to this more deeply, the reason this doesn't cause a problem in ESR 78 appears to be because the SharedSurface_EGLImage version of the method is an override. So if I were to remove it, the inherited version of the method would be called instead:
  virtual bool ReadPixels(GLint x, GLint y, GLsizei width, GLsizei height,
                          GLenum format, GLenum type, GLvoid* pixels) {
    return false;
  }
Making this change would allow things to compile, but it's not going to be adding a great deal of value. So I could just get rid of the SharedSurface_EGLImage version of the call and rely on this default implementation. But I can't at the moment see where the GLScreenBuffer version of the call is used so it seems to me it would be better to remove the GLScreenBuffer variant of the method as well.

Which is what I've done. If this method is being used it'll cause an error during compilation, which will slow things down, but will at least make clear where the caller lives. And if it goes through then we've managed to find a nice simplification, which is after all the purpose of this effort.

I've set the build off, but while it's running it might be a good idea to check in case something that consumes libxul.so is expecting access to this ReadPixels() method. One of the reasons I'm wondering is that there are certainly situations in which the texture buffer is read, such as when the images used for the tab previews and bookmark icons are generated. So it's possible it's a method we need to keep.

In QtMozEmbed there is something similar, but it calls GLES directly, rather than via libxul.so:
QImage gl_read_framebuffer(const QRect &rect)
{
    QSize size = rect.size();
    int x = rect.x();
    int y = rect.y();

    while (glGetError());

    QImage img(size, QImage::Format_RGB32);
    GLint fmt = GL_BGRA_EXT;
    glReadPixels(x, y, size.width(), size.height(), fmt, GL_UNSIGNED_BYTE, 
    img.bits());
    if (!glGetError())
        return img.mirrored();

    QImage rgbaImage(size, QImage::Format_RGBX8888);
    glReadPixels(x, y, size.width(), size.height(), GL_RGBA, GL_UNSIGNED_BYTE, 
    rgbaImage.bits());
    if (!glGetError())
        return rgbaImage.mirrored();
    return QImage();
}
There's also something similar in the sailfish-browser code. However, looking carefully at this shows that it's debug code I added myself. It's not needed for the final app and in fact, it's about time I removed it.
$ git diff
diff --git a/apps/core/declarativewebcontainer.cpp b/apps/core/
    declarativewebcontainer.cpp
index 60d5327c..1e2d63ce 100644
--- a/apps/core/declarativewebcontainer.cpp
+++ b/apps/core/declarativewebcontainer.cpp
@@ -694,9 +694,28 @@ void DeclarativeWebContainer::clearWindowSurface()
     QOpenGLFunctions_ES2* funcs = 
    m_context->versionFunctions<QOpenGLFunctions_ES2>();
     Q_ASSERT(funcs);
 
-    funcs->glClearColor(1.0, 1.0, 1.0, 0.0);
+    funcs->glClearColor(0.0, 1.0, 0.0, 0.0);
     funcs->glClear(GL_COLOR_BUFFER_BIT);
     m_context->swapBuffers(this);
+
+    QSize screenSize = QGuiApplication::primaryScreen()->size();
+    size_t bufferSize = screenSize.width() * screenSize.height() * 4;
+    uint8_t* buf = static_cast<uint8_t*>(calloc(sizeof(uint8_t), bufferSize));
+
+    funcs->glReadPixels(0, 0, screenSize.width(), screenSize.height(),
+                            GL_RGBA, GL_UNSIGNED_BYTE, buf);
+
+    int xpos = screenSize.width() / 2;
+    int ypos = screenSize.height() / 2;
+    int pos = xpos * ypos * 4;
+
+    volatile char red = buf[pos];
+    volatile char green = buf[pos + 1];
+    volatile char blue = buf[pos + 2];
+    volatile char alpha = buf[pos + 3];
+
+    printf(&quot;Colour: (%d, %d, %d, %d)\n&quot;, red, green, blue, alpha);
+    free(buf);
 }
 
 void DeclarativeWebContainer::dumpPages() const
Thankfully these debug changes are all neatly encapsulated and caught by git, so removing them is as simple as performing a git checkout.
$ git checkout apps/core/declarativewebcontainer.cpp
Updated 1 path from the index
$ git diff
$ 
I don't dare run a sailfish-browser build while the gecko build is already running (maybe it's safe... but I'm not certain). So I've lodged the fact I need to in my mental task queue. There were also some changes to embedlite-components and sailfish-components-webview which I've removed for the former and will need to commit for the latter. They're not related to ReadPixels() but I'm making another mental note here to rebuild and reinstall them later once the gecko build has completed.

Well, the build is continuing so that's it for today. More tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
30 May 2024 : Day 248 #
I'm tidying things up. If you've been following previous posts you'll know there was some visual celebration provided by artist-in-residence thigg when the rendering started working, but now things need to be put in order and pushed to the repository. Given this I need to get the code into a fit state, which means removing debug code, refactoring the changes, making sure everything still works correctly.

In other words, it's time to clean things up. And thigg has created a rather apt image to illustrate this process as well. It's not all glamorous you know!  
Tidying up after the party! A pig with wings in a room, detritus strewn everywhere including phones, cake half drunk glasses of unknown liquid and wires. A fox lies on the floor asleep.

My first clean-up task is a bit of an odd one. In some of the debug output I've pasted on previous days you'll have seen lines that look a bit like this:
frame014.data: Colour before: (8, 0, 152, 255), 1
frame015.data: Colour before: (1, 0, 255, 255), 1
frame016.data: Colour before: (223, 5, 0, 255), 1
frame017.data: Colour before: (71, 94, 162, 255), 1
You may recall that I added these in to test the colour at specific pixels of the texture used for offscreen rendering. I also introduced some code to export to disc images from the surface as textures. These produced some rubbish looking images which neither I nor the community ever successfully figured out (I now suspect they were just memory artefacts).

For example, I added a bunch of code into the GLScreenBuffer::Swap() method that looks like this:
    result = ReadPixels(0, 0, size.width, size.height, LOCAL_GL_RGB, 
    LOCAL_GL_UNSIGNED_BYTE, buf);
Right now this and all the related code can still be found in the source tree, not just in the ESR 91 code, but in the ESR 78 code as well. None of it is needed for a working browser and it's outlived its usefulness, so I'll need to remove it before pushing the latest changes to remote. All fine. But can I recall which specific parts are needed and which can be removed? No, no I can't.

I have a solution though. If I compare the changes with the original ESR 78 code it should become clear which bits I've added for debugging purposes and which are needed for offscreen rendering. But that means I first have to remove the debug code from ESR 78 so I have something to compare against.

Removing the code from ESR 78 is possible but it's also harder than it should be because it's now mixed in with all of the changes made by the patches. I didn't apply these to the ESR 78 as commits, but rather let the build processes apply them automatically, so they've been applied as a bunch of local changes, all mixed together.

To get things back to where I need them to be I need to reset the ESR 78 project folder to a clean state, then apply all the patches again. This will mess up the build state (meaning that in order to a build I'll have to build everything rather than just what's changed), but that's okay because in general I don't need to run ESR 78 builds very often. Let's hope it stays that way.

There are a few changes I made to the spec file that I want to keep, but everything else can be cleaned out.
$ cd gecko-dev-project/gecko-dev/gecko-dev
$ git reset --hard
$ git clean -xdf
$ cd ..
$ git stash
$ sfdk apply
$ git stash pop
The penultimate
apply
step applies the patches. After this I now have a nice clean ESR 78 source tree to compare my ESR 91 code against.

This has allowed me to remove a bunch of debug code from the GLContext.h file of ESR 91.

Now I'm wondering whether I can fold the GLScreenBuffer class into SwapChain. There's definitely some overlap, but also quite a bit of divergence too. Below is an annotated listing of the SwapChain header. Lines prefixed with a "+" symbol also appear in GLScreenBuffer. Those prefixed with a "-" appear only in SwapChain.
class SwapChain final {
  friend class SwapChainPresenter;

 public:
+  UniquePtr<SurfaceFactory> mFactory;
-  bool mPreserve = false;
-  std::shared_ptr<SharedSurface> mPrevFrontBuffer;

 private:
-  std::queue<std::shared_ptr<SharedSurface>> mPool;
-  std::shared_ptr<SharedSurface> mFrontBuffer;
-  SwapChainPresenter* mPresenter = nullptr;

 public:
+  SwapChain();
+  virtual ~SwapChain();

-  void ClearPool();
-  const auto& FrontBuffer() const { return mFrontBuffer; }
-  UniquePtr<SwapChainPresenter> Acquire(const gfx::IntSize&);

+  const gfx::IntSize& Size() const;
-  const gfx::IntSize& OffscreenSize() const;
+  bool Resize(const gfx::IntSize & size);
+  bool PublishFrame(const gfx::IntSize& size) { return Swap(size); }
+  void Morph(UniquePtr<SurfaceFactory> newFactory);

private:
+  bool Swap(const gfx::IntSize& size);
};
As well as these there are also a large number of methods in GLScreenBuffer which don't appear in SwapChain. These are less critical though, because as additions there won't be any attempt to make use of them in the existing code.

If I were to merge GLScreenBuffer into SwapChain I'd need to create an implementations for all of the lines indicated with a "-" in the class definition above. They can be dummy implementations (or make use of the existing implementations from the SwapChain class), so this process doesn't have to be complicated, but it would need to be done.

The fact that SwapChain is marked as final means that I'm not able to inherit from it. I could remove the final specifier of course, but that may not be a great idea. Some other parts of the code might be making assumptions based on it.

I think I'm going to have a go at performing this merge, but as it happens I'm not able to make these changes straight away anyway. That's because I need to run a full rebuild. I tried running a partial build and its come back complaining that the build config has changed:
$ sfdk engine exec
$ sb2 -t SailfishOS-esr78-aarch64.default
$ source `pwd`/obj-build-mer-qt-xr/rpm-shared.env
$ make -j1 -C obj-build-mer-qt-xr/gfx/gl/
make: Entering directory 'obj-build-mer-qt-xr/gfx/gl'
config/rules.mk:338: *** Build configuration changed. Build with |mach build| 
    or run |mach build-backend| to regenerate build config.  Stop.
make: Leaving directory 'obj-build-mer-qt-xr/gfx/gl'
For the next few days I'm planning to make small changes to the code, followed by rebuilds and tests to check things are working (a tight edit-rebuild-test cycle). To make the most of this I need the partial build working, which means running a full build now.

Well, by running the full build and watching it fail I have at least determined that the following changes are necessary:
diff --git a/gfx/gl/GLContext.h b/gfx/gl/GLContext.h
index 863414aaeb44..0991afbf045c 100644
--- a/gfx/gl/GLContext.h
+++ b/gfx/gl/GLContext.h
@@ -37,3 +38,2 @@
 #include &quot;GLDefs.h&quot;
-#include &quot;GLScreenBuffer.h&quot;
 #include &quot;GLTypes.h&quot;
@@ -42,2 +42,3 @@
 #include &quot;mozilla/WeakPtr.h&quot;
+#include &quot;SurfaceTypes.h&quot;
 
@@ -61,2 +62,5 @@ class GLReadTexImageHelper;
 struct SymLoadStruct;
+class SwapChain;
+class SurfaceCaps;
+class GLScreenBuffer;
 }  // namespace gl
Having restored them, the build is off again. Now it's back to the full build again; until this is done, there's not much point making any more changes to the code today.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
29 May 2024 : Day 247 #
It's a bright new morning and I'm in a good mood! I've been working on the offscreen rendering pipeline since Day 159. That's a total of 87 days. It took me 28 days to get printing working; 33 days to get the browser rending working and 45 days to get the initial build to complete. So 87 days is the longest stretch of singly-directed work so far. It amounts to approximately three and a half weeks of full-time work.

Technically it's not all there yet and my task for the next few days is clear: I need to get my commits in order. I've made a large number of changes to get this working and my plan is to split them into two commits, which will eventually become two patches on top of the upstream Gecko code.

The first commit will be an equivalent to patch 0044, reverting the upstream D5005 changeset (Bug 1445570), but in a simplified form to make future maintenance easier.

The second wil lbe a patch to reintroduced GLScreenBuffer and essentially reverse the changes made in upstream changeset D75055 (Bug 1632249).

The second patch is going to be the hardest to get into shape because I had to make a lot of changes which I supsect can be simplified quite considerably, but I'm not certain. Calling the changes a reversel of D75055 is unfortunately oversimplifying things somehwat. It has the feel of a quite carefully balance stack of Jenga blocks, where removing a single piece might result in the entire stack coming tumblng down (which, for the avoidance of doubt, is a metaphor for the rendering not working again).

Unlike Jenga I have the benefit of git which will restack the blocks to a working position in an instant if things go awry. Nevertheless I want to get the lightest and neatest structure I can: the fewer changes in the patch now, the less maintenance will be needed in future.

That's all to come. Today I'm just going to worry about the first, much simpler, patch.

Having simplified it as much as possible, what are we left with? Well, here's the summary of the changes as provided by git:
 dom/base/nsDOMWindowUtils.cpp    |   1 -
 layout/base/nsDocumentViewer.cpp |   8 ++
 layout/base/nsPresContext.cpp    | 142 ++++++++++++++++++-------------
 layout/base/nsPresContext.h      |  33 +++++++
 layout/base/nsRefreshDriver.cpp  |   6 --
 5 files changed, 125 insertions(+), 65 deletions(-)
Let's compare that to the chnages from patch 0044:
 dom/base/nsDOMWindowUtils.cpp                 |   1 -
 layout/base/nsDocumentViewer.cpp              |   8 ++
 layout/base/nsPresContext.cpp                 | 132 ++++++++++--------
 layout/base/nsPresContext.h                   |  32 ++++-
 layout/base/nsRefreshDriver.cpp               |  43 ++----
 layout/base/nsRefreshDriver.h                 |   7 +-
 .../test/test_restyles_in_smil_animation.html |   4 +-
 7 files changed, 129 insertions(+), 98 deletions(-)
Well, it's a simplification. Not a huge simplification (190 changes vs. 227 changes), but a simplification nonetheless. More importantly, it's done.

With the easy patch dealt with it's now time to turn to the complex patch. Here's the current local commit history.
$ git log --oneline -5
555957f3a8ee (HEAD -> FIREFOX_ESR_91_9_X_RELBRANCH_patches) Restore 
    NotifyDidPaint event and timers
bc575f7ac075 Prevent errors from DownloadPrompter
06b15998d179 Enable dconf
95276b73d273 Restore GLScreenBuffer and TextureImageEGL
c6ea49286566 (origin/FIREFOX_ESR_91_9_X_RELBRANCH_patches) Disable SessionStore 
    functionality
I want to rearrange these so that commit 95276b73d273 is the most recent. That particular commit is the really big change made to get the renderer back working, as we can see from the change stats:
 dom/base/nsGlobalWindowOuter.cpp              |   6 +
 gfx/gl/AndroidSurfaceTexture.cpp              |   2 +-
 gfx/gl/GLContext.cpp                          | 217 +++++-
 gfx/gl/GLContext.h                            |  93 ++-
 gfx/gl/GLContextEGL.h                         |  10 +-
 gfx/gl/GLContextProviderEGL.cpp               | 284 +++++++-
 gfx/gl/GLContextProviderImpl.h                |  26 +
 gfx/gl/GLContextTypes.h                       |  11 +
 gfx/gl/GLLibraryEGL.cpp                       |  13 +-
 gfx/gl/GLLibraryEGL.h                         |   8 +-
 gfx/gl/GLScreenBuffer.cpp                     | 646 ++++++++++++++++++
 gfx/gl/GLScreenBuffer.h                       | 172 ++++-
 gfx/gl/GLTextureImage.cpp                     |  38 +-
 gfx/gl/SharedSurface.cpp                      | 321 ++++++++-
 gfx/gl/SharedSurface.h                        | 139 +++-
 gfx/gl/SharedSurfaceEGL.cpp                   | 113 ++-
 gfx/gl/SharedSurfaceEGL.h                     |  44 +-
 gfx/gl/SharedSurfaceGL.cpp                    | 152 ++++-
 gfx/gl/SharedSurfaceGL.h                      |  98 ++-
 gfx/gl/SurfaceTypes.h                         |  60 ++
 gfx/gl/TextureImageEGL.cpp                    | 221 ++++++
 gfx/gl/TextureImageEGL.h                      |  80 +++
 gfx/gl/moz.build                              |   1 +
 gfx/layers/CompositorTypes.h                  |   1 +
 gfx/layers/client/TextureClient.cpp           |  42 +-
 gfx/layers/client/TextureClient.h             |   3 +
 .../client/TextureClientSharedSurface.cpp     |  50 ++
 .../client/TextureClientSharedSurface.h       |  25 +
 gfx/layers/ipc/CompositorVsyncScheduler.cpp   |   2 -
 gfx/layers/opengl/CompositorOGL.cpp           |  35 +-
 gfx/thebes/gfxPlatform.cpp                    |   2 +-
 31 files changed, 2776 insertions(+), 139 deletions(-)
I'm anticipating that a lot of this will remain necessary, but if it's possible to cut it down, anything that can be removed will be a bonus. First though to rearrange the patches using an interactive rebase. Thankfully none of the commits overlap with these changes, so the rebase completes without a hitch. Here's how things look now.
$ git log --oneline -5
c3c263a98d03 (HEAD -> FIREFOX_ESR_91_9_X_RELBRANCH_patches) Restore 
    GLScreenBuffer and TextureImageEGL
d3ba4df29a32 Restore NotifyDidPaint event and timers
f55057391ac0 Prevent errors from DownloadPrompter
eab04b8c0d80 Enable dconf
c6ea49286566 (origin/FIREFOX_ESR_91_9_X_RELBRANCH_patches) Disable SessionStore 
    functionality
After carefully looking through the code I'm confident it could benefit from some changes. For example I'm fairly sure I added a bunch of code just for debugging purpose, such as when I was extracting images from the textures to help with debugging at various points. I should remove all that code.

Plus I suspect a bunch of the other methods were also added simply to align with the previous ESR 78 implementation. That doesn't mean they're actually needed and, in fact, some methods may not even get used at all. It's a hazard of the incremental approach I took.

The following lists the new methods that I added over the last few months. My plan is to add breakpoints to them, run the code and see which ones are hit. Any that aren't hit will then be candidates for removal.
GLContext::GuaranteeResolve()
GLContext::OffscreenSize()
GLContext::CreateScreenBuffer()
GLContext::CreateScreenBufferImpl()
GLContext::fBindFramebuffer()
GLContext::InitOffscreen()
CreateTexture()
CreateTextureForOffscreen()
GLContext::fBindFramebuffer()
GLContext::raw_fBindFramebuffer()
CreateTextureForOffscreen()
CreateTexture()
DefaultEglLibrary()
GLContextProviderEGL::CreateOffscreen()
GLScreenBuffer::Create()
GLScreenBuffer::CreateFactory()
GLScreenBuffer::CreateFactory()
GLScreenBuffer::GLScreenBuffer()
GLScreenBuffer::~GLScreenBuffer()
GLScreenBuffer::BindFB()
GLScreenBuffer::BindDrawFB()
GLScreenBuffer::BindReadFB()
GLScreenBuffer::BindReadFB_Internal()
GLScreenBuffer::GetDrawFB()
GLScreenBuffer::GetReadFB()
GLScreenBuffer::GetFB()
GLScreenBuffer::CopyTexImage2D()
GLScreenBuffer::ReadPixels()
GLScreenBuffer::Morph()
GLScreenBuffer::Attach()
GLScreenBuffer::Swap()
GLScreenBuffer::Resize()
GLScreenBuffer::CreateRead()
CreateRenderbuffersForOffscreen()
ReadBuffer::Create()
ReadBuffer::~ReadBuffer()
ReadBuffer::Attach()
ReadBuffer::Size()
SharedSurface::ProdCopy()
SharedSurface::SharedSurface()
SharedSurface::GetTextureFlags()
ChooseBufferBits()
SurfaceFactory::SurfaceFactory()
SurfaceFactory::~SurfaceFactory()
SurfaceFactory::NewTexClient()
SurfaceFactory::StartRecycling()
SurfaceFactory::StopRecycling()
SurfaceFactory::RecycleCallback()
SurfaceFactory::Recycle()
SharedSurface_EGLImage::SharedSurface_EGLImage()
SharedSurface_EGLImage::ReadPixels()
SurfaceFactory_Basic::SurfaceFactory_Basic()
SharedSurface_Basic::Wrap()
SharedSurface_Basic::SharedSurface_Basic()
SharedSurface_Basic::~SharedSurface_Basic()
SharedSurface_GLTexture::Create()
SharedSurface_GLTexture::~SharedSurface_GLTexture()
SharedSurface_GLTexture::ProducerReleaseImpl()
SharedSurface_GLTexture::ToSurfaceDescriptor()
GLFormatForImage()
GLTypeForImage()
TextureImageEGL::TextureImageEGL()
TextureImageEGL::~TextureImageEGL()
TextureImageEGL::DirectUpdate()
TextureImageEGL::BindTexture()
TextureImageEGL::Resize()
TextureImageEGL::BindTexImage()
TextureImageEGL::ReleaseTexImage()
TextureImageEGL::DestroyEGLSurface()
CreateTextureImageEGL()
TileGenFuncEGL()
SharedSurfaceTextureClient::SharedSurfaceTextureClient()
SharedSurfaceTextureClient::Create()
SharedSurfaceTextureClient::~SharedSurfaceTextureClient()
There are also a few datastructures which I should consider removing as well:
struct GLFormats
struct SurfaceCaps
enum class AttachmentType
An important part of these changes involves adding back the GLScreenBuffer method in to the code. But when the upstream change was made to remove this it was replaced with a class called SwapChain. So as part of all these changes I also want to see whether I can roll GLScreenBuffer into SwapChain. My gut tells me this would be a neat simplification if it's possible.

There are some changes that go along with this. I'm keeping a note of it here so I don't lose track of the things to consider.
 
  • Combine SwapChain with GLScreenBuffer.
  • Rename all GLContext::mScreen usage to GLContext::mSwapChain.
  • Switch calls to Screen() with calls to GetSwapChain().


In addition to all of the changes above there are also some changes I need to take more care over. The following methods were added in order to accommodate Wayland rendering and so might be important for getting the browser working on native ports. However, in truth while related to offscreen rendering, these changes aren't directly required for it and I'm not able to test these changes myself. I therefore need to take care about whether these are really needed or not.
GetAppDisplay
_wl_compositor_create_surface
_wl_surface_destroy
LoadWaylandFunctions
UnloadWaylandFunctions
WaylandGLSurface::WaylandGLSurface
WaylandGLSurface::~WaylandGLSurface
CreateEmulatorBufferSurface
Finally many of these changes may be conditional on whether they're related to the Sailfish OS build or not. In this case I should consider wrapping some of them in a MOZ_WIDGET_QT precompiler conditional.

Phew, that's a lot to get through. But there is some kind of plan here and in practice I'm hoping it won't take as long to get through all of this as the lists above might suggest.

But this is enough for today. I'll start to look in to all these changes tomorrow and hopefully will immediately be able to start slimming down the changes.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
28 May 2024 : Day 246 #
Hooray! After a lot of to-ing and fro-ing, I now have a successful build and a bunch of fresh packages. Now to test them. I've transferred them over to my phone, installed the packages and set the harbour-webview test application running.

After my epic train journey building these packages yesterday, and in the hope that these packages might work, thigg, AI artist in residence, has kindly created a picture of the aftermath. I think this captures the vibe rather nicely! Thank you thigg as always for your brilliant work.
 
A train carriage with no people in with a USB cable abandoned on one of the seats; the ambience is rather lonely.

Astonishingly, after all this when I execute my harbour-webview test app using the new packages I still just get a blank screen. That's okay though. Presumably it means the NotifyDidPaintForSubtree events still aren't being called. I now have all the structure for that to happen, so I'll just have to do a bit of debugging to find out why. That, at least, is the plan on paper.

So let's get stuck in with the debugger.
(gdb) info break
Num     Type           Address            What
1       0x0000007ff2c81914 in nsRootPresContext::EnsureEventualDidPaintEvent(
    BaseTransactionId<TransactionIdType>) at nsPresContext.cpp:2689
        breakpoint already hit 4 times
2       0x0000007ff7ee8ff0 in QMozViewPrivate::OnFirstPaint(int, int) at 
    qmozview_p.cpp:1122
        breakpoint already hit 1 time
3       <MULTIPLE>         
        breakpoint already hit 2 times
3.1     0x0000007ff367468c in mozilla::embedlite::EmbedLiteViewChild::
    OnFirstPaint(int, int) at EmbedLiteViewChild.cpp:1479
3.2     0x0000007ff3674774 EmbedLiteViewChild.h:60
(gdb) c
[...]
Thread 7 &quot;GeckoWorkerThre&quot; hit Breakpoint 1, nsRootPresContext::
    EnsureEventualDidPaintEvent (this=0x7fc5112ba0, 
    aTransactionId=aTransactionId@entry=...)
    at nsPresContext.cpp:2689
2689    {
(gdb) c
Continuing.

Thread 7 &quot;GeckoWorkerThre&quot; hit Breakpoint 3, non-virtual thunk to 
    mozilla::embedlite::EmbedLiteViewChild::OnFirstPaint(int, int) ()
    at EmbedLiteViewChild.h:60
60        NS_DECL_NSIEMBEDBROWSERCHROMELISTENER
(gdb) c
Continuing.

Thread 7 &quot;GeckoWorkerThre&quot; hit Breakpoint 3, mozilla::embedlite::
    EmbedLiteViewChild::OnFirstPaint (this=0x7fc4ae79c0, aX=0, aY=0)
    at EmbedLiteViewChild.cpp:1479
1479    {
(gdb) c
Continuing.

Thread 1 &quot;harbour-webview&quot; hit Breakpoint 2, QMozViewPrivate::
    OnFirstPaint (this=0x55557db0f0, aX=0, aY=0) at qmozview_p.cpp:1122
1122        mIsPainted = true;
(gdb) c
[...]
As you can see from this all of the expected methods are being called. So it's a bit strange that it's not rendering anything. And in fact... the odd thing is that now, with the debugger running, it is rendering something. If I clear the breakpoints, the render appears just fine.

And if I restart the app without the debugger it's now working fine as well. The page is being rendered correctly to the screen. Weird. So although it didn't work the first time, it is now working correctly.

There are a bunch of reasons why this might happen, such as a profile folder that needed refreshing or an executable that hadn't quite been properly quit. So I'll put it down to an anomaly. Whatever the reason It looks like the WebView is now working correctly.

It feels like this justifies a bit of a celebration and thankfully thigg is on hand once again to provide us with the content needed for just that. After the rather sombre image above, this feels a lot more upbeat and I really do like it. Finally, the Firefox has yielded its rendering riches!
 
A medieval style block-carved image, mostly black and white, with a flying pig on the left, its front hoofs on a golden stool, holding a mobile phone. On the right an orange fox sits on a throne holding a sword upright with flames on its head (a Firefox!)

Thank you again thigg for your amazing work. I'd happily continue the celebrations, but in practice there's more work to be done. Although that's all for the harbour-webview test app it's time now to try a proper app that uses the WebView to render. The obvious example of this is the email application, but I don't have an email account set up on this phone as I only use it for development. I'll set up an email account to check this in due course, but I'm looking for a quick test and entering my account details will take some time.

The good news is that the Settings app also uses the WebView, specifically in the account creation pages, depending on the account type. Google, VK, DropBox, OneDrive and Twitter all use a WebView to show Terms and Conditions pages, and in the case of Twitter it's used for OAuth configuration as well.

But when I try to use any of them the Settings app crashes. Here's the backtrace I get when the crash occurs:
Thread 10 &quot;GeckoWorkerThre&quot; received signal SIGSEGV, Segmentation 
    fault.
[Switching to LWP 2476]
0x0000007fc4fe8fdc in mozilla::embedlite::PuppetWidgetBase::Invalidate (
    this=0x7fb4e698b0, aRect=...)
    at mobile/sailfishos/embedshared/PuppetWidgetBase.cpp:274
274         MOZ_CRASH(&quot;Unexpected layer manager type&quot;);
(gdb) bt
#0  0x0000007fc4fe8fdc in mozilla::embedlite::PuppetWidgetBase::Invalidate (
    this=0x7fb4e698b0, aRect=...)
    at mobile/sailfishos/embedshared/PuppetWidgetBase.cpp:274
#1  0x0000007fc4fed870 in mozilla::embedlite::PuppetWidgetBase::UpdateBounds (
    this=0x7fb4e698b0, aRepaint=aRepaint@entry=true)
    at mobile/sailfishos/embedshared/PuppetWidgetBase.cpp:395
#2  0x0000007fc4ff6a5c in mozilla::embedlite::EmbedLiteWindowChild::
    CreateWidget (this=0x7fb4e19370)
    at xpcom/base/nsCOMPtr.h:851
#3  0x0000007fc4fe6fc8 in mozilla::detail::RunnableMethodArguments<>::
    applyImpl<mozilla::embedlite::EmbedLiteWindowChild, void (mozilla::
    embedlite::EmbedLiteWindowChild::*)()>(mozilla::embedlite::
    EmbedLiteWindowChild*, void (mozilla::embedlite::EmbedLiteWindowChild::*)(
    ), mozilla::Tuple<>&, std::integer_sequence<unsigned long>) (args=..., 
    m=<optimized out>, o=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1151
#4  mozilla::detail::RunnableMethodArguments<>::apply<mozilla::embedlite::
    EmbedLiteWindowChild, void (mozilla::embedlite::EmbedLiteWindowChild::*)()> 
    (
    m=<optimized out>, o=<optimized out>, this=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1154
#5  mozilla::detail::RunnableMethodImpl<mozilla::embedlite::
    EmbedLiteWindowChild*, void (mozilla::embedlite::EmbedLiteWindowChild::*)(
    ), true, (mozilla::RunnableKind)1>::Run (this=<optimized out>)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1201
#6  0x0000007fc214e63c in mozilla::RunnableTask::Run (this=0x55562b6ce0)
    at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313
[...]
#28 0x0000007ff6f4289c in ?? () from /lib64/libc.so.6
(gdb) 
And here's where that error is coming from in PuppetWidgetBase.cpp according to this backtrace:
  if (mozilla::layers::LayersBackend::LAYERS_CLIENT == lm->GetBackendType()) {
    // No need to do anything, the compositor will handle drawing
  } else {
    MOZ_CRASH(&quot;Unexpected layer manager type&quot;);
  }
As we can see from this, it's an enforced crash which happens whenever the layer backend is set to anything other than LAYERS_CLIENT. According to the debugger, ours is set to LAYERS_WR, so it's no wonder the crash is happening:
(gdb) p lm->GetBackendType()
$1 = mozilla::layers::LayersBackend::LAYERS_WR
(gdb) 
The interesting thing is that this isn't the first time we've hit this error. In fact, it's not the second time either.

The first time was back on Day 50 where the problem followed from the fact that gfx::gfxVars::UseWebRender() was set to true when it was expected to be set to false.

The second time was on Day 160 when the problem also amounted to the WebView layer manager being used. It took a couple of days of investigation (I eventually figured it out on Day 163) to figure out the reason, but it turned out to be because the gfx.webrender.force-disabled preference was set to false when it should have been set to true.

This preference is managed by the pref.js file inside the .mozilla profile folder. This folder is stored in different places for different apps. In the case of the Settings app the profile folder looks like this:
$ ls -a /home/defaultuser/.cache/org.sailfishos/Settings/
.         ..        .mozilla
$ ls -a /home/defaultuser/.cache/org.sailfishos/Settings/.mozilla/
.                   .parentlock         cert9.db            cookies.sqlite-wal  
    pkcs11.txt          startupCache        storage.sqlite
..                  cache2              cookies.sqlite      key4.db             
    security_state      storage             ua-update.json
$ rm -rf /home/defaultuser/.cache/org.sailfishos/Settings/.mozilla
It's notable that there is no pref.js file to be found here. That contrasts with the harbour-webview configuration file, which not only exists, but which also includes a setting for gfx.webrender.force-disabled:
$ pwd
/home/defaultuser/.cache/harbour-webview/harbour-webview/.mozilla
$ grep &quot;gfx\.webrender\.force-disabled&quot; prefs.js 
user_pref(&quot;gfx.webrender.force-disabled&quot;, true);
But as we noted, there's no equivalent settings file for the Settings app:
$ pwd
/home/defaultuser/.cache/org.sailfishos/Settings/.mozilla
$ grep &quot;gfx\.webrender\.force-disabled&quot; prefs.js 
grep: prefs.js: No such file or directory
There's obviously something to fix here, but right now I just want to test the rendering. So to move things forwards I'm going to copy over the preferences file from the harbour-webview app and put it in the configuration folder for the Settings app.
$ cp ../../../harbour-webview/harbour-webview/.mozilla/prefs.js .
$ ls
cache2              cookies.sqlite      key4.db             prefs.js            
    startupCache        storage.sqlite
cert9.db            cookies.sqlite-wal  pkcs11.txt          security_state      
    storage             ua-update.json
$ grep &quot;gfx\.webrender\.force-disabled&quot; prefs.js 
user_pref(&quot;gfx.webrender.force-disabled&quot;, true);
After copying this file over and re-starting the Settings app I find things are now working as expected. I can happily view the Terms and Conditions pages for the different accounts and even start to enter my Twitter credentials if I want to. There is clearly a problem to solve here, but it's not directly related to rendering. The problem is that the pref.js file isn't being created when it should be.

I need to keep my problems partitioned, otherwise they become too big and never get resolved. I'll create an issue for this pref.js problem and continue concentrating on getting the rendering wrapped up.

So what is there still to wrap up on the rendering front? Well, specifically I need to now go through the code and try to simplify things, partition everything into a series of clean commits so that I can turn them into nice patches.

I've had to make a really quite huge number of changes to get this WebView offscreen rendering working. Now I have to see if it can be simplified. This is a big task and not one I'm going to start today, so that'll have to do for now. I'll pick up the next steps tomorrow.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
27 May 2024 : Day 245 #
This morning I woke up early unable to sleep. Last night I set a build running and although it's not nervousness about the build that's preventing me from sleeping, it does nevertheless provide a good opportunity for me to check the progress of the build.

Unfortunately the build that was running overnight has failed. There are errors resulting from the changes I made to nsPresContext.cpp, the first of which looks like this:
205:02.94 In file included from Unified_cpp_layout_base2.cpp:20:
205:02.94 ${PROJECT}/gecko-dev/layout/base/nsPresContext.cpp: In member 
    function ‘void nsPresContext::DetachPresShell()’:
205:02.94 ${PROJECT}/gecko-dev/layout/base/nsPresContext.cpp:839:3: error: 
    ‘thisRoot’ was not declared in this scope
205:02.94    thisRoot->CancelAllDidPaintTimers();
205:02.94    ^~~~~~~~
205:02.95 ${PROJECT}/gecko-dev/layout/base/nsPresContext.cpp:839:3: note: 
    suggested alternative: ‘IsRoot’
205:02.95    thisRoot->CancelAllDidPaintTimers();
205:02.95    ^~~~~~~~
205:02.95    IsRoot
The problem here is an attempt to use the thisRoot variable in the nsPresContext::DetachPresShell() method without it having been declared. Here's the specific line that's failing from the ESR 91 code:
  // The did-paint timer also depends on a non-null pres shell.
  thisRoot->CancelAllDidPaintTimers();
In comparison, here's what the equivalent code looks like in ESR 78:
  if (IsRoot()) {
    nsRootPresContext* thisRoot = static_cast<nsRootPresContext*>(this);

    // Have to cancel our plugin geometry timer, because the
    // callback for that depends on a non-null presshell.
    thisRoot->CancelApplyPluginGeometryTimer();

    // The did-paint timer also depends on a non-null pres shell.
    thisRoot->CancelAllDidPaintTimers();
  }
Note that both of these blocks of code are post-patch. That is, they're the code as it looks after patch 0044 (or my manual changes in the case of ESR 91) have been applied. Looking at the ESR 78 code gives an immediate idea about what the code ought to look like in ESR 91. I've made changes in line with this so that now the ESR 91 code looks like this:
  if (IsRoot()) {
    nsRootPresContext* thisRoot = static_cast<nsRootPresContext*>(this);

    // The did-paint timer also depends on a non-null pres shell.
    thisRoot->CancelAllDidPaintTimers();
  }
That should do the trick. There's another error in the build output though, which is the following:
205:03.09 ${PROJECT}/gecko-dev/layout/base/nsPresContext.cpp: At global scope:
205:03.09 ${PROJECT}/gecko-dev/layout/base/nsPresContext.cpp:2677:1: error: 
    ‘mOutstandingTransactionIdvoid’ does not name a type
205:03.09  mOutstandingTransactionIdvoid
205:03.09  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To understand this better it helps once again to compare the ESR 78 and ESR 91 code. But this time I'm going to switch it around so that we see the ESR 78 code first. Here's what it looks like there:
void
nsRootPresContext::EnsureEventualDidPaintEvent(TransactionId aTransactionId)
{
And here's what it looks like in ESR 91:
mOutstandingTransactionIdvoid
nsRootPresContext::EnsureEventualDidPaintEvent(TransactionId aTransactionId)
{
Notice the error? Honestly I don't know what's going on here but I suspect I accidentally hit Ctrl-V at some point when the cursor was in the wrong place and pasted the variable in the buffer in front of the return type.

Removing the extraneous mOutstandingTransactionIdvoid so it's restored back to the original code seems to be the correct fix here.

These are the only two errors thrown up by the compiler, so after fixing them both I've set the build running again.

Unfortunately the next attempt to build also fails. This time the compilation stages all pass without incident, it gets right up to the linking stage before throwing an undefined reference error. Apparently I've forgotten to define nsRootPresContext::Detach():
216:06.48 toolkit/library/build/libxul.so
219:52.11 opt/cross/bin/aarch64-meego-linux-gnu-ld: 
    ../../../layout/base/Unified_cpp_layout_base2.o:(
    .data.rel.ro.local._ZTV17nsRootPresContext[_ZTV17nsRootPresContext]+0x38): 
    undefined reference to `nsRootPresContext::Detach()'
219:52.11 opt/cross/bin/aarch64-meego-linux-gnu-ld: 
    libxul.so: hidden symbol `_ZN17nsRootPresContext6DetachEv' isn't defined
219:52.11 opt/cross/bin/aarch64-meego-linux-gnu-ld: 
    final link failed: bad value
219:52.11 collect2: error: ld returned 1 exit status
Sure enough the nsRootPresContext::Detach() method is defined in the class header:
class nsRootPresContext final : public nsPresContext {
 public:
  nsRootPresContext(mozilla::dom::Document* aDocument, nsPresContextType aType);
  virtual ~nsRootPresContext();
  virtual bool IsRoot() override { return true; }

  virtual void Detach() override;
[...]
But there's no code to define the body of the method in the cpp file. Hence the error. The correct method body that should have been included can be seen in the 0044 patch file:
+/* virtual */
+void nsRootPresContext::Detach() {
+  CancelAllDidPaintTimers();
+  nsPresContext::Detach();
+}
+
For some reason this never made it in to my manually patched ESR 91 code. I must have missed it.

Both of the methods called in this method — nsRootPresContext::CancelAllDidPaintTimers() and nsPresContext::Detach() — are defined and accessible, so adding the missing code shouldn't break compilation. I've added it in and kicked the build off once again. The fact it got all the way through to linking on the previous attempt will hopefully mean that there are fewer source files to be recompiled this time around and the build will complete relatively swiftly.

Having said that, for the rest of the day I'll be travelling. That means a day of switching between various buses and trains. Not ideal when it comes to a lengthy build since, while I can allow the build to continue when I'm on the train, my laptop will have to be suspended and resumed as I move from one leg of the journey to the next.

Still, the build continues, on and off, throughout the day.

[...]

It's evening now and as I sit here at my destination after a day spent travelling, the build continues to run, which means I'll have to leave it running overnight. Let's hope I wake up to a complete build and an opportunity to test the result in the morning.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment