Gecko-dev Diary
Between August 2023 and September 2024 I upgrading the Sailfish OS browser from Gecko version ESR 78 to ESR 91, writing a daily blog as I went along. This page catalogues my progress, alongside the other browser-related topics I've looked in to since.
Latest code changes are in the gecko-dev sailfishos-esr91 branch.
There is an index of all posts in case you want to jump to a particular day.
Gecko
5 most recent items
26 Mar 2024 : Day 197 #
If you've been following any of these diary entries over the last couple of weeks you'll know I've been struggling to diagnose a problem related to graphics surfaces. A serious bug prevented the graphics surface from being properly created, but as soon as that was fixed another serious issue appeared: after a short period of time using the WebView the app started to seize up, rapidly progressing to the entire phone. After a while the watchdog kicked in causing the phone to reboot itself.
This is, as a general rule, not considered ideal behaviour for an application.
Since then I've been generally debugging, monitoring and annotating the code to try to figure out what was causing the problem. As of yesterday I'd narrowed the issue down to the creation of the EGL image associated with an EGL texture. Each frame the app would create the texture, then create the image from the texture and then create a surface from that.
Skipping execution from anywhere up to the image creation and beyond would result in the seizing up happening. This led me to the EGL instructions: creating and destroying the image.
I've been looking at this code in ShareSurfaceEGL.cpp quite deeply for a couple of weeks now. And finally, narrowing down the area of consideration has finally thrown up something useful.
It turns out that while the surface destructor is called correctly and that this calls fDestroyImage() correctly, that's not all it's supposed to be doing.
All of this was stuff we checked yesterday: a call to fDestroyImage() was being called for every call to fCreateImage() except two, allowing for the front and back buffer to exist at all times.
But looking at the code today I realised there was something missing. When the image is created in SharedSurface_EGLImage::Create() it needs a texture to work with. And so we have this code:
Here is our destructor code in ESR 91:
To be clear, there's still no rendering happening to the screen, but this is nevertheless an important step forwards and I'm pretty chuffed to have noticed the missing code. In retrospect, it's something I should have noticed a lot earlier, but this goes to show both how intricate these things are, and where my limitations are as a developer. It's hard to keep all of the execution paths in my head all at the same time. As a result I'm left using these often trial-and-error based approaches to finding fixes.
It's a small victory. But it means that tomorrow I can continue on with the proper job of finding out why the render never makes it to the screen. With this resolved I'm feeling more confident again that it will be possible to get to the bottom of it.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
This is, as a general rule, not considered ideal behaviour for an application.
Since then I've been generally debugging, monitoring and annotating the code to try to figure out what was causing the problem. As of yesterday I'd narrowed the issue down to the creation of the EGL image associated with an EGL texture. Each frame the app would create the texture, then create the image from the texture and then create a surface from that.
Skipping execution from anywhere up to the image creation and beyond would result in the seizing up happening. This led me to the EGL instructions: creating and destroying the image.
I've been looking at this code in ShareSurfaceEGL.cpp quite deeply for a couple of weeks now. And finally, narrowing down the area of consideration has finally thrown up something useful.
It turns out that while the surface destructor is called correctly and that this calls fDestroyImage() correctly, that's not all it's supposed to be doing.
All of this was stuff we checked yesterday: a call to fDestroyImage() was being called for every call to fCreateImage() except two, allowing for the front and back buffer to exist at all times.
But looking at the code today I realised there was something missing. When the image is created in SharedSurface_EGLImage::Create() it needs a texture to work with. And so we have this code:
GLuint prodTex = CreateTextureForOffscreen(prodGL, formats, size); if (!prodTex) { return ret; } EGLClientBuffer buffer = reinterpret_cast<EGLClientBuffer>(uintptr_t(prodTex)); EGLImage image = egl->fCreateImage(context, LOCAL_EGL_GL_TEXTURE_2D, buffer, nullptr);First create the texture then pass this in to the image creation routine. But while the image is deleted in the destructor, the texture is not!
Here is our destructor code in ESR 91:
SharedSurface_EGLImage::~SharedSurface_EGLImage() { const auto& gle = GLContextEGL::Cast(mDesc.gl); const auto& egl = gle->mEgl; egl->fDestroyImage(mImage); if (mSync) { // We can't call this unless we have the ext, but we will always have // the ext if we have something to destroy. egl->fDestroySync(mSync); mSync = 0; } }The image and sync are both destroyed, but the texture never is. So what happens if we add in the texture deletion? To test this I've added it in and the code now looks like this:
SharedSurface_EGLImage::~SharedSurface_EGLImage() { const auto& gle = GLContextEGL::Cast(mDesc.gl); const auto& egl = gle->mEgl; egl->fDestroyImage(mImage); if (mSync) { // We can't call this unless we have the ext, but we will always have // the ext if we have something to destroy. egl->fDestroySync(mSync); mSync = 0; } if (!mDesc.gl || !mDesc.gl->MakeCurrent()) return; mDesc.gl->fDeleteTextures(1, &mProdTex); mProdTex = 0; }And now, after building and running this new version, the app no longer seizes up!
To be clear, there's still no rendering happening to the screen, but this is nevertheless an important step forwards and I'm pretty chuffed to have noticed the missing code. In retrospect, it's something I should have noticed a lot earlier, but this goes to show both how intricate these things are, and where my limitations are as a developer. It's hard to keep all of the execution paths in my head all at the same time. As a result I'm left using these often trial-and-error based approaches to finding fixes.
It's a small victory. But it means that tomorrow I can continue on with the proper job of finding out why the render never makes it to the screen. With this resolved I'm feeling more confident again that it will be possible to get to the bottom of it.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comments
Uncover Disqus comments