List items
Items from the current list are shown below.
Blog
All items from April 2024
30 Apr 2024 : Day 232 #
I spent a bit of time yesterday trying to extract colour data from the SharedSurface_EGLImage object that should, as far as I understand it, be providing the raster image to be rendered to the screen. My plan has been to override the SharedSurface::ReadPixels() method — a method that currently just returns false and does nothing more — in order to make it perform the useful task of returning colour information. Since I've already set up the code to call this, once this method is working everything else should magically fall in to place.
I tried adding code into the method body to perform the task like this:
Thankfully the Stack Overflow post has an alternative approach with methods that are all shown to already be dynamically loaded in GLContext. Consequently I've updated the code to align with the suggested approach and it now looks like this:
Although all of the EGL symbols used in the code I've added exist and are technically usable, not all of them are available because of the way visibility of them is configured in GLContext. The raw_fBindFramebuffer() and raw_fDeleteFramebuffers() methods are both marked as private. I should explain that this is because these are all the raw variants of the methods. There are non-raw versions which have public visibility, but these have additional functionality I want to avoid triggering. I really do just want to use the raw versions.
To make them accessible to SharedSurface_EGLImage outside of GLContext I've therefore hacked their visibility like so:
Having made all of these changes it's time to test things out. I've built and uploaded the code, and set the WebView app running:
But I'd like to have some proper confirmation, so tomorrow I might see if I can get a full copy of the texture written out to disk so I can see what's really happening there.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
I tried adding code into the method body to perform the task like this:
bool SharedSurface_EGLImage::ReadPixels(GLint x, GLint y, GLsizei width, GLsizei height, GLenum format, GLenum type, GLvoid* pixels) { const auto& gl = GLContextEGL::Cast(mDesc.gl); gl->fBindTexture(LOCAL_GL_TEXTURE_2D, mProdTex); gl->fGetTexImage(LOCAL_GL_TEXTURE_2D, 0, format, type, pixels); return true; }But I hit a problem when the glGetTexImage() symbol, needed for the call to fGetTexImage(), wasn't available to load dynamically from the EGL library. The error kept on coming back:
Can't find symbol 'glGetTexImage'. Crash Annotation GraphicsCriticalError: |[0][GFX1-]: Failed to create EGLConfig! (t=1.32158) [GFX1-]: Failed to create EGLConfig!As I'm sure many of you will have noted, there is in fact no support for glGetTexImage() in EGL, as is explained very succinctly on Stack Overflow. So it doesn't matter how many different permutations I was going to try, this was never going to work.
Thankfully the Stack Overflow post has an alternative approach with methods that are all shown to already be dynamically loaded in GLContext. Consequently I've updated the code to align with the suggested approach and it now looks like this:
bool SharedSurface_EGLImage::ReadPixels(GLint x, GLint y, GLsizei width, GLsizei height, GLenum format, GLenum type, GLvoid* pixels) { const auto& gl = GLContextEGL::Cast(mDesc.gl); // See https://stackoverflow.com/a/53993894 GLuint fbo; gl->raw_fGenFramebuffers(1, &fbo); gl->raw_fBindFramebuffer(LOCAL_GL_FRAMEBUFFER, fbo); gl->fFramebufferTexture2D(LOCAL_GL_FRAMEBUFFER, LOCAL_GL_COLOR_ATTACHMENT0, LOCAL_GL_TEXTURE_2D, mProdTex, 0); gl->raw_fReadPixels(x, y, width, height, format, type, pixels); gl->raw_fBindFramebuffer(LOCAL_GL_FRAMEBUFFER, 0); gl->raw_fDeleteFramebuffers(1, &fbo); return true; }This doesn't seem to be entirely what I want because it's really just copying the texture data back into a framebuffer and reading from there, which is very similar to what I've already done but in a slightly different place in the code. But this at least brings us one step close to the screen, which is where this all eventually is supposed to end up.
Although all of the EGL symbols used in the code I've added exist and are technically usable, not all of them are available because of the way visibility of them is configured in GLContext. The raw_fBindFramebuffer() and raw_fDeleteFramebuffers() methods are both marked as private. I should explain that this is because these are all the raw variants of the methods. There are non-raw versions which have public visibility, but these have additional functionality I want to avoid triggering. I really do just want to use the raw versions.
To make them accessible to SharedSurface_EGLImage outside of GLContext I've therefore hacked their visibility like so:
//private: public: void raw_fBindFramebuffer(GLenum target, GLuint framebuffer) { BEFORE_GL_CALL; mSymbols.fBindFramebuffer(target, framebuffer); AFTER_GL_CALL; } [...] // TODO: Make private again public: void raw_fDeleteFramebuffers(GLsizei n, const GLuint* names) { BEFORE_GL_CALL; mSymbols.fDeleteFramebuffers(n, names); AFTER_GL_CALL; } private: void raw_fDeleteRenderbuffers(GLsizei n, const GLuint* names) { BEFORE_GL_CALL; mSymbols.fDeleteRenderbuffers(n, names); AFTER_GL_CALL; } [...]This wouldn't be especially noteworthy or interesting were it not for the fact that the changes I'm making are only temporary while I debug things, so I'll want to change them back before I commit any changes to the public repository. As we've already seem before, taking a note in these diary entries of things I need to reverse is a great way for me to keep track and make sure I can reverse the changes as easily as possible.
Having made all of these changes it's time to test things out. I've built and uploaded the code, and set the WebView app running:
[JavaScript Warning: "Layout was forced before the page was fully loaded. If stylesheets are not yet loaded this may cause a flash of unstyled content." {file: "https://sailfishos.org/wp-includes/js/jquery/ jquery.min.js?ver=3.5.1" line: 2}] Colour before: (0, 0, 0, 255), 1 Colour after: (0, 0, 0, 255), 1 Colour before: (0, 0, 0, 255), 1 Colour after: (0, 0, 0, 255), 1 Colour before: (0, 0, 0, 255), 1 Colour after: (0, 0, 0, 255), 1 Colour before: (0, 0, 0, 255), 1 Colour after: (0, 0, 0, 255), 1 Colour before: (0, 0, 0, 255), 1 Colour after: (0, 0, 0, 255), 1 Colour before: (0, 0, 0, 255), 1 [...] Colour after: (107, 79, 139, 255), 1 Colour before: (137, 207, 255, 255), 1 Colour after: (137, 207, 255, 255), 1 Colour before: (135, 205, 255, 255), 1 Colour after: (135, 205, 255, 255), 1 Colour before: (132, 204, 254, 255), 1 Colour after: (132, 204, 254, 255), 1 Colour before: (0, 0, 0, 255), 1 Colour after: (0, 0, 0, 255), 1 Colour before: (10, 144, 170, 255), 1 Colour after: (10, 144, 170, 255), 1 Colour before: (0, 0, 0, 255), 1 Colour after: (0, 0, 0, 255), 1 Colour before: (59, 128, 13, 255), 1 Colour after: (59, 128, 13, 255), 1 Colour before: (108, 171, 244, 255), 1 Colour after: (108, 171, 244, 255), 1 Colour before: (185, 132, 119, 255), 1 Colour after: (185, 132, 119, 255), 1 Colour before: (0, 0, 0, 255), 1 Colour after: (0, 0, 0, 255), 1 [...]As you can see, the surface starts off fully black but gets rendered until eventually it's a mix of colours. There are still a lot of cases where the pixels are just black, which makes me a bit nervous, but the colours in general look intentional. I think this is a good sign.
But I'd like to have some proper confirmation, so tomorrow I might see if I can get a full copy of the texture written out to disk so I can see what's really happening there.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
29 Apr 2024 : Day 231 #
I'm still in the process of trying to extract pixel colours from the surface used for transferring the offscreen buffer to the onscreen render. The snag I hit yesterday is that I wanted to use glGetTexImage(), but this symbol isn't being dynamically loaded for use with GLContext.
Consequently I've added it to the end of the coreSymbols table like this:
At least I may as well build it and find out.
Now when I run it I get this:
This error message appears twice in the code so I can't be entirely certain, but it looks like the error is resulting in a failure in the following method:
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
Consequently I've added it to the end of the coreSymbols table like this:
const SymLoadStruct coreSymbols[] = { { (PRFuncPtr*) &mSymbols.fActiveTexture, {{ "glActiveTexture", "glActiveTextureARB" }} }, { (PRFuncPtr*) &mSymbols.fAttachShader, {{ "glAttachShader", "glAttachShaderARB" }} }, { (PRFuncPtr*) &mSymbols.fBindAttribLocation, {{ "glBindAttribLocation", "glBindAttribLocationARB" }} }, { (PRFuncPtr*) &mSymbols.fBindBuffer, {{ "glBindBuffer", "glBindBufferARB" }} }, { (PRFuncPtr*) &mSymbols.fBindTexture, {{ "glBindTexture", "glBindTextureARB" }} }, [...] { (PRFuncPtr*) &mSymbols.fGetTexImage, {{ "glGetTexImage", "glGetTexImageARB" }} }, END_SYMBOLS };I've also searched through the rest of the code for glBindBuffer, which is the reference I'm using for how to introduce glGetTexImage. Although it comes up in many places in the code, none of them look relevant to what I'm doing, so I'm going to guess that the change I've made above is all that's needed.
At least I may as well build it and find out.
Now when I run it I get this:
Created LOG for EmbedLiteLayerManager Can't find symbol 'glGetTexImage'. Crash Annotation GraphicsCriticalError: |[0][GFX1-]: Failed to create EGLConfig! (t=1.32158) [GFX1-]: Failed to create EGLConfig! Segmentation faultThat's not a great sign. I'm going to have to put some more thought into this.
This error message appears twice in the code so I can't be entirely certain, but it looks like the error is resulting in a failure in the following method:
bool CreateConfig(EglDisplay& egl, EGLConfig* aConfig, int32_t depth, bool aEnableDepthBuffer, bool aUseGles, int aVisual);At this stage I'm not really sure what I'm doing, but I've changed my earlier change so it now looks like this:
const SymLoadStruct coreSymbols[] = { { (PRFuncPtr*) &mSymbols.fActiveTexture, {{ "glActiveTexture", "glActiveTextureARB" }} }, { (PRFuncPtr*) &mSymbols.fAttachShader, {{ "glAttachShader", "glAttachShaderARB" }} }, { (PRFuncPtr*) &mSymbols.fBindAttribLocation, {{ "glBindAttribLocation", "glBindAttribLocationARB" }} }, { (PRFuncPtr*) &mSymbols.fBindBuffer, {{ "glBindBuffer", "glBindBufferARB" }} }, { (PRFuncPtr*) &mSymbols.fBindTexture, {{ "glBindTexture", "glBindTextureARB" }} }, [...] { (PRFuncPtr*) &mSymbols.fGetTexImage, {{ "glGetTexImage" }} }, END_SYMBOLS };Sadly after rebuilding, re-uploading and re-running, that didn't fix it. I'm out of ideas for today, but maybe after a good night's sleep I'll have some better ideas tomorrow. Let's see in the morning.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
28 Apr 2024 : Day 230 #
Before I get in to my diary entry today I want to first mention a change that will be happening in a week. Between Saturday 4th May and Sunday 12th May I'll be attending a conference, which means I'll be pretty tied up throughout. So as I've done a couple of times in the past already, I'll be taking a break from these dev diaries during that period. It'll basically be a break of a week and a couple of days (Saturday to Sunday), but I'll get back on track again straight after again.
With that announcement out of the way, let's get on with the entry for today.
I'm currently digging around in GLScreenBuffer::Swap(), the idea being to put in some code to print out pixel colours. I added some yesterday, but it returned only zeros. Looking at the code the reason became clear pretty quickly: the ReadPixels() method from SharedSurface was being called, which always fails and just returns false.
Even though the surface is shown in the debugger as a SharedSurface it's supposed to be a SharedSurface_EGLImage. But SharedSurface_EGLImage doesn't override the ReadPixels() method, so even if it were, it still wouldn't make any difference.
But although the code I wrote yesterday doesn't work I'm hoping it can still lay the groundwork for something that will. What I've done is implemented that missing override, so that now there's a ReadPixels() method in SharedSurface_EGLImage that looks like this:
Unfortunately I don't have time to do that this evening, but it does at least give me a very clear plan for what I can be working on tomorrow morning.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
With that announcement out of the way, let's get on with the entry for today.
I'm currently digging around in GLScreenBuffer::Swap(), the idea being to put in some code to print out pixel colours. I added some yesterday, but it returned only zeros. Looking at the code the reason became clear pretty quickly: the ReadPixels() method from SharedSurface was being called, which always fails and just returns false.
Even though the surface is shown in the debugger as a SharedSurface it's supposed to be a SharedSurface_EGLImage. But SharedSurface_EGLImage doesn't override the ReadPixels() method, so even if it were, it still wouldn't make any difference.
But although the code I wrote yesterday doesn't work I'm hoping it can still lay the groundwork for something that will. What I've done is implemented that missing override, so that now there's a ReadPixels() method in SharedSurface_EGLImage that looks like this:
bool SharedSurface_EGLImage::ReadPixels(GLint x, GLint y, GLsizei width, GLsizei height, GLenum format, GLenum type, GLvoid* pixels) { const auto& gl = GLContextEGL::Cast(mDesc.gl); gl->fBindTexture(LOCAL_GL_TEXTURE_2D, mProdTex); gl->fGetTexImage(LOCAL_GL_TEXTURE_2D, 0, format, type, pixels); return true; }The idea is that this will bind the texture associated with the surface and then read the pixels from the currently bound texture. Sadly when I run it, the app crashes with a backtrace that looks like this:
Thread 37 "Compositor" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f178ee7e0 (LWP 2120)] 0x0000000000000000 in ?? () (gdb) bt #0 0x0000000000000000 in ?? () #1 0x0000007ff112e5f0 in mozilla::gl::GLContext::fGetTexImage ( img=0x7ed81ea5d0, type=5121, format=6408, level=0, target=3553, this=0x7ed819aa50) at ${PROJECT}/gecko-dev/gfx/gl/GLContext.h:1331 #2 mozilla::gl::SharedSurface_EGLImage::ReadPixels (this=<optimized out>, x=<optimized out>, y=<optimized out>, width=<optimized out>, height=<optimized out>, format=6408, type=5121, pixels=0x7ed81ea5d0) at ${PROJECT}/gecko-dev/gfx/gl/SharedSurfaceEGL.cpp:178 #3 0x0000007ff1101a34 in mozilla::gl::GLScreenBuffer::ReadPixels ( this=this@entry=0x7ed4002e00, x=x@entry=540, y=y@entry=1260, width=width@entry=1, height=height@entry=1, format=format@entry=6408, type=type@entry=5121, pixels=pixels@entry=0x7ed81ea5d0) at ${PROJECT}/gecko-dev/gfx/gl/GLScreenBuffer.cpp:462 #4 0x0000007ff11027e4 in mozilla::gl::GLScreenBuffer::Swap ( this=this@entry=0x7ed4002e00, size=...) at ${PROJECT}/gecko-dev/gfx/gl/GLScreenBuffer.cpp:531 #5 0x0000007ff3663f48 in mozilla::gl::GLScreenBuffer::PublishFrame (size=..., this=0x7ed4002e00) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/GLScreenBuffer.h:229 #6 mozilla::embedlite::EmbedLiteCompositorBridgeParent:: PresentOffscreenSurface (this=0x7fc4b422d0) at ${PROJECT}/gecko-dev/mobile/sailfishos/embedthread/ EmbedLiteCompositorBridgeParent.cpp:191 #7 0x0000007ff367d3ac in mozilla::embedlite::nsWindow::PostRender ( this=0x7fc4b09280, aContext=<optimized out>) at ${PROJECT}/gecko-dev/mobile/sailfishos/embedshared/nsWindow.cpp:248 #8 0x0000007ff2a62664 in mozilla::widget::InProcessCompositorWidget:: PostRender (this=0x7fc4d0eb40, aContext=0x7f178ed7e8) at ${PROJECT}/gecko-dev/widget/InProcessCompositorWidget.cpp:60 #9 0x0000007ff128d1c8 in mozilla::layers::LayerManagerComposite::Render ( this=this@entry=0x7ed81b7160, aInvalidRegion=..., aOpaqueRegion=...) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/layers/Compositor.h: 575 #10 0x0000007ff128d644 in mozilla::layers::LayerManagerComposite:: UpdateAndRender (this=this@entry=0x7ed81b7160) at ${PROJECT}/gecko-dev/gfx/layers/composite/LayerManagerComposite.cpp:657 #11 0x0000007ff128d9f4 in mozilla::layers::LayerManagerComposite:: EndTransaction (this=this@entry=0x7ed81b7160, aTimeStamp=..., aFlags=aFlags@entry=mozilla::layers::LayerManager::END_DEFAULT) at ${PROJECT}/gecko-dev/gfx/layers/composite/LayerManagerComposite.cpp:572 #12 0x0000007ff12cf174 in mozilla::layers::CompositorBridgeParent:: CompositeToTarget (this=0x7fc4b422d0, aId=..., aTarget=0x0, aRect=<optimized out>) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313 #13 0x0000007ff3663c48 in mozilla::embedlite::EmbedLiteCompositorBridgeParent:: CompositeToDefaultTarget (this=0x7fc4b422d0, aId=...) at ${PROJECT}/gecko-dev/mobile/sailfishos/embedthread/ EmbedLiteCompositorBridgeParent.cpp:165 #14 0x0000007ff12b48d0 in mozilla::layers::CompositorVsyncScheduler::Composite ( this=0x7fc4c9c050, aVsyncEvent=...) at ${PROJECT}/gecko-dev/gfx/layers/ipc/CompositorVsyncScheduler.cpp:256 #15 0x0000007ff12acd50 in mozilla::detail::RunnableMethodArguments<mozilla:: VsyncEvent>::applyImpl<mozilla::layers::CompositorVsyncScheduler, void ( mozilla::layers::CompositorVsyncScheduler::*)(mozilla::VsyncEvent const&), StoreCopyPassByConstLRef<mozilla::VsyncEvent>, 0ul> (args=..., m=<optimized out>, o=<optimized out>) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/ nsThreadUtils.h:887 [...] #27 0x0000007ff6a0289c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/ clone.S:78 (gdb)Although the crash isn't ideal it does tell me one useful thing: the method is getting called, which means that the surface really is a SharedSurface_EGLImage after all. That's at least one part confirmed. The reason for the crash is less clear, given that all the inputs to the fGetTexImage() method look sensible. The reason must be because the fGetTexImage() function pointer is null. And indeed it is:
(gdb) frame 1 #1 0x0000007ff112e5f0 in mozilla::gl::GLContext::fGetTexImage ( img=0x7ed81ea5d0, type=5121, format=6408, level=0, target=3553, this=0x7ed819aa50) at ${PROJECT}/gecko-dev/gfx/gl/GLContext.h:1331 1331 in ${PROJECT}/gecko-dev/gfx/gl/GLContext.h (gdb) p mSymbols.fGetTexImage $5 = (void (*)(GLenum, GLint, GLenum, GLenum, GLvoid *)) 0x0 (gdb)Okay. I'm not sure what I can do about that. It could be that the reason is because there's no fGetTexImage() in the GLContext::InitImpl() method, which is used to set up all of these function pointers. It's worth noting that the call to fBindTexture() that happens right before the call to fGetTexImage() works just fine and fBindTexture() can be found in a table at the top of the GLContext.cpp file, so that adds weight to the claim. If that's the case, maybe I can add fGetTexImage in?
Unfortunately I don't have time to do that this evening, but it does at least give me a very clear plan for what I can be working on tomorrow morning.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
27 Apr 2024 : Day 229 #
As discussed yesterday, today I'm investigating GLScreenBuffer::Swap(). A bit of debugging has demonstrated that it's hit frequently during the WebView render loop and it seems pretty integral. I also notice that there's already some commented-out code in there for reading off and debug-printing some pixel colours.
My plan was to do the same: read off some pixel colours, print them to the console, see whether they're updating as they should be. So the fact there's already some code in there to do this is an encouraging sign.
Unfortunately the code that's there doesn't work. Here's a snippet of the code in question:
Consequently I've grabbed the code that I used earlier in the CompositorOGL::EndFrame() method, transferred it over to this Swap() method and tweaked it for the new context. The new code looks like this:
Now, having built and run the application, I don't get any output. Debugging shows that the Swap() method is definitely being hit. But the code I added to sample the surface is never called. Stepping through the code the reason immediately becomes obvious: the section of code I added them to is wrapped in a condition:
So I've now moved the new sampling code outside of this condition, rebuilt and installed.
When I run the new library, the code is now definitely being executed, but the sampling is showing a very uniform output:
Stepping through the code shows why. Inside the GLScreenBuffer::ReadPixels() method there's a call to actual do the work that looks like this:
The debugger is also showing it as a SharedSurface type, but I'm suspicious. But I'm also tired, so I'm going to return to this tomorrow morning. With any luck, it'll be possible to unravel what's happening here and fix it.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
My plan was to do the same: read off some pixel colours, print them to the console, see whether they're updating as they should be. So the fact there's already some code in there to do this is an encouraging sign.
Unfortunately the code that's there doesn't work. Here's a snippet of the code in question:
// uint32_t srcPixel = ReadPixel(src); // uint32_t destPixel = ReadPixel(dest); // printf_stderr("Before: src: 0x%08x, dest: 0x%08x\n", srcPixel, // destPixel);It looks like a pretty slick implementation — there's not much to it after all — so using this would be ideal. Unfortunately it won't compile. As you can see it references a method called ReadPixel(), but this method doesn't actually exist. It's not even clear to me what the method is supposed to do, so replicating it isn't an option either.
Consequently I've grabbed the code that I used earlier in the CompositorOGL::EndFrame() method, transferred it over to this Swap() method and tweaked it for the new context. The new code looks like this:
int xpos = size.width / 2; int ypos = size.height / 2; size_t bufferSize = sizeof(char) * 4; uint8_t* buf = static_cast<uint8_t*>(calloc(sizeof(uint8_t), bufferSize)); bool result = ReadPixels(xpos, ypos, 1, 1, LOCAL_GL_RGBA, LOCAL_GL_UNSIGNED_BYTE, buf); int pos = 0; volatile char red = buf[pos]; volatile char green = buf[pos + 1]; volatile char blue = buf[pos + 2]; volatile char alpha = buf[pos + 3]; printf_stderr("Colour before: (%d, %d, %d, %d), %d\n", red, green, blue, alpha, result);As you can see, it grabs a pixel from the surface and prints the colour value to the console. This is the "before" version and there's also an "after" version which does the same right after a call to SharedSurface::ProdCopy() has been made which appears to make a copy of the front buffer on to the back buffer. That's to match the commented-out code that was there before.
Now, having built and run the application, I don't get any output. Debugging shows that the Swap() method is definitely being hit. But the code I added to sample the surface is never called. Stepping through the code the reason immediately becomes obvious: the section of code I added them to is wrapped in a condition:
if (mCaps.preserve && mFront && mBack) { auto src = mFront->Surf(); auto dest = mBack->Surf(); [...] }I just assumed that the code that was there before was a good place to add the sampling code, but mCaps.preserve is set to false so this condition is never being met.
So I've now moved the new sampling code outside of this condition, rebuilt and installed.
When I run the new library, the code is now definitely being executed, but the sampling is showing a very uniform output:
[...] Colour before: (0, 0, 0, 0), 0 Colour after: (0, 0, 0, 0), 0 Colour before: (0, 0, 0, 0), 0 Colour after: (0, 0, 0, 0), 0 Colour before: (0, 0, 0, 0), 0 Colour after: (0, 0, 0, 0), 0 Colour before: (0, 0, 0, 0), 0 Colour after: (0, 0, 0, 0), 0 Colour before: (0, 0, 0, 0), 0 Colour after: (0, 0, 0, 0), 0 Colour before: (0, 0, 0, 0), 0 Colour after: (0, 0, 0, 0), 0 Colour before: (0, 0, 0, 0), 0 Colour after: (0, 0, 0, 0), 0 [...]This can only be for one of two reasons: either the sampling code is broken, or the transfer of the image to the surface is broken. To understand which is happening we can consider the extra hint that's there in the output: the last value on the line is giving the result value of the call to ReadPixels(). In other words this indicates whether or not the ReadPixels() call was successful. The value being returned is always zero, equivalent of false. So no, the answer is that the call is not being successful.
Stepping through the code shows why. Inside the GLScreenBuffer::ReadPixels() method there's a call to actual do the work that looks like this:
return surf->ReadPixels(x, y, width, height, format, type, pixels);There are many different types of surface with potentially different overrides for ReadPixels() that this could be calling, but in this specific case, stepping through takes us to this method:
virtual bool ReadPixels(GLint x, GLint y, GLsizei width, GLsizei height, GLenum format, GLenum type, GLvoid* pixels) { return false; }As we can see, this implementation of the call makes no attempt to read the pixels and is guaranteed to return false. Now even though this is a SharedSurface method it doesn't follow that the surface is actually of that type. For example SharedSurface_EGLImage is a subclass of SharedSurface, but since it doesn't override this particular method, it'll use the underlying SharedSurface method instead.
The debugger is also showing it as a SharedSurface type, but I'm suspicious. But I'm also tired, so I'm going to return to this tomorrow morning. With any luck, it'll be possible to unravel what's happening here and fix it.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
26 Apr 2024 : Day 228 #
I'm back on track today, returning to the question of why the WebView offscreen renderer isn't rendering. At least, not on screen. Before I got sidetracked my plan was to look at what's happening between the completion of rendering the page to the texture (which is working) and rendering of that texture to the screen (which is not).
Previously my plan was to look between DeclarativeWebContainer::clearWindowSurface() and CompositorOGL::EndFrame() and try to find the problem somewhere between the two. The clearWindowSurface() even had a handy glClear() call which I could use to change the background colour of the unrendered page.
But I subsequently discovered that clearWindowSurface() is used by the browser, but not the WebView. And that led me on my path to having to fix the browser.
It's good I went down that path, because now the browser is working again, but it was nevertheless a diversion from my path of fixing the WebView. It's good to be able to pop that topic off the stack and return to this.
So the question I'm interested to know is whether there's an equivalent of clearWindowSurface() for the WebView. But I have a suspicion that this is all performed behind the scenes directly by Qt. If so, that's going to complicate things.
So I've spent this evening digging through the code, staring at the sailfish-components-webview code and RawWebView in particular, moving rapidly on to qtmozembed where I looked at QOpenGLWebPage (not used), QuickMozView (used), QMozViewPrivate (where the actual action happens) and then ending up looking back in the Gecko code again.
And the really interesting bit seems to be EmbedLiteCompositorBridgeParent. For example, there's this PresentOffscreenSurface() method which is called frequently and includes a call to screen->PublishFrame():
It's late already though, so that investigation will have to wait until the morning.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
Previously my plan was to look between DeclarativeWebContainer::clearWindowSurface() and CompositorOGL::EndFrame() and try to find the problem somewhere between the two. The clearWindowSurface() even had a handy glClear() call which I could use to change the background colour of the unrendered page.
But I subsequently discovered that clearWindowSurface() is used by the browser, but not the WebView. And that led me on my path to having to fix the browser.
It's good I went down that path, because now the browser is working again, but it was nevertheless a diversion from my path of fixing the WebView. It's good to be able to pop that topic off the stack and return to this.
So the question I'm interested to know is whether there's an equivalent of clearWindowSurface() for the WebView. But I have a suspicion that this is all performed behind the scenes directly by Qt. If so, that's going to complicate things.
So I've spent this evening digging through the code, staring at the sailfish-components-webview code and RawWebView in particular, moving rapidly on to qtmozembed where I looked at QOpenGLWebPage (not used), QuickMozView (used), QMozViewPrivate (where the actual action happens) and then ending up looking back in the Gecko code again.
And the really interesting bit seems to be EmbedLiteCompositorBridgeParent. For example, there's this PresentOffscreenSurface() method which is called frequently and includes a call to screen->PublishFrame():
void EmbedLiteCompositorBridgeParent::PresentOffscreenSurface() { const CompositorBridgeParent::LayerTreeState* state = CompositorBridgeParent:: GetIndirectShadowTree(RootLayerTreeId()); NS_ENSURE_TRUE(state && state->mLayerManager, ); GLContext* context = static_cast<CompositorOGL*>( state->mLayerManager->GetCompositor())->gl(); NS_ENSURE_TRUE(context, ); NS_ENSURE_TRUE(context->IsOffscreen(), ); // RenderGL is called always from Gecko compositor thread. // GLScreenBuffer::PublishFrame does swap buffers and that // cannot happen while reading previous frame on EmbedLiteCompositorBridgeParent::GetPlatformImage // (potentially from another thread). MutexAutoLock lock(mRenderMutex); // TODO: The switch from GLSCreenBuffer to SwapChain needs completing // See: https://phabricator.services.mozilla.com/D75055 GLScreenBuffer* screen = context->Screen(); MOZ_ASSERT(screen); if (screen->Size().IsEmpty() || !screen->PublishFrame(screen->Size())) { NS_ERROR("Failed to publish context frame"); } }The PublishFrame() method ends up just calling the Swap() method:
bool PublishFrame(const gfx::IntSize& size) { return Swap(size); }But the Swap() method is more interesting: it deals with the surfaces and is part of the GLScreenBuffer class. So this looks like a good place to investigate further.
It's late already though, so that investigation will have to wait until the morning.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
25 Apr 2024 : Day 227 #
Things started falling into place yesterday when the reasons for the crashes became clear. Although the value of embedlite.compositor.external_gl_context was being set correctly to different values depending on whether the running application was the browser or a WebView, the value was being reversed in the case of the browser because the WebEngineSettings::initialize() was being called twice. This initialize() method is supposed to be idempotent (which you have to admit, is a great word!), but due to a glitch in the logic turned out not to be.
The fix was to change the ordering of the execution: moving the place where isInitialized gets set to before the early return caused by the existence of a marker file. That all sounds a bit esoteric, but it was a simple change. I've now lined up all of the pieces so that:
But unfortunately not quite yet. After making these changes, the browser works fine, but I now get a crash when running the WebView application:
And that's exactly what has happened now. Without keeping track of the changes, I'm fairly certain I'd have got in a mess and forgotten I'd made these changes. Now I can reverse them easily.
Having made this reversal, build and installed the executable and run the code I'm very happy to see that:
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
The fix was to change the ordering of the execution: moving the place where isInitialized gets set to before the early return caused by the existence of a marker file. That all sounds a bit esoteric, but it was a simple change. I've now lined up all of the pieces so that:
- embedlite.compositor.external_gl_context is set to true in WebEngineSettings::initialize().
- embedlite.compositor.external_gl_context is set to false in DeclarativeWebUtils::setRenderingPreferences().
- isInitialized is set in the correct place.
But unfortunately not quite yet. After making these changes, the browser works fine, but I now get a crash when running the WebView application:
Thread 37 "Compositor" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f1f94d7e0 (LWP 7616)] 0x0000007ff1105978 in mozilla::gl::GLScreenBuffer::Size (this=0x0) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:290 290 ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h: No such file or directory. (gdb) bt #0 0x0000007ff1105978 in mozilla::gl::GLScreenBuffer::Size (this=0x0) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:290 #1 mozilla::gl::GLContext::OffscreenSize (this=this@entry=0x7ee019aa70) at gfx/gl/GLContext.cpp:2141 #2 0x0000007ff3664264 in mozilla::embedlite::EmbedLiteCompositorBridgeParent:: CompositeToDefaultTarget (this=0x7fc4ad3530, aId=...) at mobile/sailfishos/embedthread/EmbedLiteCompositorBridgeParent.cpp:156 #3 0x0000007ff12b48d8 in mozilla::layers::CompositorVsyncScheduler:: ForceComposeToTarget (this=0x7fc4c36e00, aTarget=aTarget@entry=0x0, aRect=aRect@entry=0x0) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/layers/LayersTypes.h: 82 #4 0x0000007ff12b4934 in mozilla::layers::CompositorBridgeParent:: ResumeComposition (this=this@entry=0x7fc4ad3530) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313 #5 0x0000007ff12b49c0 in mozilla::layers::CompositorBridgeParent:: ResumeCompositionAndResize (this=0x7fc4ad3530, x=<optimized out>, y=<optimized out>, width=<optimized out>, height=<optimized out>) at gfx/layers/ipc/CompositorBridgeParent.cpp:794 #6 0x0000007ff12ad55c in mozilla::detail::RunnableMethodArguments<int, int, int, int>::applyImpl<mozilla::layers::CompositorBridgeParent, void (mozilla: :layers::CompositorBridgeParent::*)(int, int, int, int), StoreCopyPassByConstLRef<int>, StoreCopyPassByConstLRef<int>, StoreCopyPassByConstLRef<int>, StoreCopyPassByConstLRef<int>, 0ul, 1ul, 2ul, 3ul> (args=..., m=<optimized out>, o=<optimized out>) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1151 [...] #18 0x0000007ff6a0289c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/ clone.S:78 (gdb)Thinking back, or more precisely looking back through my diary entries the reason becomes clear. You may recall that back on Day 221 I made some changes in the hope of them fixing the browser rendering. The changes are summarised by this diff:
$ git diff -- gfx/layers/opengl/CompositorOGL.cpp diff --git a/gfx/layers/opengl/CompositorOGL.cpp b/gfx/layers/opengl/ CompositorOGL.cpp index 8a423b840dd5..11105c77c43b 100644 --- a/gfx/layers/opengl/CompositorOGL.cpp +++ b/gfx/layers/opengl/CompositorOGL.cpp @@ -246,12 +246,14 @@ already_AddRefed<mozilla::gl::GLContext> CompositorOGL:: CreateContext() { // Allow to create offscreen GL context for main Layer Manager if (!context && gfxEnv::LayersPreferOffscreen()) { + SurfaceCaps caps = SurfaceCaps::ForRGB(); + caps.preserve = false; + caps.bpp16 = gfxVars::OffscreenFormat() == SurfaceFormat::R5G6B5_UINT16; + nsCString discardFailureId; - context = GLContextProvider::CreateHeadless( - {CreateContextFlags::REQUIRE_COMPAT_PROFILE}, &discardFailureId); - if (!context->CreateOffscreenDefaultFb(mSurfaceSize)) { - context = nullptr; - } + context = GLContextProvider::CreateOffscreen( + mSurfaceSize, caps, CreateContextFlags::REQUIRE_COMPAT_PROFILE, + &discardFailureId); } [...]Now that the underlying error is clear it's time to reverse this change. At the time I even noted the importance of these diary entries as a way of helping in case I might have to do something like this:
Things will get a bit messy the more I change, but the beauty of these diaries is that I'll be keeping a full record. So it should all be clear what gets changed.
And that's exactly what has happened now. Without keeping track of the changes, I'm fairly certain I'd have got in a mess and forgotten I'd made these changes. Now I can reverse them easily.
Having made this reversal, build and installed the executable and run the code I'm very happy to see that:
- The browser now works again, rendering and all.
- The WebVew app no longer crashes.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
24 Apr 2024 : Day 226 #
Having solved the mystery of the crashing browser, the question today is what to do about it. What I can't immediately understand is how in ESR 78 the value read off for the embedlite.compositor.external_gl_context preference is true when the browser is run, but false when the WebView app is run.
We can see the difference quite clearly from stepping through the AllocPLayerTransactionParent() method on ESR 78. First, this is what we see when running the browser. We can clearly observe that mUseExternalGLContext is set to true:
It turns out the browser startup sequence is a bit messy. The sequence it goes through (along with a whole bunch of other stuff) is to first call initialize(). This method has a guard inside it to prevent the content of the method being called multiple times and which looks like this:
But now comes the catch. A little later on in the execution sequence initialize() is called again. At this point, were the isInitialized value set to true the method would return early and the contents of the method would be skipped. All would be good. But the problem is that it's not set to true, it's still set to false.
That's because the code that sets it to true is at the end of the method and on first execution the method is returning early due to a separate check; this one:
The fact it's exiting early here makes sense. But I don't think it makes sense for the isInitialized variable to be left as it is. I think it should be set to true even if the method exits early at this point.
In case you want to see the full gory details for yourself (trust me: you don't!), here's the step through that shows this sequence of events:
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
We can see the difference quite clearly from stepping through the AllocPLayerTransactionParent() method on ESR 78. First, this is what we see when running the browser. We can clearly observe that mUseExternalGLContext is set to true:
Thread 39 "Compositor" hit Breakpoint 2, mozilla::embedlite:: EmbedLiteCompositorBridgeParent::AllocPLayerTransactionParent ( this=0x7f809d5670, aBackendHints=..., aId=...) at mobile/sailfishos/embedthread/EmbedLiteCompositorBridgeParent.cpp:77 77 PLayerTransactionParent* p = (gdb) n 80 EmbedLiteWindowParent *parentWindow = EmbedLiteWindowParent::From( mWindowId); (gdb) p mUseExternalGLContext $2 = true (gdb)When stepping through the same bit of code when running harbour-webview, mUseExternalGLContext is set to false:
Thread 34 "Compositor" hit Breakpoint 1, mozilla::embedlite:: EmbedLiteCompositorBridgeParent::AllocPLayerTransactionParent ( this=0x7f8cb70d50, aBackendHints=..., aId=...) at mobile/sailfishos/embedthread/EmbedLiteCompositorBridgeParent.cpp:77 77 PLayerTransactionParent* p = (gdb) n 80 EmbedLiteWindowParent *parentWindow = EmbedLiteWindowParent::From( mWindowId); (gdb) n 81 if (parentWindow) { (gdb) p mUseExternalGLContext $1 = false (gdb)How can this be? There must be somewhere it's being changed. I notice that the value is set in the embedding.js file. Could that be it?
// Make gecko compositor use GL context/surface provided by the application. pref("embedlite.compositor.external_gl_context", false); // Request the application to create GLContext for the compositor as // soon as the top level PuppetWidget is creted for the view. Setting // this pref only makes sense when using external compositor gl context. pref("embedlite.compositor.request_external_gl_context_early", false);The other thing I notice is that the value is set to true in declarativewebutils.cpp, which is part of the sailfish-browser code:
void DeclarativeWebUtils::setRenderingPreferences() { SailfishOS::WebEngineSettings *webEngineSettings = SailfishOS:: WebEngineSettings::instance(); // Use external Qt window for rendering content webEngineSettings->setPreference(QString( "gfx.compositor.external-window"), QVariant(true)); webEngineSettings->setPreference(QString( "gfx.compositor.clear-context"), QVariant(false)); webEngineSettings->setPreference(QString( "gfx.webrender.force-disabled"), QVariant(true)); webEngineSettings->setPreference(QString( "embedlite.compositor.external_gl_context"), QVariant(true)); }And this wasn't a change made by me:
$ git blame apps/core/declarativewebutils.cpp -L239,239 d8932efa1 src/browser/declarativewebutils.cpp (Raine Makelainen 2016-09-19 20: 16:59 +0300 239) webEngineSettings->setPreference(QString( "embedlite.compositor.external_gl_context"), QVariant(true));So the key methods that are of interest to us are WebEngineSettings::initialize() in the sailfish-components-webview project, since this is where I've set the value to false and DeclarativeWebUtils::setRenderingPreferences() in the sailfish-browser project, since this is where, historically, the value was always forced to true on browser start-up.
It turns out the browser startup sequence is a bit messy. The sequence it goes through (along with a whole bunch of other stuff) is to first call initialize(). This method has a guard inside it to prevent the content of the method being called multiple times and which looks like this:
static bool isInitialized = false; if (isInitialized) { return; }The first time the method is called the isInitialized static variable is set to false and the contents of the method executes. As per our updated code this sets the embedlite.compositor.external_gl_context static preference to false. This is all fine for the WebView. In the case of the browser the setRenderingPreferences() method is called shortly after, setting the value to true. This is all present and correct.
But now comes the catch. A little later on in the execution sequence initialize() is called again. At this point, were the isInitialized value set to true the method would return early and the contents of the method would be skipped. All would be good. But the problem is that it's not set to true, it's still set to false.
That's because the code that sets it to true is at the end of the method and on first execution the method is returning early due to a separate check; this one:
// Guard preferences that should be written only once. If a preference // needs to be // forcefully written upon each start that should happen before this. QString appConfig = QStandardPaths::writableLocation(QStandardPaths:: AppDataLocation); QFile markerFile(QString("%1/__PREFS_WRITTEN__").arg(appConfig)); if (markerFile.exists()) { return; }This call causes the method to exit early so that the isInitialized variable never gets set.
The fact it's exiting early here makes sense. But I don't think it makes sense for the isInitialized variable to be left as it is. I think it should be set to true even if the method exits early at this point.
In case you want to see the full gory details for yourself (trust me: you don't!), here's the step through that shows this sequence of events:
$ gdb sailfish-browser [...] (gdb) b DeclarativeWebUtils::setRenderingPreferences Breakpoint 1 at 0x42c48: file ../core/declarativewebutils.cpp, line 233. (gdb) b WebEngineSettings::initialize Breakpoint 2 at 0x22ce4 (gdb) r Starting program: /usr/bin/sailfish-browser [...] Thread 1 "sailfish-browse" hit Breakpoint 2, SailfishOS:: WebEngineSettings::initialize () at webenginesettings.cpp:96 96 if (isInitialized) { (gdb) p isInitialized $4 = false (gdb) n [...] 135 if (markerFile.exists()) { (gdb) 136 return; (gdb) c Continuing. Thread 1 "sailfish-browse" hit Breakpoint 1, DeclarativeWebUtils:: setRenderingPreferences (this=0x55556785f0) at ../core/ declarativewebutils.cpp:233 233 SailfishOS::WebEngineSettings *webEngineSettings = SailfishOS:: WebEngineSettings::instance(); (gdb) c Continuing. Thread 1 "sailfish-browse" hit Breakpoint 2, SailfishOS:: WebEngineSettings::initialize () at webenginesettings.cpp:96 96 if (isInitialized) { (gdb) p isInitialized $5 = false (gdb) n [...] 135 if (markerFile.exists()) { (gdb) 136 return; (gdb) c Continuing. [...]At any rate, it looks like I have some kind of solution: set the preference in both places, but make sure the isInitialized value gets set correctly by moving the place it's set to above the condition that returns early. Tomorrow I'll install the packages with this change and give it a go.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
23 Apr 2024 : Day 225 #
As we left things yesterday we were looking at nsWindow::GetNativeData() and the fact it does things the same way for both ESR 78 and ESR 91, simply returning GetGLContext() in response to a NS_NATIVE_OPENGL_CONTEXT parameter being passed in.
Interestingly, this call to GetGLContext() is exactly where we saw the GLContext being created on ESR 78 on day 223. So it's quite possible that both calls are going through on both versions, yet while it succeeds on ESR 78, it fails on ESR 91. It's not at all clear why that might be. Time to find out.
Crucially there is a difference between the code in GetGLContext() on ESR 78 compared to the code on ESR 91. Here's the very start of the ESR 78 method:
The value of the static pref is set to true by default, as we can see in the StaticPrefList.yaml file:
However, the next time I fire the browser up it crashes again. And when I check the preferences file I can see the value has switched back to false. That's because I've set it to do that in WebEngineSettings that's part of sailfish-components-webview:
But I obviously wasn't entirely comfortable with this at the time and went on to write the following:
But now I need the opposite, so for the time being I've disabled the forcing of this preference in sailfish-components-webview. That change necessitated (of course) a rebuild and reinstallation of the component.
The browser is now back up and running. Hooray! But the WebView now crashes before it even attempts to render a page. That's not unexpected and presumably will take us back to this failure to call PrepareOffscreen(). I'm going to investigate that further, but not today, it'll have to wait until tomorrow.
Even though this was one step backwards, one step forwards, it still feels like progress. We'll get there.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
Interestingly, this call to GetGLContext() is exactly where we saw the GLContext being created on ESR 78 on day 223. So it's quite possible that both calls are going through on both versions, yet while it succeeds on ESR 78, it fails on ESR 91. It's not at all clear why that might be. Time to find out.
Crucially there is a difference between the code in GetGLContext() on ESR 78 compared to the code on ESR 91. Here's the very start of the ESR 78 method:
GLContext* nsWindow::GetGLContext() const { LOGT("this:%p, UseExternalContext:%d", this, sUseExternalGLContext); if (sUseExternalGLContext) { void* context = nullptr; void* surface = nullptr; void* display = nullptr; [...]Notice how the entrance to the main body of the function is gated by a variable called sUseExternalGLContext. In order for this method to return something non-null, it's essential that this is set to true. On ESR 91 this has changed from a variable to a static preference that looks like this:
GLContext* nsWindow::GetGLContext() const { LOGT("this:%p, UseExternalContext:%d", this, StaticPrefs::embedlite_compositor_external_gl_context()); if (StaticPrefs::embedlite_compositor_external_gl_context()) { void* context = nullptr; void* surface = nullptr; void* display = nullptr; [...]This was actually a change I made myself and it's not really a very dramatic change at all. In ESR 78 the sUseExternalGLContext variable was being mirrored from a static preference, which is one that can be read very quickly, so there was no real reason to copy it into a variable. Hence I just switched out the variable with direct accesses of the static pref instead. That was all documented back on Day 97.
The value of the static pref is set to true by default, as we can see in the StaticPrefList.yaml file:
# Make gecko compositor use GL context/surface provided by the application - name: embedlite.compositor.external_gl_context type: bool value: true mirror: alwaysHowever this value can be overidden by the preferences, stored in ~/.local/share/org.sailfishos/browser/.mozilla/prefs.js. Looking inside that file I see the following:
user_pref("embedlite.compositor.external_gl_context", false);I've switched that back to true and now, when I run the browser... woohoo! Rendering works again. Great! This is a huge relief. The onscreen rendering pipeline is still working just fine.
However, the next time I fire the browser up it crashes again. And when I check the preferences file I can see the value has switched back to false. That's because I've set it to do that in WebEngineSettings that's part of sailfish-components-webview:
// Ensure the renderer is configured correctly engineSettings->setPreference(QStringLiteral( "gfx.webrender.force-disabled"), QVariant(true)); engineSettings->setPreference(QStringLiteral( "embedlite.compositor.external_gl_context"), QVariant(false));And the reason for that was documented back on Day 165, where I wrote this:
As we can see, it comes down to this embedlite.compositor.external_gl_context static preference, which needs to be set to false for the condition to be entered.
But I obviously wasn't entirely comfortable with this at the time and went on to write the following:
I'm going to set it to <tt>false</tt> explicitly for the WebView. But this immediately makes me feel nervous: this setting isn't new and there's a reason it's not being touched in the WebView code. It makes me think that I'm travelling down a rendering pipeline path that I shouldn't be.And so it panned out. Looming back at what I wrote then, the issue I was trying to address by flipping the static preference was the need to enter the condition in the following bit of code:
PLayerTransactionParent* EmbedLiteCompositorBridgeParent::AllocPLayerTransactionParent(const nsTArray<LayersBackend>& aBackendHints, const LayersId& aId) { PLayerTransactionParent* p = CompositorBridgeParent::AllocPLayerTransactionParent(aBackendHints, aId); EmbedLiteWindowParent *parentWindow = EmbedLiteWindowParent::From(mWindowId); if (parentWindow) { parentWindow->GetListener()->CompositorCreated(); } if (!StaticPrefs::embedlite_compositor_external_gl_context()) { // Prepare Offscreen rendering context PrepareOffscreen(); } return p; }I wanted PrepareOffscreen() to be called and negating this preference seemed the neatest and easiest way to make it happen.
But now I need the opposite, so for the time being I've disabled the forcing of this preference in sailfish-components-webview. That change necessitated (of course) a rebuild and reinstallation of the component.
The browser is now back up and running. Hooray! But the WebView now crashes before it even attempts to render a page. That's not unexpected and presumably will take us back to this failure to call PrepareOffscreen(). I'm going to investigate that further, but not today, it'll have to wait until tomorrow.
Even though this was one step backwards, one step forwards, it still feels like progress. We'll get there.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
22 Apr 2024 : Day 224 #
Before we get started, a content warning. If you're a little squeamish or discussions about human illness make you uncomfortable, this might be one to skip.
Because today was not the day of development I was hoping it was going to be. Last night I found myself experiencing a bad case of food poisoning. I woke in the middle of the night, or maybe I never went to sleep, feeling deeply uncomfortable. After a long period of unrest I eventually found myself arched over the toilet bowl emptying the contents of my stomach into it.
I've spent the rest of the day recuperating, mostly sleeping, consuming only liquids (large quantities of Lucozade) and drifting in and out of consciousness. It seems like it was something I ate, which means it's just a case of waiting it out, but the lack of sleep, elevated temperature and general discomfort has left me in a state of hazy semi- coherence. Focusing has not really been an option.
After having spent the day like this it's now late evening and the chance for me to do much of anything let alone development, has sadly passed.
But I'm going to try to seize the moment nonetheless. There's one simple thing I think I can do that will help my task tomorrow. You'll recall that the question I need to answer is why the following is returning a value on ESR 78, but returning null on ESR 91:
The process for checking this is simple with the debugger: place a breakpoint on CompositorOGL::CreateContext() as before. When it hits, place a new breakpoint on every version of GetNativeData(), run the code onward and wait to see which one is hit first.
Here goes, first with ESR 78:
I've not made a lot of progress today, but perhaps that's to be expected. I'll continue following this lead up tomorrow.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
Because today was not the day of development I was hoping it was going to be. Last night I found myself experiencing a bad case of food poisoning. I woke in the middle of the night, or maybe I never went to sleep, feeling deeply uncomfortable. After a long period of unrest I eventually found myself arched over the toilet bowl emptying the contents of my stomach into it.
I've spent the rest of the day recuperating, mostly sleeping, consuming only liquids (large quantities of Lucozade) and drifting in and out of consciousness. It seems like it was something I ate, which means it's just a case of waiting it out, but the lack of sleep, elevated temperature and general discomfort has left me in a state of hazy semi- coherence. Focusing has not really been an option.
After having spent the day like this it's now late evening and the chance for me to do much of anything let alone development, has sadly passed.
But I'm going to try to seize the moment nonetheless. There's one simple thing I think I can do that will help my task tomorrow. You'll recall that the question I need to answer is why the following is returning a value on ESR 78, but returning null on ESR 91:
void* widgetOpenGLContext = widget ? widget->GetNativeData(NS_NATIVE_OPENGL_CONTEXT) : nullptr;The GetNativeData() method is an override, which means that there are potentially multiple different versions that might be being called when this bit of code is executed. The pointer to the method will be taken from the widget object's vtable. So it would be useful to know what method is actually being called on the two different versions of the browser.
The process for checking this is simple with the debugger: place a breakpoint on CompositorOGL::CreateContext() as before. When it hits, place a new breakpoint on every version of GetNativeData(), run the code onward and wait to see which one is hit first.
Here goes, first with ESR 78:
Thread 39 "Compositor" hit Breakpoint 1, mozilla::layers:: CompositorOGL::CreateContext (this=0x7ea0003420) at gfx/layers/opengl/CompositorOGL.cpp:223 223 already_AddRefed<mozilla::gl::GLContext> CompositorOGL::CreateContext() { (gdb) b GetNativeData Breakpoint 2 at 0x7fbbdbe5e0: GetNativeData. (5 locations) (gdb) c Continuing. Thread 39 "Compositor" hit Breakpoint 2, mozilla::embedlite::nsWindow: :GetNativeData (this=0x7f80c8d500, aDataType=12) at mobile/sailfishos/embedshared/nsWindow.cpp:176 176 LOGT("t:%p, DataType: %i", this, aDataType); (gdb) bt #0 mozilla::embedlite::nsWindow::GetNativeData (this=0x7f80c8d500, aDataType=12) at mobile/sailfishos/embedshared/nsWindow.cpp:176 #1 0x0000007fba682718 in mozilla::layers::CompositorOGL::CreateContext ( this=0x7ea0003420) at gfx/layers/opengl/CompositorOGL.cpp:228 #2 0x0000007fba6a33a4 in mozilla::layers::CompositorOGL::Initialize ( this=0x7ea0003420, out_failureReason=0x7edb30b720) at gfx/layers/opengl/CompositorOGL.cpp:374 #3 0x0000007fba77aff4 in mozilla::layers::CompositorBridgeParent:: NewCompositor (this=this@entry=0x7f807596a0, aBackendHints=...) at gfx/layers/ipc/CompositorBridgeParent.cpp:1534 #4 0x0000007fba784660 in mozilla::layers::CompositorBridgeParent:: InitializeLayerManager (this=this@entry=0x7f807596a0, aBackendHints=...) at gfx/layers/ipc/CompositorBridgeParent.cpp:1491 #5 0x0000007fba7847a8 in mozilla::layers::CompositorBridgeParent:: AllocPLayerTransactionParent (this=this@entry=0x7f807596a0, aBackendHints=..., aId=...) at gfx/layers/ipc/CompositorBridgeParent.cpp:1587 #6 0x0000007fbca81234 in mozilla::embedlite::EmbedLiteCompositorBridgeParent:: AllocPLayerTransactionParent (this=0x7f807596a0, aBackendHints=..., aId=...) at mobile/sailfishos/embedthread/EmbedLiteCompositorBridgeParent.cpp:77 #7 0x0000007fba05f3f0 in mozilla::layers::PCompositorBridgeParent:: OnMessageReceived (this=0x7f807596a0, msg__=...) at PCompositorBridgeParent.cpp:1391 [...] #23 0x0000007fb735989c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/ clone.S:78 (gdb)And then with ESR 91:
Thread 37 "Compositor" hit Breakpoint 1, mozilla::layers:: CompositorOGL::CreateContext (this=this@entry=0x7ed8002ed0) at gfx/layers/opengl/CompositorOGL.cpp:227 227 already_AddRefed<mozilla::gl::GLContext> CompositorOGL::CreateContext() { (gdb) b GetNativeData Breakpoint 2 at 0x7ff41fcef8: GetNativeData. (4 locations) (gdb) c Continuing. Thread 37 "Compositor" hit Breakpoint 2, mozilla::embedlite::nsWindow: :GetNativeData (this=0x7fc89ffc80, aDataType=12) at mobile/sailfishos/embedshared/nsWindow.cpp:164 164 { (gdb) bt #0 mozilla::embedlite::nsWindow::GetNativeData (this=0x7fc89ffc80, aDataType=12) at mobile/sailfishos/embedshared/nsWindow.cpp:164 #1 0x0000007ff293aae4 in mozilla::layers::CompositorOGL::CreateContext ( this=this@entry=0x7ed8002ed0) at gfx/layers/opengl/CompositorOGL.cpp:232 #2 0x0000007ff29503b8 in mozilla::layers::CompositorOGL::Initialize ( this=0x7ed8002ed0, out_failureReason=0x7fc17a8510) at gfx/layers/opengl/CompositorOGL.cpp:397 #3 0x0000007ff2a66094 in mozilla::layers::CompositorBridgeParent:: NewCompositor (this=this@entry=0x7fc8a01850, aBackendHints=...) at gfx/layers/ipc/CompositorBridgeParent.cpp:1493 #4 0x0000007ff2a71110 in mozilla::layers::CompositorBridgeParent:: InitializeLayerManager (this=this@entry=0x7fc8a01850, aBackendHints=...) at gfx/layers/ipc/CompositorBridgeParent.cpp:1436 #5 0x0000007ff2a71240 in mozilla::layers::CompositorBridgeParent:: AllocPLayerTransactionParent (this=this@entry=0x7fc8a01850, aBackendHints=..., aId=...) at gfx/layers/ipc/CompositorBridgeParent.cpp:1546 #6 0x0000007ff4e07ca8 in mozilla::embedlite::EmbedLiteCompositorBridgeParent:: AllocPLayerTransactionParent (this=0x7fc8a01850, aBackendHints=..., aId=...) at mobile/sailfishos/embedthread/EmbedLiteCompositorBridgeParent.cpp:80 #7 0x0000007ff23fe52c in mozilla::layers::PCompositorBridgeParent:: OnMessageReceived (this=0x7fc8a01850, msg__=...) at PCompositorBridgeParent.cpp:1285 [...] #22 0x0000007fefba889c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/ clone.S:78 (gdb)So in both cases the class that the GetNativeData() method is being called from is mozilla::embedlite::nsWindow::GetNativeData. Looking at the code for this method, there's a pretty svelte switch statement inside that depends on the aDataType value being passed in. In our case this value is always NS_NATIVE_OPENGL_CONTEXT. And the code for that case is pretty simple:
case NS_NATIVE_OPENGL_CONTEXT: { MOZ_ASSERT(!GetParent()); return GetGLContext(); }This is the same for both ESR 78 and ESR 91, so to get to the bottom of this will require digging a bit deeper. But at least we're one step further along.
I've not made a lot of progress today, but perhaps that's to be expected. I'll continue following this lead up tomorrow.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
21 Apr 2024 : Day 223 #
I do love travelling by train. A big part of the reason for this is the freedom it gives me to do other things while I'm travelling. Cars and busses simply can't offer that same empowerment. But even then travelling by train isn't the same as sitting at a desk that's static relative to the motion of the surface of the earth. Debugging multiple phones simultaneously just isn't very convenient on a train.
And so now here I am, back home at my desk, and able to properly perform the debugging I attempted to do yesterday while hurtling between Birmingham and Cambridge at 120 km per hour. It also helps that I'm not feeling quite so exhausted this morning after a long day either.
The question I'm asking is, when using the browser, which of the routes I identified yesterday are followed when creating the GLContext object, since this is where the offscreen status is set and stored.
Using the debugger it's easy to find where this is happening on ESR 78:
Both backtraces have CompositorOGL::CreateContext() in them at frame 4 and 6 respectively, so that's presumably where the decision to go one way or the other is happening. In the ESR 78 executable nsWindow::GetGLContext() is being called from there, whilst in ESR 91 it's GLContextProviderEGL::CreateHeadless().
So the difference here appears to be that on ESR 91 the following condition is entered:
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
And so now here I am, back home at my desk, and able to properly perform the debugging I attempted to do yesterday while hurtling between Birmingham and Cambridge at 120 km per hour. It also helps that I'm not feeling quite so exhausted this morning after a long day either.
The question I'm asking is, when using the browser, which of the routes I identified yesterday are followed when creating the GLContext object, since this is where the offscreen status is set and stored.
Using the debugger it's easy to find where this is happening on ESR 78:
Thread 39 "Compositor" hit Breakpoint 1, mozilla::gl::GLContext:: GLContext (this=this@entry=0x7ea01118b0, flags=mozilla::gl:: CreateContextFlags::NONE, caps=..., sharedContext=sharedContext@entry=0x0, isOffscreen=false, useTLSIsCurrent=useTLSIsCurrent@entry=false) at gfx/gl/GLContext.cpp:274 274 GLContext::GLContext(CreateContextFlags flags, const SurfaceCaps& caps, (gdb) bt #0 mozilla::gl::GLContext::GLContext (this=this@entry=0x7ea01118b0, flags=mozilla::gl::CreateContextFlags::NONE, caps=..., sharedContext=sharedContext@entry=0x0, isOffscreen=false, useTLSIsCurrent=useTLSIsCurrent@entry=false) at gfx/gl/GLContext.cpp:274 #1 0x0000007fba607af0 in mozilla::gl::GLContextEGL::GLContextEGL ( this=0x7ea01118b0, egl=0x7ea0110db0, flags=<optimized out>, caps=..., isOffscreen=<optimized out>, config=0x0, surface=0x5555cc3980, context=0x7ea0004d80) at gfx/gl/GLContextProviderEGL.cpp:472 #2 0x0000007fba60f0d8 in mozilla::gl::GLContextProviderEGL:: CreateWrappingExisting (aContext=0x7ea0004d80, aSurface=0x5555cc3980, aDisplay=<optimized out>) at /home/flypig/Documents/Development/jolla/gecko-dev-project/gecko-dev/ obj-build-mer-qt-xr/dist/include/mozilla/cxxalloc.h:33 #3 0x0000007fbca9a388 in mozilla::embedlite::nsWindow::GetGLContext ( this=0x7f80945810) at mobile/sailfishos/embedshared/nsWindow.cpp:415 #4 0x0000007fba682718 in mozilla::layers::CompositorOGL::CreateContext ( this=0x7ea0003420) at gfx/layers/opengl/CompositorOGL.cpp:228 #5 0x0000007fba6a33a4 in mozilla::layers::CompositorOGL::Initialize ( this=0x7ea0003420, out_failureReason=0x7edb32d720) at gfx/layers/opengl/CompositorOGL.cpp:374 #6 0x0000007fba77aff4 in mozilla::layers::CompositorBridgeParent:: NewCompositor (this=this@entry=0x7f808cbf80, aBackendHints=...) at gfx/layers/ipc/CompositorBridgeParent.cpp:1534 #7 0x0000007fba784660 in mozilla::layers::CompositorBridgeParent:: InitializeLayerManager (this=this@entry=0x7f808cbf80, aBackendHints=...) at gfx/layers/ipc/CompositorBridgeParent.cpp:1491 #8 0x0000007fba7847a8 in mozilla::layers::CompositorBridgeParent:: AllocPLayerTransactionParent (this=this@entry=0x7f808cbf80, aBackendHints=..., aId=...) at gfx/layers/ipc/CompositorBridgeParent.cpp:1587 #9 0x0000007fbca81234 in mozilla::embedlite::EmbedLiteCompositorBridgeParent:: AllocPLayerTransactionParent (this=0x7f808cbf80, aBackendHints=..., aId=...) at mobile/sailfishos/embedthread/EmbedLiteCompositorBridgeParent.cpp:77 #10 0x0000007fba05f3f0 in mozilla::layers::PCompositorBridgeParent:: OnMessageReceived (this=0x7f808cbf80, msg__=...) at PCompositorBridgeParent.cpp:1391 [...] #26 0x0000007fb735989c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/ clone.S:78 (gdb)So this is the case of a call to CreateWrappingExisting():
already_AddRefed<GLContext> GLContextProviderEGL::CreateWrappingExisting( void* aContext, void* aSurface, void* aDisplay) { [...] RefPtr<GLContextEGL> gl = new GLContextEGL(egl, CreateContextFlags::NONE, caps, false, config, (EGLSurface)aSurface, (EGLContext)aContext); [...]On ESR 91 the situation is that we have a call to GLContextEGL::CreateEGLPBufferOffscreenContextImpl():
Thread 37 "Compositor" hit Breakpoint 1, mozilla::gl::GLContext:: GLContext (this=this@entry=0x7f6019aa80, desc=..., sharedContext=sharedContext@entry=0x0, useTLSIsCurrent=useTLSIsCurrent@entry=false) at gfx/gl/GLContext.cpp:283 283 GLContext::GLContext(const GLContextDesc& desc, GLContext* sharedContext, (gdb) bt #0 mozilla::gl::GLContext::GLContext (this=this@entry=0x7f6019aa80, desc=..., sharedContext=sharedContext@entry=0x0, useTLSIsCurrent=useTLSIsCurrent@entry=false) at gfx/gl/GLContext.cpp:283 #1 0x0000007ff28a9450 in mozilla::gl::GLContextEGL::GLContextEGL ( this=0x7f6019aa80, egl=std::shared_ptr<mozilla::gl::EglDisplay> (use count 5, weak count 2) = {...}, desc=..., config=0x5555ab3f60, surface=0x7f60004bb0, context=0x7f60004c30) at gfx/gl/GLContextProviderEGL.cpp:496 #2 0x0000007ff28cfd18 in mozilla::gl::GLContextEGL::CreateGLContext (egl=std:: shared_ptr<mozilla::gl::EglDisplay> (use count 5, weak count 2) = {...}, desc=..., config=<optimized out>, config@entry=0x5555ab3f60, surface=surface@entry=0x7f60004bb0, useGles=useGles@entry=true, out_failureId=out_failureId@entry=0x7fb179c1b8) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/cxxalloc.h:33 #3 0x0000007ff28d0858 in mozilla::gl::GLContextEGL:: CreateEGLPBufferOffscreenContextImpl ( egl=std::shared_ptr<mozilla::gl::EglDisplay> (use count 5, weak count 2) = {...}, desc=..., size=..., useGles=useGles@entry=true, out_failureId=out_failureId@entry=0x7fb179c1b8) at include/c++/8.3.0/ext/atomicity.h:96 #4 0x0000007ff28d0a6c in mozilla::gl::GLContextEGL:: CreateEGLPBufferOffscreenContext ( display=std::shared_ptr<mozilla::gl::EglDisplay> (use count 5, weak count 2) = {...}, desc=..., size=..., out_failureId=out_failureId@entry=0x7fb179c1b8) at include/c++/8.3.0/ext/atomicity.h:96 #5 0x0000007ff28d0ba0 in mozilla::gl::GLContextProviderEGL::CreateHeadless ( desc=..., out_failureId=out_failureId@entry=0x7fb179c1b8) at include/c++/8.3.0/ext/atomicity.h:96 #6 0x0000007ff293abc0 in mozilla::layers::CompositorOGL::CreateContext ( this=this@entry=0x7f60002ed0) at gfx/layers/opengl/CompositorOGL.cpp:254 #7 0x0000007ff29503b8 in mozilla::layers::CompositorOGL::Initialize ( this=0x7f60002ed0, out_failureReason=0x7fb179c510) at gfx/layers/opengl/CompositorOGL.cpp:397 #8 0x0000007ff2a66094 in mozilla::layers::CompositorBridgeParent:: NewCompositor (this=this@entry=0x7fc89abd50, aBackendHints=...) at gfx/layers/ipc/CompositorBridgeParent.cpp:1493 #9 0x0000007ff2a71110 in mozilla::layers::CompositorBridgeParent:: InitializeLayerManager (this=this@entry=0x7fc89abd50, aBackendHints=...) at gfx/layers/ipc/CompositorBridgeParent.cpp:1436 #10 0x0000007ff2a71240 in mozilla::layers::CompositorBridgeParent:: AllocPLayerTransactionParent (this=this@entry=0x7fc89abd50, aBackendHints=..., aId=...) at gfx/layers/ipc/CompositorBridgeParent.cpp:1546 #11 0x0000007ff4e07ca8 in mozilla::embedlite::EmbedLiteCompositorBridgeParent:: AllocPLayerTransactionParent (this=0x7fc89abd50, aBackendHints=..., aId=...) at mobile/sailfishos/embedthread/EmbedLiteCompositorBridgeParent.cpp:80 #12 0x0000007ff23fe52c in mozilla::layers::PCompositorBridgeParent:: OnMessageReceived (this=0x7fc89abd50, msg__=...) at PCompositorBridgeParent.cpp:1285 [...] #27 0x0000007fefba889c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/ clone.S:78 (gdb)These differences are bigger than expected; I'll need to realign them.
Both backtraces have CompositorOGL::CreateContext() in them at frame 4 and 6 respectively, so that's presumably where the decision to go one way or the other is happening. In the ESR 78 executable nsWindow::GetGLContext() is being called from there, whilst in ESR 91 it's GLContextProviderEGL::CreateHeadless().
So the difference here appears to be that on ESR 91 the following condition is entered:
// Allow to create offscreen GL context for main Layer Manager if (!context && gfxEnv::LayersPreferOffscreen()) { [...]On ESR 78 it's never reached. The reason becomes clear as I step through the code. The value of the widgetOpenGLContext variable that's collected from the widget is non-null, causing the method to exit early:
Thread 39 "Compositor" hit Breakpoint 1, mozilla::layers:: CompositorOGL::CreateContext (this=0x7ea0003420) at gfx/layers/opengl/CompositorOGL.cpp:223 223 already_AddRefed<mozilla::gl::GLContext> CompositorOGL::CreateContext() { (gdb) n 227 nsIWidget* widget = mWidget->RealWidget(); (gdb) p context $1 = {mRawPtr = 0x0} (gdb) n 228 void* widgetOpenGLContext = (gdb) p widget $2 = (nsIWidget *) 0x7f80c86f60 (gdb) n 230 if (widgetOpenGLContext) { (gdb) p widgetOpenGLContext $3 = (void *) 0x7ea0111800 (gdb)As on ESR 78, on ESR 91 the value of context is set to null, while widget has a non-null value. But unlike on ESR 78 the widgetOpenGLContext that's retrieved from widget is null:
Thread 37 "Compositor" hit Breakpoint 1, mozilla::layers:: CompositorOGL::CreateContext (this=this@entry=0x7ed8002ed0) at gfx/layers/opengl/CompositorOGL.cpp:227 227 already_AddRefed<mozilla::gl::GLContext> CompositorOGL::CreateContext() { (gdb) n 231 nsIWidget* widget = mWidget->RealWidget(); (gdb) p context $3 = {mRawPtr = 0x0} (gdb) n 232 void* widgetOpenGLContext = (gdb) p widget $4 = (nsIWidget *) 0x7fc8550d80 (gdb) n 234 if (widgetOpenGLContext) { (gdb) p widgetOpenGLContext $5 = (void *) 0x0 (gdb)As a consequence of this the method doesn't return early and the context is created via a different route as a result. So the question we have to answer is why the following is returning a value on ESR 78 but returning null on ESR 91:
void* widgetOpenGLContext = widget ? widget->GetNativeData(NS_NATIVE_OPENGL_CONTEXT) : nullptr;Finding out the answer to this will have to wait until tomorrow, when I have a bit more time in which I'll hopefully be able to get this sorted.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
20 Apr 2024 : Day 222 #
I'm travelling for work at the moment. Last night I stayed at an easyHotel. I'm a fan of the easyHotel concept: clean, cheap and well-located, but also tiny and basic. The fact that having a window is an optional paid-for extra tells you everything you need to know about the chain. There was just enough room around the edge of the bed to squeeze a human leg or two, but certainly no space for a desk. This presented me with a problem when it came to leaving my laptop running. I needed to have the build I started yesterday run overnight. Sending my laptop to sleep wasn't an option.
For a while I contemplated leaving it running on the bed next to me, but I had visions of waking up in the middle of night with my bed on fire or, worse, with my laptop shattered into little pieces having been thrown to the floor.
Eventually the very-helpful hotel staff found me a mini ironing board. After removing the cover it made a passable laptop rack. And so this morning I find myself in a coffee shop in the heart of Birmingham (which is itself the heart of the UK) with a freshly minted set of packages ready to try out on my phone.
The newly installed debug-symbol adorned binaries immediately deliver.
But, my time in this coffee shop is coming to an end, so that will be a task for my journey back to Cambridge this evening on the train.
[...]
I'm on the train again, heading back to Cambridge. My next task is to step through the CompositeToDefaultTarget() method on ESR 78. The question I want to know the answer to is whether it's making the call to context->OffscreenSize() or not.
Brace yourself for a lengthy debugging step-through. As I've typically come to say, feel free to skip this without losing the context.
Great! This is something with a very clear cause and which it should be possible to find a sensible fix for.
On ESR 91 the initial value of this flag is set to false:
This gives me something to dig into further tomorrow; but my train having arrived I'm going to have to leave it there for today. More tomorrow!
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
For a while I contemplated leaving it running on the bed next to me, but I had visions of waking up in the middle of night with my bed on fire or, worse, with my laptop shattered into little pieces having been thrown to the floor.
Eventually the very-helpful hotel staff found me a mini ironing board. After removing the cover it made a passable laptop rack. And so this morning I find myself in a coffee shop in the heart of Birmingham (which is itself the heart of the UK) with a freshly minted set of packages ready to try out on my phone.
The newly installed debug-symbol adorned binaries immediately deliver.
Thread 36 "Compositor" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f39d297d0 (LWP 31614)] 0x0000007ff28a8978 in mozilla::gl::GLScreenBuffer::Size (this=0x0) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:290 290 ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h: No such file or directory. (gdb) bt #0 0x0000007ff28a8978 in mozilla::gl::GLScreenBuffer::Size (this=0x0) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:290 #1 mozilla::gl::GLContext::OffscreenSize (this=this@entry=0x7ee019aa30) at gfx/gl/GLContext.cpp:2141 #2 0x0000007ff4e07264 in mozilla::embedlite::EmbedLiteCompositorBridgeParent:: CompositeToDefaultTarget (this=0x7fc8b77bc0, aId=...) at mobile/sailfishos/embedthread/EmbedLiteCompositorBridgeParent.cpp:156 #3 0x0000007ff2a578d8 in mozilla::layers::CompositorVsyncScheduler:: ForceComposeToTarget (this=0x7fc8ca6a10, aTarget=aTarget@entry=0x0, aRect=aRect@entry=0x0) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/layers/LayersTypes.h: 82 #4 0x0000007ff2a57934 in mozilla::layers::CompositorBridgeParent:: ResumeComposition (this=this@entry=0x7fc8b77bc0) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h:313 #5 0x0000007ff2a579c0 in mozilla::layers::CompositorBridgeParent:: ResumeCompositionAndResize (this=0x7fc8b77bc0, x=<optimized out>, y=<optimized out>, width=<optimized out>, height=<optimized out>) at gfx/layers/ipc/CompositorBridgeParent.cpp:794 #6 0x0000007ff2a5055c in mozilla::detail::RunnableMethodArguments<int, int, int, int>::applyImpl<mozilla::layers::CompositorBridgeParent, void (mozilla: :layers::CompositorBridgeParent::*)(int, int, int, int), StoreCopyPassByConstLRef<int>, StoreCopyPassByConstLRef<int>, StoreCopyPassByConstLRef<int>, StoreCopyPassByConstLRef<int>, 0ul, 1ul, 2ul, 3ul> (args=..., m=<optimized out>, o=<optimized out>) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsThreadUtils.h:1151 [...] #18 0x0000007fefba889c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/ clone.S:78 (gdb)The basis of the problem is that GLContext::OffscreenSize() is calling Size() on a value of GLScreenBuffer mScreen which is set to null:
const gfx::IntSize& GLContext::OffscreenSize() const { MOZ_ASSERT(IsOffscreen()); return mScreen->Size(); }Checking the diff, this GLContext::OffscreenSize() method was added by me as part of the recent shake-up of the offscreen render code. Here's the change I made in EmbedLiteCompositorBridgeParent.cpp that's making this call:
@@ -151,8 +153,7 @@ EmbedLiteCompositorBridgeParent::CompositeToDefaultTarget( VsyncId aId) if (context->IsOffscreen()) { MutexAutoLock lock(mRenderMutex); - if (context->GetSwapChain()->OffscreenSize() != mEGLSurfaceSize - && !context->GetSwapChain()->Resize(mEGLSurfaceSize)) { + if (context->OffscreenSize() != mEGLSurfaceSize && !context->ResizeOffscreen(mEGLSurfaceSize)) { return; } }As you can see CompositeToDefaultTarget() isn't new. It's the way its resizing the screen that's causing problems. Interestingly this change has actually reverted back to the code as it is in ESR 78. So that means I can step through the ESR 78 version to compare the two.
But, my time in this coffee shop is coming to an end, so that will be a task for my journey back to Cambridge this evening on the train.
[...]
I'm on the train again, heading back to Cambridge. My next task is to step through the CompositeToDefaultTarget() method on ESR 78. The question I want to know the answer to is whether it's making the call to context->OffscreenSize() or not.
Brace yourself for a lengthy debugging step-through. As I've typically come to say, feel free to skip this without losing the context.
Thread 40 "Compositor" hit Breakpoint 1, mozilla::embedlite:: EmbedLiteCompositorBridgeParent::CompositeToDefaultTarget ( this=0x7f80a77420, aId=...) at mobile/sailfishos/embedthread/EmbedLiteCompositorBridgeParent.cpp:137 137 { (gdb) n 138 const CompositorBridgeParent::LayerTreeState* state = CompositorBridgeParent::GetIndirectShadowTree(RootLayerTreeId()); (gdb) n 139 NS_ENSURE_TRUE(state && state->mLayerManager, ); (gdb) n 141 GLContext* context = static_cast<CompositorOGL*>( state->mLayerManager->GetCompositor())->gl(); (gdb) n 142 NS_ENSURE_TRUE(context, ); (gdb) n 143 if (!context->IsCurrent()) { (gdb) n 146 NS_ENSURE_TRUE(context->IsCurrent(), ); (gdb) p context->mIsOffscreen $1 = false (gdb) n 3566 /home/flypig/Documents/Development/jolla/gecko-dev-project/gecko-dev/ obj-build-mer-qt-xr/dist/include/GLContext.h: No such file or directory. (gdb) n 156 ScopedScissorRect Thread 37 "Compositor" hit Breakpoint 1, mozilla::embedlite:: EmbedLiteCompositorBridgeParent::CompositeToDefaultTarget ( this=0x7fc8a01040, aId=...) at mobile/sailfishos/embedthread/EmbedLiteCompositorBridgeParent.cpp:143 143 { (gdb) n 144 const CompositorBridgeParent::LayerTreeState* state = CompositorBridgeParent::GetIndirectShadowTree(RootLayerTreeId()); (gdb) n 145 NS_ENSURE_TRUE(state && state->mLayerManager, ); (gdb) n 147 GLContext* context = static_cast<CompositorOGL*>( state->mLayerManager->GetCompositor())->gl(); (gdb) n 148 NS_ENSURE_TRUE(context, ); (gdb) n 149 if (!context->IsCurrent()) { (gdb) n 152 NS_ENSURE_TRUE(context->IsCurrent(), ); (gdb) p context->mDesc.isOffscreen $2 = true (gdb) n 3598 ${PROJECT}/obj-build-mer-qt-xr/dist/include/GLContext.h: No such file or directory. (gdb) n 155 MutexAutoLock lock(mRenderMutex); (gdb) n 156 if (context->OffscreenSize() != mEGLSurfaceSize && !context->ResizeOffscreen(mEGLSurfaceSize)) { (gdb) n Thread 37 "Compositor" received signal SIGSEGV, Segmentation fault. 0x0000007ff28a8978 in mozilla::gl::GLScreenBuffer::Size (this=0x0) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:290 290 ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h: No such file or directory. (gdb)So, for once, this is just as expected. On ESR 78 context->IsOffscreen() is returning false (as it should do), whereas on ESR 91 it's returning true.
Great! This is something with a very clear cause and which it should be possible to find a sensible fix for.
On ESR 91 the initial value of this flag is set to false:
struct GLContextDesc final : public GLContextCreateDesc { bool isOffscreen = false; };The only place it gets set to true is in the GLContextEGL::CreateEGLPBufferOffscreenContextImpl() method:
auto fullDesc = GLContextDesc{desc}; fullDesc.isOffscreen = true;In contrast on ESR 78 it's set in the constructor:
GLContext::GLContext(CreateContextFlags flags, const SurfaceCaps& caps, GLContext* sharedContext, bool isOffscreen, bool useTLSIsCurrent) : mUseTLSIsCurrent(ShouldUseTLSIsCurrent(useTLSIsCurrent)), mIsOffscreen(isOffscreen), [...]This constructor is called in two places. In GLContextEGL():
already_AddRefed<GLContextEGL> GLContextEGL::CreateGLContext( GLLibraryEGL* const egl, CreateContextFlags flags, const SurfaceCaps& caps, bool isOffscreen, EGLConfig config, EGLSurface surface, const bool useGles, nsACString* const out_failureId) { [...] RefPtr<GLContextEGL> glContext = new GLContextEGL(egl, flags, caps, isOffscreen, config, surface, context); [...]And in CreateWrappingExisting():
already_AddRefed<GLContext> GLContextProviderEGL::CreateWrappingExisting( void* aContext, void* aSurface, void* aDisplay) { [...] RefPtr<GLContextEGL> gl = new GLContextEGL(egl, CreateContextFlags::NONE, caps, false, config, (EGLSurface)aSurface, (EGLContext)aContext); [...]The latter always ends up setting it to false as you can see. The former can also come from that route, but could also be arrived at via different routes, all of them in GLContextProviderEGL.cpp. It could be called from GLContextEGLFactory::CreateImpl() where it's always set to false:
already_AddRefed<GLContext> GLContextEGLFactory::CreateImpl( EGLNativeWindowType aWindow, bool aWebRender, bool aUseGles) { [...] RefPtr<GLContextEGL> gl = GLContextEGL::CreateGLContext( egl, flags, caps, false, config, surface, aUseGles, &discardFailureId); [...]Or it could be called from GLContextEGL::CreateEGLPBufferOffscreenContextImpl() where it's always set to true:
/*static*/ already_AddRefed<GLContextEGL> GLContextEGL::CreateEGLPBufferOffscreenContextImpl( CreateContextFlags flags, const mozilla::gfx::IntSize& size, const SurfaceCaps& minCaps, bool aUseGles, nsACString* const out_failureId) { [...] RefPtr<GLContextEGL> gl = GLContextEGL::CreateGLContext( egl, flags, configCaps, true, config, surface, aUseGles, out_failureId); [...]The last of these is the only place where it can be set to true. So that's the one we have to focus on. Either there needs to be some more refined logic in there, or it shouldn't be going down that route at all.
This gives me something to dig into further tomorrow; but my train having arrived I'm going to have to leave it there for today. More tomorrow!
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
19 Apr 2024 : Day 221 #
This morning I started looking at why the sailfish-browser has been broken by my changes to the offscreen rendering process. The approach I'm using is to look at the intersection of:
The first place I started looking was the CompositorOGL::CreateContext() method. This is the last method that gets called by sailfish-browser before the behaviour diverges from what I'd expect it to (as we saw yesterday). Plus I made very specific changes to this, as you can see from this portion of the diff:
Now when I transfer it over to my phone the browser crashes with a segmentation fault as soon as the page is loaded. I feel like I've been here before! Because it takes so long to transfer the libxul.so output from the partial build over to my phone I stripped it of debugging symbols before the transfer. This makes a huge difference to its size of the library (stripping out around 3 GiB of data to leave just 100 MiB of code). Unfortunately that means I can't now find out where or why the crash is occurring.
Even worse, performing another partial rebuild doesn't seem to restore the debug symbols. So the only thing for me to do is a full rebuild, which is an overnight job.
I'll do that, but in the meantime I'll continue browsing through the diff in case a reason jumps out at me, or there's something obvious that needs changing.
Things will get a bit messy the more I change, but the beauty of these diaries is that I'll be keeping a full record. So it should all be clear what gets changed and why.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
- Parts of the code I've changed (as shown by the diff generated by git).
- Parts of the code that are touched by sailfish-browser (as indicated by gdb backtraces).
The first place I started looking was the CompositorOGL::CreateContext() method. This is the last method that gets called by sailfish-browser before the behaviour diverges from what I'd expect it to (as we saw yesterday). Plus I made very specific changes to this, as you can see from this portion of the diff:
$ git diff -- gfx/layers/opengl/CompositorOGL.cpp diff --git a/gfx/layers/opengl/CompositorOGL.cpp b/gfx/layers/opengl/ CompositorOGL.cpp index 8a423b840dd5..11105c77c43b 100644 --- a/gfx/layers/opengl/CompositorOGL.cpp +++ b/gfx/layers/opengl/CompositorOGL.cpp @@ -246,12 +246,14 @@ already_AddRefed<mozilla::gl::GLContext> CompositorOGL:: CreateContext() { // Allow to create offscreen GL context for main Layer Manager if (!context && gfxEnv::LayersPreferOffscreen()) { + SurfaceCaps caps = SurfaceCaps::ForRGB(); + caps.preserve = false; + caps.bpp16 = gfxVars::OffscreenFormat() == SurfaceFormat::R5G6B5_UINT16; + nsCString discardFailureId; - context = GLContextProvider::CreateHeadless( - {CreateContextFlags::REQUIRE_COMPAT_PROFILE}, &discardFailureId); - if (!context->CreateOffscreenDefaultFb(mSurfaceSize)) { - context = nullptr; - } + context = GLContextProvider::CreateOffscreen( + mSurfaceSize, caps, CreateContextFlags::REQUIRE_COMPAT_PROFILE, + &discardFailureId); } [...]I've reversed this change and performed a partial build.
Now when I transfer it over to my phone the browser crashes with a segmentation fault as soon as the page is loaded. I feel like I've been here before! Because it takes so long to transfer the libxul.so output from the partial build over to my phone I stripped it of debugging symbols before the transfer. This makes a huge difference to its size of the library (stripping out around 3 GiB of data to leave just 100 MiB of code). Unfortunately that means I can't now find out where or why the crash is occurring.
Even worse, performing another partial rebuild doesn't seem to restore the debug symbols. So the only thing for me to do is a full rebuild, which is an overnight job.
I'll do that, but in the meantime I'll continue browsing through the diff in case a reason jumps out at me, or there's something obvious that needs changing.
Things will get a bit messy the more I change, but the beauty of these diaries is that I'll be keeping a full record. So it should all be clear what gets changed and why.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
18 Apr 2024 : Day 220 #
My first task today is to figure out when and how often DeclarativeWebContainer::clearWindowSurface() gets called on ESR 78. As I discovered yesterday it's called only once on the latest ESR 91 build and that's the same build where things are broken now even for the sailfish-browser. So my hope is that this will lead to some insight as to why it's broken.
My suspicion is that it should be called often. In fact, each time the screen is updated. If that's the case then it will give me a clear issue to track down: why is it not being called every update on ESR 91.
For context, recall that DeclarativeWebContainer::clearWindowSurface() is not part of the gecko code itself, but rather part of the sailfish-browser code.
So, I'm firing up the debugger to find out.
Here are some of the calls and signals which potentially could result in a call to clearWindowSurface(). Not all of these are actually happening during the runs I've been testing:
What happens is that there's a break in the call stack. While the following aren't called:
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
My suspicion is that it should be called often. In fact, each time the screen is updated. If that's the case then it will give me a clear issue to track down: why is it not being called every update on ESR 91.
For context, recall that DeclarativeWebContainer::clearWindowSurface() is not part of the gecko code itself, but rather part of the sailfish-browser code.
So, I'm firing up the debugger to find out.
$ gdb sailfish-browser [...] (gdb) b DeclarativeWebContainer::clearWindowSurface Breakpoint 1 at 0x3c750: file ../core/declarativewebcontainer.cpp, line 681. (gdb) r Starting program: /usr/bin/sailfish-browser [...] Created LOG for EmbedLite [...] Thread 1 "sailfish-browse" hit Breakpoint 1, DeclarativeWebContainer:: clearWindowSurface (this=0x55559bbc60) at ../core/ declarativewebcontainer.cpp:681 681 m_context->makeCurrent(this); (gdb) bt #0 DeclarativeWebContainer::clearWindowSurface (this=0x55559bbc60) at ../core/ declarativewebcontainer.cpp:681 #1 0x0000005555594e68 in DeclarativeWebContainer::exposeEvent ( this=0x55559bbc60) at ../core/declarativewebcontainer.cpp:815 #2 0x0000007fb83603dc in QWindow::event(QEvent*) () from /usr/lib64/ libQt5Gui.so.5 #3 0x0000005555594b8c in DeclarativeWebContainer::event (this=0x55559bbc60, event=0x7fffffecf8) at ../core/declarativewebcontainer.cpp:770 #4 0x0000007fb7941144 in QCoreApplication::notify(QObject*, QEvent*) () from / usr/lib64/libQt5Core.so.5 #5 0x0000007fb79412e8 in QCoreApplication::notifyInternal2(QObject*, QEvent*) ( ) from /usr/lib64/libQt5Core.so.5 #6 0x0000007fb8356488 in QGuiApplicationPrivate::processExposeEvent( QWindowSystemInterfacePrivate::ExposeEvent*) () from /usr/lib64/ libQt5Gui.so.5 #7 0x0000007fb83570b4 in QGuiApplicationPrivate::processWindowSystemEvent( QWindowSystemInterfacePrivate::WindowSystemEvent*) () from /usr/lib64/libQt5Gui.so.5 #8 0x0000007fb83356e4 in QWindowSystemInterface::sendWindowSystemEvents( QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib64/libQt5Gui.so.5 #9 0x0000007faf65ac4c in ?? () from /usr/lib64/libQt5WaylandClient.so.5 #10 0x0000007fb70dfd34 in g_main_context_dispatch () from /usr/lib64/ libglib-2.0.so.0 #11 0x0000007fb70dffa0 in ?? () from /usr/lib64/libglib-2.0.so.0 #12 0x0000007fb70e0034 in g_main_context_iteration () from /usr/lib64/ libglib-2.0.so.0 #13 0x0000007fb7993a90 in QEventDispatcherGlib::processEvents(QFlags<QEventLoop: :ProcessEventsFlag>) () from /usr/lib64/libQt5Core.so.5 #14 0x0000007fb793f608 in QEventLoop::exec(QFlags<QEventLoop:: ProcessEventsFlag>) () from /usr/lib64/libQt5Core.so.5 #15 0x0000007fb79471d4 in QCoreApplication::exec() () from /usr/lib64/ libQt5Core.so.5 #16 0x000000555557b360 in main (argc=<optimized out>, argv=<optimized out>) at main.cpp:201 (gdb) c Continuing. [...] Created LOG for EmbedLiteLayerManager [...] Thread 38 "Compositor" hit Breakpoint 1, DeclarativeWebContainer:: clearWindowSurface (this=0x55559bbc60) at ../core/ declarativewebcontainer.cpp:681 681 m_context->makeCurrent(this); (gdb) bt #0 DeclarativeWebContainer::clearWindowSurface (this=0x55559bbc60) at ../core/ declarativewebcontainer.cpp:681 #1 0x0000005555591910 in DeclarativeWebContainer::createGLContext ( this=0x55559bbc60) at ../core/declarativewebcontainer.cpp:1193 #2 0x0000007fb796b204 in QMetaObject::activate(QObject*, int, int, void**) () from /usr/lib64/libQt5Core.so.5 #3 0x0000007fbfb9f5c8 in QMozWindowPrivate::RequestGLContext ( this=0x5555b81290, context=@0x7edb46b2f0: 0x0, surface=@0x7edb46b2f8: 0x0, display=@0x7edb46b300: 0x0) at qmozwindow_p.cpp:133 #4 0x0000007fbca9a374 in mozilla::embedlite::nsWindow::GetGLContext ( this=0x7f80ce9e90) at mobile/sailfishos/embedshared/nsWindow.cpp:408 #5 0x0000007fba682718 in mozilla::layers::CompositorOGL::CreateContext ( this=0x7e98003420) at gfx/layers/opengl/CompositorOGL.cpp:228 #6 0x0000007fba6a33a4 in mozilla::layers::CompositorOGL::Initialize ( this=0x7e98003420, out_failureReason=0x7edb46b720) at gfx/layers/opengl/CompositorOGL.cpp:374 #7 0x0000007fba77aff4 in mozilla::layers::CompositorBridgeParent:: NewCompositor (this=this@entry=0x7f80ce94a0, aBackendHints=...) at gfx/layers/ipc/CompositorBridgeParent.cpp:1534 #8 0x0000007fba784660 in mozilla::layers::CompositorBridgeParent:: InitializeLayerManager (this=this@entry=0x7f80ce94a0, aBackendHints=...) at gfx/layers/ipc/CompositorBridgeParent.cpp:1491 #9 0x0000007fba7847a8 in mozilla::layers::CompositorBridgeParent:: AllocPLayerTransactionParent (this=this@entry=0x7f80ce94a0, aBackendHints=..., aId=...) at gfx/layers/ipc/CompositorBridgeParent.cpp:1587 #10 0x0000007fbca81234 in mozilla::embedlite::EmbedLiteCompositorBridgeParent:: AllocPLayerTransactionParent (this=0x7f80ce94a0, aBackendHints=..., aId=...) at mobile/sailfishos/embedthread/EmbedLiteCompositorBridgeParent.cpp:77 #11 0x0000007fba05f3f0 in mozilla::layers::PCompositorBridgeParent:: OnMessageReceived (this=0x7f80ce94a0, msg__=...) at PCompositorBridgeParent.cpp:1391 [...] #27 0x0000007fb735989c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/ clone.S:78 (gdb) c Continuing. [...] Thread 38 "Compositor" hit Breakpoint 1, DeclarativeWebContainer:: clearWindowSurface (this=0x55559bbc60) at ../core/ declarativewebcontainer.cpp:681 681 m_context->makeCurrent(this); (gdb) bt #0 DeclarativeWebContainer::clearWindowSurface (this=0x55559bbc60) at ../core/ declarativewebcontainer.cpp:681 #1 0x0000005555591830 in DeclarativeWebContainer::clearWindowSurfaceTask ( data=0x55559bbc60) at ../core/declarativewebcontainer.cpp:671 #2 0x0000007fbca81e10 in details::CallFunction<0ul, void (*)(void*), void*> ( arg=..., function=<optimized out>) at ipc/chromium/src/base/task.h:52 #3 DispatchTupleToFunction<void (*)(void*), void*> (arg=..., function=<optimized out>) at ipc/chromium/src/base/task.h:53 #4 RunnableFunction<void (*)(void*), mozilla::Tuple<void*> >::Run ( this=<optimized out>) at ipc/chromium/src/base/task.h:324 #5 0x0000007fb9bfe4e4 in nsThread::ProcessNextEvent (aResult=0x7edb46be77, aMayWait=<optimized out>, this=0x7f80cc0580) at xpcom/threads/nsThread.cpp:1211 [...] #15 0x0000007fb735989c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/ clone.S:78 (gdb) c Continuing. [...]It turns out I'm wrong about this DeclarativeWebContainer::clearWindowSurface() method. It gets called three times as the page is loading and that's it. Checking the code, it's triggered by signals that come from qtmozembed. These are, in order:
- DeclarativeWebContainer::exposeEvent().
- DeclarativeWebContainer::createGLContext() triggered by the QMozWindow::requestGLContext() signal.
- WebPageFactory::aboutToInitialize() on page creation.
Here are some of the calls and signals which potentially could result in a call to clearWindowSurface(). Not all of these are actually happening during the runs I've been testing:
DeclarativeWebContainer::exposeEvent() connect(m_mozWindow.data(), &QMozWindow::requestGLContext, this, &DeclarativeWebContainer::createGLContext, Qt::DirectConnection); connect(m_mozWindow.data(), &QMozWindow::compositorCreated, this, &DeclarativeWebContainer::postClearWindowSurfaceTask); connect(m_model.data(), &DeclarativeTabModel::tabClosed, this, &DeclarativeWebContainer::releasePage); DeclarativeWebContainer::clearSurface()On ESR 91 only one of these is happening: the first one triggered by a call to exposeEvent(). So what of the others? Placing breakpoints in various places I'm able to identify that CompositorOGL::CreateContext() is called on ESR 91. This appears in the call stack of the second breakpoint hit on ESR 78, so this should in theory also be calling clearWindowSurface() on ESR 91, but it's not.
What happens is that there's a break in the call stack. While the following aren't called:
#0 clearWindowSurface() at ../core/declarativewebcontainer.cpp:681 #1 createGLContext() at ../core/declarativewebcontainer.cpp:1193 #2 QMetaObject::activate() /usr/lib64/libQt5Core.so.5 #3 QMozWindowPrivate::RequestGLContext() at qmozwindow_p.cpp:133 #4 GetGLContext() mobile/sailfishos/embedshared/nsWindow.cpp:408The following are called. These are all prior to the others in the callstack on ESR 78, which means that the break is happening in the CompositorOGL::CreateContext() method.
#5 CompositorOGL::CreateContext() gfx/layers/opengl/CompositorOGL.cpp:228 #6 CompositorOGL::Initialize() gfx/layers/opengl/CompositorOGL.cpp:374 #7 NewCompositor() gfx/layers/ipc/CompositorBridgeParent.cpp:1534 #8 InitializeLayerManager() gfx/layers/ipc/CompositorBridgeParent.cpp:1491So tomorrow I'll need to take a look at the CompositorOGL::CreateContext() code. Based on what I've seen today, something different is likely to be happening in there compared to what was happening with the working version. By looking at the diff and seeing what's changed, it should be possible to figure out what. More on this tomorrow.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
17 Apr 2024 : Day 219 #
Today has been a bit of a disheartening day from a Gecko perspective. So disheartening in fact that I considered pausing this blog, taking some time off and then trying to fix it in private to avoid having to admit how badly things are going right now.
But it's important to show the downs as well as the ups. Software development can be a messy business at times, things rarely go to plan and even when they do there's still often an awful lot of angst and frustration preceding the enjoyment of getting something working.
So here it is, warts and all.
Overnight I rebuilt the packages with the new patches installed. Running the WebView showed no changes in the output: still a white screen, no page rendering or coloured backgrounds. I can live with that, it's not what I wanted but it's also no worse than before.
So I decided to head off in the direction that was set last Thursday when I laid plans to check the implementation that happens between the DeclarativeWebContainer::clearWindowSurface() method and the CompositorOGL::EndFrame() method. This direction is in response to the useful discussion in last week's Sailfish Community Meeting.
First up I wanted to establish what was happening on the window side, so starting at clearWindowSurface() I added some code first to change the colour used to clear the texture from white to green.
This didn't affect the rendering, which remains white. Okay, so far so normal. That's a little unexpected, but nevertheless adds new and useful information.
I then added some code to read off the colour at the centre of the texture. This is pretty simple code and, as before, I added some debug output lines so we can find out what's going wrong. Here's what the method looks like now:
Executing this I was surprised to find that there was no new output from this. So I tried to set a breakpoint on it, but gdb claims it doesn't exist in the binary. A bit of reflection made me realise why: this is code in sailfish-browser, which forms part of the browser, not the WebView.
So for the WebView I'll need to find the similar equivalent call. Before trying to find out where the correct call lives, I thought I'd first see what happens when I run the browser instead of the WebView. Will gdb have more luck with that?
And this is where things start to go wrong. When running with the browser the method is called once. The screen turns green. But there's no other rendering. The fact the screen goes green is good. The fact there's no other rendering is bad. Very bad.
It means that over the last few weeks while I've been trying to fix the WebView render pipeline, I've been doing damage to the browser render pipeline in the process. I honestly thought they didn't interact at this level, but it turns out I was wrong.
So now I have both a broken WebView and a broken browser to fix. Grrrr.
While I was initially quite downhearted about this, on reflection it's not quite as bad as it sounds. The working browser version is still safely stored in the repository, so if necessary I can just revert back. But I've also got all of the changes on top of this easily visible using git locally on my system. So I can at least now work through the changes to see whether I can figure out what's responsible.
I'm not going to make more progress on this tonight, so I'll have to return to it tomorrow to try to understand what I've broken.
So very frustrating. But that's software development for you.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
But it's important to show the downs as well as the ups. Software development can be a messy business at times, things rarely go to plan and even when they do there's still often an awful lot of angst and frustration preceding the enjoyment of getting something working.
So here it is, warts and all.
Overnight I rebuilt the packages with the new patches installed. Running the WebView showed no changes in the output: still a white screen, no page rendering or coloured backgrounds. I can live with that, it's not what I wanted but it's also no worse than before.
So I decided to head off in the direction that was set last Thursday when I laid plans to check the implementation that happens between the DeclarativeWebContainer::clearWindowSurface() method and the CompositorOGL::EndFrame() method. This direction is in response to the useful discussion in last week's Sailfish Community Meeting.
First up I wanted to establish what was happening on the window side, so starting at clearWindowSurface() I added some code first to change the colour used to clear the texture from white to green.
This didn't affect the rendering, which remains white. Okay, so far so normal. That's a little unexpected, but nevertheless adds new and useful information.
I then added some code to read off the colour at the centre of the texture. This is pretty simple code and, as before, I added some debug output lines so we can find out what's going wrong. Here's what the method looks like now:
void DeclarativeWebContainer::clearWindowSurface() { Q_ASSERT(m_context); // The GL context should always be used from the same thread in which it was created. Q_ASSERT(m_context->thread() == QThread::currentThread()); m_context->makeCurrent(this); QOpenGLFunctions_ES2* funcs = m_context->versionFunctions<QOpenGLFunctions_ES2>(); Q_ASSERT(funcs); funcs->glClearColor(0.0, 1.0, 0.0, 0.0); funcs->glClear(GL_COLOR_BUFFER_BIT); m_context->swapBuffers(this); QSize screenSize = QGuiApplication::primaryScreen()->size(); size_t bufferSize = screenSize.width() * screenSize.height() * 4; uint8_t* buf = static_cast<uint8_t*>(calloc(sizeof(uint8_t), bufferSize)); funcs->glReadPixels(0, 0, screenSize.width(), screenSize.height(), GL_RGBA, GL_UNSIGNED_BYTE, buf); int xpos = screenSize.width() / 2; int ypos = screenSize.height() / 2; int pos = xpos * ypos * 4; volatile char red = buf[pos]; volatile char green = buf[pos + 1]; volatile char blue = buf[pos + 2]; volatile char alpha = buf[pos + 3]; printf("Colour: (%d, %d, %d, %d)\n", red, green, blue, alpha); free(buf); }Everything after the call to swapBuffers() is newly added.
Executing this I was surprised to find that there was no new output from this. So I tried to set a breakpoint on it, but gdb claims it doesn't exist in the binary. A bit of reflection made me realise why: this is code in sailfish-browser, which forms part of the browser, not the WebView.
So for the WebView I'll need to find the similar equivalent call. Before trying to find out where the correct call lives, I thought I'd first see what happens when I run the browser instead of the WebView. Will gdb have more luck with that?
And this is where things start to go wrong. When running with the browser the method is called once. The screen turns green. But there's no other rendering. The fact the screen goes green is good. The fact there's no other rendering is bad. Very bad.
It means that over the last few weeks while I've been trying to fix the WebView render pipeline, I've been doing damage to the browser render pipeline in the process. I honestly thought they didn't interact at this level, but it turns out I was wrong.
So now I have both a broken WebView and a broken browser to fix. Grrrr.
While I was initially quite downhearted about this, on reflection it's not quite as bad as it sounds. The working browser version is still safely stored in the repository, so if necessary I can just revert back. But I've also got all of the changes on top of this easily visible using git locally on my system. So I can at least now work through the changes to see whether I can figure out what's responsible.
I'm not going to make more progress on this tonight, so I'll have to return to it tomorrow to try to understand what I've broken.
So very frustrating. But that's software development for you.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
16 Apr 2024 : Day 218 #
I've been continuing my patch audit today. It remains a bit of a lengthy and laborious process, but I have at least managed to get through them all.
Until now I've been keeping track of patches that have been applied, patches that aren't needed — either because changes have become redundant or because the code has changed so much that they're just no longer applicable — and patches that haven't yet been applied.
As I worked through today I found a new category to log: patches that haven't been applied but really should be. For example there are patches to ensure libcontentaction is used for handling URLs, or patches for managing extensions better on mobile. These are changes that won't necessarily have an immediately obvious effect on whether or not the browser runs, but will have functionality or efficiency impacts that need to be included.
The list of all such patches is longer than I was expecting:
0056, 0057, 0058, 0059, 0060, 0061, 0062, 0063, 0067, 0068, 0069, 0074, 0077, 0078, 0079, 0080, 0081, 0086.
You might well ask, if I know for sure that these should be applied, why didn't I just apply them? That's because I don't want to muddy the waters with them while I focus on fixing the offscreen render pipeline. If I applied these patches now, not only would it make debugging harder, it would also make the partitioning of code changes into clean patches harder as well. What I am sure about, however, is that none of these are affecting the render pipeline problem.
On top of these the following patches haven't been applied, but potentially should be. I say potentially because these are patches which either probably should be applied but I've not had time to properly decide yet, or where it's possible the reason for the patch was fixed upstream, but it's not entirely clear yet.
For example this includes patches for integrating with FFmpeg version 5.0 and above, patches to fix bugs or fixes for memory leaks. When I checked these they definitely hadn't yet been applied, but maybe they can safely be skipped. I should go through all of these more carefully at some point. Here's the list:
0008, 0012, 0014, 0041, 0042, 0043, 0044, 0045, 0046, 0049, 0050, 0051, 0052, 0064, 0066, 0073, 0076, 0082, 0083, 0084, 0085, 0087, 0089, 0090, 0091, 0094, 0095, 0096, 0097, 0099.
Then there are the patches that are not needed. I didn't find any more of these, so the list remains the same as yesterday with the following:
0004, 0005, 0013, 0035.
Finally the patches that have already been applied:
0001, 0002, 0003, 0006, 0007, 0009, 0010, 0011, 0015, 0016, 0017, 0018, 0019, 0020, 0021, 0022, 0023, 0024, 0025, 0026, 0027, 0028, 0029, 0030, 0031, 0032, 0033, 0034, 0036, 0037, 0038, 0039, 0040, 0047, 0048, 0053, 0054, 0055, 0065, 0070, 0071, 0072, 0075, 0088, 0092, 0093, 0099.
That's 47 patches in total; just under half. Plus the four that won't get applied at all sums to 51 patches dealt with, leaving 48 that still need dealing with; just under half.
Of those patches that have been applied, the following are the ones that hadn't been applied but looked potentially relevant to the render pipeline issue I'm tryig to fix. So I went ahead and applied all of these just in case:
0053, 0054, 0055, 0065, 0075.
My suspicion is that none of these will be important enough to fix the issue, but all contributions are helpful. The next step is to build myself some new packages, install them and see where we are.
Then, tomorrow, I can finally move on to the task that was discussed at the community meeting last Thursday. I'm going to start testing the render status at points in between DeclarativeWebContainer::clearWindowSurface() and CompositorOGL::EndFrame(). This will help narrow down where the working render turns into a broken render. That'll leave me in a much better position to focus on why.
I'm just starting the build now.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
Until now I've been keeping track of patches that have been applied, patches that aren't needed — either because changes have become redundant or because the code has changed so much that they're just no longer applicable — and patches that haven't yet been applied.
As I worked through today I found a new category to log: patches that haven't been applied but really should be. For example there are patches to ensure libcontentaction is used for handling URLs, or patches for managing extensions better on mobile. These are changes that won't necessarily have an immediately obvious effect on whether or not the browser runs, but will have functionality or efficiency impacts that need to be included.
The list of all such patches is longer than I was expecting:
0056, 0057, 0058, 0059, 0060, 0061, 0062, 0063, 0067, 0068, 0069, 0074, 0077, 0078, 0079, 0080, 0081, 0086.
You might well ask, if I know for sure that these should be applied, why didn't I just apply them? That's because I don't want to muddy the waters with them while I focus on fixing the offscreen render pipeline. If I applied these patches now, not only would it make debugging harder, it would also make the partitioning of code changes into clean patches harder as well. What I am sure about, however, is that none of these are affecting the render pipeline problem.
On top of these the following patches haven't been applied, but potentially should be. I say potentially because these are patches which either probably should be applied but I've not had time to properly decide yet, or where it's possible the reason for the patch was fixed upstream, but it's not entirely clear yet.
For example this includes patches for integrating with FFmpeg version 5.0 and above, patches to fix bugs or fixes for memory leaks. When I checked these they definitely hadn't yet been applied, but maybe they can safely be skipped. I should go through all of these more carefully at some point. Here's the list:
0008, 0012, 0014, 0041, 0042, 0043, 0044, 0045, 0046, 0049, 0050, 0051, 0052, 0064, 0066, 0073, 0076, 0082, 0083, 0084, 0085, 0087, 0089, 0090, 0091, 0094, 0095, 0096, 0097, 0099.
Then there are the patches that are not needed. I didn't find any more of these, so the list remains the same as yesterday with the following:
0004, 0005, 0013, 0035.
Finally the patches that have already been applied:
0001, 0002, 0003, 0006, 0007, 0009, 0010, 0011, 0015, 0016, 0017, 0018, 0019, 0020, 0021, 0022, 0023, 0024, 0025, 0026, 0027, 0028, 0029, 0030, 0031, 0032, 0033, 0034, 0036, 0037, 0038, 0039, 0040, 0047, 0048, 0053, 0054, 0055, 0065, 0070, 0071, 0072, 0075, 0088, 0092, 0093, 0099.
That's 47 patches in total; just under half. Plus the four that won't get applied at all sums to 51 patches dealt with, leaving 48 that still need dealing with; just under half.
Of those patches that have been applied, the following are the ones that hadn't been applied but looked potentially relevant to the render pipeline issue I'm tryig to fix. So I went ahead and applied all of these just in case:
0053, 0054, 0055, 0065, 0075.
My suspicion is that none of these will be important enough to fix the issue, but all contributions are helpful. The next step is to build myself some new packages, install them and see where we are.
Then, tomorrow, I can finally move on to the task that was discussed at the community meeting last Thursday. I'm going to start testing the render status at points in between DeclarativeWebContainer::clearWindowSurface() and CompositorOGL::EndFrame(). This will help narrow down where the working render turns into a broken render. That'll leave me in a much better position to focus on why.
I'm just starting the build now.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
15 Apr 2024 : Day 217 #
After re-reading patch 0053 yesterday, today I've been reintegrating the TextureImageEGL class back into the codebase. For context, patch 0053 makes a very small change to a single file, the TextureImageEGL.h file. But when I tried to check whether the patch had been applied or not I discovered the file it's supposed to apply to has been removed entirely from the ESR 91 codebase entirely.
The upstream changeset that removed the file mentions that the class doesn't add anything over and above what BasicTextureImage already offers. When I compared the two classes myself I came to pretty much the same conclusion: there really isn't much that's different between them. The key difference seems to be that TextureImageEGL allows for a wide variety of surface formats, whereas BasicTextureImage supports only RGBA format. There's also a slight difference in that TextureImageEGL seems to be more aggressive in resizing the texture.
Neither of these strike me as significant enough to prevent rendering, but you never know. So I've copied over the two relevant files — TextureImageEGL.h and TextureImageEGL.cpp — and integrated them into the build system. I also had to adjust the code in a few places in order to support the new EglDisplay class and some slight differences in the overridable parent method signatures that necessitated altering the signatures in the class.
Since then I've been trying to perform a partial rebuild of the code, trying to avoid having to do a full rebuild. Previously I would have run a full build overnight, but I've been getting increasingly confident with the way the build system works. Consequently I persuaded mach to rebuild the configuration based on the advice it offered up:
After the build I went through the usual process of transferring and installing the new library to my phone. Sadly I'm still left with a blank screen, so this wasn't enough to get the render working. That may mean these changes can safely reverted, but I'll leave them as they are for now and aim to come back to that later.
That's because it should be easier to remove redundant code once things are working than to guess exactly which changes are needed to get things to work. Or at least, that's my rationale.
This leaves me having applied another patch today (patch 0053), so that we now have the following applied:
0001, 0002, 0003, 0006, 0007, 0009, 0010, 0011, 0015, 0016, 0017, 0018, 0019, 0020, 0021, 0022, 0023, 0024, 0025, 0026, 0027, 0028, 0029, 0030, 0031, 0032, 0033, 0034, 0036, 0037. 0038, 0039, 0040, 0047, 0048, 0053
The following remain uneeded:
0004, 0005, 0013, 0035,
While the following haven't yet been applied:
0008, 0012, 0014, 0041, 0042, 0043, 0044, 0045, 0046, 0049, 0050, 0051, 0052,
Tomorrow I'll continue working through the patches. It's slow going, but this interlude with TextureImageEGL serves to highlight that there are still relevant patches still to be applied, which means it's worth checking. But I also want to reiterate that I've not forgotten the discussion from the Sailfish Community Meeting last Thursday. I still have some work to do following the advice provided there.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
The upstream changeset that removed the file mentions that the class doesn't add anything over and above what BasicTextureImage already offers. When I compared the two classes myself I came to pretty much the same conclusion: there really isn't much that's different between them. The key difference seems to be that TextureImageEGL allows for a wide variety of surface formats, whereas BasicTextureImage supports only RGBA format. There's also a slight difference in that TextureImageEGL seems to be more aggressive in resizing the texture.
Neither of these strike me as significant enough to prevent rendering, but you never know. So I've copied over the two relevant files — TextureImageEGL.h and TextureImageEGL.cpp — and integrated them into the build system. I also had to adjust the code in a few places in order to support the new EglDisplay class and some slight differences in the overridable parent method signatures that necessitated altering the signatures in the class.
Since then I've been trying to perform a partial rebuild of the code, trying to avoid having to do a full rebuild. Previously I would have run a full build overnight, but I've been getting increasingly confident with the way the build system works. Consequently I persuaded mach to rebuild the configuration based on the advice it offered up:
$ sfdk engine exec $ sb2 -t SailfishOS-esr78-aarch64.default $ source `pwd`/obj-build-mer-qt-xr/rpm-shared.env $ make -j1 -C obj-build-mer-qt-xr/gfx/gl/ make: Entering directory '${PROJECT}/obj-build-mer-qt-xr/gfx/gl' ${PROJECT}/gecko-dev/config/rules.mk:338: *** Build configuration changed. Build with |mach build| or run |mach build-backend| to regenerate build config. Stop. make: Leaving directory '${PROJECT}/obj-build-mer-qt-xr/gfx/gl'Following that advice seems to give good results:
$ ./gecko-dev/mach build-backend 0:05.01 ${PROJECT}/obj-build-mer-qt-xr/_virtualenvs/common/bin/python ${PROJECT}/obj-build-mer-qt-xr/config.status Reticulating splines... 0:05.22 File already read. Skipping: ${PROJECT}/gecko-dev/intl/components/ moz.build 0:08.32 File already read. Skipping: ${PROJECT}/gecko-dev/gfx/angle/targets/ angle_common/moz.build Finished reading 984 moz.build files in 19.20s Read 11 gyp files in parallel contributing 0.00s to total wall time Processed into 5326 build config descriptors in 17.35s RecursiveMake backend executed in 28.37s 1959 total backend files; 0 created; 2 updated; 1957 unchanged; 0 deleted; 21 -> 664 Makefile FasterMake backend executed in 2.77s 8 total backend files; 0 created; 1 updated; 7 unchanged; 0 deleted Total wall time: 69.45s; CPU time: 68.55s; Efficiency: 99%; Untracked: 1.75s Glean could not be found, so telemetry will not be reported. You may need to run |mach bootstrap|.I can then follow my usual partial build commands in order to get a newly built version of the library out.
After the build I went through the usual process of transferring and installing the new library to my phone. Sadly I'm still left with a blank screen, so this wasn't enough to get the render working. That may mean these changes can safely reverted, but I'll leave them as they are for now and aim to come back to that later.
That's because it should be easier to remove redundant code once things are working than to guess exactly which changes are needed to get things to work. Or at least, that's my rationale.
This leaves me having applied another patch today (patch 0053), so that we now have the following applied:
0001, 0002, 0003, 0006, 0007, 0009, 0010, 0011, 0015, 0016, 0017, 0018, 0019, 0020, 0021, 0022, 0023, 0024, 0025, 0026, 0027, 0028, 0029, 0030, 0031, 0032, 0033, 0034, 0036, 0037. 0038, 0039, 0040, 0047, 0048, 0053
The following remain uneeded:
0004, 0005, 0013, 0035,
While the following haven't yet been applied:
0008, 0012, 0014, 0041, 0042, 0043, 0044, 0045, 0046, 0049, 0050, 0051, 0052,
Tomorrow I'll continue working through the patches. It's slow going, but this interlude with TextureImageEGL serves to highlight that there are still relevant patches still to be applied, which means it's worth checking. But I also want to reiterate that I've not forgotten the discussion from the Sailfish Community Meeting last Thursday. I still have some work to do following the advice provided there.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
14 Apr 2024 : Day 216 #
After a motivational day getting really useful feedback from Raine, Damien and others, I'm back to going through patches today. I will of course need to act on the helpful suggestions, which I#m sure will be key to getting the rendering to work, but having started on my journey of checking patches I need to now finish it. This entire activity is an in-order execution pipeline you see.
I had reached patch 0038, otherwise named "Fix mesa egl display and buffer initialisation" and originally written by Adam Pigg, Frajo and myself back in 2021. It's a big one and also a relevant one to the current problem, so I'm going to spend a bit of time working my way through it.
The good news is that the process of checking, while a little laborious, doesn't require a lot of concentration. So it's good work to do while I'm still half asleep on my morning commute.
[...]
Having worked through patch 0038 it all looks present and correct, at least to the extent that makes sense with the way the new code is structured. I still have 61 patches to check through so I'm going to continue on with them.
So far the following are all applied, present and correct:
0001, 0002, 0003, 0006, 0007, 0009, 0010, 0011, 0015, 0016, 0017, 0018, 0019, 0020, 0021, 0022, 0023, 0024, 0025, 0026, 0027, 0028, 0029, 0030, 0031, 0032, 0033, 0034, 0036, 0037. 0038, 0039, 0040, 0047, 0048,
The following are no longer needed:
0004, 0005, 0013, 0035,
While the following haven't been applied, but might potentially be important for the browser overall, so may need to be applied at some point in the future:
0008, 0012, 0014, 0041, 0042, 0043, 0044, 0045, 0046, 0049, 0050, 0051, 0052,
I got up to patch 0051 and then I hit something unexpected. Patch 0051 makes "TextureImageEGL hold a reference to GLContext". There's a detailed explanation in the patch about why this is needed: it ensures that freed memory isn't used during the shutdown of EmbedLite port objects. The actual change required is minimal:
Well that's odd.
It would appear that the TextureImageEGL class has been completely removed. And because it's created in a factory it's apparently not affected any of the other EmbedLite changes. Here's the creation code from ESR 78:
The upstream diff suggests that TextureImageEGL provides no value, but it would be easy to miss the value if it applied only to a project that's external to gecko and which isn't officially supported. So I wonder if it's potentially offering value to us?
Unfortunately I've run out of time today to investigate this further today. I'd especially like to compare the functionality of TextureImageEGL from ESR 78 with that of BasicTextureImage in ESR 91. That will be my task for tomorrow morning.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
I had reached patch 0038, otherwise named "Fix mesa egl display and buffer initialisation" and originally written by Adam Pigg, Frajo and myself back in 2021. It's a big one and also a relevant one to the current problem, so I'm going to spend a bit of time working my way through it.
The good news is that the process of checking, while a little laborious, doesn't require a lot of concentration. So it's good work to do while I'm still half asleep on my morning commute.
[...]
Having worked through patch 0038 it all looks present and correct, at least to the extent that makes sense with the way the new code is structured. I still have 61 patches to check through so I'm going to continue on with them.
So far the following are all applied, present and correct:
0001, 0002, 0003, 0006, 0007, 0009, 0010, 0011, 0015, 0016, 0017, 0018, 0019, 0020, 0021, 0022, 0023, 0024, 0025, 0026, 0027, 0028, 0029, 0030, 0031, 0032, 0033, 0034, 0036, 0037. 0038, 0039, 0040, 0047, 0048,
The following are no longer needed:
0004, 0005, 0013, 0035,
While the following haven't been applied, but might potentially be important for the browser overall, so may need to be applied at some point in the future:
0008, 0012, 0014, 0041, 0042, 0043, 0044, 0045, 0046, 0049, 0050, 0051, 0052,
I got up to patch 0051 and then I hit something unexpected. Patch 0051 makes "TextureImageEGL hold a reference to GLContext". There's a detailed explanation in the patch about why this is needed: it ensures that freed memory isn't used during the shutdown of EmbedLite port objects. The actual change required is minimal:
protected: typedef gfxImageFormat ImageFormat; - GLContext* mGLContext; + RefPtr<GLContext> mGLContext; gfx::SurfaceFormat mUpdateFormat; EGLImage mEGLImage;Changes don't come much simpler than this. But when I tried to check whether it had been applied I ran in to problems: the TextureImageEGL.h file which it's intended to apply to doesn't exist.
Well that's odd.
It would appear that the TextureImageEGL class has been completely removed. And because it's created in a factory it's apparently not affected any of the other EmbedLite changes. Here's the creation code from ESR 78:
already_AddRefed<TextureImage> CreateTextureImage( GLContext* gl, const gfx::IntSize& aSize, TextureImage::ContentType aContentType, GLenum aWrapMode, TextureImage::Flags aFlags, TextureImage::ImageFormat aImageFormat) { switch (gl->GetContextType()) { case GLContextType::EGL: return CreateTextureImageEGL(gl, aSize, aContentType, aWrapMode, aFlags, aImageFormat); default: { GLint maxTextureSize; gl->fGetIntegerv(LOCAL_GL_MAX_TEXTURE_SIZE, &maxTextureSize); if (aSize.width > maxTextureSize || aSize.height > maxTextureSize) { NS_ASSERTION(aWrapMode == LOCAL_GL_CLAMP_TO_EDGE, "Can't support wrapping with tiles!"); return CreateTiledTextureImage(gl, aSize, aContentType, aFlags, aImageFormat); } else { return CreateBasicTextureImage(gl, aSize, aContentType, aWrapMode, aFlags); } } } }Notice how in ESR 91 the code for generating TextureImageEGL objects has been completely removed:
already_AddRefed<TextureImage> CreateTextureImage( GLContext* gl, const gfx::IntSize& aSize, TextureImage::ContentType aContentType, GLenum aWrapMode, TextureImage::Flags aFlags, TextureImage::ImageFormat aImageFormat) { GLint maxTextureSize; gl->fGetIntegerv(LOCAL_GL_MAX_TEXTURE_SIZE, &maxTextureSize); if (aSize.width > maxTextureSize || aSize.height > maxTextureSize) { NS_ASSERTION(aWrapMode == LOCAL_GL_CLAMP_TO_EDGE, "Can't support wrapping with tiles!"); return CreateTiledTextureImage(gl, aSize, aContentType, aFlags, aImageFormat); } else { return CreateBasicTextureImage(gl, aSize, aContentType, aWrapMode, aFlags); } }Let's find out why that is.
$ git log -1 -S "CreateTextureImageEGL" -- gfx/gl/GLTextureImage.cpp commit a7817816b708c8b1fc3362d21cbcadbff3c46741 Author: Jeff Muizelaar <jmuizelaar@mozilla.com> Date: Tue Jun 15 21:10:47 2021 +0000 Bug 1716559 - Remove TextureImageEGL. r=jnicol,jgilbert TextureImageEGL doesn't seem to provide any value beyond BasicTextureImage. It's last usage was bug 814159. Removing this has the side effect of using BasicTextureImage for small images instead of always using TilingTextureImage. Differential Revision: https://phabricator.services.mozilla.com/D117904It surprises me that I've not run in to this before, but one of the benefits of keeping this diary is that I can very quickly verify this:... I have not.
The upstream diff suggests that TextureImageEGL provides no value, but it would be easy to miss the value if it applied only to a project that's external to gecko and which isn't officially supported. So I wonder if it's potentially offering value to us?
Unfortunately I've run out of time today to investigate this further today. I'd especially like to compare the functionality of TextureImageEGL from ESR 78 with that of BasicTextureImage in ESR 91. That will be my task for tomorrow morning.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
13 Apr 2024 : Day 215 #
This morning I had a really fruitful discussion with everyone attending the fortnightly Sailfish Community Meeting. As you'll know I've reached the point where it's clear that rendering is happening to the surface, but there's something not working between the surface and the screen that's preventing any colour getting to the screen.
There were two really excellent outcomes from the meeting. The first involved potential solutions to this problem and I'd like to spend a little time discussing that now. The second... well I'll come to that later.
So let's first look at those potential solutions. The first thing was that there's a process that happens between the sailfish-browser code and the gecko code that performs the WebView rendering. It takes the offscreen render and ensures it gets to the screen. And as Raine (rainemak) explained during the meeting, that starts in the DeclarativeWebContainer::clearWindowSurface() method. That's part of the sailfish-browser code.
Recently I've been playing around with the CompositorOGL::EndFrame() method. That's at the other end of the process. So that means the part of the code I need to check should be somewhere between these two points. Here's some of the relevant conversation (slightly amended by me for brevity and clarify):
So that's some clear things to check, specifically the code around clearWindowSurface() and the calling of glMakeCurrent() (which is most likely called fMakeCurrent() if it's in the gecko code). So this is the reason for the first part of my upcoming plan:
But there was more useful info from Raine in addition to this. He also suggested to take a look at the DrawUnderlay() method — if it exists — or the functionality that's replaced it.
Finally, he also suggested to check the updatePaintNode() method which is part of the qtmozembed codebase:
All of these are excellent suggestions and they're now on my list of things to check. Yesterday I'd started working through the patches and I'll have to finish that task before I move on to these suggestions, since the patches have ramifications for other parts of the code as well, but I'm really hoping — and quite energised by the prospect — that these tips from Raine might lead to a solution for the offscreen rendering problems.
It was also great to see that the discussion during the meeting also led to some helpful OpenGL debugging suggestions on the forum. I've already thanked Tone (tortoisedoc) previously but now I must also add Damien Caliste (dcaliste) to the list of people who deserve my thanks.
So that was the first nice outcome, what about the second? Well that's of a slightly different nature. During the meeting thigg also made an unexpected by also really superb offer:
I can't get enough of Thilo's pictures and am already looking forward to being able to share them in these diaries when the time comes.
That's it for today, but I'll return tomorrow with updates on patches.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
There were two really excellent outcomes from the meeting. The first involved potential solutions to this problem and I'd like to spend a little time discussing that now. The second... well I'll come to that later.
So let's first look at those potential solutions. The first thing was that there's a process that happens between the sailfish-browser code and the gecko code that performs the WebView rendering. It takes the offscreen render and ensures it gets to the screen. And as Raine (rainemak) explained during the meeting, that starts in the DeclarativeWebContainer::clearWindowSurface() method. That's part of the sailfish-browser code.
Recently I've been playing around with the CompositorOGL::EndFrame() method. That's at the other end of the process. So that means the part of the code I need to check should be somewhere between these two points. Here's some of the relevant conversation (slightly amended by me for brevity and clarify):
rainemak: flypig, have you tried to change gl clear color?
flypig: Yes, I tried glClearColor. The change shows up with fReadPixels, but not on screen.
rainemak: has to be that off screen is not there
flypig: Could you elaborate?
rainemak: flypig, one thing worth looking is that gecko doesn't create any headless things in addition to what you're expecting it to create
flypig: rainemak, how would that manifest itself? I've been careful to ensure there's only one display value
created. Is there anything else to check for this you can think of?
rainemak: flypig, also gl makeCurrent and swap I'd check
rainemak: DeclarativeWebContainer::clearWindowSurface
rainemak: ^ is that being called properly
flypig: That's in sailfish-browser rainemak? I've not checked that; I'll do so.
flypig: Yes, I tried glClearColor. The change shows up with fReadPixels, but not on screen.
rainemak: has to be that off screen is not there
flypig: Could you elaborate?
rainemak: flypig, one thing worth looking is that gecko doesn't create any headless things in addition to what you're expecting it to create
flypig: rainemak, how would that manifest itself? I've been careful to ensure there's only one display value
created. Is there anything else to check for this you can think of?
rainemak: flypig, also gl makeCurrent and swap I'd check
rainemak: DeclarativeWebContainer::clearWindowSurface
rainemak: ^ is that being called properly
flypig: That's in sailfish-browser rainemak? I've not checked that; I'll do so.
So that's some clear things to check, specifically the code around clearWindowSurface() and the calling of glMakeCurrent() (which is most likely called fMakeCurrent() if it's in the gecko code). So this is the reason for the first part of my upcoming plan:
flypig: Alright, I'm going to check on the clearWindowSurface side of things and everything between that and CompositorOGL::EndFrame().
rainemak: flypig, that's at least scheduling glclear
flypig: Okay, that's useful to know. Then maybe I can put lots of readPixels in between the two to find out exactly which step is failing.
rainemak: flypig, that's at least scheduling glclear
flypig: Okay, that's useful to know. Then maybe I can put lots of readPixels in between the two to find out exactly which step is failing.
But there was more useful info from Raine in addition to this. He also suggested to take a look at the DrawUnderlay() method — if it exists — or the functionality that's replaced it.
rainemak: flypig, and draw underlay is getting called properly as well? that call makeCurrent
rainemak: that calls makeCurrent
flypig: One second, let me check.
rainemak: let me think
flypig: That's called in EmbedLiteCompositorBridgeParent::CompositeToDefaultTarget() ?
rainemak: it starts around there
rainemak: did we remove on esr78 DrawUnderlay like methon ( there's still QMozWindowPrivate::DrawOverlay )
rainemak: hmm
rainemak: I'm not seeing that makeCurrent
rainemak: that calls makeCurrent
flypig: One second, let me check.
rainemak: let me think
flypig: That's called in EmbedLiteCompositorBridgeParent::CompositeToDefaultTarget() ?
rainemak: it starts around there
rainemak: did we remove on esr78 DrawUnderlay like methon ( there's still QMozWindowPrivate::DrawOverlay )
rainemak: hmm
rainemak: I'm not seeing that makeCurrent
Finally, he also suggested to check the updatePaintNode() method which is part of the qtmozembed codebase:
rainemak: flypig, QuickMozView::updatePaintNode I'd check this one as browser rendering works
rainemak: i.e. does getPlatform and MozMaterialNode::preprocess work as expected
rainemak: thinking if there's something wrong now with previous scenegraph integration
rainemak: i.e. does getPlatform and MozMaterialNode::preprocess work as expected
rainemak: thinking if there's something wrong now with previous scenegraph integration
All of these are excellent suggestions and they're now on my list of things to check. Yesterday I'd started working through the patches and I'll have to finish that task before I move on to these suggestions, since the patches have ramifications for other parts of the code as well, but I'm really hoping — and quite energised by the prospect — that these tips from Raine might lead to a solution for the offscreen rendering problems.
It was also great to see that the discussion during the meeting also led to some helpful OpenGL debugging suggestions on the forum. I've already thanked Tone (tortoisedoc) previously but now I must also add Damien Caliste (dcaliste) to the list of people who deserve my thanks.
So that was the first nice outcome, what about the second? Well that's of a slightly different nature. During the meeting thigg also made an unexpected by also really superb offer:
thilo[m]: #info thigg, community
thilo[m]: Flypig: you have a milestone in gecko after which you have a big party planned? ;)
flypig: thilo[m], once the offscreen render is working, I'll generate patches, and then it will be time to celebrate (for me at least!).
thilo[m]: Cool, i'll prepare pictures ;)
thilo[m]: Flypig: you have a milestone in gecko after which you have a big party planned? ;)
flypig: thilo[m], once the offscreen render is working, I'll generate patches, and then it will be time to celebrate (for me at least!).
thilo[m]: Cool, i'll prepare pictures ;)
I can't get enough of Thilo's pictures and am already looking forward to being able to share them in these diaries when the time comes.
That's it for today, but I'll return tomorrow with updates on patches.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
12 Apr 2024 : Day 214 #
Yesterday was a minor breakthrough day. I'd normally start off by explaining why, but I can't do a better job than how Leif-Jöran Olsson (ljo) described things on Mastodon last night:
So very evocative! And when you're left peering at the rendering through the prism of a single pixel, it does rather feel like the dark ages.
The reality of this summary is that there's still no rendering to the screen. As it also captures however, I was able to confirm that rendering is happening to the backbuffer surface. That gives me hope, as well as a much better idea about where to focus my efforts, which is in the gap between the backbuffer and the screen.
Unfortunately this is part of the mechanism that I know least about, involving a mysterious combination of Qt, Wayland, OpenGL ES and with only the tip of the technology iceberg protruding into gecko code.
Before heading back to this it strikes me it might be worth me double checking the patches. Raine already suggested some I should look at and I'm concerned I may have missed something else that might be important.
So I'm going to review all 99 patches to check what has and hasn't been applied, just in case I've missed something crucial.
The process is rather laborious: I open the patch in my editor, open the files that have changed and check whether the diffs have been applied. In many cases things have moved around and it takes a bit more than just checking the lines listed in the patch. Instead I'm having to search for prominent bits of code to check whether they exist somewhere in the code base.
It's not hard, but there's also not much to write about: open, check, check, check, note down result, close. Open, check, check, check, note down result, close. Rinse and repeat.
What I've discovered so far today is that the following patches have all been applied and look correct:
0001, 0002, 0003, 0006, 0007, 0009, 0010, 0011, 0015, 0016, 0017, 0018, 0019, 0020, 0021, 0022, 0023, 0024, 0025, 0026, 0027, 0028, 0029, 0030, 0031, 0032, 0033, 0034, 0036, 0037.
The following patches aren't applicable because the code has either changed so much as to make them impossible to apply or irrelevant:
0004, 0005, 0013, 0035.
And finally the following haven't been applied, but might yet need to be:
0008, 0012, 0014.
That means I still have 0038 to 0099 to check. As yet I've not noticed anything missing or misapplied that looks like it might be affecting the offscreen render pipeline. Having said that, as the day comes to an end I can see the next patch — patch 0038 — is very definitely applicable. I'll need to look through it very carefully indeed. But that's for tomorrow when I'm feeling more awake.
Also tomorrow I'll be discussing the outcome of the Sailfish Community Meeting, which turned out to be unexpectedly useful for the tasks I'm working on. More on that tomorrow!
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
The dark ages losing its grip on the rendering origin. While the reminiscences of violet sky still casts its reflection into the void, a Big bang awaits to radiate the 10^122 black matter surfaces onto the infinite screen of the gecko multi-verse
So very evocative! And when you're left peering at the rendering through the prism of a single pixel, it does rather feel like the dark ages.
The reality of this summary is that there's still no rendering to the screen. As it also captures however, I was able to confirm that rendering is happening to the backbuffer surface. That gives me hope, as well as a much better idea about where to focus my efforts, which is in the gap between the backbuffer and the screen.
Unfortunately this is part of the mechanism that I know least about, involving a mysterious combination of Qt, Wayland, OpenGL ES and with only the tip of the technology iceberg protruding into gecko code.
Before heading back to this it strikes me it might be worth me double checking the patches. Raine already suggested some I should look at and I'm concerned I may have missed something else that might be important.
So I'm going to review all 99 patches to check what has and hasn't been applied, just in case I've missed something crucial.
The process is rather laborious: I open the patch in my editor, open the files that have changed and check whether the diffs have been applied. In many cases things have moved around and it takes a bit more than just checking the lines listed in the patch. Instead I'm having to search for prominent bits of code to check whether they exist somewhere in the code base.
It's not hard, but there's also not much to write about: open, check, check, check, note down result, close. Open, check, check, check, note down result, close. Rinse and repeat.
What I've discovered so far today is that the following patches have all been applied and look correct:
0001, 0002, 0003, 0006, 0007, 0009, 0010, 0011, 0015, 0016, 0017, 0018, 0019, 0020, 0021, 0022, 0023, 0024, 0025, 0026, 0027, 0028, 0029, 0030, 0031, 0032, 0033, 0034, 0036, 0037.
The following patches aren't applicable because the code has either changed so much as to make them impossible to apply or irrelevant:
0004, 0005, 0013, 0035.
And finally the following haven't been applied, but might yet need to be:
0008, 0012, 0014.
That means I still have 0038 to 0099 to check. As yet I've not noticed anything missing or misapplied that looks like it might be affecting the offscreen render pipeline. Having said that, as the day comes to an end I can see the next patch — patch 0038 — is very definitely applicable. I'll need to look through it very carefully indeed. But that's for tomorrow when I'm feeling more awake.
Also tomorrow I'll be discussing the outcome of the Sailfish Community Meeting, which turned out to be unexpectedly useful for the tasks I'm working on. More on that tomorrow!
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
11 Apr 2024 : Day 213 #
The build ran overnight and finally this morning it was complete. Now time to test it. As a quick recap, what I'm looking for when it runs is either a cyan screen or a white screen. Yesterday I made a change to the ESR 78 code to insert a "clear surface to a cyan colour" call right before the surface is swapped onto the screen. Here are the two new commands I added:
On ESR 78 this resulted in the screen turning cyan, which is what I was expecting and hoping for. If the same thing happens on ESR 91 then it means the display of the surface to the screen is working correctly. If, on the other hand, the screen remains white, it means that either the call to clear the screen is failing, or the surface isn't being rendered to the screen at all.
The screen remains white.
So we're in the latter category. Either the methods I'm calling to clear the screen are dud, or the background surface simply isn't being rendered to screen.
This is useful progress in my opinion.
So where to take it next? I want to separate the two possibilities: is the problem the call to clear the screen, or is the surface not being rendered. My suspicion is that it's the latter, but I'm done with suspicions when it comes to the offscreen rendering pipeline. I'm in need of a bit of certainty.
So my plan is to try to probe the texture to check its colour. And here's the code I'm going to use to do it.
This is inelegant and inefficient code and sharing it is unedifying. But if it gets the job done I don't care. So now I'm building it in the ESR 78 build environment to test it with the ESR 78 rendering pipeline.
As a quick second test I've rebuilt the code without the cyan screen clearing part. Now when I run the app with the sailfishos.org website showing on the screen we get the following output.
Once again, this is good news. It's what I'd hope to see in a working system. Next up it's time to try the same code on ESR 91. This is crunch time because it should tell us whether the render surface is being rendered to or not.
With the code build and deployed, here's the output with the clear-screen cyan code in place:
Here's the output from this.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
mGLContext->fClearColor(0.0, 1.0, 1.0, 0.0); mGLContext->fClear(LOCAL_GL_COLOR_BUFFER_BIT | LOCAL_GL_DEPTH_BUFFER_BIT);These were added to the top of the CompositorOGL::EndFrame() method, which is the last thing to happen before the surfaces are swapped. Actually that's not quite right: EndFrame() is actually the method that performs the swap. So these clear calls are now pretty much the last thing that happens before the swap.
On ESR 78 this resulted in the screen turning cyan, which is what I was expecting and hoping for. If the same thing happens on ESR 91 then it means the display of the surface to the screen is working correctly. If, on the other hand, the screen remains white, it means that either the call to clear the screen is failing, or the surface isn't being rendered to the screen at all.
The screen remains white.
So we're in the latter category. Either the methods I'm calling to clear the screen are dud, or the background surface simply isn't being rendered to screen.
This is useful progress in my opinion.
So where to take it next? I want to separate the two possibilities: is the problem the call to clear the screen, or is the surface not being rendered. My suspicion is that it's the latter, but I'm done with suspicions when it comes to the offscreen rendering pipeline. I'm in need of a bit of certainty.
So my plan is to try to probe the texture to check its colour. And here's the code I'm going to use to do it.
void CompositorOGL::EndFrame() { AUTO_PROFILER_LABEL("CompositorOGL::EndFrame", GRAPHICS); mGLContext->fClearColor(0.0, 1.0, 1.0, 0.0); mGLContext->fClear(LOCAL_GL_COLOR_BUFFER_BIT | LOCAL_GL_DEPTH_BUFFER_BIT); size_t bufferSize = mWidgetSize.width * mWidgetSize.height * 4; auto buf = MakeUnique<uint8_t[]>(bufferSize); mGLContext->fReadPixels(0, 0, mWidgetSize.width, mWidgetSize.height, LOCAL_GL_RGBA, LOCAL_GL_UNSIGNED_BYTE, buf.get()); int xpos = mWidgetSize.width / 2; int ypos = mWidgetSize.height / 2; int pos = xpos * ypos * 4; char red = buf.get()[pos]; char green = buf.get()[pos + 1]; char blue = buf.get()[pos + 2]; char alpha = buf.get()[pos + 3]; printf_stderr("Colour: (%d, %d, %d, %d)\n", red, green, blue, alpha); [...]This is not beautiful code. After clearing the screen it creates a buffer large enough to contain all of the data held in the texture. The fReadPixels() call is then used to transfer the pixel data from the graphics memory to this buffer. We then measure up the centre of the buffer and pluck out the colour from the buffer at that point.
This is inelegant and inefficient code and sharing it is unedifying. But if it gets the job done I don't care. So now I'm building it in the ESR 78 build environment to test it with the ESR 78 rendering pipeline.
$ sfdk engine exec $ sb2 -t SailfishOS-esr78-aarch64.default $ mv .git .git-disabled $ source `pwd`/obj-build-mer-qt-xr/rpm-shared.env $ make -j1 -C obj-build-mer-qt-xr/gfx/layers $ make -j16 -C `pwd`/obj-build-mer-qt-xr/toolkit $ strip obj-build-mer-qt-xr/toolkit/library/build/libxul.soAnd on running it, with the full-screen cyan showing, it gives the following output:
$ harbour-webview [...] Created LOG for EmbedLiteLayerManager =============== Preparing offscreen rendering context =============== Colour: (0, 255, 255, 255) Colour: (0, 255, 255, 255) Colour: (0, 255, 255, 255) Colour: (0, 255, 255, 255) Colour: (0, 255, 255, 255) Colour: (0, 255, 255, 255) Colour: (0, 255, 255, 255) Colour: (0, 255, 255, 255) Colour: (0, 255, 255, 255) Colour: (0, 255, 255, 255) [...]The colour is red, green, blue alpha. So this indicates no red, full green, full blue and full opaque. Which is cyan. That's a good result: it shows us the code is working.
As a quick second test I've rebuilt the code without the cyan screen clearing part. Now when I run the app with the sailfishos.org website showing on the screen we get the following output.
$ harbour-webview [...] Created LOG for EmbedLiteLayerManager =============== Preparing offscreen rendering context =============== Colour: (255, 255, 255, 255) Colour: (255, 255, 255, 255) Colour: (255, 255, 255, 255) Colour: (255, 255, 255, 255) Colour: (255, 255, 255, 255) Colour: (255, 255, 255, 255) Colour: (255, 255, 255, 255) Colour: (255, 255, 255, 255) Colour: (253, 253, 254, 255) Colour: (237, 237, 243, 255) Colour: (234, 234, 242, 255) Colour: (222, 222, 233, 255) Colour: (214, 213, 228, 255) Colour: (199, 198, 218, 255) Colour: (183, 182, 207, 255) Colour: (164, 163, 195, 255) Colour: (152, 151, 187, 255) Colour: (127, 125, 170, 255) Colour: (109, 107, 158, 255) Colour: (98, 96, 151, 255) Colour: (82, 80, 141, 255) Colour: (69, 66, 132, 255) Colour: (61, 58, 127, 255) Colour: (52, 49, 120, 255) Colour: (46, 44, 117, 255) Colour: (42, 39, 114, 255) Colour: (38, 35, 111, 255) Colour: (36, 33, 110, 255) Colour: (36, 33, 110, 255) Colour: (36, 33, 110, 255)That's an initially white screen that changes colour as the rendering progresses. Ultimately we get the end result which is a dark blue-purple colour. That's the colour which also happens to be prominent on the main sailfishos.org front page.
Once again, this is good news. It's what I'd hope to see in a working system. Next up it's time to try the same code on ESR 91. This is crunch time because it should tell us whether the render surface is being rendered to or not.
With the code build and deployed, here's the output with the clear-screen cyan code in place:
$ harbour-webview [...] Created LOG for EmbedLiteLayerManager =============== Preparing offscreen rendering context =============== [...] Colour: (0, 255, 255, 255) Colour: (0, 255, 255, 255) Colour: (0, 255, 255, 255) Colour: (0, 255, 255, 255) Colour: (0, 255, 255, 255) Colour: (0, 255, 255, 255) Colour: (0, 255, 255, 255) Colour: (0, 255, 255, 255) Colour: (0, 255, 255, 255) Colour: (0, 255, 255, 255) Colour: (0, 255, 255, 255) Colour: (0, 255, 255, 255) [...]This is interesting. This shows quite clearly that the cyan colour is making its way onto the render surface. That's actually quite exciting. It means that the most complex part of the process appears to be working as expected. The problem is that the surface isn't making it all the way to the screen. Just to be sure I'm also going to build a version without the cyan screen clearing. This will tell us whether the page is rendering onto the surface.
Here's the output from this.
$ harbour-webview [...] Created LOG for EmbedLiteLayerManager =============== Preparing offscreen rendering context =============== [...] Colour: (255, 255, 255, 255) Colour: (255, 255, 255, 255) Colour: (255, 255, 255, 255) Colour: (255, 255, 255, 255) Colour: (255, 255, 255, 255) Colour: (255, 255, 255, 255) Colour: (255, 255, 255, 255) Colour: (255, 255, 255, 255) Colour: (254, 254, 254, 255) Colour: (245, 245, 248, 255) Colour: (242, 242, 246, 255) Colour: (240, 239, 245, 255) Colour: (234, 233, 241, 255) Colour: (229, 229, 238, 255) Colour: (226, 225, 236, 255) Colour: (221, 221, 233, 255) Colour: (217, 217, 230, 255) Colour: (208, 207, 224, 255) Colour: (204, 203, 221, 255) Colour: (192, 191, 213, 255) Colour: (187, 186, 210, 255) Colour: (181, 180, 206, 255) Colour: (175, 174, 202, 255) Colour: (169, 168, 198, 255) Colour: (157, 156, 190, 255) Colour: (150, 149, 186, 255) Colour: (144, 143, 182, 255) Colour: (138, 137, 178, 255) Colour: (125, 124, 169, 255) Colour: (119, 117, 165, 255) Colour: (113, 111, 161, 255) Colour: (107, 105, 157, 255) Colour: (101, 99, 153, 255) Colour: (96, 94, 150, 255) Colour: (91, 89, 146, 255) Colour: (86, 84, 143, 255) Colour: (81, 78, 140, 255) Colour: (71, 69, 133, 255) Colour: (68, 65, 131, 255) Colour: (63, 61, 128, 255) Colour: (60, 57, 126, 255) Colour: (57, 54, 124, 255) Colour: (53, 50, 121, 255) Colour: (51, 48, 120, 255) Colour: (48, 45, 118, 255) Colour: (45, 43, 116, 255) Colour: (44, 41, 115, 255) Colour: (41, 38, 113, 255) Colour: (40, 37, 113, 255) Colour: (39, 36, 112, 255) Colour: (38, 35, 111, 255) Colour: (37, 34, 110, 255) Colour: (36, 33, 110, 255) [...]So there we have it. The site is rendering just fine onto the background surface. The problem is that the surface isn't making it on to the screen. This leaves us plenty more questions to answer and things to think about for tomorrow, but that's enough for today.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
10 Apr 2024 : 212 #
I'm changing tack today. After spending weeks reviewing the offscreen render initialisation code, I'm now turning my attention to the ESR 78 code. That may sounds a bit weird, but bear with me.
In order to check whether changes to the build ought to be resulting in particular consequences in the running software, I'm wanting to make some changes to the ESR 78 code to see what happens there. You could say it's an experimental approach to debugging the ESR 91 code. But to do this I need to build ESR 78.
The change I want to make is simple. Over recent days we were looking at the CompositorOGL::EndFrame() code. This is called at the end of the render process in order to swap the back buffer with the front buffer and allow the latest updates to be shown onscreen. Before this swap happens I've added two lines of code, right at the top of the method:
So I'll need to set off a build of the ESR 78 code with these minor changes. Apart from the build being a lengthy process, it'll also involve configuring the build environment correctly. The sfdk build tool has some nice features for separating out different build environments and configurations using snapshots and sessions. I'm going to use both. It's quite important to get all this right though. Otherwise there's a risk I'll clobber my ESR 91 build configuration which will mean, quite apart from having to reconfigure things back, I may also have to do a full rebuild of all the many packages that have been touched by the ESR 91 changes.
For context, here's the build configuration I'm using for ESR 91. The important things to note are the output-prefix which is where the final packages end up and the target which combines both the choice of processor and the list of packages that are installed in the environment where the build takes place.
For reference, here are the targets that I have installed in my SDK:
So that's my current setup for ESR 91. For ESR 78 we want to tweak this a little. Like the SailfishOS-devel-aarch64 target we want to use a snapshot of the SailfishOS-4.5.0.18-aarch64 target for our build, but unlike for ESR 91 we want the output-prefix option to be set to a folder in the build directory, so that we only use system packages.
Let's call our new snapshot SailfishOS-esr78-aarch64. Here's how we create it:
The build was astonishingly fast, less than three hours for everything, which must be some kind of record. That makes me rather suspicious, but let's see. The packages are all there; I'm able to transfer them to my phone:
I've made the same change to the ESR 91 code and the great thing is I can now kick off a new build of ESR 91 in a different snapshot without having to make any changes to the configuration. I just switch gnu screen sessions and kick the build off:
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
In order to check whether changes to the build ought to be resulting in particular consequences in the running software, I'm wanting to make some changes to the ESR 78 code to see what happens there. You could say it's an experimental approach to debugging the ESR 91 code. But to do this I need to build ESR 78.
The change I want to make is simple. Over recent days we were looking at the CompositorOGL::EndFrame() code. This is called at the end of the render process in order to swap the back buffer with the front buffer and allow the latest updates to be shown onscreen. Before this swap happens I've added two lines of code, right at the top of the method:
void CompositorOGL::EndFrame() { AUTO_PROFILER_LABEL("CompositorOGL::EndFrame", GRAPHICS); mGLContext->fClearColor(0.0, 1.0, 1.0, 0.0); mGLContext->fClear(LOCAL_GL_COLOR_BUFFER_BIT | LOCAL_GL_DEPTH_BUFFER_BIT); [...]The two added lines are the fClearColor() and fClear() calls. Both of these combined should clear the buffer and leave it a nice cyan colour. That's assuming everything is working as I expect it to. If this works on ESR 78 then I can transfer the same code to ESR 91 to see whether it will work there. If it does, then I'll know the process of showing the surface onscreen is working and I should focus on the page rendering code. If it doesn't then it will indicate the surface render is broken and I should continue focusing my efforts there.
So I'll need to set off a build of the ESR 78 code with these minor changes. Apart from the build being a lengthy process, it'll also involve configuring the build environment correctly. The sfdk build tool has some nice features for separating out different build environments and configurations using snapshots and sessions. I'm going to use both. It's quite important to get all this right though. Otherwise there's a risk I'll clobber my ESR 91 build configuration which will mean, quite apart from having to reconfigure things back, I may also have to do a full rebuild of all the many packages that have been touched by the ESR 91 changes.
For context, here's the build configuration I'm using for ESR 91. The important things to note are the output-prefix which is where the final packages end up and the target which combines both the choice of processor and the list of packages that are installed in the environment where the build takes place.
$ sfdk config # ---- command scope --------- # <clear> # ---- session scope --------- # <clear> # ---- global scope --------- output-prefix = /home/flypig/RPMS device = kolbe target = SailfishOS-devel-aarch64The output-prefix is also important because it's used to override packages in the target. In other words, any packages it finds in this output directory will be used in preference to packages that are part of the target environment.
For reference, here are the targets that I have installed in my SDK:
$ sfdk tools target list SailfishOS-3.4.0.24-aarch64 sdk-provided SailfishOS-3.4.0.24-armv7hl sdk-provided SailfishOS-3.4.0.24-i486 sdk-provided SailfishOS-4.5.0.18-aarch64 sdk-provided,latest SailfishOS-4.5.0.18-armv7hl sdk-provided,latest SailfishOS-4.5.0.18-i486 sdk-provided,latest SailfishOS-devel-aarch64 user-defined SailfishOS-devel-armv7hl user-defined SailfishOS-devel-i486 user-definedThe devel targets are actually just copies of the 4.5.0.18 targets which I use for convenience to avoid having to remember which version is the latest. Here are the snapshots I have available to use with these:
$ sfdk tools list --snapshots SailfishOS-3.4.0.24 sdk-provided ├── SailfishOS-3.4.0.24-aarch64 sdk-provided │ └── SailfishOS-3.4.0.24-aarch64.default snapshot ├── SailfishOS-3.4.0.24-armv7hl sdk-provided │ └── SailfishOS-3.4.0.24-armv7hl.default snapshot └── SailfishOS-3.4.0.24-i486 sdk-provided └── SailfishOS-3.4.0.24-i486.default snapshot SailfishOS-4.5.0.18 sdk-provided,latest ├── SailfishOS-4.5.0.18-aarch64 sdk-provided,latest │ └── SailfishOS-4.5.0.18-aarch64.default snapshot ├── SailfishOS-4.5.0.18-armv7hl sdk-provided,latest │ └── SailfishOS-4.5.0.18-armv7hl.default snapshot ├── SailfishOS-4.5.0.18-i486 sdk-provided,latest │ └── SailfishOS-4.5.0.18-i486.default snapshot ├── SailfishOS-devel-aarch64 user-defined │ └── SailfishOS-devel-aarch64.default snapshot ├── SailfishOS-devel-armv7hl user-defined │ └── SailfishOS-devel-armv7hl.default snapshot └── SailfishOS-devel-i486 user-defined └── SailfishOS-devel-i486.default snapshotThe SailfishOS-devel-aarch64 target is the one I use to build ESR 91 against, but that in practice this gets cloned as SailfishOS-devel-aarch64.default for the actual build.
So that's my current setup for ESR 91. For ESR 78 we want to tweak this a little. Like the SailfishOS-devel-aarch64 target we want to use a snapshot of the SailfishOS-4.5.0.18-aarch64 target for our build, but unlike for ESR 91 we want the output-prefix option to be set to a folder in the build directory, so that we only use system packages.
Let's call our new snapshot SailfishOS-esr78-aarch64. Here's how we create it:
$ sfdk tools clone SailfishOS-4.5.0.18-aarch64 SailfishOS-esr78-aarch64Now listing the targets we can check it was successfully created. I've removed some of the old targets I have installed on my machine for brevity.
$ sfdk tools list --snapshots [...] SailfishOS-4.5.0.18 sdk-provided,latest ├── SailfishOS-4.5.0.18-aarch64 sdk-provided,latest │ └── SailfishOS-4.5.0.18-aarch64.default snapshot ├── SailfishOS-4.5.0.18-armv7hl sdk-provided,latest │ └── SailfishOS-4.5.0.18-armv7hl.default snapshot ├── SailfishOS-4.5.0.18-i486 sdk-provided,latest │ └── SailfishOS-4.5.0.18-i486.default snapshot ├── SailfishOS-devel-aarch64 user-defined │ └── SailfishOS-devel-aarch64.default snapshot ├── SailfishOS-devel-armv7hl user-defined │ └── SailfishOS-devel-armv7hl.default snapshot ├── SailfishOS-devel-i486 user-defined │ └── SailfishOS-devel-i486.default snapshot └── SailfishOS-esr78-aarch64 user-defined └── SailfishOS-esr78-aarch64.default snapshotThat's created the snapshot, but I also need to update the configuration. There are a variety of different ways to do this, but this is the route I've chosen:
$ sfdk config --session output-prefix=${PWD}/RPMS $ sfdk config --session target=SailfishOS-esr78-aarch64This will reset the output target to its default of the RPMS folder inside the build directory and set the target to use our new snapshot. Now checking the configuration gives the following.
$ sfdk config # ---- command scope --------- # <clear> # ---- session scope --------- output-prefix = /home/flypig/Documents/Development/jolla/gecko-dev-project/ gecko-dev/RPMS target = SailfishOS-esr78-aarch64 # ---- global scope --------- # masked at session scope ;output-prefix = /home/flypig/RPMS device = kolbe # masked at session scope ;target = SailfishOS-devel-aarch64Ultimately I'd like to do partial builds, but to make sure everything is set up properly, especially the installed dependencies, I'm going to start by doing a full rebuild of the code.
$ sfdk build -d --with git_workaroundThis will, as always, take a long time. Below is an abridged copy of the first few screens of output. Notice in this output that 47 new dependencies are installed as part of this build process. I'll need all of those for the partial builds as well.
NOTICE: Appending changelog entries to the RPM SPEC file… Setting version: 78.15.1+git36+master.20240408074155.619a29e0d110+gecko.dev.4d734e782f53 Directory walk started Directory walk done - 0 packages Temporary output repo path: /home/flypig/Documents/Development/jolla/ gecko-dev-project/gecko-dev/.sfdk/filtered-output-dir/.repodata/ Preparing sqlite DBs [...] The following 47 NEW packages are going to be installed: alsa-lib-devel 1.2.8+git1-1.6.1.jolla autoconf213 2.13-1.4.10.jolla bzip2-devel 1.0.8+git2-1.6.4.jolla cairo-devel 1.17.4+git1-1.6.2.jolla cargo 1.52.1+git4-1.9.1.jolla cbindgen 0.17.0+git4-1.12.5.jolla clang 10.0.1+git3-1.6.3.jolla clang-devel 10.0.1+git3-1.6.3.jolla clang-libs 10.0.1+git3-1.6.3.jolla clang-tools-extra 10.0.1+git3-1.6.3.jolla cpp 8.3.0-1.7.2.jolla [...] zip 3.0+git1-1.6.6.jolla 47 new packages to install. [...][...]
The build was astonishingly fast, less than three hours for everything, which must be some kind of record. That makes me rather suspicious, but let's see. The packages are all there; I'm able to transfer them to my phone:
$ scp RPMS/SailfishOS-esr78-aarch64/xulrunner-qt5-78.*.rpm defaultuser@10.0.0.117:/home/defaultuser/Documents/Development/gecko/And install them:
$ rpm -U xulrunner-qt5-78.*.rpm xulrunner-qt5-misc-78.*.rpm xulrunner-qt5-debuginfo-78.*.rpm xulrunner-qt5-debugsource-78.*.rpmAnd run them:
$ harbour-webviewAnd what do we get with this? A really nice looking cyan screen and nothing else. This is great: this is what I wanted to see.
I've made the same change to the ESR 91 code and the great thing is I can now kick off a new build of ESR 91 in a different snapshot without having to make any changes to the configuration. I just switch gnu screen sessions and kick the build off:
$ sfdk config # ---- command scope --------- # <clear> # ---- session scope --------- # <clear> # ---- global scope --------- output-prefix = /home/flypig/RPMS device = kolbe target = SailfishOS-devel-aarch64 $ sfdk build -d --with git_workaroundNow that the ESR 78 build has completed I'm also able to do partial builds. Usually — for ESR 91 — I would use the following to get into the build engine. I've left the prompt prefixes in to help make clearer what's going on, but I generally remove them when writing these entries as they make things a bit messy.
$ sfdk engine exec [mersdk@ba519abb3e13 gecko-dev]$ sb2 -t SailfishOS-devel-aarch64.default [SB2 sdk-build SailfishOS-devel-aarch64.default] I have no name!@ba519abb3e13 gecko-dev $Now when I want to get into my new ESR 78 build target I'll do the following instead (I'll remove the prefixes again from now on):
$ sfdk engine exec $ sb2 -t SailfishOS-esr78-aarch64.default $I can then perform a partial build using the following:
$ mv .git .git-disabled $ source `pwd`/obj-build-mer-qt-xr/rpm-shared.env $ make -j1 -C obj-build-mer-qt-xr/gfx/gl/ $ make -j16 -C `pwd`/obj-build-mer-qt-xr/toolkit $ strip obj-build-mer-qt-xr/toolkit/library/build/libxul.soBut, that'll be useful for another day. As for today, the ESR 91 build is still running and I'll need to wait for that to complete before I can test my hypothesis. By the looks of things, that'll have to wait until tomorrow, so that's it for today.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
9 Apr 2024 : Day 211 #
I've been talking about wrapping up my investigation of offscreen rendering initialisation for several days now, so I really ought to pull my finger out and actually move on. It's just, as I look through the code, I keep on finding new things to check. All of them have provided a little more insight, but none of them have resulted in a working renderer.
Today I stumbled on a new thing to check. The code that swaps the buffer is an obvious area where things might be going wrong. And since there's not much of it and it all happens within the CompositorOGL::EndFrame() method in both versions (which, incidentally, share the same code almost identically) it seemed worth spending the time to check it.
So first up the actual code that swaps the buffers. The idea is that the previous rendered frame is held in one buffer; the next frame is rendered off screen and then swapped out for the old one. This ensures the swap happens fast, minimising the chance of artefacts and flicker appearing during the render cycle.
Here's a copy of the method for reference.
It's therefore an obvious concern that this might be causing problems for the offscreen render pipeline as well. Clearly on ESR 78 these are disabled, but are they on ESR 91? Here's what we get when we try with ESR 91:
While stepping through this I noticed that when stepping out of the SwapBuffers() method I end up in the CompositorOGL::EndFrame() method. This is also a pretty simple and linear method, so I figure it might be worth checking it as well. So that's what I'm going to do.
Here's the output from the ESR 78 code for CompositorOGL::EndFrame(). I've cut some of the output for brevity.
What would be great is if this build were to complete before the end of today because I want to run a new build of ESR 78 overnight. Rebuilding ESR 78 might sound a little crazy, after all I should be focusing on ESR 91. But as I've previously explained, I want to test out rendering to the background in different colours to establish more accurately where rendering is failing. Every thing I've tried up to now has had no effect on the output, but that's all been using ESR 91. I want to test the same things on ESR 78. That means making minimal changes to the ESR 78 code and doing a rebuild to test it on-device.
I can't think of a better way of doing this. I've come to the conclusion I just can't get the same results using the debugger alone. And so it is that I need to rebuild ESR 78.
With all these builds and plans for builds, I won't be able to do much more work on this today, so onward into the week to see how things progress.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
Today I stumbled on a new thing to check. The code that swaps the buffer is an obvious area where things might be going wrong. And since there's not much of it and it all happens within the CompositorOGL::EndFrame() method in both versions (which, incidentally, share the same code almost identically) it seemed worth spending the time to check it.
So first up the actual code that swaps the buffers. The idea is that the previous rendered frame is held in one buffer; the next frame is rendered off screen and then swapped out for the old one. This ensures the swap happens fast, minimising the chance of artefacts and flicker appearing during the render cycle.
Here's a copy of the method for reference.
bool GLContextEGL::SwapBuffers() { EGLSurface surface = mSurfaceOverride != EGL_NO_SURFACE ? mSurfaceOverride : mSurface; if (surface) { if ((mEgl->IsExtensionSupported( GLLibraryEGL::EXT_swap_buffers_with_damage) || mEgl->IsExtensionSupported( GLLibraryEGL::KHR_swap_buffers_with_damage))) { std::vector<EGLint> rects; for (auto iter = mDamageRegion.RectIter(); !iter.Done(); iter.Next()) { const IntRect& r = iter.Get(); rects.push_back(r.X()); rects.push_back(r.Y()); rects.push_back(r.Width()); rects.push_back(r.Height()); } mDamageRegion.SetEmpty(); return mEgl->fSwapBuffersWithDamage(mEgl->Display(), surface, rects.data(), rects.size() / 4); } return mEgl->fSwapBuffers(mEgl->Display(), surface); } else { return false; } }Stepping through the code on ESR 78 this is what we get:
Thread 36 "Compositor" hit Breakpoint 1, mozilla::gl::GLContextEGL:: SwapBuffers (this=0x7eac109140) at gfx/gl/GLContextProviderEGL.cpp:643 643 bool GLContextEGL::SwapBuffers() { (gdb) n [Thread 0x7fa4afe9a0 (LWP 24091) exited] 644 EGLSurface surface = (gdb) n [New Thread 0x7fa467c9a0 (LWP 24117)] 646 if (surface) { (gdb) p surface $1 = (EGLSurface) 0x7eac004170 (gdb) n 647 if ((mEgl->IsExtensionSupported( (gdb) n 415 /opt/cross/aarch64-meego-linux-gnu/include/c++/8.3.0/bitset: No such file or directory. (gdb) 663 return mEgl->fSwapBuffers(mEgl->Display(), surface); (gdb)The really important thing to notice about this is that the condition checking support for EXT_swap_buffers_with_damage or KHR_swap_buffers_with_damage both fail. This means that the condition is skipped rather than being entered. You may recall that back on Day 83 we had to explicitly disable these two extensions and that it was the act of disabling them that resulted in the onscreen render pipeline working successfully for the first time.
It's therefore an obvious concern that this might be causing problems for the offscreen render pipeline as well. Clearly on ESR 78 these are disabled, but are they on ESR 91? Here's what we get when we try with ESR 91:
Thread 37 "Compositor" hit Breakpoint 1, mozilla::gl::GLContextEGL:: SwapBuffers (this=0x7ee019aa50) at gfx/gl/GLContextProviderEGL.cpp:662 662 bool GLContextEGL::SwapBuffers() { (gdb) n 663 EGLSurface surface = (gdb) 665 if (surface) { (gdb) p surface $1 = (EGLSurface) 0x7ee0004b60 (gdb) n [Thread 0x7f1f2fe7e0 (LWP 958) exited] 666 if ((mEgl->IsExtensionSupported( (gdb) 415 include/c++/8.3.0/bitset: No such file or directory. (gdb) 682 return mEgl->fSwapBuffers(surface); (gdb)This is almost identical to the ESR 78 output and crucially the extension condition is also being skipped. So I guess it's not this that's causing the problems for the offscreen render. Nonetheless I'm glad I checked.
While stepping through this I noticed that when stepping out of the SwapBuffers() method I end up in the CompositorOGL::EndFrame() method. This is also a pretty simple and linear method, so I figure it might be worth checking it as well. So that's what I'm going to do.
Here's the output from the ESR 78 code for CompositorOGL::EndFrame(). I've cut some of the output for brevity.
Thread 36 "Compositor" hit Breakpoint 3, mozilla::layers:: CompositorOGL::EndFrame (this=0x7eac003450) at gfx/layers/opengl/CompositorOGL.cpp:2000 2000 void CompositorOGL::EndFrame() { (gdb) n 2001 AUTO_PROFILER_LABEL("CompositorOGL::EndFrame", GRAPHICS); (gdb) n 2022 mShouldInvalidateWindow = false; (gdb) n 2024 if (mTarget) { (gdb) p mTarget $4 = {mRawPtr = 0x0} (gdb) n 2034 mWindowRenderTarget = nullptr; (gdb) n 2035 mCurrentRenderTarget = nullptr; (gdb) n 2037 if (mTexturePool) { (gdb) p mTexturePool $5 = {mRawPtr = 0x0} (gdb) n 2043 mGLContext->SetDamage(mCurrentFrameInvalidRegion); (gdb) n 2044 mGLContext->SwapBuffers(); (gdb) n 2045 mGLContext->fBindBuffer(LOCAL_GL_ARRAY_BUFFER, 0); (gdb) n 2049 mGLContext->fActiveTexture(LOCAL_GL_TEXTURE0 + i); (gdb) n 2050 mGLContext->fBindTexture(LOCAL_GL_TEXTURE_2D, 0); (gdb) n 2051 if (!mGLContext->IsGLES()) { (gdb) n 258 obj-build-mer-qt-xr/dist/include/GLContext.h: No such file or directory. [...] (gdb) n 2056 mCurrentFrameInvalidRegion.SetEmpty(); (gdb) n 2058 Compositor::EndFrame(); (gdb)And here's the same on ESR 91. Once again, the execution flow, variables and output overall are almost identical.
Thread 37 "Compositor" hit Breakpoint 2, mozilla::layers:: CompositorOGL::EndFrame (this=0x7ee0002f10) at gfx/layers/opengl/CompositorO GL.cpp:2014 2014 void CompositorOGL::EndFrame() { (gdb) n 2015 AUTO_PROFILER_LABEL("CompositorOGL::EndFrame", GRAPHICS); (gdb) n 2035 mFrameInProgress = false; (gdb) n 2036 mShouldInvalidateWindow = false; (gdb) n 2038 if (mTarget) { (gdb) p mTarget $2 = {mRawPtr = 0x0} (gdb) n 2048 mWindowRenderTarget = nullptr; (gdb) n 2049 mCurrentRenderTarget = nullptr; (gdb) n 2051 if (mTexturePool) { (gdb) p mTexturePool $3 = {mRawPtr = 0x0} (gdb) n 2057 mGLContext->SetDamage(mCurrentFrameInvalidRegion); (gdb) n 2058 mGLContext->SwapBuffers(); (gdb) n 2059 mGLContext->fBindBuffer(LOCAL_GL_ARRAY_BUFFER, 0); (gdb) n 2063 mGLContext->fActiveTexture(LOCAL_GL_TEXTURE0 + i); (gdb) n 2064 mGLContext->fBindTexture(LOCAL_GL_TEXTURE_2D, 0); (gdb) n 2065 if (!mGLContext->IsGLES()) { (gdb) n 260 ${PROJECT}/obj-build-mer-qt-xr/dist/include/GLContext.h: No such file or directory. [...] (gdb) n 2070 mCurrentFrameInvalidRegion.SetEmpty(); (gdb) n [...] 2072 Compositor::EndFrame(); (gdb)That's all good. Both match up nicely as we might hope. With that dealt with it now really does feel like it's time to move on. To cement this segue I've set the ESR 91 build running. This will allow me to check the changes I made over the last few days to the GetAppDisplay() usage. For some reason every night for the last few nights I've forgotten to run the build which has prevented me from checking it.
What would be great is if this build were to complete before the end of today because I want to run a new build of ESR 78 overnight. Rebuilding ESR 78 might sound a little crazy, after all I should be focusing on ESR 91. But as I've previously explained, I want to test out rendering to the background in different colours to establish more accurately where rendering is failing. Every thing I've tried up to now has had no effect on the output, but that's all been using ESR 91. I want to test the same things on ESR 78. That means making minimal changes to the ESR 78 code and doing a rebuild to test it on-device.
I can't think of a better way of doing this. I've come to the conclusion I just can't get the same results using the debugger alone. And so it is that I need to rebuild ESR 78.
With all these builds and plans for builds, I won't be able to do much more work on this today, so onward into the week to see how things progress.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
8 Apr 2024 : Day 210 #
I hope you'll forgive the fact it's a bit of a long diary entry today. If I were writing this properly I'd split the content up across a couple of days, but that wouldn't reflect what I'm actually doing. I'm not sure why I care about matching with reality, except to give a more honest reflection of what I'm actually doing. As with all of these diary entries, you won't miss anything if you skip the debugger output, skim the text, or frankly just skip the entire entry!
There were a couple of tasks (well, three if I'm honest) left over on my to-do list from yesterday. The first was to compare the two library creation executions on ESR 78 and ESR 91.
I've been doing that throughout the day today. It was quite a lengthy process and there were some odd changes. Like the fact that on ESR 78 two sets of EGL client extensions get initialised, each of which includes a call to MarkExtensions() with a particular prefix. The prefixes are "client" and "display":
The other big difference is that in GLLibraryEGL::DoEnsureInitialized() there's a call, about half way through the initialisation, to CreateDisplay():
Other than these I don't notice any other significant differences between the two initialisation processes.
Next up is the task of checking every instance where the display is used, with the aim of checking what it's set to in each case. While in ESR 78 the display value is used widely in the GLContextEGL class, it's primarily used to pass in to library methods. We saw that when we were stepping through the code on Day 207. In contrast ESR 91 moves the display variable directly in to the library calls. For example, In ESR 78 we have this — in my opinion — hideous use of non-bracketed code where the EGLDisplay dpy parameter is passed in:
While stepping through the code I was surprised to see GLContextEGL::CreateGLContext() being called twice on ESR 91. This made me a little suspicious, so I checked to see whether the same thing is happening on ESR 78.
It is. This still feels odd to me, but it's consistent across the two versions so presumably correct.
Here are the breakpoints that get hit on ESR 78:
To my eye, even though the initialisation flows of ESR 78 and ESR 91 are different, it looks to me like the same things get executed largely in the same order and the various relevant objects end up in a similar state.
Where does this leave things? It means that it's probably time to move on from the render initialisation process and take a look at the page rendering process instead. I'd still also like to get some kind of confirmation that if something were being rendered it would show on screen. You may recall I tried to do this a while back by changing the background colour of the screen clearing calls. This didn't yield good results, so I'll need to put some thought into an alternative method to show me what I need. I'll have to think about that tomorrow.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
There were a couple of tasks (well, three if I'm honest) left over on my to-do list from yesterday. The first was to compare the two library creation executions on ESR 78 and ESR 91.
I've been doing that throughout the day today. It was quite a lengthy process and there were some odd changes. Like the fact that on ESR 78 two sets of EGL client extensions get initialised, each of which includes a call to MarkExtensions() with a particular prefix. The prefixes are "client" and "display":
MarkExtensions(rawExtString, shouldDumpExts, "client", &mAvailableExtensions); [...] MarkExtensions(rawExtString, shouldDumpExts, "display", &mAvailableExtensions);On ESR 91 the initialisation process, which is quite similar in most other respects, does this differently, instead only initialising extensions with the "lib" prefix during the GLLibraryEGL::Init() method and then initialising extensions with the "display" prefix in the EglDisplay constructor:
MarkExtensions(rawExtString, shouldDumpExts, "display", sEGLExtensionNames, &mAvailableExtensions); [...] MarkExtensions(rawExtString, shouldDumpExts, "lib", sEGLLibraryExtensionNames, &mAvailableExtensions);To be honest, I'm not really sure what this MarkExtensions() method is really needed for (I've read the code; it didn't really enlighten me), so I don't know whether this is a really significant difference or not. I'm going to hope not for the moment.
The other big difference is that in GLLibraryEGL::DoEnsureInitialized() there's a call, about half way through the initialisation, to CreateDisplay():
mEGLDisplay = CreateDisplay(forceAccel, gfxInfo, out_failureId, aDisplay); if (!mEGLDisplay) { return false; }This has been removed in ESR 91 and now happens in the GLLibraryEGL::DefaultDisplay() method instead. What this means is that rather than being created during the library initialisation, it gets created the first time the default display is requested. This changes the order of things and, while I don't think it amounts to a practical difference, I could be wrong. Ordering can sometimes be crucial.
Other than these I don't notice any other significant differences between the two initialisation processes.
Next up is the task of checking every instance where the display is used, with the aim of checking what it's set to in each case. While in ESR 78 the display value is used widely in the GLContextEGL class, it's primarily used to pass in to library methods. We saw that when we were stepping through the code on Day 207. In contrast ESR 91 moves the display variable directly in to the library calls. For example, In ESR 78 we have this — in my opinion — hideous use of non-bracketed code where the EGLDisplay dpy parameter is passed in:
EGLSurface fCreatePbufferSurface(EGLDisplay dpy, EGLConfig config, const EGLint* attrib_list) const WRAP(fCreatePbufferSurface(dpy, config, attrib_list))Compare this to the ESR 91 code, where the display value is now a class variable and so no longer needs to be passed on. Thankfully the brackets and indentation have been sorted out as well:
EGLSurface fCreatePbufferSurface(EGLConfig config, const EGLint* attrib_list) const { return mLib->fCreatePbufferSurface(mDisplay, config, attrib_list); }The consequence is that while the value for the display must be passed in when library methods are called in the ESR 78 code like this:
surface = egl->fCreatePbufferSurface(egl->Display(), config, &pbattrs[0]);In the ESR 91 code there's no need to pass the value in. We have this instead:
EGLSurface surface = egl.fCreatePbufferSurface(config, pbattrs.data());It's actually a nice change and simplifies the code greatly. But it makes our comparison harder. I've had to put breakpoints on all of the library methods as a result. Here's the full list:
GLContextEGL::GetWSIInfo GLContextEGL::CreateGLContext CreateEmulatorBufferSurface EglDisplay::fTerminate EglDisplay::fMakeCurrent EglDisplay::fDestroyContext EglDisplay::fCreateContext EglDisplay::fDestroySurface EglDisplay::fCreateWindowSurface EglDisplay::fCreatePbufferSurface EglDisplay::fCreatePbufferFromClientBuffer EglDisplay::fChooseConfig EglDisplay::fGetConfigAttrib EglDisplay::fGetConfigs EglDisplay::fSwapBuffers EglDisplay::fBindTexImage EglDisplay::fReleaseTexImage EglDisplay::fSwapInterval EglDisplay::fCreateImage EglDisplay::fDestroyImage EglDisplay::fQuerySurface EglDisplay::fQuerySurfacePointerANGLE EglDisplay::fCreateSync EglDisplay::fDestroySync EglDisplay::fClientWaitSync EglDisplay::fGetSyncAttrib EglDisplay::fWaitSync EglDisplay::fDupNativeFenceFDANDROID EglDisplay::fQueryDisplayAttribEXT EglDisplay::fCreateStreamKHR EglDisplay::fDestroyStreamKHR EglDisplay::fQueryStreamKHR EglDisplay::fStreamConsumerGLTextureExternalKHR EglDisplay::fStreamConsumerAcquireKHR EglDisplay::fStreamConsumerReleaseKHR EglDisplay::fStreamConsumerGLTextureExternalAttribsNV EglDisplay::fCreateStreamProducerD3DTextureANGLE EglDisplay::fStreamPostD3DTextureANGLE EglDisplay::fSwapBuffersWithDamage EglDisplay::fSetDamageRegionAnd here's the output resulting from my putting breakpoints on them. Please forgive the long and rather dull output here, but I'm trying to follow the same processes as on Day 207, which means sharing the same output here as well. Note that many of the breakpoints simply don't stick:
(gdb) b GLContextEGL::GetWSIInfo Breakpoint 10 at 0x7ff1105558: file gfx/gl/GLContextProviderEGL.cpp, line 692. (gdb) b GLContextEGL::CreateGLContext Breakpoint 11 at 0x7ff112d8d0: file gfx/gl/GLContextProviderEGL.cpp, line 742. (gdb) b CreateEmulatorBufferSurface Breakpoint 12 at 0x7ff111d2c8: file gfx/gl/GLContextProviderEGL.cpp, line 980. (gdb) b EglDisplay::fTerminate Breakpoint 13 at 0x7ff111baf0: file gfx/gl/GLLibraryEGL.h, line 234. (gdb) b EglDisplay::fMakeCurrent Breakpoint 14 at 0x7ff1104a04: EglDisplay::fMakeCurrent. (2 locations) (gdb) b EglDisplay::fDestroyContext Breakpoint 15 at 0x7ff112f4b0: file gfx/gl/GLLibraryEGL.h, line 242. (gdb) b EglDisplay::fCreateContext Breakpoint 16 at 0x7ff11098c8: file gfx/gl/GLLibraryEGL.h, line 248. (gdb) b EglDisplay::fDestroySurface Breakpoint 17 at 0x7ff1104a20: EglDisplay::fDestroySurface. (2 locations) (gdb) b EglDisplay::fCreateWindowSurface Breakpoint 18 at 0x7ff11182ac: file gfx/gl/GLLibraryEGL.h, line 259. (gdb) b EglDisplay::fCreatePbufferSurface Breakpoint 19 at 0x7ff1109ef8: file gfx/gl/GLLibraryEGL.h, line 265. (gdb) b EglDisplay::fCreatePbufferFromClientBuffer Function "EglDisplay::fCreatePbufferFromClientBuffer" not defined. Make breakpoint pending on future shared library load? (y or [n]) n (gdb) b EglDisplay::fChooseConfig Breakpoint 20 at 0x7ff1109bd8: EglDisplay::fChooseConfig. (2 locations) (gdb) b EglDisplay::fGetConfigAttrib Breakpoint 21 at 0x7ff110bf44: EglDisplay::fGetConfigAttrib. (5 locations) (gdb) b EglDisplay::fGetConfigs Function "EglDisplay::fGetConfigs" not defined. Make breakpoint pending on future shared library load? (y or [n]) n (gdb) b EglDisplay::fSwapBuffers Breakpoint 22 at 0x7ff110ba70: file gfx/gl/GLLibraryEGL.h, line 303. (gdb) b EglDisplay::fBindTexImage Breakpoint 23 at 0x7ff1104ad0: file gfx/gl/GLLibraryEGL.h, line 324. (gdb) b EglDisplay::fReleaseTexImage Breakpoint 24 at 0x7ff1104a68: file gfx/gl/GLLibraryEGL.h, line 329. (gdb) b EglDisplay::fSwapInterval Breakpoint 25 at 0x7ff112e1dc: file gfx/gl/GLLibraryEGL.h, line 706. (gdb) b EglDisplay::fCreateImage Function "EglDisplay::fCreateImage" not defined. Make breakpoint pending on future shared library load? (y or [n]) n (gdb) b EglDisplay::fDestroyImage Function "EglDisplay::fDestroyImage" not defined. Make breakpoint pending on future shared library load? (y or [n]) n (gdb) b EglDisplay::fQuerySurface Breakpoint 26 at 0x7ff110751c: file gfx/gl/GLLibraryEGL.h, line 348. (gdb) b EglDisplay::fQuerySurfacePointerANGLE Function "EglDisplay::fQuerySurfacePointerANGLE" not defined. Make breakpoint pending on future shared library load? (y or [n]) n (gdb) b EglDisplay::fCreateSync Function "EglDisplay::fCreateSync" not defined. Make breakpoint pending on future shared library load? (y or [n]) n (gdb) b EglDisplay::fDestroySync Function "EglDisplay::fDestroySync" not defined. Make breakpoint pending on future shared library load? (y or [n]) n (gdb) b EglDisplay::fClientWaitSync Function "EglDisplay::fClientWaitSync" not defined. Make breakpoint pending on future shared library load? (y or [n]) n (gdb) b EglDisplay::fGetSyncAttrib Function "EglDisplay::fGetSyncAttrib" not defined. Make breakpoint pending on future shared library load? (y or [n]) n (gdb) b EglDisplay::fWaitSync Function "EglDisplay::fWaitSync" not defined. Make breakpoint pending on future shared library load? (y or [n]) n (gdb) b EglDisplay::fDupNativeFenceFDANDROID Function "EglDisplay::fDupNativeFenceFDANDROID" not defined. Make breakpoint pending on future shared library load? (y or [n]) n (gdb) b EglDisplay::fQueryDisplayAttribEXT Function "EglDisplay::fQueryDisplayAttribEXT" not defined. Make breakpoint pending on future shared library load? (y or [n]) n (gdb) b EglDisplay::fCreateStreamKHR Function "EglDisplay::fCreateStreamKHR" not defined. Make breakpoint pending on future shared library load? (y or [n]) n (gdb) b EglDisplay::fDestroyStreamKHR Function "EglDisplay::fDestroyStreamKHR" not defined. Make breakpoint pending on future shared library load? (y or [n]) n (gdb) b EglDisplay::fQueryStreamKHR Function "EglDisplay::fQueryStreamKHR" not defined. Make breakpoint pending on future shared library load? (y or [n]) n (gdb) b EglDisplay::fStreamConsumerGLTextureExternalKHR Function "EglDisplay::fStreamConsumerGLTextureExternalKHR" not defined. Make breakpoint pending on future shared library load? (y or [n]) n (gdb) b EglDisplay::fStreamConsumerAcquireKHR Function "EglDisplay::fStreamConsumerAcquireKHR" not defined. Make breakpoint pending on future shared library load? (y or [n]) n (gdb) b EglDisplay::fStreamConsumerReleaseKHR Function "EglDisplay::fStreamConsumerReleaseKHR" not defined. Make breakpoint pending on future shared library load? (y or [n]) n (gdb) b EglDisplay::fStreamConsumerGLTextureExternalAttribsNV Function "EglDisplay::fStreamConsumerGLTextureExternalAttribsNV" not defined. Make breakpoint pending on future shared library load? (y or [n]) n (gdb) b EglDisplay::fCreateStreamProducerD3DTextureANGLE Function "EglDisplay::fCreateStreamProducerD3DTextureANGLE" not defined. Make breakpoint pending on future shared library load? (y or [n]) n (gdb) b EglDisplay::fStreamPostD3DTextureANGLE Function "EglDisplay::fStreamPostD3DTextureANGLE" not defined. Make breakpoint pending on future shared library load? (y or [n]) n (gdb) b EglDisplay::fSwapBuffersWithDamage Breakpoint 27 at 0x7ff110bc3c: file gfx/gl/GLLibraryEGL.h, line 448. (gdb) b EglDisplay::fSetDamageRegion Function "EglDisplay::fSetDamageRegion" not defined. Make breakpoint pending on future shared library load? (y or [n]) n (gdb) r [...]With that done the next step is to hop carefully through the code from breakpoint to breakpoint. The important thing to note here is that the display values exhibit the same behaviour as in ESR 78: they're all the same value and all set to 0x01. Once again this is a long and rather dull piece of debugger output; I've actually cut out a big portion since everything ends up looking the same.
Thread 38 "Compositor" hit Breakpoint 20, mozilla::gl::EglDisplay:: fChooseConfig (num_config=0x7f1f96efb4, config_size=1, configs=0x7f1f96efc0, attrib_list=0x7edc004de8, this=0x7edc003590) at gfx/gl/GLLibraryEGL.h:679 679 return mLib->fChooseConfig(mDisplay, attrib_list, configs, config_size, (gdb) p mDisplay $1 = (const EGLDisplay) 0x1 (gdb) c Continuing. Thread 38 "Compositor" hit Breakpoint 19, mozilla::gl::EglDisplay:: fCreatePbufferSurface (attrib_list=0x7edc003cc8, config=0x555599bb50, this=0x7edc003590) at gfx/gl/GLLibraryEGL.h:666 666 return mLib->fCreatePbufferSurface(mDisplay, config, attrib_list); (gdb) p /x mDisplay $2 = 0x1 (gdb) c Continuing. Thread 38 "Compositor" hit Breakpoint 11, mozilla::gl::GLContextEGL:: CreateGLContext ( egl=std::shared_ptr<mozilla::gl::EglDisplay> (use count 4, weak count 2) = {...}, desc=..., config=config@entry=0x555599bb50, surface=surface@entry=0x7edc004b60, useGles=useGles@entry=false, out_failureId=out_failureId@entry=0x7f1f96f1c8) at gfx/gl/GLContextProviderEGL.cpp:742 742 nsACString* const out_failureId) { (gdb) p /x egl->mDisplay $3 = 0x1 (gdb) c Continuing. [...] Thread 38 "Compositor" hit Breakpoint 14, mozilla::gl::EglDisplay:: fMakeCurrent (ctx=0x7edc004be0, read=0x7edc004b60, draw=0x7edc004b60, this=0x7edc003590) at gfx/gl/GLLibraryEGL.h:643 643 return mLib->fMakeCurrent(mDisplay, draw, read, ctx); (gdb) p /x mDisplay $9 = 0x1 (gdb) c Continuing. [New Thread 0x7f1f3bd7e0 (LWP 23890)] =============== Preparing offscreen rendering context =============== [New Thread 0x7f1f1bc7e0 (LWP 23891)] [New Thread 0x7f1f17b7e0 (LWP 23892)] [...] Thread 38 "Compositor" hit Breakpoint 14, mozilla::gl::EglDisplay:: fMakeCurrent (ctx=0x7edc004be0, read=0x7edc004b60, draw=0x7edc004b60, this=0x7edc003590) at gfx/gl/GLLibraryEGL.h:643 643 return mLib->fMakeCurrent(mDisplay, draw, read, ctx); (gdb) p /x mDisplay $10 = 0x1 (gdb) c Continuing. [Thread 0x7f1f3bd7e0 (LWP 23890) exited] [...]That all looks healthy to me. I'm happy to see this, even if it means there's not a problem to fix, it makes me more confident that the initialisation steps are working as intended. The EGL display value, at least, seems to be set and used correctly.
While stepping through the code I was surprised to see GLContextEGL::CreateGLContext() being called twice on ESR 91. This made me a little suspicious, so I checked to see whether the same thing is happening on ESR 78.
It is. This still feels odd to me, but it's consistent across the two versions so presumably correct.
Here are the breakpoints that get hit on ESR 78:
Thread 36 "Compositor" hit Breakpoint 1, mozilla::gl::GLContextEGL:: CreateGLContext (egl=egl@entry=0x7eac0036a0, flags=flags@entry=mozilla::gl::CreateContextFlags::REQUIRE_COMPAT_PROFILE, caps=..., isOffscreen=isOffscreen@entry=true, config=config@entry=0x55558c8450, surface=surface@entry=0x7eac004170, useGles=useGles@entry=false, out_failureId=out_failureId@entry=0x7fa512d378) at gfx/gl/GLContextProviderEGL.cpp:719 719 nsACString* const out_failureId) { (gdb) bt #0 mozilla::gl::GLContextEGL::CreateGLContext (egl=egl@entry=0x7eac0036a0, flags=flags@entry=mozilla::gl::CreateContextFlags::REQUIRE_COMPAT_PROFILE, caps=..., isOffscreen=isOffscreen@entry=true, config=config@entry=0x55558c8450, surface=surface@entry=0x7eac004170, useGles=useGles@entry=false, out_failureId=out_failureId@entry=0x7fa512d378) at gfx/gl/GLContextProviderEGL.cpp:719 #1 0x0000007fb8e6f244 in mozilla::gl::GLContextEGL:: CreateEGLPBufferOffscreenContextImpl (out_failureId=0x7fa512d378, aUseGles=false, minCaps=..., size=..., flags=mozilla::gl::CreateContextFlags::REQUIRE_COMPAT_PROFILE) at gfx/gl/GLContextProviderEGL.cpp:1360 #2 mozilla::gl::GLContextEGL::CreateEGLPBufferOffscreenContextImpl ( flags=<optimized out>, size=..., minCaps=..., aUseGles=<optimized out>, out_failureId=0x7fa512d378) at gfx/gl/GLContextProviderEGL.cpp:1302 #3 0x0000007fb8e6f398 in mozilla::gl::GLContextEGL:: CreateEGLPBufferOffscreenContext ( flags=flags@entry=mozilla::gl::CreateContextFlags::REQUIRE_COMPAT_PROFILE, size=..., minCaps=..., out_failureId=out_failureId@entry=0x7fa512d378) at gfx/gl/GLContextProviderEGL.cpp:1377 #4 0x0000007fb8e6f41c in mozilla::gl::GLContextProviderEGL::CreateHeadless ( flags=flags@entry=mozilla::gl::CreateContextFlags::REQUIRE_COMPAT_PROFILE, out_failureId=out_failureId@entry=0x7fa512d378) at gfx/gl/GLContextProviderEGL.cpp:1391 [...] #29 0x0000007fbe70b89c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/ clone.S:78 (gdb) c Continuing. Thread 36 "Compositor" hit Breakpoint 1, mozilla::gl::GLContextEGL:: CreateGLContext (egl=egl@entry=0x7eac0036a0, flags=flags@entry=mozilla::gl::CreateContextFlags::REQUIRE_COMPAT_PROFILE, caps=..., isOffscreen=isOffscreen@entry=true, config=config@entry=0x55558c8450, surface=surface@entry=0x7eac004170, useGles=useGles@entry=true, out_failureId=out_failureId@entry=0x7fa512d378) at gfx/gl/GLContextProviderEGL.cpp:719 719 nsACString* const out_failureId) { (gdb) bt #0 mozilla::gl::GLContextEGL::CreateGLContext (egl=egl@entry=0x7eac0036a0, flags=flags@entry=mozilla::gl::CreateContextFlags::REQUIRE_COMPAT_PROFILE, caps=..., isOffscreen=isOffscreen@entry=true, config=config@entry=0x55558c8450, surface=surface@entry=0x7eac004170, useGles=useGles@entry=true, out_failureId=out_failureId@entry=0x7fa512d378) at gfx/gl/GLContextProviderEGL.cpp:719 #1 0x0000007fb8e6f244 in mozilla::gl::GLContextEGL:: CreateEGLPBufferOffscreenContextImpl (out_failureId=0x7fa512d378, aUseGles=true, minCaps=..., size=..., flags=mozilla::gl::CreateContextFlags::REQUIRE_COMPAT_PROFILE) at gfx/gl/GLContextProviderEGL.cpp:1360 #2 mozilla::gl::GLContextEGL::CreateEGLPBufferOffscreenContextImpl ( flags=<optimized out>, size=..., minCaps=..., aUseGles=<optimized out>, out_failureId=0x7fa512d378) at gfx/gl/GLContextProviderEGL.cpp:1302 #3 0x0000007fb8e6f3c4 in mozilla::gl::GLContextEGL:: CreateEGLPBufferOffscreenContext ( flags=flags@entry=mozilla::gl::CreateContextFlags::REQUIRE_COMPAT_PROFILE, size=..., minCaps=..., out_failureId=out_failureId@entry=0x7fa512d378) at gfx/gl/GLContextProviderEGL.cpp:1380 #4 0x0000007fb8e6f41c in mozilla::gl::GLContextProviderEGL::CreateHeadless ( flags=flags@entry=mozilla::gl::CreateContextFlags::REQUIRE_COMPAT_PROFILE, out_failureId=out_failureId@entry=0x7fa512d378) at gfx/gl/GLContextProviderEGL.cpp:1391 [...] #29 0x0000007fbe70b89c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/ clone.S:78 (gdb) c Continuing. [New Thread 0x7fa4afe9a0 (LWP 3400)] =============== Preparing offscreen rendering context =============== [...]And here are the same breakpoints getting hit on ESR 91. As you can see, the backtraces are similar in both cases. These all seems to be legit.
Thread 37 "Compositor" hit Breakpoint 28, mozilla::gl::GLContextEGL:: CreateGLContext ( egl=std::shared_ptr<mozilla::gl::EglDisplay> (use count 4, weak count 2) = {...}, desc=..., config=config@entry=0x55558f7d10, surface=surface@entry=0x7ed8004b60, useGles=useGles@entry=false, out_failureId=out_failureId@entry=0x7f1f96f1c8) at gfx/gl/GLContextProviderEGL.cpp:742 742 nsACString* const out_failureId) { (gdb) bt #0 mozilla::gl::GLContextEGL::CreateGLContext (egl=std::shared_ptr<mozilla::gl: :EglDisplay> (use count 4, weak count 2) = {...}, desc=..., config=config@entry=0x55558f7d10, surface=surface@entry=0x7ed8004b60, useGles=useGles@entry=false, out_failureId=out_failureId@entry=0x7f1f96f1c8) at gfx/gl/GLContextProviderEGL.cpp:742 #1 0x0000007ff112e674 in mozilla::gl::GLContextEGL:: CreateEGLPBufferOffscreenContextImpl ( egl=std::shared_ptr<mozilla::gl::EglDisplay> (use count 4, weak count 2) = {...}, desc=..., size=..., useGles=useGles@entry=false, out_failureId=out_failureId@entry=0x7f1f96f1c8) at include/c++/8.3.0/ext/atomicity.h:96 #2 0x0000007ff112e7e4 in mozilla::gl::GLContextEGL:: CreateEGLPBufferOffscreenContext ( display=std::shared_ptr<mozilla::gl::EglDisplay> (use count 4, weak count 2) = {...}, desc=..., size=..., out_failureId=out_failureId@entry=0x7f1f96f1c8) at include/c++/8.3.0/ext/atomicity.h:96 #3 0x0000007ff112e9b8 in mozilla::gl::GLContextProviderEGL::CreateHeadless ( desc=..., out_failureId=out_failureId@entry=0x7f1f96f1c8) at include/c++/8.3.0/ext/atomicity.h:96 #4 0x0000007ff112f260 in mozilla::gl::GLContextProviderEGL::CreateOffscreen ( size=..., minCaps=..., flags=flags@entry=mozilla::gl::CreateContextFlags::REQUIRE_COMPAT_PROFILE, out_failureId=out_failureId@entry=0x7f1f96f1c8) at gfx/gl/GLContextProviderEGL.cpp:1476 [...] #26 0x0000007ff6a0289c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/ clone.S:78 (gdb) c Continuing. Thread 37 "Compositor" hit Breakpoint 28, mozilla::gl::GLContextEGL:: CreateGLContext ( egl=std::shared_ptr<mozilla::gl::EglDisplay> (use count 4, weak count 2) = {...}, desc=..., config=config@entry=0x55558f7d10, surface=surface@entry=0x7ed8004b60, useGles=useGles@entry=true, out_failureId=out_failureId@entry=0x7f1f96f1c8) at gfx/gl/GLContextProviderEGL.cpp:742 742 nsACString* const out_failureId) { (gdb) bt #0 mozilla::gl::GLContextEGL::CreateGLContext (egl=std::shared_ptr<mozilla::gl: :EglDisplay> (use count 4, weak count 2) = {...}, desc=..., config=config@entry=0x55558f7d10, surface=surface@entry=0x7ed8004b60, useGles=useGles@entry=true, out_failureId=out_failureId@entry=0x7f1f96f1c8) at gfx/gl/GLContextProviderEGL.cpp:742 #1 0x0000007ff112e674 in mozilla::gl::GLContextEGL:: CreateEGLPBufferOffscreenContextImpl ( egl=std::shared_ptr<mozilla::gl::EglDisplay> (use count 4, weak count 2) = {...}, desc=..., size=..., useGles=useGles@entry=true, out_failureId=out_failureId@entry=0x7f1f96f1c8) at include/c++/8.3.0/ext/atomicity.h:96 #2 0x0000007ff112e888 in mozilla::gl::GLContextEGL:: CreateEGLPBufferOffscreenContext ( display=std::shared_ptr<mozilla::gl::EglDisplay> (use count 4, weak count 2) = {...}, desc=..., size=..., out_failureId=out_failureId@entry=0x7f1f96f1c8) at include/c++/8.3.0/ext/atomicity.h:96 #3 0x0000007ff112e9b8 in mozilla::gl::GLContextProviderEGL::CreateHeadless ( desc=..., out_failureId=out_failureId@entry=0x7f1f96f1c8) at include/c++/8.3.0/ext/atomicity.h:96 [...] #26 0x0000007ff6a0289c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/ clone.S:78 (gdb) c Continuing. [New Thread 0x7f1f3fe7e0 (LWP 28609)] =============== Preparing offscreen rendering context =============== [...]Since I now have breakpoints set on GLContextEGL::CreateGLContext(), as a final task for today, I thought I'd look at the inputs passed in to the method and stored within the GLContextEGL object. The creation methods are both pretty small and straightforward. Here's what happen when we inspect the values on ESR 78:
Thread 36 "Compositor" hit Breakpoint 4, mozilla::gl::GLContext:: GLContext (this=this@entry=0x7eac109140, flags=mozilla::gl::CreateContextFlags::REQUIRE_COMPAT_PROFILE, caps=..., sharedContext=sharedContext@entry=0x0, isOffscreen=true, useTLSIsCurrent=useTLSIsCurrent@entry=false) at gfx/gl/GLContext.cpp:274 274 GLContext::GLContext(CreateContextFlags flags, const SurfaceCaps& caps, (gdb) p flags $12 = mozilla::gl::CreateContextFlags::REQUIRE_COMPAT_PROFILE (gdb) p caps $13 = (const mozilla::gl::SurfaceCaps &) @0x7fa516f210: {any = true, color = true, alpha = false, bpp16 = false, depth = false, stencil = false, premultAlpha = true, preserve = false, surfaceAllocator = {mRawPtr = 0x0}} (gdb) p sharedContext $14 = (mozilla::gl::GLContext *) 0x0 (gdb) p isOffscreen $15 = true (gdb) p useTLSIsCurrent $16 = false (gdb)On ESR 91 things are a little different. The flags and isOffscreen values are passed in as part of a GLContextDesc object rather than being passed in as separate parameters. That's not too hard to unravel for comparison. But more tricky is the fact that there's no caps parameter at all. That's because this gets set in the GLContext::InitOffscreen() method instead. To properly compare the values we need to therefore check this method as well:
Thread 38 "Compositor" hit Breakpoint 30, mozilla::gl::GLContext:: GLContext (this=this@entry=0x7edc19aa50, desc=..., sharedContext=sharedContext@entry=0x0, useTLSIsCurrent=useTLSIsCurrent@entry=false) at gfx/gl/GLContext.cpp:283 283 GLContext::GLContext(const GLContextDesc& desc, GLContext* sharedContext, (gdb) p desc.flags $26 = mozilla::gl::CreateContextFlags::REQUIRE_COMPAT_PROFILE (gdb) p sharedContext $27 = (mozilla::gl::GLContext *) 0x0 (gdb) p desc.isOffscreen $28 = true (gdb) p useTLSIsCurrent $29 = false (gdb) p desc $30 = (const mozilla::gl::GLContextDesc &) @0x7f1f930020: {<mozilla::gl:: GLContextCreateDesc> = { flags = mozilla::gl::CreateContextFlags::REQUIRE_COMPAT_PROFILE}, isOffscreen = true} (gdb) c Continuing. Thread 38 "Compositor" hit Breakpoint 31, mozilla::gl::GLContext:: InitOffscreen (this=this@entry=0x7edc19aa50, size=..., caps=...) at gfx/gl/GLContext.cpp:2424 2424 const SurfaceCaps& caps) { (gdb) p caps $31 = (const mozilla::gl::SurfaceCaps &) @0x7f1f930158: {any = false, color = true, alpha = false, bpp16 = false, depth = false, stencil = false, premultAlpha = true, preserve = false, surfaceAllocator = {mRawPtr = 0x0}} (gdb) n 2425 if (!CreateScreenBuffer(size, caps)) return false; [...] (gdb) n 2437 return true; (gdb) p mScreen.mTuple.mFirstA.mCaps $38 = {any = false, color = true, alpha = false, bpp16 = false, depth = false, stencil = false, premultAlpha = true, preserve = false, surfaceAllocator = {mRawPtr = 0x0}} $39 = {any = false, color = true, alpha = false, bpp16 = false, depth = false, stencil = false, premultAlpha = true, preserve = false, surfaceAllocator = {mRawPtr = 0x0}} (gdb)The conclusion of the above is that although storage of the different parameters happens in slightly different places, the result is ultimately the same. The values stored as part of GLContext match across ESR 78 and ESR 91. On the one hand this isn't surprising: by now I've spent quite a bit of effort doing my best to get them to align. On the other hand it's also surprising: quite a lot of effort has been needed to get them to align, so there's been plenty of scope for errors.
To my eye, even though the initialisation flows of ESR 78 and ESR 91 are different, it looks to me like the same things get executed largely in the same order and the various relevant objects end up in a similar state.
Where does this leave things? It means that it's probably time to move on from the render initialisation process and take a look at the page rendering process instead. I'd still also like to get some kind of confirmation that if something were being rendered it would show on screen. You may recall I tried to do this a while back by changing the background colour of the screen clearing calls. This didn't yield good results, so I'll need to put some thought into an alternative method to show me what I need. I'll have to think about that tomorrow.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
7 Apr 2024 : Day 209 #
You may recall that yesterday I was just reading through the code, writing up my understanding of how the flow differs between ESR 78 and ESR 91 during the initialisation steps of the offscreen rendering pipeline.
By the end of that discussion I'd come up with a set of tasks to perform:
I've not done these quite in the order I said I would. This evening I've completed the second of the tasks, removing the extra call to GetAppDisplay() from the ESR 91 code. I've done this by adding an aDisplay parameter to DefaultEglLibrary() so it now looks like this:
That still leaves the task of comparing the EGL library creation code. I realise that I also left another task outstanding from a few days back, which is to place breakpoints on the ESR 91 code everywhere the display is used and check the value is the same all the way through. I did this for ESR 78 on Day 207 and — while it was rather dull and laborious — it did help me to understand the execution flow. It also highlighted that all the display values are the same (set to 0x01) throughout the execution. I'd be wanting to see something similar on ESR 91.
That's all going to be for tomorrow though. Overnight I plan to build the packages so I can test out the GetAppDisplay() changes I made today.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
By the end of that discussion I'd come up with a set of tasks to perform:
First I'll compare the EGL library creation, then I'll remove this additional call to GetAppDisplay() and then, if none of these have fixed things, I'll make the larger changes to align the two flows more closely.
I've not done these quite in the order I said I would. This evening I've completed the second of the tasks, removing the extra call to GetAppDisplay() from the ESR 91 code. I've done this by adding an aDisplay parameter to DefaultEglLibrary() so it now looks like this:
RefPtr<GLLibraryEGL> DefaultEglLibrary(nsACString* const out_failureId, EGLDisplay aDisplay) { StaticMutexAutoLock lock(sMutex); if (!gDefaultEglLibrary) { gDefaultEglLibrary = GLLibraryEGL::Create(out_failureId, aDisplay); if (!gDefaultEglLibrary) { NS_WARNING("GLLibraryEGL::Create failed"); } } return gDefaultEglLibrary.get(); }As a consequence I'm able to remove the call to GetAppDisplay() that was happening inside of this method. This has actually ended up simplifying the code as well. There's still a call to GetAppDisplay() inside GLContextEGLFactory::CreateImpl() which I'd like to get rid of, but there's no really obvious way to do this right now.
That still leaves the task of comparing the EGL library creation code. I realise that I also left another task outstanding from a few days back, which is to place breakpoints on the ESR 91 code everywhere the display is used and check the value is the same all the way through. I did this for ESR 78 on Day 207 and — while it was rather dull and laborious — it did help me to understand the execution flow. It also highlighted that all the display values are the same (set to 0x01) throughout the execution. I'd be wanting to see something similar on ESR 91.
That's all going to be for tomorrow though. Overnight I plan to build the packages so I can test out the GetAppDisplay() changes I made today.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
6 Apr 2024 : Day 208 #
Yesterday I promised to do two things. First to analyse the structure of the code around GLContextProviderEGL::CreateOffscreen() to find out why there were two calls to GetAppDisplay() on ESR 78 but only one on ESR 91. Second I said I was going to step through all the places where the display is used in ESR 91 to compare against what we saw with ESR 78.
For the first of these, we have three points of reference. Let's focus first on the ESR 78 code which in practice uses GetAppDisplay() in only one place:
Let's compare that to ESR 91. With the new code the call to GLLibraryEGL::EnsureInitialized() has been removed. Instead the code drops straight through to CreateHeadless(). The implementation of CreateHeadless() is a little different: instead of just calling CreateEGLPBufferOffscreenContext() it first makes a call to DefaultEglDisplay() which is why we get this first breakpoint hit:
It's this rather peculiar restructuring of the code that has led to GetAppDisplay() being called twice on ESR 91. Because the call to GLLibraryEGL::Create() also requires a display, but the display isn't passed in to DefaultEglLibrary(). Consequently another call to GetAppDisplay() is being made inside DefaultEglLibrary() as you can see from the second breakpoint hit:
All of the differences in flow here between ESR 78 and ESR 91 make things really messy. But I don't think I should revert all of the changes in ESR 91 just yet, I'll need to take things in stages. First I'll compare the EGL library creation, then I'll remove this additional call to GetAppDisplay() and then, if none of these have fixed things, I'll make the larger changes to align the two flows more closely.
It feels like this has been a helpful analysis, even if I've not written any code or made any changes today. But I have made a plan and some immediate tasks for me to perform tomorrow. So definitely not wasted effort.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
For the first of these, we have three points of reference. Let's focus first on the ESR 78 code which in practice uses GetAppDisplay() in only one place:
#0 mozilla::gl::GetAppDisplay () at gfx/gl/GLContextProviderEGL.cpp:175 #1 0x0000007fb8e81a2c in mozilla::gl::GLContextProviderEGL::CreateOffscreen ( size=..., minCaps=..., flags=flags@entry=mozilla::gl::CreateContextFlags::REQUIRE_COMPAT_PROFILE, out_failureId=out_failureId@entry=0x7fa516f378) at gfx/gl/GLContextProviderEGL.cpp:1404The code this breakpoint is attached to looks like this:
if (!GLLibraryEGL::EnsureInitialized( forceEnableHardware, out_failureId, GetAppDisplay())) { // Needed for IsANGLE(). return nullptr; }This is a really crucial call because it ends up resulting in a call to GLLibraryEGL::DoEnsureInitialized() which is responsible for setting up everything about the EGL library. Having set that up we then drop back to CreateOffscreen() where CreateHeadless() is called to set up all the textures and surfaces. Finally GLContext::InitOffscreen() is called which looks like this:
bool GLContext::InitOffscreen(const gfx::IntSize& size, const SurfaceCaps& caps) { if (!CreateScreenBuffer(size, caps)) return false; if (!MakeCurrent()) { return false; } fBindFramebuffer(LOCAL_GL_FRAMEBUFFER, 0); fScissor(0, 0, size.width, size.height); fViewport(0, 0, size.width, size.height); mCaps = mScreen->mCaps; MOZ_ASSERT(!mCaps.any); return true; }With that done, all of the main initialisation steps for the display are complete.
Let's compare that to ESR 91. With the new code the call to GLLibraryEGL::EnsureInitialized() has been removed. Instead the code drops straight through to CreateHeadless(). The implementation of CreateHeadless() is a little different: instead of just calling CreateEGLPBufferOffscreenContext() it first makes a call to DefaultEglDisplay() which is why we get this first breakpoint hit:
#0 mozilla::gl::GetAppDisplay () at gfx/gl/GLContextProviderEGL.cpp:161 #1 0x0000007ff112e910 in mozilla::gl::GLContextProviderEGL::CreateHeadless ( desc=..., out_failureId=out_failureId@entry=0x7f1f5ac1c8) at gfx/gl/GLContextProviderEGL.cpp:1434 #2 0x0000007ff112f260 in mozilla::gl::GLContextProviderEGL::CreateOffscreen ( size=..., minCaps=..., flags=flags@entry=mozilla::gl::CreateContextFlags::REQUIRE_COMPAT_PROFILE, out_failureId=out_failureId@entry=0x7f1f5ac1c8) at gfx/gl/GLContextProviderEGL.cpp:1476That call will try to get the default EGL library and on the first occasion, when the "get" fails, it will call GLLibraryEGL::Create(). This will then end up calling GLLibraryEGL::Init() which is the closest equivalent we have in ESR 91 to GLLibraryEGL::DoEnsureInitialized(). I'm not going to have time today, but I should try to set some time aside over the coming days to compare the two. They're both long and complex methods, which makes them good candidates for problematic divergence.
It's this rather peculiar restructuring of the code that has led to GetAppDisplay() being called twice on ESR 91. Because the call to GLLibraryEGL::Create() also requires a display, but the display isn't passed in to DefaultEglLibrary(). Consequently another call to GetAppDisplay() is being made inside DefaultEglLibrary() as you can see from the second breakpoint hit:
#0 mozilla::gl::GetAppDisplay () at gfx/gl/GLContextProviderEGL.cpp:161 #1 0x0000007ff111907c in mozilla::gl::DefaultEglLibrary ( out_failureId=out_failureId@entry=0x7f1f5ac1c8) at gfx/gl/GLContextProviderEGL.cpp:1519 #2 0x0000007ff112e920 in mozilla::gl::DefaultEglDisplay (aDisplay=0x1, out_failureId=0x7f1f5ac1c8) at gfx/gl/GLContextEGL.h:29 #3 mozilla::gl::GLContextProviderEGL::CreateHeadless (desc=..., out_failureId=out_failureId@entry=0x7f1f5ac1c8) at gfx/gl/GLContextProviderEGL.cpp:1434 #4 0x0000007ff112f260 in mozilla::gl::GLContextProviderEGL::CreateOffscreen ( size=..., minCaps=..., flags=flags@entry=mozilla::gl::CreateContextFlags::REQUIRE_COMPAT_PROFILE, out_failureId=out_failureId@entry=0x7f1f5ac1c8) at gfx/gl/GLContextProviderEGL.cpp:1476This second call turns out to be unnecessary: it'd be better to add an aDisplay parameter to DefaultEglLibrary(), making things line up better with the code in ESR 78. Having said that, a quick search for DefaultEglLibrary() shows it's also used in one other place, so if I change its signature I'll have to update this other call as well:
already_AddRefed<GLContext> GLContextEGLFactory::CreateImpl( EGLNativeWindowType aWindow, bool aHardwareWebRender, bool aUseGles) { nsCString failureId; const auto lib = gl::DefaultEglLibrary(&failureId); if (!lib) { gfxCriticalNote << "Failed[3] to load EGL library: " << failureId.get(); return nullptr; } const auto egl = lib->CreateDisplay(true, &failureId, GetAppDisplay()); [...]Since there's already a call to GetAppDisplay() it wouldn't be the end of the world to just move it up a little inside the method and store the result for use in both places. So this is also on my task list for tomorrow.
All of the differences in flow here between ESR 78 and ESR 91 make things really messy. But I don't think I should revert all of the changes in ESR 91 just yet, I'll need to take things in stages. First I'll compare the EGL library creation, then I'll remove this additional call to GetAppDisplay() and then, if none of these have fixed things, I'll make the larger changes to align the two flows more closely.
It feels like this has been a helpful analysis, even if I've not written any code or made any changes today. But I have made a plan and some immediate tasks for me to perform tomorrow. So definitely not wasted effort.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
5 Apr 2024 : Day 207 #
Today I'm continuing to step through the two different versions of the WebView app I'm testing with. I'm particularly interested in the value set for the EGL display variable. I'm concerned that this isn't being set properly and, if that's the case, that would be a serious hindrance to having a working render.
First up I'm going to set a breakpoint on the GetAppDisplay() method. This is the method that's supposed to tie the EGL display — where rending takes place — to the Qt display interface. Hopefully the execution and usage will be similar across the two versions. But let's see.
First up these are the places the method gets called when running the ESR 78 code.
So let's perform the same process but this time using the ESR 91 build for comparison:
Moreover, the CreateHeadless() method is itself being called from GLContextProviderEGL::CreateOffscreen(). Recall that this is the place where the GetAppDisplay() method was being called from in ESR 78. So all of these calls are happening in close proximity of one another.
I'm going to put the analysis of this to one side for now and come back to it. The next question to figure out is what the value being returned by the method is.
I'm not able to extract this from inside the GetAppDisplay() method because of the way the method is structured. But the method returns the value that I'm interested in. In all cases this value gets immediately passed in to another method, so if I can breakpoint on the method it gets passed to I can then extract the value from the input parameters of the method.
On ESR 78 the relevant method is the following. Notice how it has an aDisplay parameter. That's the parameter we're interested in.
Let's compare that to ESR 91. This time there are two breakpoints to add and we want to extract the values from both, assuming they hit.
The difference between the two is concerning. The ESR 78 execution is particularly problematic because there are two values being generated and right now it's not clear which one is actually being used. In order to make progress I need to understand this better.
My plan is to go through each instance of egl->Display() or mEgl->Display() in GLContextProviderEGL.cpp and add a breakpoint to the method containing it so that we can check its value. This will have two purposes. First it will help to understand the flow of execution: which methods should be called and which are unused. Second it will allow me to test what the value of the display is in each case.
You might ask why I don't just place a breakpoint on EGLLibrary::Display() and watch the values flow in. Well, I tried and the breakpoint won't stick. There is some benefit to doing it the hard way, which is that I'm getting to review all of the places in the code where it's used. If I'd taken the easy route I would have missed that opportunity.
Here are the methods that make use of the Display() value. I'm going to place a breakpoint on every single one and record every single usage until the page is completely rendered.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
First up I'm going to set a breakpoint on the GetAppDisplay() method. This is the method that's supposed to tie the EGL display — where rending takes place — to the Qt display interface. Hopefully the execution and usage will be similar across the two versions. But let's see.
First up these are the places the method gets called when running the ESR 78 code.
(gdb) b GetAppDisplay Function "GetAppDisplay" not defined. Make breakpoint pending on future shared library load? (y or [n]) y Breakpoint 1 (GetAppDisplay) pending. (gdb) r Starting program: /usr/bin/harbour-webview [...] Thread 36 "Compositor" hit Breakpoint 1, mozilla::gl::GetAppDisplay () at gfx/gl/GLContextProviderEGL.cpp:175 175 static EGLDisplay GetAppDisplay() { (gdb) bt #0 mozilla::gl::GetAppDisplay () at gfx/gl/GLContextProviderEGL.cpp:175 #1 0x0000007fb8e81a2c in mozilla::gl::GLContextProviderEGL::CreateOffscreen ( size=..., minCaps=..., flags=flags@entry=mozilla::gl::CreateContextFlags::REQUIRE_COMPAT_PROFILE, out_failureId=out_failureId@entry=0x7fa516f378) at gfx/gl/GLContextProviderEGL.cpp:1404 #2 0x0000007fb8ee275c in mozilla::layers::CompositorOGL::CreateContext ( this=0x7eac003420) at gfx/layers/opengl/CompositorOGL.cpp:250 #3 mozilla::layers::CompositorOGL::CreateContext (this=0x7eac003420) at gfx/layers/opengl/CompositorOGL.cpp:223 #4 0x0000007fb8f033bc in mozilla::layers::CompositorOGL::Initialize ( this=0x7eac003420, out_failureReason=0x7fa516f730) at gfx/layers/opengl/CompositorOGL.cpp:374 #5 0x0000007fb8fdaf7c in mozilla::layers::CompositorBridgeParent:: NewCompositor (this=this@entry=0x7f8c99d8f0, aBackendHints=...) at gfx/layers/ipc/CompositorBridgeParent.cpp:1534 [...] #25 0x0000007fbe70b89c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/ clone.S:78 (gdb) c Continuing. [New Thread 0x7fa4afe9a0 (LWP 5119)] =============== Preparing offscreen rendering context =============== [...]From this we can see it only gets called once in the GLContextProviderEGL::CreateOffscreen() method. We're also interested in the value it gets set to and that's going to be the next thing to look at. But first off I just want to compare how often the method gets called and where from.
So let's perform the same process but this time using the ESR 91 build for comparison:
(gdb) b GetAppDisplay Function "GetAppDisplay" not defined. Make breakpoint pending on future shared library load? (y or [n]) y Breakpoint 1 (GetAppDisplay) pending. (gdb) r Starting program: /usr/bin/harbour-webview [...] Thread 37 "Compositor" hit Breakpoint 1, mozilla::gl::GetAppDisplay () at gfx/gl/GLContextProviderEGL.cpp:161 161 static EGLDisplay GetAppDisplay() { (gdb) handle SIGPIPE nostop Signal Stop Print Pass to program Description SIGPIPE No Yes Yes Broken pipe (gdb) bt #0 mozilla::gl::GetAppDisplay () at gfx/gl/GLContextProviderEGL.cpp:161 #1 0x0000007ff112e910 in mozilla::gl::GLContextProviderEGL::CreateHeadless ( desc=..., out_failureId=out_failureId@entry=0x7f1f5ac1c8) at gfx/gl/GLContextProviderEGL.cpp:1434 #2 0x0000007ff112f260 in mozilla::gl::GLContextProviderEGL::CreateOffscreen ( size=..., minCaps=..., flags=flags@entry=mozilla::gl::CreateContextFlags::REQUIRE_COMPAT_PROFILE, out_failureId=out_failureId@entry=0x7f1f5ac1c8) at gfx/gl/GLContextProviderEGL.cpp:1476 #3 0x0000007ff1197d40 in mozilla::layers::CompositorOGL::CreateContext ( this=this@entry=0x7ed8002f10) at gfx/layers/opengl/CompositorOGL.cpp:254 #4 0x0000007ff11ad520 in mozilla::layers::CompositorOGL::Initialize ( this=0x7ed8002f10, out_failureReason=0x7f1f5ac520) at gfx/layers/opengl/CompositorOGL.cpp:391 #5 0x0000007ff12c31fc in mozilla::layers::CompositorBridgeParent:: NewCompositor (this=this@entry=0x7fc4b77630, aBackendHints=...) at gfx/layers/ipc/CompositorBridgeParent.cpp:1493 [...] #24 0x0000007ff6a0289c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/ clone.S:78 (gdb) c Continuing. Thread 37 "Compositor" hit Breakpoint 1, mozilla::gl::GetAppDisplay () at gfx/gl/GLContextProviderEGL.cpp:161 161 static EGLDisplay GetAppDisplay() { (gdb) bt #0 mozilla::gl::GetAppDisplay () at gfx/gl/GLContextProviderEGL.cpp:161 #1 0x0000007ff111907c in mozilla::gl::DefaultEglLibrary ( out_failureId=out_failureId@entry=0x7f1f5ac1c8) at gfx/gl/GLContextProviderEGL.cpp:1519 #2 0x0000007ff112e920 in mozilla::gl::DefaultEglDisplay (aDisplay=0x1, out_failureId=0x7f1f5ac1c8) at gfx/gl/GLContextEGL.h:29 #3 mozilla::gl::GLContextProviderEGL::CreateHeadless (desc=..., out_failureId=out_failureId@entry=0x7f1f5ac1c8) at gfx/gl/GLContextProviderEGL.cpp:1434 #4 0x0000007ff112f260 in mozilla::gl::GLContextProviderEGL::CreateOffscreen ( size=..., minCaps=..., flags=flags@entry=mozilla::gl::CreateContextFlags::REQUIRE_COMPAT_PROFILE, out_failureId=out_failureId@entry=0x7f1f5ac1c8) at gfx/gl/GLContextProviderEGL.cpp:1476 #5 0x0000007ff1197d40 in mozilla::layers::CompositorOGL::CreateContext ( this=this@entry=0x7ed8002f10) at gfx/layers/opengl/CompositorOGL.cpp:254 #6 0x0000007ff11ad520 in mozilla::layers::CompositorOGL::Initialize ( this=0x7ed8002f10, out_failureReason=0x7f1f5ac520) at gfx/layers/opengl/CompositorOGL.cpp:391 #7 0x0000007ff12c31fc in mozilla::layers::CompositorBridgeParent:: NewCompositor (this=this@entry=0x7fc4b77630, aBackendHints=...) at gfx/layers/ipc/CompositorBridgeParent.cpp:1493 [...] #26 0x0000007ff6a0289c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/ clone.S:78 Continuing. [New Thread 0x7f1f1fe7e0 (LWP 16054)] =============== Preparing offscreen rendering context =============== [...]This time the breakpoint hits twice. First the method is called from GLContextProviderEGL::CreateHeadless(). Soon after it's called from DefaultEglLibrary(). But as it happens this itself is being called from GLContextProviderEGL::CreateHeadless(). So it shouldn't be too hard to follow this through and figure out why it's getting called twice instead of once.
Moreover, the CreateHeadless() method is itself being called from GLContextProviderEGL::CreateOffscreen(). Recall that this is the place where the GetAppDisplay() method was being called from in ESR 78. So all of these calls are happening in close proximity of one another.
I'm going to put the analysis of this to one side for now and come back to it. The next question to figure out is what the value being returned by the method is.
I'm not able to extract this from inside the GetAppDisplay() method because of the way the method is structured. But the method returns the value that I'm interested in. In all cases this value gets immediately passed in to another method, so if I can breakpoint on the method it gets passed to I can then extract the value from the input parameters of the method.
On ESR 78 the relevant method is the following. Notice how it has an aDisplay parameter. That's the parameter we're interested in.
/* static */ bool GLLibraryEGL::EnsureInitialized(bool forceAccel, nsACString* const out_failureId, EGLDisplay aDisplay);We have two different methods to consider on ESR 91. Note again how they both have aDisplay parameters. Those are the parameters we need to check.
RefPtr<GLLibraryEGL> GLLibraryEGL::Create(nsACString* const out_failureId, EGLDisplay aDisplay) inline std::shared_ptr<EglDisplay> DefaultEglDisplay(nsACString* const out_failureId, EGLDisplay aDisplay);So, here goes with the ESR 78 build. I've placed a breakpoint on the method so we can extract the parameter.
(gdb) b GLLibraryEGL::EnsureInitialized Breakpoint 2 at 0x7fb8e6ee78: file obj-build-mer-qt-xr/dist/include/mozilla/ StaticPtr.h, line 152. (gdb) r The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /usr/bin/harbour-webview [...] Thread 36 "Compositor" hit Breakpoint 2, mozilla::gl::GLLibraryEGL:: EnsureInitialized (forceAccel=false, out_failureId=0x7fa516f378, aDisplay=0x1) at obj-build-mer-qt-xr/dist/include/mozilla/StaticPtr.h:152 152 obj-build-mer-qt-xr/dist/include/mozilla/StaticPtr.h: No such file or directory. (gdb) p /x aDisplay $1 = 0x1 (gdb) c Continuing. Thread 36 "Compositor" hit Breakpoint 2, mozilla::gl::GLLibraryEGL:: EnsureInitialized (forceAccel=false, out_failureId=0x7fa516f378, aDisplay=0x0) at obj-build-mer-qt-xr/dist/include/mozilla/StaticPtr.h:152 152 in obj-build-mer-qt-xr/dist/include/mozilla/StaticPtr.h (gdb) p /x aDisplay $2 = 0x0 (gdb) c Continuing. Thread 36 "Compositor" hit Breakpoint 2, mozilla::gl::GLLibraryEGL:: EnsureInitialized (forceAccel=false, out_failureId=0x7fa516f378, aDisplay=0x0) at obj-build-mer-qt-xr/dist/include/mozilla/StaticPtr.h:152 152 in obj-build-mer-qt-xr/dist/include/mozilla/StaticPtr.h (gdb) p /x aDisplay $3 = 0x0 (gdb)As can be seen it actually hits this breakpoint three times. The first time the aDisplay value is set to 0x1. Subsequent calls have the value 0x0 passed on.
Let's compare that to ESR 91. This time there are two breakpoints to add and we want to extract the values from both, assuming they hit.
(gdb) b GLLibraryEGL::Create Breakpoint 1 at 0x7ff1118e8c: file gfx/gl/GLLibraryEGL.cpp, line 343. (gdb) b DefaultEglDisplay Breakpoint 2 at 0x7ff112e914: file gfx/gl/GLContextEGL.h, line 29. (gdb) r The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /usr/bin/harbour-webview [...] Thread 38 "Compositor" hit Breakpoint 2, mozilla::gl:: DefaultEglDisplay (aDisplay=0x1, out_failureId=0x7f1f94c1c8) at gfx/gl/GLContextEGL.h:29 29 const auto lib = DefaultEglLibrary(out_failureId); (gdb) p /x aDisplay $1 = 0x1 (gdb) c Continuing. Thread 38 "Compositor" hit Breakpoint 1, mozilla::gl::GLLibraryEGL:: Create (out_failureId=out_failureId@entry=0x7f1f94c1c8, aDisplay=0x1) at gfx/gl/GLLibraryEGL.cpp:343 343 RefPtr<GLLibraryEGL> GLLibraryEGL::Create(nsACString* const out_failureId, EGLDisplay aDisplay) { (gdb) p /x aDisplay $2 = 0x1 (gdb)There are two hits, one for each of our breakpoints. Both are being passed a display value of 0x1.
The difference between the two is concerning. The ESR 78 execution is particularly problematic because there are two values being generated and right now it's not clear which one is actually being used. In order to make progress I need to understand this better.
My plan is to go through each instance of egl->Display() or mEgl->Display() in GLContextProviderEGL.cpp and add a breakpoint to the method containing it so that we can check its value. This will have two purposes. First it will help to understand the flow of execution: which methods should be called and which are unused. Second it will allow me to test what the value of the display is in each case.
You might ask why I don't just place a breakpoint on EGLLibrary::Display() and watch the values flow in. Well, I tried and the breakpoint won't stick. There is some benefit to doing it the hard way, which is that I'm getting to review all of the places in the code where it's used. If I'd taken the easy route I would have missed that opportunity.
Here are the methods that make use of the Display() value. I'm going to place a breakpoint on every single one and record every single usage until the page is completely rendered.
DestroySurface() CreateFallbackSurface() CreateSurfaceFromNativeWindow() GLContextEGLFactory::CreateImpl() GLContextEGL::GLContextEGL() GLContextEGL::~GLContextEGL() GLContextEGL::BindTexImage() GLContextEGL::ReleaseTexImage() GLContextEGL::MakeCurrentImpl() GLContextEGL::RenewSurface() GLContextEGL::SwapBuffers() GLContextEGL::GetWSIInfo() GLContextEGL::GetBufferAge() GLContextEGL::CreateGLContext() GLContextEGL::CreatePBufferSurfaceTryingPowerOfTwo() GLContextEGL::CreateWaylandBufferSurface() CreateEmulatorBufferSurface() CreateConfig() CreateEGLSurfaceImpl() GLContextProviderEGL::DestroyEGLSurface() GetAttrib() ChooseConfigOffscreen() GLContextEGL::CreateEGLPBufferOffscreenContextImpl()Here goes!
(gdb) b CreateFallbackSurface Function "CreateFallbackSurface" not defined. Make breakpoint pending on future shared library load? (y or [n]) n (gdb) b CreateSurfaceFromNativeWindow Breakpoint 2 at 0x7fb8e673a8: CreateSurfaceFromNativeWindow. (2 locations) (gdb) b GLContextEGLFactory::CreateImpl Breakpoint 3 at 0x7fb8e6f460: file gfx/gl/GLContextProviderEGL.cpp, line 372. (gdb) b GLContextEGL::GLContextEGL Breakpoint 4 at 0x7fb8e67a98: file gfx/gl/GLContextProviderEGL.cpp, line 472. (gdb) b GLContextEGL::~GLContextEGL Breakpoint 5 at 0x7fb8e81668: GLContextEGL::~GLContextEGL. (2 locations) (gdb) b GLContextEGL::BindTexImage Breakpoint 6 at 0x7fb8e66328: GLContextEGL::BindTexImage. (2 locations) (gdb) b GLContextEGL::ReleaseTexImage Breakpoint 7 at 0x7fb8e663a8: GLContextEGL::ReleaseTexImage. (2 locations) (gdb) b GLContextEGL::MakeCurrentImpl Breakpoint 8 at 0x7fb8e66420: GLContextEGL::MakeCurrentImpl. (2 locations) (gdb) b GLContextEGL::RenewSurface Breakpoint 9 at 0x7fb8e67358: GLContextEGL::RenewSurface. (2 locations) (gdb) b GLContextEGL::SwapBuffers Breakpoint 10 at 0x7fb8e6ae40: file gfx/gl/GLContextProviderEGL.cpp, line 643. (gdb) b GLContextEGL::GetWSIInfo Breakpoint 11 at 0x7fb8e65da0: file gfx/gl/GLContextProviderEGL.cpp, line 674. (gdb) b GLContextEGL::GetBufferAge Function "GLContextEGL::GetBufferAge" not defined. Make breakpoint pending on future shared library load? (y or [n]) n (gdb) b GLContextEGL::CreateGLContext Breakpoint 12 at 0x7fb8e69ef0: file gfx/gl/GLContextProviderEGL.cpp, line 719. (gdb) b GLContextEGL::CreatePBufferSurfaceTryingPowerOfTwo Breakpoint 13 at 0x7fb8e6bf80: file gfx/gl/GLContextProviderEGL.cpp, line 854. (gdb) b GLContextEGL::CreateWaylandBufferSurface Function "GLContextEGL::CreateWaylandBufferSurface" not defined. Make breakpoint pending on future shared library load? (y or [n]) n (gdb) b CreateEmulatorBufferSurface Breakpoint 14 at 0x7fb8e67b40: file gfx/gl/GLContextProviderEGL.cpp, line 953. (gdb) b CreateConfig Breakpoint 15 at 0x7fb8e6a498: CreateConfig. (5 locations) (gdb) b CreateEGLSurfaceImpl Function "CreateEGLSurfaceImpl" not defined. Make breakpoint pending on future shared library load? (y or [n]) n (gdb) b GLContextProviderEGL::DestroyEGLSurface Function "GLContextProviderEGL::DestroyEGLSurface" not defined. Make breakpoint pending on future shared library load? (y or [n]) n (gdb) b GetAttrib Breakpoint 16 at 0x7fb8e6ab64: GetAttrib. (4 locations) (gdb) b ChooseConfigOffscreen Breakpoint 17 at 0x7fb8e6a920: file gfx/gl/GLContextProviderEGL.cpp, line 1265. (gdb) b GLContextEGL::CreateEGLPBufferOffscreenContextImpl Breakpoint 18 at 0x7fb8e6f130: GLContextEGL:: CreateEGLPBufferOffscreenContextImpl. (2 locations) (gdb) r The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /usr/bin/harbour-webview [...]That's all of the breakpoints set. Or at least, all that it was possible to make stick. The obvious thing for me to copy over now is the places where the breakpoint actually hit. But there are a lot of them and going through them is very laborious, so I'll just include a selection to give you a flavour.
Thread 38 "Compositor" hit Breakpoint 18, mozilla::gl::GLContextEGL:: CreateEGLPBufferOffscreenContextImpl ( flags=mozilla::gl::CreateContextFlags::REQUIRE_COMPAT_PROFILE, size=..., minCaps=..., aUseGles=false, out_failureId=0x7fa55eb378) at gfx/gl/GLContextProviderEGL.cpp:1305 1305 nsACString* const out_failureId) { (gdb) n 1309 if (!GLLibraryEGL::EnsureInitialized(forceEnableHardware, out_failureId)) { (gdb) n Thread 38 "Compositor" hit Breakpoint 18, mozilla::gl::GLContextEGL:: CreateEGLPBufferOffscreenContextImpl (out_failureId=0x7fa55eb378, aUseGles=false, minCaps=..., size=..., flags=mozilla::gl::CreateContextFlags:: REQUIRE_COMPAT_PROFILE) at gfx/gl/GLContextProviderEGL.cpp:1313 1313 auto* egl = gl::GLLibraryEGL::Get(); (gdb) n 20 struct SurfaceCaps final { (gdb) p /x egl->mEGLDisplay $2 = 0x1 (gdb) c Continuing. Thread 38 "Compositor" hit Breakpoint 17, mozilla::gl:: ChooseConfigOffscreen (egl=egl@entry=0x7eac0036a0, flags=flags@entry=mozilla::gl::CreateContextFlags::REQUIRE_COMPAT_PROFILE, minCaps=..., aUseGles=aUseGles@entry=false, useWindow=useWindow@entry=false, out_configCaps=out_configCaps@entry=0x7fa55eb210) at gfx/gl/GLContextProviderEGL.cpp:1265 1265 SurfaceCaps* const out_configCaps) { (gdb) n 1266 nsTArray<EGLint> configAttribList; (gdb) p /x egl->mEGLDisplay $3 = 0x1 (gdb) c Continuing. [...] Thread 38 "Compositor" hit Breakpoint 8, mozilla::gl::GLContextEGL:: MakeCurrentImpl (this=0x7eac108cd0) at gfx/gl/GLContextProviderEGL.cpp:571 571 EGLSurface surface = (gdb) p /x mEgl.mRawPtr->mEGLDisplay $21 = 0x1 (gdb) c Continuing. [New Thread 0x7fa4afe9a0 (LWP 19251)] =============== Preparing offscreen rendering context =============== [New Thread 0x7fae9649a0 (LWP 19252)] [Thread 0x7fa50ee9a0 (LWP 12902) exited] [Thread 0x7fa51ff9a0 (LWP 12872) exited] [Thread 0x7fa512f9a0 (LWP 12900) exited] [New Thread 0x7fae86d9a0 (LWP 19253)] [New Thread 0x7fa48fd9a0 (LWP 19254)] [New Thread 0x7fa48bc9a0 (LWP 19255)] [New Thread 0x7fa487b9a0 (LWP 19256)] [New Thread 0x7fa512f9a0 (LWP 19257)] Thread 38 "Compositor" hit Breakpoint 8, mozilla::gl::GLContextEGL:: MakeCurrentImpl (this=0x7eac108cd0) at gfx/gl/GLContextProviderEGL.cpp:571 571 EGLSurface surface = (gdb) p /x mEgl.mRawPtr->mEGLDisplay $22 = 0x1 (gdb) c Continuing. [Thread 0x7fa4afe9a0 (LWP 19251) exited] [...]Phew. One thing we can see from this is that every actual use of the display is set to 0x1. That's true for the onces I cut out as well. Now we have to check the same thing on ESR 91. It's a bit late to do that tonight, but it'll be first thing on my to-do list tomorrow.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
4 Apr 2024 : Day 206 #
Testing out my overnight build I'm pleased to discover that the infinite recursion is now resolved — no more segfaults — but rendering has still not returned. Nevertheless I should check that the mInternalDrawFB and mInternalReadFB are now being set correctly.
Back on Day 204 I placed a breakpoint on CompositorOGL::CreateContext() for this, stepping through to check the values:
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
Back on Day 204 I placed a breakpoint on CompositorOGL::CreateContext() for this, stepping through to check the values:
Thread 36 "Compositor" hit Breakpoint 1, mozilla::layers:: CompositorOGL::CreateContext (this=0x7eac003420) at gfx/layers/opengl/CompositorOGL.cpp:223 223 already_AddRefed<mozilla::gl::GLContext> CompositorOGL::CreateContext() { [...] (gdb) n mozilla::layers::CompositorOGL::Initialize (this=0x7eac003420, out_failureReason=0x7fa516f730) at gfx/layers/opengl/CompositorOGL.cpp:374 374 mGLContext = CreateContext(); (gdb) n 383 if (!mGLContext) { (gdb) p mGLContext $5 = {mRawPtr = 0x7eac109140} (gdb) p mGLContext.mRawPtr $6 = (mozilla::gl::GLContext *) 0x7eac109140 [...] (gdb) p mGLContext.mRawPtr->mScreen.mTuple.mFirstA.mUserDrawFB $12 = 0 (gdb) p mGLContext.mRawPtr->mScreen.mTuple.mFirstA.mUserReadFB $13 = 0 (gdb) p mGLContext.mRawPtr->mScreen.mTuple.mFirstA.mInternalDrawFB $14 = 2 (gdb) p mGLContext.mRawPtr->mScreen.mTuple.mFirstA.mInternalReadFB $15 = 2 (gdb)So I'll do the same on the latest version of the code to see whether we have the difference we need. Here's the output:
mozilla::layers::CompositorOGL::Initialize (this=0x7ee0002f10, out_failureReason=0x7f1f923520) at gfx/layers/opengl/CompositorOGL.cpp:391 391 mGLContext = CreateContext(); (gdb) 401 if (!mGLContext) { (gdb) p mGLContext.mRawPtr->mScreen.mTuple.mFirstA.mUserDrawFB $1 = 0 (gdb) p mGLContext.mRawPtr->mScreen.mTuple.mFirstA.mUserReadFB $2 = 0 (gdb) p mGLContext.mRawPtr->mScreen.mTuple.mFirstA.mInternalDrawFB $3 = 2 (gdb) p mGLContext.mRawPtr->mScreen.mTuple.mFirstA.mInternalReadFB $4 = 2 (gdb)That does look a lot more healthy now. So, with that in order, it's back to stepping through the execution, comparing ESR 78 with ESR 91 again. But that's for today. As you can see, it's only a short one today — the checking took a lot longer than I was expecting — but there's plenty still to do and with any luck I'll find time to pick up the pace a bit tomorrow and over the weekend.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
3 Apr 2024 : Day 205 #
Time to test out my overnight build. Yesterday I added in some code that had got lost in the transition from ESR 78 to ESR 91. The code allows the name of the frame buffer objects created for reading and writing to be recorded for use within gecko when they get bound to a target.
This is achieved by (sort-of) overriding the call to fBindFramebuffer. Instead of just calling the relevant OpenGL function (in this case glBindFramebuffer() as all of the other GL library methods do, this new call does a bit more work before calling this underlying function. Here's the code for the underlying method (this is the same snippet I gave in my diary entry yesterday):
I built new packages overnight, but now when I run them this morning I immediately get a segmentation fault.
This clearly isn't just your standard invalid memory read or write; it's the program running out of stack due to an accidental recursion. Fascinating! But not intentional! The GLContext::fBindFramebuffer() method is calling GLScreenBuffer::BindFB()< which is again calling GLContext::fBindFramebuffer() creating a loop of pairs of calls repeatedly calling themselves over and over again. Let's go back to the code to find out why. As it happens we already saw the code for the two methods above. And sure enough fBindFramebuffer() calls mScreen->BindFB() which then goes on to call mGL->fBindFramebuffer().
This code is guaranteed to cause problems and I should have noticed it yesterday before setting the build to run. The code in the fBindFramebuffer() method came directly from ESR 78, so the issue must be a difference in implementation between the version of BindFB() on ESR 78 compared to ESR 91. So let's check the ESR 78 version of that:
There are many other cases where fBindFramebuffer() gets called but a quick search of the ESR 78 code shows that these are the only ones that have been switched to use raw_fBindFramebuffer(), other than the couple of additional calls that we already have inside our new fBindFramebuffer() override.
With any luck, that should be enough to fix the infinite calling loop. I've set the build going, so now it's a case of finding out what effect this will have tomorrow morning.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
This is achieved by (sort-of) overriding the call to fBindFramebuffer. Instead of just calling the relevant OpenGL function (in this case glBindFramebuffer() as all of the other GL library methods do, this new call does a bit more work before calling this underlying function. Here's the code for the underlying method (this is the same snippet I gave in my diary entry yesterday):
void GLContext::fBindFramebuffer(GLenum target, GLuint framebuffer) { if (!mScreen) { raw_fBindFramebuffer(target, framebuffer); return; } switch (target) { case LOCAL_GL_DRAW_FRAMEBUFFER_EXT: mScreen->BindDrawFB(framebuffer); return; case LOCAL_GL_READ_FRAMEBUFFER_EXT: mScreen->BindReadFB(framebuffer); return; case LOCAL_GL_FRAMEBUFFER: mScreen->BindFB(framebuffer); return; default: // Nothing we care about, likely an error. break; } raw_fBindFramebuffer(target, framebuffer); }In this code there are a couple of calls to raw_fBindFramebuffer(). These are the calls that execute the underlying OpenGL function. But there are also calls to GLScreenBuffer::BindDrawFB(), GLScreenBuffer::BindReadFB() and GLScreenBuffer::BindFB(). These all do some additional work, for example the code for the last of these looks like this:
void GLScreenBuffer::BindFB(GLuint fb) { GLuint drawFB = DrawFB(); GLuint readFB = ReadFB(); mUserDrawFB = fb; mUserReadFB = fb; mInternalDrawFB = (fb == 0) ? drawFB : fb; mInternalReadFB = (fb == 0) ? readFB : fb; if (mInternalDrawFB == mInternalReadFB) { mGL->fBindFramebuffer(LOCAL_GL_FRAMEBUFFER, mInternalDrawFB); } else { MOZ_ASSERT(mGL->IsSupported(GLFeature::split_framebuffer)); mGL->fBindFramebuffer(LOCAL_GL_DRAW_FRAMEBUFFER_EXT, mInternalDrawFB); mGL->fBindFramebuffer(LOCAL_GL_READ_FRAMEBUFFER_EXT, mInternalReadFB); } }As we can see, this records the value of fb inside the GLSCreenBuffer object and then goes on to perform some other OpenGL binding actions. Without the overriding GLContext::fBindFramebuffer() method the frame buffer names never get recorded. That could be quite a serious gap in the rendering process, so it'll be good to have it fixed.
I built new packages overnight, but now when I run them this morning I immediately get a segmentation fault.
$ harbour-webview [D] unknown:0 - QML debugging is enabled. Only use this in a safe environment. [...] [JavaScript Error: "Unexpected event profile-after-change" {file: "resource://gre/modules/URLQueryStrippingListService.jsm" line: 228}] observe@resource://gre/modules/URLQueryStrippingListService.jsm:228:12 Created LOG for EmbedPrefs Created LOG for EmbedLiteLayerManager Segmentation fault $When I run it through the debugger to get a backtrace I get an unexpected and unusual and unexpected result:
$ gdb harbour-webview [...] (gdb) r Starting program: /usr/bin/harbour-webview [...] Created LOG for EmbedLiteLayerManager Thread 36 "Compositor" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f1f7c07e0 (LWP 18578)] mozilla::gl::GLScreenBuffer::BindFB (this=0x7fc4697440, fb=fb@entry=2) at gfx/gl/GLScreenBuffer.cpp:306 306 void GLScreenBuffer::BindFB(GLuint fb) { (gdb) bt #0 mozilla::gl::GLScreenBuffer::BindFB (this=0x7fc4697440, fb=fb@entry=2) at gfx/gl/GLScreenBuffer.cpp:306 #1 0x0000007ff110e244 in mozilla::gl::GLContext::fBindFramebuffer ( this=0x7edc19aa50, target=target@entry=36160, framebuffer=2) at gfx/gl/GLContext.cpp:2256 #2 0x0000007ff1100ecc in mozilla::gl::GLScreenBuffer::BindFB (this=<optimized out>, fb=fb@entry=2) at gfx/gl/GLScreenBuffer.cpp:316 #3 0x0000007ff110e244 in mozilla::gl::GLContext::fBindFramebuffer ( this=0x7edc19aa50, target=target@entry=36160, framebuffer=2) at gfx/gl/GLContext.cpp:2256 #4 0x0000007ff1100ecc in mozilla::gl::GLScreenBuffer::BindFB (this=<optimized out>, fb=fb@entry=2) at gfx/gl/GLScreenBuffer.cpp:316 #5 0x0000007ff110e244 in mozilla::gl::GLContext::fBindFramebuffer ( this=0x7edc19aa50, target=target@entry=36160, framebuffer=2) at gfx/gl/GLContext.cpp:2256 #6 0x0000007ff1100ecc in mozilla::gl::GLScreenBuffer::BindFB (this=<optimized out>, fb=fb@entry=2) at gfx/gl/GLScreenBuffer.cpp:316 #7 0x0000007ff110e244 in mozilla::gl::GLContext::fBindFramebuffer ( this=0x7edc19aa50, target=target@entry=36160, framebuffer=2) at gfx/gl/GLContext.cpp:2256 #8 0x0000007ff1100ecc in mozilla::gl::GLScreenBuffer::BindFB (this=<optimized out>, fb=fb@entry=2) at gfx/gl/GLScreenBuffer.cpp:316 #9 0x0000007ff110e244 in mozilla::gl::GLContext::fBindFramebuffer ( this=0x7edc19aa50, target=target@entry=36160, framebuffer=2) at gfx/gl/GLContext.cpp:2256 #10 0x0000007ff1100ecc in mozilla::gl::GLScreenBuffer::BindFB (this=<optimized out>, fb=fb@entry=2) at gfx/gl/GLScreenBuffer.cpp:316 [...] #335 0x0000007ff110e244 in mozilla::gl::GLContext::fBindFramebuffer ( this=0x7edc19aa50, target=target@entry=36160, framebuffer=2) at gfx/gl/GLContext.cpp:2256 #336 0x0000007ff1100ecc in mozilla::gl::GLScreenBuffer::BindFB (this=<optimized out>, fb=fb@entry=2) at gfx/gl/GLScreenBuffer.cpp:316 #337 0x0000007ff110e244 in mozilla::gl::GLContext::fBindFramebuffer ( this=0x7edc19aa50, target=target@entry=36160, framebuffer=2) at gfx/gl/GLContext.cpp:2256 [...]It goes on and on and on until eventually I give up following it.
This clearly isn't just your standard invalid memory read or write; it's the program running out of stack due to an accidental recursion. Fascinating! But not intentional! The GLContext::fBindFramebuffer() method is calling GLScreenBuffer::BindFB()< which is again calling GLContext::fBindFramebuffer() creating a loop of pairs of calls repeatedly calling themselves over and over again. Let's go back to the code to find out why. As it happens we already saw the code for the two methods above. And sure enough fBindFramebuffer() calls mScreen->BindFB() which then goes on to call mGL->fBindFramebuffer().
This code is guaranteed to cause problems and I should have noticed it yesterday before setting the build to run. The code in the fBindFramebuffer() method came directly from ESR 78, so the issue must be a difference in implementation between the version of BindFB() on ESR 78 compared to ESR 91. So let's check the ESR 78 version of that:
void GLScreenBuffer::BindFB(GLuint fb) { GLuint drawFB = DrawFB(); GLuint readFB = ReadFB(); mUserDrawFB = fb; mUserReadFB = fb; mInternalDrawFB = (fb == 0) ? drawFB : fb; mInternalReadFB = (fb == 0) ? readFB : fb; if (mInternalDrawFB == mInternalReadFB) { mGL->raw_fBindFramebuffer(LOCAL_GL_FRAMEBUFFER, mInternalDrawFB); } else { MOZ_ASSERT(mGL->IsSupported(GLFeature::split_framebuffer)); mGL->raw_fBindFramebuffer(LOCAL_GL_DRAW_FRAMEBUFFER_EXT, mInternalDrawFB); mGL->raw_fBindFramebuffer(LOCAL_GL_READ_FRAMEBUFFER_EXT, mInternalReadFB); } }The difference is clear. While on ESR 78 the code has been updated to call the underlying raw_fBindFramebuffer() method, I didn't make this change on ESR 91. So, time to do that now. Apart from this small change the rest of the code in the BindFB() method looks identical. There are a few other changes to make though as it turns out there are similar issues going on with BindDrawFB(), BindReadFB() and BindReadFB_Internal() as well. The last of these is rather mysterious and doesn't seem to get called anywhere, but I've updated it just in case.
There are many other cases where fBindFramebuffer() gets called but a quick search of the ESR 78 code shows that these are the only ones that have been switched to use raw_fBindFramebuffer(), other than the couple of additional calls that we already have inside our new fBindFramebuffer() override.
With any luck, that should be enough to fix the infinite calling loop. I've set the build going, so now it's a case of finding out what effect this will have tomorrow morning.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
2 Apr 2024 : Day 204 #
It's refreshing to be working with a fully built binary again with all the debug source aligning with the binary. Debugging is so much more fulfilling this way.
That's good, because I spent today doing lots more debugging. The first thing I did today is step through the CompositorOGL::BeginFrame() method so see whether my changes were actually being executed. In particular, I wanted to know whether the colour used to clear the texture was being set to something other than white.
Debugging confirms that it is:
However, while digging through the code I did discover some anomalies. Anomalies caused by patches making changes in ESR 78 that I'd not applied to ESR 91. The two patches are 0070 "Fix flipped FBO textures when rendering to an offscreen target" and 0071 "Do not flip scissor rects when rendering to an offscreen window." Both are certainly relevant to the parts of the code I'm working on. I have my doubts that rendering would be completely scuppered without them, but fixing them still looks worthwhile.
After all, the changes are likely to be needed anyway, even if they're not going to solve the rendering problem on their own.
So I've applied them both. They're both small patches and so easy to introduce into the ESR 91 code.
Although I've done a quick build and executed the code to check the screen has remained stubbornly blank. So as suspected these aren't the only problems I need to fix. So now I've reached the point where it feels it might be helpful to step through the code, side-by-side, with ESR 78 and ESR 91 simultaneously.
The purpose of this is to try to find where the two diverge. If there are differences, this could be the source of my troubles. It's a laborious but thorough process and so feels to me to be the most fruitful way forwards at this point.
Since I'm already working in this area the CompositorOGL::CreateContext() method seems like as good a place to start as any, so I've been stepping through from there for most of the day.
Eventually I notice a difference between the execution of ESR 78 and that of ESR 91. I pored over the values assigned in the mScreen object. This is of type GLScreenBuffer and is one of the class variables in GLContext (note that this is GLContext rather than the GLContextEGL that we were focusing on last week).
The interesting parts are the frame buffer variables. Here's what they look like on ESR 78:
The next step is to find out where they're getting set on ESR 78. It's immediately clear from the code that the mInternalDrawFB and mInternalReadFB variables get set in the GLContext::fBindFramebuffer() method, but actually figuring out where this happens turns out to be more challenging. Here's what the relevant method looks like:
Eventually I do get to the right point though. When reading through the debugging steps it's useful to know that the value of LOCAL_GL_FRAMEBUFFER is defined to be 0x8D40 in GLConsts.h:
Now this is all well and good, but the real question is why this is happening correctly on ESR 78 but not ESR 91. When I eventually get to checking the ESR 91 code I discover it's because on ESR 91 the fBindFramebuffer() method doesn't have the same wrapper as on ESR 78; instead it goes straight for the library method:
I've done this and checked it compiles. But to properly understand the resulting effect I'm going to need to step through the code again, which means a full rebuild will be needed. It's 21:31 in the evening now so, as you know, that means an overnight build. Probably now is a good time to stop anyway. So onward to tomorrow, when we'll see if this has made any practical difference!
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
That's good, because I spent today doing lots more debugging. The first thing I did today is step through the CompositorOGL::BeginFrame() method so see whether my changes were actually being executed. In particular, I wanted to know whether the colour used to clear the texture was being set to something other than white.
Debugging confirms that it is:
[...] Thread 37 "Compositor" hit Breakpoint 1, mozilla::layers:: CompositorOGL::BeginFrame (this=0x7ed8002ed0, aInvalidRegion=..., aClipRect=..., aRenderBounds=..., aOpaqueRegion=...) at gfx/layers/opengl/CompositorOGL.cpp:1084 1084 mClearColor.r = 0.0; (gdb) n [Thread 0x7f172fe7e0 (LWP 2982) exited] 1085 mClearColor.g = 0.0; (gdb) 1086 mClearColor.b = 1.0; (gdb) [New Thread 0x7f170bc7e0 (LWP 3141)] 1087 mClearColor.a = 0.0; (gdb) 1088 mGLContext->fClearColor(mClearColor.r, mClearColor.g, mClearColor.b, (gdb) [New Thread 0x7f1707b7e0 (LWP 3143)] 1091 mGLContext->fClear(clearBits); (gdb) 1093 return Some(rect); 1084 mClearColor.r = 0.0; (gdb) disable break (gdb) c Continuing.But this has no effect on the screen output. Maybe this isn't where the background colour gets set at all? My explorations with ESR 78 yesterday would seem to imply this is the case. So I've dug around the code a bit more but can't see anywhere more appropriate to add similar changes in. There are a couple of other places where textures are cleared so I tried changing the colours for those as well but to no avail.
However, while digging through the code I did discover some anomalies. Anomalies caused by patches making changes in ESR 78 that I'd not applied to ESR 91. The two patches are 0070 "Fix flipped FBO textures when rendering to an offscreen target" and 0071 "Do not flip scissor rects when rendering to an offscreen window." Both are certainly relevant to the parts of the code I'm working on. I have my doubts that rendering would be completely scuppered without them, but fixing them still looks worthwhile.
After all, the changes are likely to be needed anyway, even if they're not going to solve the rendering problem on their own.
So I've applied them both. They're both small patches and so easy to introduce into the ESR 91 code.
Although I've done a quick build and executed the code to check the screen has remained stubbornly blank. So as suspected these aren't the only problems I need to fix. So now I've reached the point where it feels it might be helpful to step through the code, side-by-side, with ESR 78 and ESR 91 simultaneously.
The purpose of this is to try to find where the two diverge. If there are differences, this could be the source of my troubles. It's a laborious but thorough process and so feels to me to be the most fruitful way forwards at this point.
Since I'm already working in this area the CompositorOGL::CreateContext() method seems like as good a place to start as any, so I've been stepping through from there for most of the day.
Eventually I notice a difference between the execution of ESR 78 and that of ESR 91. I pored over the values assigned in the mScreen object. This is of type GLScreenBuffer and is one of the class variables in GLContext (note that this is GLContext rather than the GLContextEGL that we were focusing on last week).
The interesting parts are the frame buffer variables. Here's what they look like on ESR 78:
Thread 36 "Compositor" hit Breakpoint 1, mozilla::layers:: CompositorOGL::CreateContext (this=0x7eac003420) at gfx/layers/opengl/CompositorOGL.cpp:223 223 already_AddRefed<mozilla::gl::GLContext> CompositorOGL::CreateContext() { [...] (gdb) n mozilla::layers::CompositorOGL::Initialize (this=0x7eac003420, out_failureReason=0x7fa516f730) at gfx/layers/opengl/CompositorOGL.cpp:374 374 mGLContext = CreateContext(); (gdb) n 383 if (!mGLContext) { (gdb) p mGLContext $5 = {mRawPtr = 0x7eac109140} (gdb) p mGLContext.mRawPtr $6 = (mozilla::gl::GLContext *) 0x7eac109140 [...] (gdb) p mGLContext.mRawPtr->mScreen.mTuple.mFirstA.mUserDrawFB $12 = 0 (gdb) p mGLContext.mRawPtr->mScreen.mTuple.mFirstA.mUserReadFB $13 = 0 (gdb) p mGLContext.mRawPtr->mScreen.mTuple.mFirstA.mInternalDrawFB $14 = 2 (gdb) p mGLContext.mRawPtr->mScreen.mTuple.mFirstA.mInternalReadFB $15 = 2 (gdb)But on ESR 91 these same values are decidedly more zero:
Thread 37 "Compositor" hit Breakpoint 1, mozilla::layers:: CompositorOGL::CreateContext (this=this@entry=0x7ed8002f10) at gfx/layers/opengl/CompositorOGL.cpp:227 227 already_AddRefed<mozilla::gl::GLContext> CompositorOGL::CreateContext() { [...] (gdb) n 401 if (!mGLContext) { (gdb) p mGLContext $6 = {mRawPtr = 0x7ed819aa50} (gdb) p mGLContext.mRawPtr $7 = (mozilla::gl::GLContext *) 0x7ed819aa50 [...] (gdb) p mGLContext.mRawPtr->mScreen.mTuple.mFirstA.mUserDrawFB $10 = 0 (gdb) p mGLContext.mRawPtr->mScreen.mTuple.mFirstA.mUserReadFB $11 = 0 (gdb) p mGLContext.mRawPtr->mScreen.mTuple.mFirstA.mInternalDrawFB $12 = 0 (gdb) p mGLContext.mRawPtr->mScreen.mTuple.mFirstA.mInternalReadFB $13 = 0 (gdb)That looks suspicious to me. A zero value suggests they've not been initialised at all, compared to ESR 78 where they quite clearly have been initialised. It looks broken. But that's also great! Something very concrete to fix.
The next step is to find out where they're getting set on ESR 78. It's immediately clear from the code that the mInternalDrawFB and mInternalReadFB variables get set in the GLContext::fBindFramebuffer() method, but actually figuring out where this happens turns out to be more challenging. Here's what the relevant method looks like:
void GLContext::fBindFramebuffer(GLenum target, GLuint framebuffer) { if (!mScreen) { raw_fBindFramebuffer(target, framebuffer); return; } switch (target) { case LOCAL_GL_DRAW_FRAMEBUFFER_EXT: mScreen->BindDrawFB(framebuffer); return; case LOCAL_GL_READ_FRAMEBUFFER_EXT: mScreen->BindReadFB(framebuffer); return; case LOCAL_GL_FRAMEBUFFER: mScreen->BindFB(framebuffer); return; default: // Nothing we care about, likely an error. break; } raw_fBindFramebuffer(target, framebuffer); }It takes quite a bit of stepping through to find the right call, because the the method is called multiple times and usually exits early due to the fact there's no mScreen value set.
Eventually I do get to the right point though. When reading through the debugging steps it's useful to know that the value of LOCAL_GL_FRAMEBUFFER is defined to be 0x8D40 in GLConsts.h:
#define LOCAL_GL_FRAMEBUFFER 0x8D40That should help clarify what's going on here:
Thread 36 "Compositor" hit Breakpoint 4, mozilla::gl::GLContext:: fBindFramebuffer (this=this@entry=0x7ea8109140, target=target@entry=36160, framebuffer=framebuffer@entry=0) at obj-build-mer-qt-xr/dist/include/ mozilla/UniquePtr.h:287 287 obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h: No such file or directory. (gdb) n 2325 raw_fBindFramebuffer(target, framebuffer); (gdb) p mScreen.mTuple.mFirstA $17 = (mozilla::gl::GLScreenBuffer *) 0x0 (gdb) c Continuing. [...] Thread 36 "Compositor" hit Breakpoint 4, mozilla::gl::GLContext:: fBindFramebuffer (this=0x7ea8109140, target=36160, framebuffer=0) at obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h:287 287 obj-build-mer-qt-xr/dist/include/mozilla/UniquePtr.h: No such file or directory. (gdb) n 2329 switch (target) { (gdb) p /x target $21 = 0x8d40 (gdb) n 2339 mScreen->BindFB(framebuffer);So this is definitely the right place. But the backtrace produced from this is confusing because checking the GLContext::CreateScreenBufferImpl() method, which is the call on the second frame, none of the code in that method actually calls GLContext::fBindFramebuffer().
(gdb) bt #0 mozilla::gl::GLContext::fBindFramebuffer (this=0x7ea8109140, target=36160, framebuffer=0) at gfx/gl/GLContext.cpp:2339 #1 0x0000007fb8e81890 in mozilla::gl::GLContext::CreateScreenBufferImpl ( this=this@entry=0x7ea8109140, size=..., caps=...) at gfx/gl/GLContext.cpp:2135 #2 0x0000007fb8e818ec in mozilla::gl::GLContext::CreateScreenBuffer (caps=..., size=..., this=0x7ea8109140) at gfx/gl/GLContext.h:3517 #3 mozilla::gl::GLContext::InitOffscreen (this=0x7ea8109140, size=..., caps=...) at gfx/gl/GLContext.cpp:2578 #4 0x0000007fb8e81ac8 in mozilla::gl::GLContextProviderEGL::CreateOffscreen ( size=..., minCaps=..., flags=flags@entry=mozilla::gl::CreateContextFlags::REQUIRE_COMPAT_PROFILE, out_failureId=out_failureId@entry=0x7fa51fe378) at gfx/gl/GLContextProviderEGL.cpp:1443 #5 0x0000007fb8ee275c in mozilla::layers::CompositorOGL::CreateContext ( this=0x7ea8003420) at gfx/layers/opengl/CompositorOGL.cpp:250 #6 mozilla::layers::CompositorOGL::CreateContext (this=0x7ea8003420) at gfx/layers/opengl/CompositorOGL.cpp:223 #7 0x0000007fb8f033bc in mozilla::layers::CompositorOGL::Initialize ( this=0x7ea8003420, out_failureReason=0x7fa51fe730) at gfx/layers/opengl/CompositorOGL.cpp:374 #8 0x0000007fb8fdaf7c in mozilla::layers::CompositorBridgeParent:: NewCompositor (this=this@entry=0x7f8c99db50, aBackendHints=...) at gfx/layers/ipc/CompositorBridgeParent.cpp:1534 #9 0x0000007fb8fe45e8 in mozilla::layers::CompositorBridgeParent:: InitializeLayerManager (this=this@entry=0x7f8c99db50, aBackendHints=...) at gfx/layers/ipc/CompositorBridgeParent.cpp:1491 #10 0x0000007fb8fe4730 in mozilla::layers::CompositorBridgeParent:: AllocPLayerTransactionParent (this=this@entry=0x7f8c99db50, aBackendHints=..., aId=...) at gfx/layers/ipc/CompositorBridgeParent.cpp:1587 #11 0x0000007fbb2e11b4 in mozilla::embedlite::EmbedLiteCompositorBridgeParent:: AllocPLayerTransactionParent (this=0x7f8c99db50, aBackendHints=..., aId=...) at mobile/sailfishos/embedthread/ EmbedLiteCompositorBridgeParent.cpp:77 #12 0x0000007fb88bf3d0 in mozilla::layers::PCompositorBridgeParent:: OnMessageReceived (this=0x7f8c99db50, msg__=...) at PCompositorBridgeParent.cpp:1391 [...] #28 0x0000007fbe70b89c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/ clone.S:78 (gdb)It looks like the reason for this is that it's being called as part of the ScopedBindFramebuffer wrapper. But it's confusing.
Now this is all well and good, but the real question is why this is happening correctly on ESR 78 but not ESR 91. When I eventually get to checking the ESR 91 code I discover it's because on ESR 91 the fBindFramebuffer() method doesn't have the same wrapper as on ESR 78; instead it goes straight for the library method:
void fBindFramebuffer(GLenum target, GLuint framebuffer) { BEFORE_GL_CALL; mSymbols.fBindFramebuffer(target, framebuffer); AFTER_GL_CALL; }On ESR 78 this has been wrapped and replaced by a call to raw_fBindFramebuffer() (which looks the same as the above snippet for fBindFramebuffer(). The fix, therefore should be to add the same wrapper from ESR 78 into the ESR 91 code.
I've done this and checked it compiles. But to properly understand the resulting effect I'm going to need to step through the code again, which means a full rebuild will be needed. It's 21:31 in the evening now so, as you know, that means an overnight build. Probably now is a good time to stop anyway. So onward to tomorrow, when we'll see if this has made any practical difference!
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
1 Apr 2024 : Day 203 #
Today I've spent a fair bit of time exploring a new part of the rendering process. Having spent a lot of time on the nuts and bolts of textures, images and surfaces over the last few days, it's made a nice change to be looking at fresh parts of the code.
My plan has been to try to check that the texture that's used as a canvas is being rendered on to the screen properly. Historically the way I've done this when it comes to the browser (as opposed to the WebView) has been to clear the texture to a particular colour. The buffer-clearing calls sit in the sailfish-browser code and specifically the declarativewebcontainer.cpp source file where there's a method specifically for the purpose:
But this isn't part of the WebView code. In the case of the WebView the location of the code that performs this task is less clear to me. In fact it may be there is no direct equivalent, but it still feels like I should be able to place this glClear() code somewhere in order to get a similar result, even if there isn't anything yet in the codebase that's doing this.
The challenge therefore is to find the right place for this. Checking through the code I eventually come upon the CompositorOGL::BeginFrame() method. This includes code like this:
I've added some changes to the ESR 91 codebase to set the colour to dark blue again, in the hope this will have an impact on the output. When I step through the code, I can see that the colour value is getting set to blue. But I can't see this having any consequence on what's rendered to the screen.
To try to distinguish between the two I've attempted to run the WebView on ESR 78 and, although I don't want to have to go to the trouble of recompiling, I can poke in values for the colour to see whether they have any effect.
Here's the debug output from me changing the colour value on ESR 78 mid-execution using the debugger.
But that also means that there's no scope to do more partial building of the code today. Once the full rebuild has completed I'll return to this topic to see whether I can get any more concrete results.
In the meantime, if you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
My plan has been to try to check that the texture that's used as a canvas is being rendered on to the screen properly. Historically the way I've done this when it comes to the browser (as opposed to the WebView) has been to clear the texture to a particular colour. The buffer-clearing calls sit in the sailfish-browser code and specifically the declarativewebcontainer.cpp source file where there's a method specifically for the purpose:
void DeclarativeWebContainer::clearWindowSurface() { Q_ASSERT(m_context); // The GL context should always be used from the same thread in which it was created. Q_ASSERT(m_context->thread() == QThread::currentThread()); m_context->makeCurrent(this); QOpenGLFunctions_ES2* funcs = m_context->versionFunctions<QOpenGLFunctions_ES2>(); Q_ASSERT(funcs); funcs->glClearColor(1.0, 1.0, 1.0, 0.0); funcs->glClear(GL_COLOR_BUFFER_BIT); m_context->swapBuffers(this); }This method clears the texture to a white colour before performing rendering of the web page on top of it. By changing the value passed to glClearColor() it's possible to get immediate feedback about the fact that the texture is being rendered. For example, suppose the code is changed to the following:
funcs->glClearColor(0.0, ).0, 1.0, 0.0); funcs->glClear(GL_COLOR_BUFFER_BIT);In this case the screen will change to a dark blue on initial render, only later being painted over by the web page itself.
But this isn't part of the WebView code. In the case of the WebView the location of the code that performs this task is less clear to me. In fact it may be there is no direct equivalent, but it still feels like I should be able to place this glClear() code somewhere in order to get a similar result, even if there isn't anything yet in the codebase that's doing this.
The challenge therefore is to find the right place for this. Checking through the code I eventually come upon the CompositorOGL::BeginFrame() method. This includes code like this:
mGLContext->fClearColor(mClearColor.r, mClearColor.g, mClearColor.b, mClearColor.a); mGLContext->fClear(clearBits);This looks pretty similar to the sailfish-browser code, apart from the fact that the colour looks configurable. Moreover, checking with the debugger shows that this method, and the fClear() call within it, are definitely being called.
I've added some changes to the ESR 91 codebase to set the colour to dark blue again, in the hope this will have an impact on the output. When I step through the code, I can see that the colour value is getting set to blue. But I can't see this having any consequence on what's rendered to the screen.
(gdb) b CompositorOGL::BeginFrame Breakpoint 1 at 0x7ff11a6970: file ${PROJECT}/gecko-dev/gfx/layers/opengl/ CompositorOGL.cpp, line 9 83. (gdb) c Continuing. [...] Thread 37 "Compositor" hit Breakpoint 1, mozilla::layers:: CompositorOGL::BeginFrame (this=0x7ed8002f10, aInvalidRegion=..., aClipRect=..., aRenderBounds=..., aOpaqueRegion=...) at ${PROJECT}/gecko-dev/gfx/layers/opengl/CompositorOGL.cpp:983 983 in ${PROJECT}/gecko-dev/gfx/layers/opengl/CompositorOGL.cpp (gdb) c Continuing. [...] 1050 in ${PROJECT}/gecko-dev/gfx/layers/opengl/CompositorOGL.cpp (gdb) p mClearColor (gdb) p mClearColor $1 = {r = 0, g = 0, b = 0, a = 0} (gdb) n 2436 ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsRegion.h: No such file or directory. (gdb) n [...] 1086 in ${PROJECT}/gecko-dev/gfx/layers/opengl/CompositorOGL.cpp (gdb) n 1087 in ${PROJECT}/gecko-dev/gfx/layers/opengl/CompositorOGL.cpp (gdb) p mClearColor $2 = {r = 0, g = 0, b = 1, a = 0} (gdb) p /x clearBits $5 = 0x4100 (gdb) nThere could be one of two reasons for this: the first it could be that this code simply has no effect no rendering anyway. The second is that this is having no effect in ESR 91 specifically because the rendering pipeline is broken in some other way.
To try to distinguish between the two I've attempted to run the WebView on ESR 78 and, although I don't want to have to go to the trouble of recompiling, I can poke in values for the colour to see whether they have any effect.
Here's the debug output from me changing the colour value on ESR 78 mid-execution using the debugger.
[Switching to Thread 0x7fa516f9a0 (LWP 32148)] Thread 36 "Compositor" hit Breakpoint 1, mozilla::layers:: CompositorOGL::BeginFrame (this=0x7eac003420, aInvalidRegion=..., aClipRect=..., aRenderBounds=..., aOpaqueRegion=...) at gfx/layers/opengl/ CompositorOGL.cpp:967 967 const nsIntRegion& aOpaqueRegion) { (gdb) p mClearColor $1 = {r = 0, g = 0, b = 0, a = 0} (gdb) n (gdb) n 968 AUTO_PROFILER_LABEL("CompositorOGL::BeginFrame", GRAPHICS); [...] 313 obj-build-mer-qt-xr/dist/include/mozilla/RefPtr.h: No such file or directory. (gdb) n 1045 mGLContext->fClearColor(mClearColor.r, mClearColor.g, mClearColor.b, (gdb) p mClearColor $2 = {r = 0, g = 0, b = 0, a = 0} (gdb) mClearColor.b = 1.0 Undefined command: "mClearColor". Try "help". (gdb) set mClearColor.b = 1.0 (gdb) p mClearColor $3 = {r = 0, g = 0, b = 1, a = 0} (gdb)I'm surprised to discover that this also has no effect on ESR 78. I'm left wondering whether there might be some other part of the code that's clearing the background instead. I'm going to need to investigate this further. However, one other issue I'm experiencing is that the ESR 91 build has lost its debug source annotations. This is as a result of me having performed partial builds almost exclusively for the last week or so. Partial builds can result in the debug source and library packages getting out of sync, the only solution to which is to perform a full rebuild. So I've set a full rebuild off in the hope that it will help provide more clarity tomorrow.
But that also means that there's no scope to do more partial building of the code today. Once the full rebuild has completed I'll return to this topic to see whether I can get any more concrete results.
In the meantime, if you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.