flypig.co.uk

List items

Items from the current list are shown below.

Blog

21 Jun 2024 : Day 265 #
Sadly, when I checked my machine this morning, I discovered the build I kicked off overnight didn't complete successfully. There have been a couple of errors during the compilation step. The first looks like this:
330:03.83 mobile/sailfishos
330:22.42 ${PROJECT}/gecko-dev/mobile/sailfishos/embedthread/
    EmbedLiteCompositorBridgeParent.cpp: In member function ‘void mozilla::
    embedlite::EmbedLiteCompositorBridgeParent::PrepareOffscreen()’:
330:22.42 ${PROJECT}/gecko-dev/mobile/sailfishos/embedthread/
    EmbedLiteCompositorBridgeParent.cpp:116:39: error: ‘class mozilla::gl::
    GLContext’ has no member named ‘Screen’
330:22.42      GLScreenBuffer* screen = context->Screen();
330:22.42                                        ^~~~~~
The second looks like this:
330:22.43 ${PROJECT}/gecko-dev/mobile/sailfishos/embedthread/
    EmbedLiteCompositorBridgeParent.cpp:124:74: error: no matching function for 
    call to ‘mozilla::gl::SurfaceFactory_EGLImage::Create(mozilla::gl::
    GLContext*&, std::nullptr_t, mozilla::layers::TextureFlags&)’
330:22.43          factory = SurfaceFactory_EGLImage::Create(context, nullptr, 
    flags);
330:22.43                                                                       
        ^
There are some further errors, but they look like variations on these two. You might think it's odd that the full build failed when the partial build completed successfully last night. This is an occupational hazard of running partial builds. When running a partial build we have to specify the folder to start in. For example, this is the command I used last night:
$ make -j1 -C obj-build-mer-qt-xr/gfx/
This is going to rebuild everything in the gfx directory and anything that depends on it. I chose this as the root because, as far as I could recall, all of the changes I made were in this directory or one of its children. But that's not always enough, for example, it means if there's something in the project with a shared dependency that's higher up the directory hierarchy from gfx it won't necessarily get rebuilt.

If I'd run the final linker stage at the end of the process, the error may have been exposed by an undefined reference, but I was so tired last night by the time I'd made all of the changes to the source code that I could barely think straight. So I had neither the energy nor the whit to do this.

Never mind, with any luck I can fix it this morning and get another build running during the day.

The fix appears to involve reverting almost all of the changes made to EmbedLiteCompositorBridgeParent.cpp to re-accommodate the GLContext::mScreen, which I'd previous to that switched for GLContext::mSwapChain in line with changes that happened upstream between ESR 78 and ESR 91. Given that I removed GLScreenBuffer, which mScreen was an instance of, these changes aren't too surprising in retrospect. But I was never going to notice them given my state of tiredness last night.

So anyway, here we are, it's still early and a fresh build is running. Hopefully this one will enjoy more success!

[...]

And happily it does: the build has completed without any errors, in time for some evening development.

Now time to execute it. And the result is...
Thread 38 "Compositor" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 13378]
0x0000007fe7e374cc in wl_proxy_marshal_constructor () from /usr/lib64/
    libwayland-client.so.0
(gdb) bt
#0  0x0000007fe7e374cc in wl_proxy_marshal_constructor () from /usr/lib64/
    libwayland-client.so.0
#1  0x0000007fe7b8742c in ServerWaylandBuffer::ServerWaylandBuffer(unsigned 
    int, unsigned int, int, int, android_wlegl*, wl_event_queue*) ()
   from /usr/lib64/libhybris//eglplatform_wayland.so
#2  0x0000007fe7b874c8 in WaylandNativeWindow::addBuffer() () from /usr/lib64/
    libhybris//eglplatform_wayland.so
#3  0x0000007fe7b86728 in WaylandNativeWindow::dequeueBuffer(
    BaseNativeWindowBuffer**, int*) () from /usr/lib64/libhybris//
    eglplatform_wayland.so
#4  0x0000007fe7b4d124 in BaseNativeWindow::_dequeueBuffer(ANativeWindow*, 
    ANativeWindowBuffer**, int*) () from /usr/lib64/
    libhybris-platformcommon.so.1
#5  0x0000007fe4fa9188 in ?? ()
#6  0x0000000000000438 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) 
Honestly, I really thought that I'd removed enough of the changes that this error would no longer occur, so I'm surprised that it's still here. I must be missing something important. There aren't really so many active changes in the code now and I'm really struggling to figure out what the problem is. So I've placed breakpoints on the main remaining edited methods. One of these will have to be hit before the crash occurs.

Here's the list of breakpoints I've set:
(gdb) info break
Num     Type           Disp Enb Address    What
1       breakpoint     keep y   <PENDING>  DirectUpdate
2       breakpoint     keep y   <PENDING>  TextureImageEGL::Resize
3       breakpoint     keep y   <PENDING>  TextureImageEGL::ReleaseTexImage
4       breakpoint     keep y   <PENDING>  TextureImageEGL::TextureImageEGL
5       breakpoint     keep y   <PENDING>  DestroyTextureData
6       breakpoint     keep y   <PENDING>  TextureClient::Destroy
7       breakpoint     keep y   <PENDING>  CompositorOGL::PrepareViewport
8       breakpoint     keep y   <PENDING>  CompositorOGL::DrawGeometry
(gdb) 
Astonishingly not one of these hits. This is crazy. As I try to add the final few breakpoints, even of the methods that look unused, I notice that there are a couple of classes that have signatures but no implementation. Is it possible this could be the reason and I've been missing this all along?

I've now removed those classes and the few methods that also had signatures without definitions. I had thought that if these were the problems either the compiler would pick up on them or it would just fail when an attempt was made to load the library. Maybe I was wrong.

So, as I say, I've removed the classes signatures and related code. The good news is that the partial build completed fine, including the linking stage. Does it run?

No. No it doesn't. The crash, along with its backtrace, remains identical.
Thread 37 &quot;Compositor&quot; received signal SIGSEGV, Segmentation fault.
[Switching to LWP 31572]
0x0000007fe7e364bc in wl_proxy_marshal_constructor () from /usr/lib64/
    libwayland-client.so.0
(gdb) bt
#0  0x0000007fe7e364bc in wl_proxy_marshal_constructor () from /usr/lib64/
    libwayland-client.so.0
#1  0x0000007fe7b8642c in ServerWaylandBuffer::ServerWaylandBuffer(unsigned 
    int, unsigned int, int, int, android_wlegl*, wl_event_queue*) ()
   from /usr/lib64/libhybris//eglplatform_wayland.so
#2  0x0000007fe7b864c8 in WaylandNativeWindow::addBuffer() () from /usr/lib64/
    libhybris//eglplatform_wayland.so
#3  0x0000007fe7b85728 in WaylandNativeWindow::dequeueBuffer(
    BaseNativeWindowBuffer**, int*) () from /usr/lib64/libhybris//
    eglplatform_wayland.so
#4  0x0000007fe7b4c124 in BaseNativeWindow::_dequeueBuffer(ANativeWindow*, 
    ANativeWindowBuffer**, int*) () from /usr/lib64/
    libhybris-platformcommon.so.1
#5  0x0000007fe4f69188 in ?? ()
#6  0x0000000000000438 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) 
That's not what I was hoping to see and this is deeply frustrating. I've reached the end of my usable hours for today, so I'll have to continue with this tomorrow. I'm not sure how much further I can strip code out until there's nothing left to remove, but I'll continue onward. Right now that seems like the only sane thing to do.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.

Comments

Uncover Disqus comments