List items
Items from the current list are shown below.
Blog
23 Aug 2024 : Day 328 #
I've been in a buoyant mood all night since Raine and Frajo shared a solution for getting ESR 91 working on Sailfish OS 4.6. It made it quite hard to sleep actually, knowing how close this brings things to the final goal. I'm quite looking forward to getting to the tidying up stage, which will be the final task that I plan to do on all this.
The tidying-up stage will involve turning all of the commits on the FIREFOX_ESR_91_9_X_RELBRANCH_patches branch of the gecko repository into patches that can be applied as part of the RPM build process. That's going to be a fair bit of work, because I'll need to rationalise and combine patches as well as cross-referencing them against the ESR 78 patches, but it should at least be quite mechanical. That makes it an appealing task to end on.
But right now I want to see if there's a way to integrate Frajo's LD_PRELOAD workaround into the gecko code. You'll recall that by running the browser with this environment variable set to the eglplatform_wayland.so shared library, this ensures the library is dynamically loaded at start up and not dropped. This then prevented the Wayland crash that I've been trying to figure out for the last few days.
As soon as I saw Frajo's fix I thought of patch 0038 "Fix mesa egl display and buffer initialisation" which has code in to dynamically load various functions exported from the libwayland-egl.so.1 library. This isn't a patch I applied, because it was too messy to do so earlier and, I had thought, the changes would only be needed for native Sailfish OS ports. That's not to say that it's not important to have the browser working on native ports, but the plan was always to get something working rather than get something perfect. So applying this patch felt like extra complexity that it would be better to avoid.
But maybe the patch is now needed to fix this problem too?
It's an awkward patch to apply because all of the code it's supposed to apply to has changed considerably. The underlying structure is all bound up in the GLScreenBuffer changes that were such a big deal previously. As a result I've had to apply the entire patch manually.
That's okay, but it was a fair bit of work. And I'm not totally certain that I've applied it correctly either. It's hard for me to test it properly, I'm hoping others may be able to do this for me. The original patch was put together with Adam Pigg and Frajo, and I'm hoping they'll be able to help again with this at some point.
But right now I just need to know the following: After the changes...
But unfortunately it doesn't seem to have had any beneficial effect when LD_PRELOAD is left unset. So the next step is to check whether any of the changes I've made are actually being executed.
First up I've added breakpoints to the LoadWaylandFunctions() and UnloadWaylandFunctions() methods which were added by this patch. It turns out, at least on an Xperia 10 III, neither of these breakpoints are hit:
For a few of the more interesting cases I also captured backtraces. These show us that the first couple of calls to open libwayland-egl.so are all triggered by the Qt code, rather than the gecko code. The eglplatform_wayland.so library is loaded by gecko, but it's loaded first as a result of a call to fGetDisplay() rather than the explicit load that I added. This explicit load does execute, but slightly later on and, in fact, it's the last thing to happen before the crash occurs.
Here's what I've added:
A bit more testing shows that even if I move it to right at the end of the main() function, just before the execution loop is started, the crash is still avoided:
Happily this still works: the browser and WebView both work without the need to set the LD_PRELOAD variable. I've cleaned up the code so that the library is searched for, rather than having the hardcoded path, but it's otherwise essentially the same. We can see the debug output when executing with an appropriate QT_LOGGING_RULES setting. See here, where have the debug output from platform_egl_workaround_open:
I'm not so convinced myself. ESR 78 is working fine on Sailfish OS 4.6 and it's unclear to me why it would be unaffected by this if it's a problem in the underlying libhybris code. Nevertheless for now the fix is there and it's working. Maybe this will be something to return to in the future, but for now, that's good enough for me.
And this is also good enough for a day's work on this today. Tomorrow I'll finalise this so that I can then move on to the tidying up phase!
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
The tidying-up stage will involve turning all of the commits on the FIREFOX_ESR_91_9_X_RELBRANCH_patches branch of the gecko repository into patches that can be applied as part of the RPM build process. That's going to be a fair bit of work, because I'll need to rationalise and combine patches as well as cross-referencing them against the ESR 78 patches, but it should at least be quite mechanical. That makes it an appealing task to end on.
But right now I want to see if there's a way to integrate Frajo's LD_PRELOAD workaround into the gecko code. You'll recall that by running the browser with this environment variable set to the eglplatform_wayland.so shared library, this ensures the library is dynamically loaded at start up and not dropped. This then prevented the Wayland crash that I've been trying to figure out for the last few days.
$ LD_PRELOAD=/usr/lib64/libhybris/eglplatform_wayland.so gdb sailfish-browser [...]If, like me, you're not yet fully familiar with LD_PRELOAD then there's a nice article on Baeldung Linux.
As soon as I saw Frajo's fix I thought of patch 0038 "Fix mesa egl display and buffer initialisation" which has code in to dynamically load various functions exported from the libwayland-egl.so.1 library. This isn't a patch I applied, because it was too messy to do so earlier and, I had thought, the changes would only be needed for native Sailfish OS ports. That's not to say that it's not important to have the browser working on native ports, but the plan was always to get something working rather than get something perfect. So applying this patch felt like extra complexity that it would be better to avoid.
But maybe the patch is now needed to fix this problem too?
It's an awkward patch to apply because all of the code it's supposed to apply to has changed considerably. The underlying structure is all bound up in the GLScreenBuffer changes that were such a big deal previously. As a result I've had to apply the entire patch manually.
That's okay, but it was a fair bit of work. And I'm not totally certain that I've applied it correctly either. It's hard for me to test it properly, I'm hoping others may be able to do this for me. The original patch was put together with Adam Pigg and Frajo, and I'm hoping they'll be able to help again with this at some point.
But right now I just need to know the following: After the changes...
- ...does the browser still work if I use the LD_PRELOAD workaround?
- ...does this help get the browser to work without the workaround?
But unfortunately it doesn't seem to have had any beneficial effect when LD_PRELOAD is left unset. So the next step is to check whether any of the changes I've made are actually being executed.
First up I've added breakpoints to the LoadWaylandFunctions() and UnloadWaylandFunctions() methods which were added by this patch. It turns out, at least on an Xperia 10 III, neither of these breakpoints are hit:
$ LD_PRELOAD=/usr/lib64/libhybris/eglplatform_wayland.so gdb sailfish-browser [...] (gdb) break LoadWaylandFunctions Breakpoint 2 at 0x7ff2330b5c: file ${PROJECT}/gecko-dev/gfx/gl/ GLContextProviderEGL.cpp, line 208. (gdb) break UnloadWaylandFunctions Breakpoint 3 at 0x7ff235de8c: file ${PROJECT}/gecko-dev/gfx/gl/ GLContextProviderEGL.cpp, line 260. (gdb) c Continuing. [...]No hits. So maybe we need to add a call to somewhere else that does get executed and, if necessary, add some code there to do something similar. My initial thinking is that the WaylandGLSurface constructor might be a good place. But it turns out this isn't being executed either:
(gdb) break WaylandGLSurface::WaylandGLSurface Breakpoint 2 at 0x7ff2332fdc: file ${PROJECT}/gecko-dev/gfx/gl/ GLContextProviderEGL.cpp, line 952. (gdb) c Continuing. [...]No hits. So I've settled on the GLContextEGL constructor, since I know this gets executed and it feels related to what we're trying to do. So I've added in a call to LoadWaylandFunctions() at the end of the constructor. My reasoning is that if the dynamic library is opened there, maybe it'll have the additional reference count it needs to keep the library open throughout execution.
GLContextEGL::GLContextEGL(const std::shared_ptr<EglDisplay> egl, const GLContextDesc& desc, EGLConfig config, EGLSurface surface, EGLContext context) : GLContext(desc, nullptr, false), mEgl(egl), mConfig(config), mContext(context), mSurface(surface), mFallbackSurface(EGL_NO_SURFACE) { #ifdef DEBUG printf_stderr("Initializing context %p surface %p on display %p\n", mContext, mSurface, mEgl->mDisplay); #endif #if defined(MOZ_WIDGET_QT) LoadWaylandFunctions(); #endif }Unfortunately this change doesn't make any discernible difference to the crash, so I've decided to shift to investigation mode again to try to find out where the library is currently being opened, as well as where it's being closed, in the hope this will provide some clarity as to either what needs fixing, or where to put the dynamic library loading code.
$ gdb sailfish-browser [...] (gdb) b dlopen Function "dlopen" not defined. Make breakpoint pending on future shared library load? (y or [n]) y Breakpoint 1 (dlopen) pending. (gdb) b dlclose Function "dlclose" not defined. Make breakpoint pending on future shared library load? (y or [n]) y Breakpoint 2 (dlclose) pending. (gdb) r Starting program: /usr/bin/sailfish-browser [...] Breakpoint 1, __dlopen (file=0x555565fa88 "/usr/lib64/qt5/plugins/ platforms/libqwayland-generic.so", mode=4097) at dlopen.c:75 75 { (gdb) c Continuing. [...] Breakpoint 1, __dlopen (file=0x555566f428 "/usr/lib64/qt5/plugins/ platforminputcontexts/libmaliitplatforminputcontextplugin.so", mode=4097) at dlopen.c:75 75 { (gdb) c Continuing. [...]It turns out there are a great many dynamic libraries opened by the code. I don't know why I was quite so surprised at just how many there are as it makes perfect sense in retrospect, but I was surprised nevertheless. The debug output goes on and on, but all looks similar to the above. So I've cut out everything except the function call and library names.
__dlopen() libqwayland-generic.so __dlopen() libmaliitplatforminputcontextplugin.so __dlopen() libcustomcontext.so __dlopen() libwayland-egl.so __dlopen() linker/q.so __dlopen() eglplatform_wayland.so __dlopen() liblgpllibs.so __dlopen() libxul.so __dlopen() file=0x0 rtld_active __dlopen() file=0x0 __dlopen() libqgif.so __dlopen() libqico.so __dlopen() libqjpeg.so __dlopen() libqsvg.so __dlopen() libqtiff.so __dlopen() libqwebp.so __dlopen() libqtquick2plugin.so __dlopen() libwindowplugin.so __dlopen() libsailfishsilicaplugin.so __dlopen() libsailfishpolicyplugin.so __dlopen() libsystemsettingsplugin.so __dlopen() libsystemsettingsplugin.so __dlopen() libnemosystemsettings.so __dlopen() libSailfishSilicaBackgroundPlugin.so __dlopen() libGLESv1_CM.so.1 __dlopen() libGLESv1_CM.so.1 __dlopen() libGLESv1_CM.so.1 __dlopen() libGLESv1_CM.so.1 __dlopen() libGLESv1_CM.so.1 __dlopen() libnemoconfiguration.so __dlopen() libnemodbus.so __dlopen() libsailfishwebviewpickersplugin.so __dlopen() libsailfishwebviewpopupsplugin.so __dlopen() libsailfishwebviewcontrolsplugin.so __dlopen() libqmlmozembedpluginqt5.so __dlopen() libQOfonoQtDeclarative.so __dlopen() libkeepaliveplugin.so __dlopen() libsailfishwebengineplugin.so __dlopen() libConnmanQtDeclarative.so __dlopen() libnemopolicy.so __dlopen() libnemoconnectivity.so __dlopen() libsailfishshareplugin.so __dlopen() libqtgraphicaleffectsprivate.so __dlopen() libmodelsplugin.so __dlopen() libqsqlite.so __dlopen() libqconnmanbearer.so __dlopen() libdeclarative_feedback.so __dlopen() libqtfeedback_libngf.so __dlopen() libGLESv2.so.2 __dlopen() libsoftokn3.so __dlopen() libfreeblpriv3.so __dlopen() libnspr4.so __dlopen() libnssutil3.so __dlopen() libnssckbi.so __dlopen() libEGL.so.1 rtld_active() __dlopen() libEGL.so __dlopen() libEGL.so.1 __dlopen() libGL.so __dlopen() libGL.so.1 __dlopen() libGLESv2.so __dlopen() libGLESv2.so.2 rtld_active() __dlopen() eglplatform_wayland.so __dlopen() ibosclientcerts.soEven with all the cruft cut out it's still a long list. Here the __dlopen lines are where a dynamic library is opened, whereas the rtld_active() calls are where a library is closed. Unfortunately the debugger doesn't provide clues as to which library is being closed.
For a few of the more interesting cases I also captured backtraces. These show us that the first couple of calls to open libwayland-egl.so are all triggered by the Qt code, rather than the gecko code. The eglplatform_wayland.so library is loaded by gecko, but it's loaded first as a result of a call to fGetDisplay() rather than the explicit load that I added. This explicit load does execute, but slightly later on and, in fact, it's the last thing to happen before the crash occurs.
Breakpoint 1, __dlopen (file=0x5555675868 "/usr/lib64/qt5/plugins/ wayland-graphics-integration-client/libwayland-egl.so", mode=4097) at dlopen.c:75 (gdb) bt #0 __dlopen (file=0x5555675868 "/usr/lib64/qt5/plugins/ wayland-graphics-integration-client/libwayland-egl.so", mode=4097) at dlopen.c:75 #1 0x0000007fef9b6088 in ?? () from /usr/lib64/libQt5Core.so.5 #2 0x0000007fef9af4bc in ?? () from /usr/lib64/libQt5Core.so.5 #3 0x0000007fef9af94c in ?? () from /usr/lib64/libQt5Core.so.5 #4 0x0000007fef9a2820 in QFactoryLoader::instance(int) const () from /usr/ lib64/libQt5Core.so.5 #5 0x0000007fe770cdb4 in ?? () from /usr/lib64/libQt5WaylandClient.so.5 #6 0x0000007fe76eea78 in QtWaylandClient::QWaylandIntegration:: initializeClientBufferIntegration() () from /usr/lib64/ libQt5WaylandClient.so.5 #7 0x0000007fe76eebfc in QtWaylandClient::QWaylandIntegration:: clientBufferIntegration() const () from /usr/lib64/libQt5WaylandClient.so.5 #8 0x0000007fe76ee598 in QtWaylandClient::QWaylandIntegration::hasCapability( QPlatformIntegration::Capability) const () from /usr/lib64/libQt5WaylandClient.so.5 #9 0x0000007ff099ec18 in QSGRenderLoop::instance() () from /usr/lib64/ libQt5Quick.so.5 #10 0x0000007ff09cf2b4 in QQuickWindowPrivate::init(QQuickWindow*, QQuickRenderControl*) () from /usr/lib64/libQt5Quick.so.5 #11 0x0000007ff0a7262c in QQuickView::QQuickView(QWindow*) () from /usr/lib64/ libQt5Quick.so.5 #12 0x0000007ff0ca5a80 in MDeclarativeCachePrivate::qQuickView() () from /usr/ lib64/libmdeclarativecache5.so.0 #13 0x000000555557b31c in main (argc=<optimized out>, argv=0x7ffffff298) at main.cpp:88 [...] Breakpoint 1, __dlopen (file=0x7fffffe418 "/usr/lib64/libhybris// eglplatform_wayland.so", mode=1) at dlopen.c:75 (gdb) bt #0 __dlopen (file=0x7fffffe418 "/usr/lib64/libhybris// eglplatform_wayland.so", mode=1) at dlopen.c:75 #1 0x0000007feeebbb04 in ws_init () from /usr/lib64/libEGL.so.1 #2 0x0000007feeeba478 in ?? () from /usr/lib64/libEGL.so.1 #3 0x0000007fe751f008 in ?? () from /usr/lib64/qt5/plugins/ wayland-graphics-integration-client/libwayland-egl.so #4 0x0000007fe76ee8f0 in QtWaylandClient::QWaylandIntegration:: initializeClientBufferIntegration() () from /usr/lib64/ libQt5WaylandClient.so.5 #5 0x0000007fe76eebfc in QtWaylandClient::QWaylandIntegration:: clientBufferIntegration() const () from /usr/lib64/libQt5WaylandClient.so.5 [...] Thread 41 "Compositor" hit Breakpoint 1, __dlopen (file=0x7fb4f43498 "/usr/lib64/libhybris//eglplatform_wayland.so", mode=1) at dlopen.c:75 (gdb) bt #0 __dlopen (file=0x7fb4f43498 "/usr/lib64/libhybris// eglplatform_wayland.so", mode=1) at dlopen.c:75 #1 0x0000007feeebbb04 in ws_init () from /usr/lib64/libEGL.so.1 #2 0x0000007feeeba478 in ?? () from /usr/lib64/libEGL.so.1 #3 0x0000007ff236afa0 in mozilla::gl::GLLibraryEGL::fGetDisplay ( display_id=0x0, this=0x7ee81a3b60) at ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.h:193 #4 mozilla::gl::GetAndInitDisplay (egl=..., displayType=displayType@entry=0x0, display=display@entry=0x0) at ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp:151 #5 0x0000007ff236b550 in mozilla::gl::GLLibraryEGL::CreateDisplay ( this=this@entry=0x7ee81a3b60, forceAccel=forceAccel@entry=false, out_failureId=out_failureId@entry=0x7fb4f43fe0, aDisplay=aDisplay@entry=0x0) at ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp:813 #6 0x0000007ff236c6a4 in mozilla::gl::GLLibraryEGL::DefaultDisplay ( this=0x7ee81a3b60, out_failureId=out_failureId@entry=0x7fb4f43fe0) at ${PROJECT}/gecko-dev/gfx/gl/GLLibraryEGL.cpp:745 #7 0x0000007ff236c7c4 in mozilla::gl::GLContextProviderEGL:: CreateWrappingExisting (aContext=0x7ee80048a0, aSurface=0x5555b9ac50, aDisplay=<optimized out>) at ${PROJECT}/obj-build-mer-qt-xr/dist/include/mozilla/StaticPtr.h:150 #8 0x0000007ff4d77264 in mozilla::embedlite::nsWindow::GetGLContext ( this=this@entry=0x7fbc9b8e10) at ${PROJECT}/gecko-dev/mobile/sailfishos/embedshared/nsWindow.cpp:405 #9 0x0000007ff4d7741c in mozilla::embedlite::nsWindow::GetNativeData ( this=0x7fbc9b8e10, aDataType=12) at ${PROJECT}/gecko-dev/mobile/sailfishos/embedshared/nsWindow.cpp:173 #10 0x0000007ff23e80f4 in mozilla::layers::CompositorOGL::CreateContext ( this=this@entry=0x7ee81a32a0) at ${PROJECT}/gecko-dev/gfx/layers/opengl/CompositorOGL.cpp:232 #11 0x0000007ff23fd31c in mozilla::layers::CompositorOGL::Initialize ( this=0x7ee81a32a0, out_failureReason=0x7fb4f445a0) at ${PROJECT}/gecko-dev/gfx/layers/opengl/CompositorOGL.cpp:387 #12 0x0000007ff251cd50 in mozilla::layers::CompositorBridgeParent:: NewCompositor (this=this@entry=0x7fbc9ab8f0, aBackendHints=...) at ${PROJECT}/gecko-dev/gfx/layers/ipc/CompositorBridgeParent.cpp:1493 #13 0x0000007ff2533bb4 in mozilla::layers::CompositorBridgeParent:: InitializeLayerManager (this=this@entry=0x7fbc9ab8f0, aBackendHints=...) at ${PROJECT}/gecko-dev/gfx/layers/ipc/CompositorBridgeParent.cpp:1436 #14 0x0000007ff2533d40 in mozilla::layers::CompositorBridgeParent:: AllocPLayerTransactionParent (this=this@entry=0x7fbc9ab8f0, aBackendHints=..., aId=...) at ${PROJECT}/gecko-dev/gfx/layers/ipc/CompositorBridgeParent.cpp:1546 [...] Thread 41 "Compositor" hit Breakpoint 1, __dlopen (file=0x7ff62b2bc0 "eglplatform_wayland.so", mode=1) at dlopen.c:75 (gdb) bt #0 __dlopen (file=0x7ff62b2bc0 "eglplatform_wayland.so", mode=1) at dlopen.c:75 #1 0x0000007ff2353edc in mozilla::gl::GLContextEGL::GLContextEGL ( this=0x7ee81aaea0, egl=std::shared_ptr<mozilla::gl::EglDisplay> (use count 3, weak count 2) = {...}, desc=..., config=0x0, surface=0x5555b9ac50, context=0x7ee80048a0) at ${PROJECT}/gecko-dev/gfx/gl/GLContextProviderEGL.cpp:519 #2 0x0000007ff236c81c in mozilla::gl::GLContextProviderEGL:: CreateWrappingExisting (aContext=0x7ee80048a0, aSurface=0x5555b9ac50, aDisplay=<optimized out>) at ${PROJECT}/gecko-dev/gfx/gl/ GLContextProviderEGL.cpp:1216 #3 0x0000007ff4d77264 in mozilla::embedlite::nsWindow::GetGLContext ( this=this@entry=0x7fbc9b8e10) at ${PROJECT}/gecko-dev/mobile/sailfishos/embedshared/nsWindow.cpp:405 #4 0x0000007ff4d7741c in mozilla::embedlite::nsWindow::GetNativeData ( this=0x7fbc9b8e10, aDataType=12) at ${PROJECT}/gecko-dev/mobile/sailfishos/embedshared/nsWindow.cpp:173 #5 0x0000007ff23e80f4 in mozilla::layers::CompositorOGL::CreateContext ( this=this@entry=0x7ee81a32a0) at ${PROJECT}/gecko-dev/gfx/layers/opengl/CompositorOGL.cpp:232 #6 0x0000007ff23fd31c in mozilla::layers::CompositorOGL::Initialize ( this=0x7ee81a32a0, out_failureReason=0x7fb4f445a0) at ${PROJECT}/gecko-dev/gfx/layers/opengl/CompositorOGL.cpp:387 #7 0x0000007ff251cd50 in mozilla::layers::CompositorBridgeParent:: NewCompositor (this=this@entry=0x7fbc9ab8f0, aBackendHints=...) at ${PROJECT}/gecko-dev/gfx/layers/ipc/CompositorBridgeParent.cpp:1493 #8 0x0000007ff2533bb4 in mozilla::layers::CompositorBridgeParent:: InitializeLayerManager (this=this@entry=0x7fbc9ab8f0, aBackendHints=...) at ${PROJECT}/gecko-dev/gfx/layers/ipc/CompositorBridgeParent.cpp:1436 #9 0x0000007ff2533d40 in mozilla::layers::CompositorBridgeParent:: AllocPLayerTransactionParent (this=this@entry=0x7fbc9ab8f0, aBackendHints=..., aId=...) at ${PROJECT}/gecko-dev/gfx/layers/ipc/CompositorBridgeParent.cpp:1546 #10 0x0000007ff4d5e6e4 in mozilla::embedlite::EmbedLiteCompositorBridgeParent:: AllocPLayerTransactionParent (this=0x7fbc9ab8f0, aBackendHints=..., aId=...) at ${PROJECT}/gecko-dev/mobile/sailfishos/embedthread/ EmbedLiteCompositorBridgeParent.cpp:80 #11 0x0000007ff1df7634 in mozilla::layers::PCompositorBridgeParent:: OnMessageReceived (this=0x7fbc9ab8f0, msg__=...) at PCompositorBridgeParent.cpp:1285 [...]So I've ended up adding the following code into the GLContextEGL.cpp file. This is in the GLContextEGL constructor and the new part is the section surrounded by the MOZ_WIDGET_QT pre-processor condition:
static bool platformEglFunctionsLoaded = false; static void *platformEglHandle = nullptr; static void (*_wl_proxy_destroy_platform)(struct wl_proxy *proxy) = nullptr; GLContextEGL::GLContextEGL(const std::shared_ptr<EglDisplay> egl, const GLContextDesc& desc, EGLConfig config, EGLSurface surface, EGLContext context) : GLContext(desc, nullptr, false), mEgl(egl), mConfig(config), mContext(context), mSurface(surface), mFallbackSurface(EGL_NO_SURFACE) { #ifdef DEBUG printf_stderr("Initializing context %p surface %p on display %p\n", mContext, mSurface, mEgl->mDisplay); #endif #if defined(MOZ_WIDGET_QT) printf_stderr("DLOPEN: checking eglplatform_wayland.so\n"); if (platformEglFunctionsLoaded) { printf_stderr("DLOPEN: already loaded\n"); return; } printf_stderr("DLOPEN: loading eglplatform_wayland.so\n"); platformEglHandle = dlopen("eglplatform_wayland.so", RTLD_NOW | RTLD_GLOBAL | RTLD_NODELETE | RTLD_DEEPBIND); if (!platformEglHandle) { printf_stderr("DLOPEN: error loading eglplatform_wayland.so\n"); } platformEglFunctionsLoaded = true; *(void **)(&_wl_proxy_destroy_platform) = dlsym(platformEglHandle, "wl_proxy_destroy"); if (!_wl_proxy_destroy_platform) { printf_stderr("DLOPEN: Error loading wl_proxy_destroy from eglplatform_wayland.so\n"); } else { printf_stderr("DLOPEN: loaded wl_proxy_destroy from eglplatform_wayland.so\n"); } printf_stderr("DLOPEN: loaded eglplatform_wayland.so\n"); #endif }The idea here is that the dynamic library is loaded, with all of the flags that looked relevant applied. A reference to the wl_proxy_destroy() method is pulled from the library and all this is just to try to ensure the library stays loaded. But, unfortunately, with all of this added the crash still occurs. We can see that the code is being executed from the debug output, it's just not having the desired effect:
$ sailfish-browser [...] Created LOG for EmbedLiteLayerManager library "libui_compat_layer.so" not found DLOPEN: checking eglplatform_wayland.so DLOPEN: loading eglplatform_wayland.so DLOPEN: loaded wl_proxy_destroy from eglplatform_wayland.so DLOPEN: loaded eglplatform_wayland.so Segmentation faultOne possibility is that the execution of the code is happening too late to be useful. If Qt has already opened the library, it's possible that this code is having no real effect when it comes to being closed again. To test out this theory I've moved this code to the very top of the main() function in the sailfish-browser code. This is practically the first thing that gets executed, so if anywhere is going to work, it should be here.
Here's what I've added:
static void platform_egl_workaround() { static void *platformEglHandle = nullptr; if (platformEglHandle) { return; } printf("Pre-loading eglplatform_wayland.so\n"); platformEglHandle = dlopen("/usr/lib64/libhybris/ eglplatform_wayland.so", RTLD_LAZY); if (!platformEglHandle) { printf("Error pre-loading eglplatform_wayland.so\n"); } }With this added the browser now runs successfully, even without LD_PRELOAD being set. So that's a nice result. But this is definitely not the right place for it. For example, although this fixes things for the browser, the problem persists with the WebView, since this doesn't using the main() method at all.
A bit more testing shows that even if I move it to right at the end of the main() function, just before the execution loop is started, the crash is still avoided:
Q_DECL_EXPORT int main(int argc, char *argv[]) { [...] platform_egl_workaround(); return app->exec(); }That means I have a fair bit of leeway here in finding a suitable place. There are a number of options, but I'm gravitating towards the qtmozembed codebase, since this is used by both the browser and WebView and wraps the rendering code. I've therefore added something similar in to the QMozContextPrivate constructor.
Happily this still works: the browser and WebView both work without the need to set the LD_PRELOAD variable. I've cleaned up the code so that the library is searched for, rather than having the hardcoded path, but it's otherwise essentially the same. We can see the debug output when executing with an appropriate QT_LOGGING_RULES setting. See here, where have the debug output from platform_egl_workaround_open:
$ QT_LOGGING_RULES="org.sailfishos.embedliteext=true" harbour-webview [D] unknown:0 - Using Wayland-EGL library "libui_compat_layer.so" not found library "libutils.so" not found library "libcutils.so" not found library "libhardware.so" not found library "android.hardware.graphics.mapper@2.0.so" not found library "android.hardware.graphics.mapper@2.1.so" not found library "android.hardware.graphics.mapper@3.0.so" not found library "android.hardware.graphics.mapper@4.0.so" not found library "libc++.so" not found library "libhidlbase.so" not found library "libgralloctypes.so" not found library "android.hardware.graphics.common@1.2.so" not found library "libion.so" not found library "libz.so" not found library "libhidlmemory.so" not found library "android.hidl.memory@1.0.so" not found library "vendor.qti.qspmhal@1.0.so" not found [D] QMozContextPrivate::QMozContextPrivate:102 - Create new Context: 0x71a7f7e778 , parent: 0x0 /usr/bin [D] platform_egl_workaround_open:71 - Pre-loading eglplatform_wayland.so at from "/usr/lib64/libhybris//eglplatform_wayland.so" greHome from GRE_HOME:/usr/bin libxul.so is not found, in /usr/bin/libxul.soI admit that this isn't the nicest of solutions: it's very much a hack. But Raine seems to think that the underlying problem may be in the libhybris code rather than in the gecko code. He thinks it may be due to a lack of reference counting in ws_init and ws_Terminate. If that's the case then the proper fix will need to go there and, once that's deployed, this hack can be removed.
I'm not so convinced myself. ESR 78 is working fine on Sailfish OS 4.6 and it's unclear to me why it would be unaffected by this if it's a problem in the underlying libhybris code. Nevertheless for now the fix is there and it's working. Maybe this will be something to return to in the future, but for now, that's good enough for me.
And this is also good enough for a day's work on this today. Tomorrow I'll finalise this so that I can then move on to the tidying up phase!
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comments
Uncover Disqus comments