flypig.co.uk

List items

Items from the current list are shown below.

Blog

25 Aug 2024 : Day 330 #
Over the last few days I've been mopping up the last few outstanding issues with the browser. Unfortunately I wasn't able to get the browser cover hang (which is now a crash) fixed, but the workaround is just to disable it for now. The active-tab preview still works, it's only when there are no tabs that the cover will be different.

As a result of where things are at I was all ready to make packages available for others to install and test. I even installed them on my daily phone with the plan to use them as my only browser to see how I got on.

But during the process of doing this I discovered that the WebView is no longer working with the Settings app or the Email app on Sailfish OS 4.6. These were definitely working previously on 4.5, so my immediate thought was that the workaround for the Wayland EGL dynamic loading issue was failing for these apps.

These are both critical apps for Sailfish OS, so I can't reasonably release packages that are going to break them to this extent, so I've decided to hold back on releasing the packages and the instructions for installing them just for now until this can be resolved.

We'll come back to the issue of the Settings and Email apps a little later. Since then I've been having some really fruitful discussions with Frajo (krnlyng). I'm sure I've explained this before, but Frajo is one of Jolla's hardware adaptation and Android App Support gurus and also a good friend. As a Jolla employee it's safe to assume he really knows his stuff. And it's also the truth. Back in 2017, before Frajo started working for Jolla, he was interviewed for the Jolla blog. It's well worth a read, even if some of the info is now a little outdated.

Frajo went to the trouble of testing the ESR 91 packages a few days back on Sailfish OS 4.6. Up to that point I'd only ever run them on 4.5, so was disappointed to discover that they weren't working on 4.6. Some of this context I've already covered in previous diary entries over the last few days.

To cut to the chase, Frajo got to the bottom of the reason for the failure on 4.6, which turns out to be due to a bug at quite a low level in libhybris. Lower than I would ever typically have to deal with myself. My naive understanding of libhybris is that it provides a conversion interface between glibc and the Android Bionic and Binder interfaces. Amongst other things this allows Linux (and therefore Sailfish OS) to transparently make use of Android drivers.

The issue discovered by Frajo is that libhybris will close the dynamically loaded eglplatform_wayland.so library once it's determined that it's no longer being used by any processes. Unfortunately under certain circumstances (specifically if the display is initialise multiple times in more than one thread) it can happen that libhybris will close the dynamic library even though it's still in use.

Frajo, via Raine, provided the workaround of using LD_PRELOAD that I've been using over the last few days. Since then Frajo, in discussion with mal, developed a fix for libhybris which you can see in the associated pull request to the repository. Here's how Frajo explains the relationship between the changes in this pull request and the use of LD_PRELOAD:
 
One might wonder why the LD_PRELOAD trick works at all because the ws_Terminate function sets the wsmod and ws pointers to 0. But i think i know why (a sequence of function calls and my understanding of what happens):

On a thread T1:
  1. eglInitialize() and subsequently dlopen(eglplatform_wayland)
  2. other egl calls...
On another thread T2 interleaved/simultaneously to T1's 2.
  1. eglInitialize() no dlopen
  2. eglTerminate() -> dlclose(eglplatform_wayland) -> ws and wsmod pointers are 0 which results in sometimes the assert(ws != NULL) firing as can be seen sometimes when running the browser when libhybris debug logs are active. C. eglInitialize() -> dlopen(eglplatform_wayland)
  3. pointers ws and wsmod are valid, but different than before and since the library was unloaded in step B and reloaded in step C there might be some pointers or structures that the step 2 stuff depends on which is now different/invalid.
But when LD_PRELOAD is set to preload the eglplatform_wayland library, the library never gets unloaded so the stuff that 2 depends on is still the same stuff after B/C despite the ws and wsmod pointers getting temporarily set to 0 in B (so there might even still be a case where the LD_PRELOAD trick won't work if the T1 thread manages to call a ws function while the pointers are 0).

Frajo tested out ESR 91 with his libhybris fix applied and confirmed that this allowed the browser to run, even without the workarounds. Of course I was keen to test this out for myself, but I also had another reason for wanting to test this solution, which is that I was keen to know whether it would fix the issues with the Settings and Email apps as well.

So today Frajo kindly shared RPM packages for libhybris and friends so that I could test the changes in the pull request on my own device. As per his very wise suggestion, I first made a copy of all of the related packages currently installed on my phone just in case something were to go wrong. Here's how I installed the packages provided by Frajo:
$ rpm -U \
    libhybris-0.0.5.52+pr.testing.20240825175210.1.gdfa51d7-1.8.1.jolla.aarch64.rpm \
    libhybris-libEGL-0.0.5.52+pr.testing.20240825175210.1.gdfa51d7-1.8.1.jolla.aarch64.rpm \
    libhybris-libGLESv2-0.0.5.52+pr.testing.20240825175210.1.gdfa51d7-1.8.1.jolla.aarch64.rpm \
    libhybris-libhardware-0.0.5.52+pr.testing.20240825175210.1.gdfa51d7-1.8.1.jolla.aarch64.rpm \
    libhybris-libsync-0.0.5.52+pr.testing.20240825175210.1.gdfa51d7-1.8.1.jolla.aarch64.rpm \
    libhybris-libwayland-egl-0.0.5.52+pr.testing.20240825175210.1.gdfa51d7-1.8.1.jolla.aarch64.rpm
$ systemctl restart --user lipstick
Rather than rebooting the device, restarting lipstick is sufficient to ensure the changes are applied. With this done and with the workaround I added to qtmozembed removed, I had the same experience as Frajo, in that the browser works correctly. That's a great result.

However, I also found that these packages don't fix the issues with the Settings app or the Email app. This motivated me to look into these two crashes a bit further.

It turns out that the Settings app isn't crashing due to the dynamic library loading issue, but because the prefs.js file contains settings that prevent the WebView from working.

After copying across a working prefs.js file from the harbour-webview mozilla profile, the Settings app now works correctly.

The email app suffered from a different, but equally unrelated, issue. It expects the libxul.so library to be found in the /usr/lib64/xulrunner-qt5-78.15.1 directory. This seems to be somehow baked into the email app. But of course the xulrunner packages install things in the /usr/lib64/xulrunner-qt5-91.9.1 directory.

As a workaround for this I've created a symlink so that anything looking for the ESR 78 library will get the ESR 91 library instead:
$ ln -s /usr/lib64/xulrunner-qt5-91.9.1/ /usr/lib64/xulrunner-qt5-78.15.1
Having applied this workaround the email app also works correctly. So the problem with these two apps turns out to be unrelated to the dynamic library loading issue. To be clear though, the libhybris fix is a requirement for them to work, even if it wasn't causing the crash in these cases.

Since I made copies of the original libhybris packages replaced with Frajo's versions, I'm also able to restore my phone to its previous state like this:
$ rpm -U --force \
    libhybris-0.0.5.50-1.4.1.jolla.aarch64.rpm \
    libhybris-libEGL-0.0.5.50-1.4.1.jolla.aarch64.rpm \
    libhybris-libGLESv2-0.0.5.50-1.4.1.jolla.aarch64.rpm \
    libhybris-libhardware-0.0.5.50-1.4.1.jolla.aarch64.rpm \
    libhybris-libsync-0.0.5.50-1.4.1.jolla.aarch64.rpm \
    libhybris-libwayland-egl-0.0.5.50-1.4.1.jolla.aarch64.rpm
$ systemctl restart --user lipstick
What's the consequence of all this? Well, first off is the great news that Frajo, mal and Raine have collectively looked at this and Frajo has submitted a pull request that will fix the issue in future versions of Sailfish OS. This means that the workaround I added to qtmozembed will no longer be needed and as such I have a new task to update this new piece of code, as Frajo requests:
 
I saw you've already made a workaround with the platform_egl_workaround function. Since this is a real bug in libhybris could you make sure that the workaround can be easily disabled so we can still verify that the libhybris patch works, maybe via some environment variable?

So I'll make this change over the coming days. But I also have a bit more work to do: first to fix the issue that's causing the Settings app to crash and second to look at the Email app as well.

Once all of this is sorted I'll then be in a position to release packages and instructions for how to install them.

I'm sincerely grateful to Frajo and the Jolla team for the work they've put in to identifying and fixing the underlying issue in libhybris. I've said it before but it bears repeating: I would never have been able to solve this on my own and there really is no substitute for a team of developers who really know their stuff.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.

Comments

Uncover Disqus comments