List items
Items from the current list are shown below.
Gecko
9 Oct 2023 : Day 54 #
I didn't mention it yesterday, but while putting together the OBS build I realised I'd not submitted the PR for the nspr update the build needs. It took just a few hours for mal to check, test and merge the PR; so I extend much gratitude and thanks to him for the fast turnaround.
This PR spawned some interesting talk on IRC. A couple of days ago I mentioned in my diary that the build took 7 hours 36 minutes and 10.3 seconds. This provoked Nico, direc85 and mal to consider the reasons for it taking so long. Nico highlighted the fact that this is much longer than for a normal i486 Firefox build.
That's a huge difference. As I noted in the discussion, just the linking step takes 20 minutes for a Gecko building targeting aarch64 on Sailfish OS (although if you're read my missive from Day 52, you'll know that this requires some caveats).
During the discussion some of the reasons for the longer build time became apparent. First of all, builds simply do take longer using the Sailfish SDK, especially when Rust is involved. This is something that direc85 has experienced through his work with rubdos on Whisperfish. Here's how he describes the current situation for the Whisperfish build pipeline:
The issue isn't just for local builds using sfdk either, as mal pointed out.
There's always going to be a discrepancy with cross-compilation, because some of the build will be happening under emulation using QEMU. But much of that is supposed to be abstracted so that native tools are used where possible (for example compiling using clang or gcc and linking using ld). But as direc85 alludes to with his comment, it's also because we're having to perform builds sequentially, a single job at a time, rather than scaling it up with the number of processors.
For those — like me — who aren't unfamiliar with it, the taskset command allows you to execute a particular process (in this case cargo) so that it only runs using specific CPUs cores, otherwise known as the CPU affinity.
So there is a problem that seems to afflict Rust builds in particular (gecko includes a lot of Rust components) that means that if you try to run them with more than one job, there's a high likelihood the build will hang and have to be restarted.
During the discussion direc85 and Nico had many useful suggestions, for example around futexes and unimplemented Rust features, which could be related. But as yet there is no clear cause and the issue persists.
There's definitely some interesting work to be done around this. I have my own nascent theories about what might be causing the problem, but they're too poorly defined to expand on here. If anyone has thoughts about how to fix this and would like to do some investigation, the whole Sailfish ecosystem (including Jolla's build pipeline) would benefit from a solution. I did create an issue related to this for ESR 91, although it probably deserves a more global issue lodged against scratcbox2 or the SDK tooling.
That's a bit of an aside, but an interesting one I hope.
Today I'm looking at the gecko rendering pipeline. Right now the problem of how to get the rendering pipeline to work remains mysterious, but I do have three leads:
I have asked my question on the gfx-firefox Matrix channel.
Hopefully that will get a response. But this query is a bit open-ended; I think I might get a more useful response if I can formulate a more precise question as well.
However unfortunately it looks like that's all I'm going to have time for today. I'll spend tomorrow poring over the rendering pipeline code to see what I can make of it. If I'm going to make progress I'll also need to actually debug the running code, to see which parts it's touching. After all I may have the completely wrong idea about this.
More tomorrow!
If you'd like to read more about all this gecko stuff, do take a look at my full Gecko Dev Diary.
This PR spawned some interesting talk on IRC. A couple of days ago I mentioned in my diary that the build took 7 hours 36 minutes and 10.3 seconds. This provoked Nico, direc85 and mal to consider the reasons for it taking so long. Nico highlighted the fact that this is much longer than for a normal i486 Firefox build.
Nico: What computer are you using anyway, that it takes so long to build gecko? :D Nico: Because I am used to a firefox build taking less than an hour, usually more around 30 minutes :D
That's a huge difference. As I noted in the discussion, just the linking step takes 20 minutes for a Gecko building targeting aarch64 on Sailfish OS (although if you're read my missive from Day 52, you'll know that this requires some caveats).
During the discussion some of the reasons for the longer build time became apparent. First of all, builds simply do take longer using the Sailfish SDK, especially when Rust is involved. This is something that direc85 has experienced through his work with rubdos on Whisperfish. Here's how he describes the current situation for the Whisperfish build pipeline:
direc85: CI is also single threaded. The host compilation takes 7 minutes (not in SFDK, so it's threaded), but armv7hl and aarch64 take around 38 minutes and i486 106 minutes (because it can't use sccache for reasons).
The issue isn't just for local builds using sfdk either, as mal pointed out.
mal: Nico: even on jolla obs the build takes quite a while, about 1.5 hours for x86 and longer for arm and aarch64 mal: Nico: quite likely the issues come somehow from scratchbox2 used for arm and aarch64 builds
There's always going to be a discrepancy with cross-compilation, because some of the build will be happening under emulation using QEMU. But much of that is supposed to be abstracted so that native tools are used where possible (for example compiling using clang or gcc and linking using ld). But as direc85 alludes to with his comment, it's also because we're having to perform builds sequentially, a single job at a time, rather than scaling it up with the number of processors.
direc85: With Whisperfish — a Rust application — we also must use a single thread when compiling, or the compiler is likely to hang. Running something like "taskset 0x555555 cargo -j8" (pin it for 12 cores and use 8) helps somewhat, but it still hangs almost every time direc85: I'm not sure if it's the same underlying issue, but back in 2006-ish in uni when we made apps for Nokia N810 (with C or C++) we were also forced to use -j1 to prevent hanging. So I have a long history with that one :
For those — like me — who aren't unfamiliar with it, the taskset command allows you to execute a particular process (in this case cargo) so that it only runs using specific CPUs cores, otherwise known as the CPU affinity.
So there is a problem that seems to afflict Rust builds in particular (gecko includes a lot of Rust components) that means that if you try to run them with more than one job, there's a high likelihood the build will hang and have to be restarted.
During the discussion direc85 and Nico had many useful suggestions, for example around futexes and unimplemented Rust features, which could be related. But as yet there is no clear cause and the issue persists.
There's definitely some interesting work to be done around this. I have my own nascent theories about what might be causing the problem, but they're too poorly defined to expand on here. If anyone has thoughts about how to fix this and would like to do some investigation, the whole Sailfish ecosystem (including Jolla's build pipeline) would benefit from a solution. I did create an issue related to this for ESR 91, although it probably deserves a more global issue lodged against scratcbox2 or the SDK tooling.
That's a bit of an aside, but an interesting one I hope.
Today I'm looking at the gecko rendering pipeline. Right now the problem of how to get the rendering pipeline to work remains mysterious, but I do have three leads:
- Check the code changes I had to make in relation to switching GLScreenBuffer for SwapChain. This is almost certainly a contributing factor to the problem, although how to fix it is another matter.
- Check the code changes I made in relation to EglDisplay. Although less likely to be the issue than GLScreenBuffer, it's quite likely that both will need fixing.
- Back on Day 18 I received some great advice from Fabrice, who recommended that I get in touch with the Mozilla graphics team on Matrix. Now would seem like a good time to go that route.
- Finally I have a meeting arranged with Raine from Jolla in a couple of days. As I've explained before, Raine is the master of all things Sailfish OS Gecko, so if anyone can help, he can.
I have asked my question on the gfx-firefox Matrix channel.
Hello all. I'm currently upgrading the gecko-based browser on Sailfish OS from ESR 78 to ESR 91 (we're a bit behind). We previously used GLScreenBuffer, but this has been replaced with SwapChain. I've read through the changes, but am pretty lost. Would this be a good place to get help on this? I'm trying to understand how to make use of SwapChain as a replacement for how we were using GLScreenBuffer.
Hopefully that will get a response. But this query is a bit open-ended; I think I might get a more useful response if I can formulate a more precise question as well.
However unfortunately it looks like that's all I'm going to have time for today. I'll spend tomorrow poring over the rendering pipeline code to see what I can make of it. If I'm going to make progress I'll also need to actually debug the running code, to see which parts it's touching. After all I may have the completely wrong idea about this.
More tomorrow!
If you'd like to read more about all this gecko stuff, do take a look at my full Gecko Dev Diary.
Comments
Uncover Disqus comments