flypig.co.uk

Gecko-dev Diary

Starting in August 2023 I'll be upgrading the Sailfish OS browser from Gecko version ESR 78 to ESR 91. This page catalogues my progress.

Latest code changes are in the gecko-dev sailfishos-esr91 branch.

There is an index of all posts in case you want to jump to a particular day.

Gecko RSS feed Click the icon for the Gecko-dev Diary RSS feed.

Gecko

5 most recent items

19 May 2024 : Day 237 #
I was easing myself gently back into Gecko development yesterday with a quick look over some of the work that others in the Sailfish community have been doing while I've been having a break, related to deciphering the messed up textures that are coming out of the render surface.

Today I'm back looking at the code in earnest. And it's been a fruitful process. My starting point has been to take a look not at the Gecko code, but from the other end: starting with QtMozEmbed and sailfish-components-webview. Taking things easy, I'm just browsing through the code, trying to find where the Surface, as used by Gecko, connects with the texture rendered to the screen by the WebView. If I can figure that out I'll feel like I'm on solid ground.

It looks like a key piece of the puzzle happens in QMozExtTexture::updateTexture(). For example, on ESR 78, when rendering is taking place, the method gets hit often. So often that it appears to be every frame:
Thread 9 "QSGRenderThread" hit Breakpoint 1, QMozExtTexture::
    updateTexture (this=0x7f90024e30) at qmozexttexture.cpp:64
64      {
(gdb) bt
#0  QMozExtTexture::updateTexture (this=0x7f90024e30) at qmozexttexture.cpp:64
#1  0x0000007fbfc04f18 in MozMaterialNode::preprocess (this=0x7f9001ef60) at 
    qmozextmaterialnode.cpp:81
#2  0x0000007fbf8f0c44 in QSGRenderer::preprocess() () from /usr/lib64/
    libQt5Quick.so.5
#3  0x0000007fbf8f0e5c in QSGRenderer::renderScene(QSGBindable const&) () from /
    usr/lib64/libQt5Quick.so.5
#4  0x0000007fbf8f14c0 in QSGRenderer::renderScene(unsigned int) () from /usr/
    lib64/libQt5Quick.so.5
#5  0x0000007fbf8ffe58 in QSGRenderContext::renderNextFrame(QSGRenderer*, 
    unsigned int) () from /usr/lib64/libQt5Quick.so.5
#6  0x0000007fbf9429d0 in QQuickWindowPrivate::renderSceneGraph(QSize const&) (
    ) from /usr/lib64/libQt5Quick.so.5
#7  0x0000007fbf916c04 in ?? () from /usr/lib64/libQt5Quick.so.5
#8  0x0000007fbf91cc10 in ?? () from /usr/lib64/libQt5Quick.so.5
#9  0x0000007fbea290e8 in ?? () from /usr/lib64/libQt5Core.so.5
#10 0x0000007fb74b0a4c in start_thread (arg=0x7fffffe9bf) at pthread_create.c:
    479
#11 0x0000007fbe70b89c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/
    clone.S:78
This contrasts with the ESR 91 build, where the same breakpoint is never hit. I placed a breakpoint on all of the updateTexture() methods just to be sure:
Num Disp Enb Address       What
3   keep y   <MULTIPLE>         
3.1      y   0x07ff7b64ba4 <QSGDistanceFieldGlyphCache::updateTexture()@plt+4>
3.2      y   0x07ff7b64d64 <QSGDefaultPainterNode::updateTexture()@plt+4>
3.3      y   0x07ff7bf071c <QSGDefaultPainterNode::updateTexture()+20>
3.4      y   0x07ff7bf5230 <QSGDistanceFieldGlyphCache::updateTexture()+24>
3.5      y   0x07ff7c1babc <QSGDefaultLayer::updateTexture()+12>
3.6      y   0x07ff7ef846c <QMozExtTexture::updateTexture()+20>
The fact there are hits for ESR 78, but not for ESR 91, is definitely of interest.

This isn't something that's come up before. Until now I've focused almost exclusively on getting the texture rendered to at the other end. But it looks now like the QtMozEmbed code is failing somewhere; the render update isn't being called. Just to be clear though, this isn't due to any change that I've made in QtMozEmbed. At least that seems unlikely at any rate, given I've not made an notable changes to these parts of the code.

More likely, some part of the initialisation process in the Gecko code is failing, or not happening as it should. Something like that could well lead to a situation where the render update doesn't get called. For example, it could be that the texture is never set as part of the Qt Scene Graph. But that's speculation, it could be all sorts of things.

I'm rather excited by this. Maybe this is the key problem we've been searching for? But I don't want to go too far today while I'm still getting myself up to speed with things. So I'll continue looking in to this further tomorrow.

Also worth mentioning is that tomorrow is Jolla Love Day 2. I'm excited to find out what Jolla have in store for us and while I won't be able to attend in-person, I'll definitely be there both online and in spirit!

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
18 May 2024 : Day 236 #
It's been two weeks since I last published an entry in my Gecko dev diary series. The gap was necessary for me to fit various other things into my life which just weren't going to be compatible with me working on the Gecko code, specifically my attendance of the — HPC/AI Days — conference in Durham for a week, followed by preparations for a couple of Pint of Science events I was involved with during this last week. This was my first involvement with Pint of Science and I have to admit it was a lot of fun.

The break from Gecko offered me some healthy respite, but I promised to start back up again today and here we are. I've not yet got back in to the rhythm of Gecko development yet, so my first few entries may be somewhat relaxed. I'll be taking it gently to begin with.

One important reason for this is the problem I'm trying to solve. As I left things a fortnight ago I'm still trying to figure out why the offscreen rendering pipeline is broken. Since starting on trying to fix this problem I've come a long way, but I'm still not there. More importantly, by this point it's become a problem without a clear solution. That means I no longer have a clear path forwards.

The daily rhythm of these posts is important for keeping me focused on the task, but it can also be a distraction. It means chopping the work into day-sized chunks. That can be a hindrance when what the task really needs is a deep analysis, during which there may not be so much to write about. This work is easiest to write about when I'm making code changes. When I'm not making changes to the code, things can be a bit too intangible.

That's where I am today.

Before I get in to the actual work I want to first talk about confusing textures and the amazing work of the Sailfish community in their attempts to disentangle them. If you've been following these diary entries for some time you'll know that before the gap I spent a fair amount of time trying to extract texture image data from the render surface used by the offline renderer.

The result of all this work ended up being a collection of raw dumps of pixel data which, after attempting to convert them to something visible, looked like this.
 
The corrupted, swizzled image data from before the gap. After two weeks setting around unchanged it still just looks like fuzzy pixels to me.

I tried all sorts to get the images into better shape but without any luck. But the courageous Sailfish community took up the baton and continued the work while I was taking a break from it.

There were great contributions from across the community, but I want to especially highlight the efforts and ideas of Tone (tortoisedoc), Adam Pigg (piggz), Ville Nummela (vige, my ex-colleague from Jolla), Mark Washeim (poetaster), thigg, attah, remote and kan_ibal, all of whom provided genuinely useful input.
 
Three screenshots: a texture with some ghostly images shown on it, a zoomed in texture with a few bright pixels and a window showing a similar texture with some metadata in a desktop window

Here you can see some of the efforts to decipher the code. On the left is the result of poetaster having processed the image using ImageMagick. As poetaster explains:
 
I thought it might be an offset issue, but after tooling about in gimp, I'm not so sure. What 'should' they look like. The RGB data does seem to be RBG, the RGBA, I'm not sure. You said unsigned?... bits of sailfish logo can also be recognized. It does look unsigned judging from the results of the convert operation.

In the centre is a close up rendered by piggz using PixelViewer.
 
There is definitely almost an image there! For this I used PixelViewer, which allows to quickly try many different possible formats, this one is RGBA_8888... This is making a great puzzle... just need to figure out how to put the bits together.

When Adam talks about bits I'm not sure whether he's talking about computer bits or puzzle pieces, but either way I agree it should just be a matter of putting them together int the right way.

Finally on the right we can see the image also rendered using PixelViewer this time by kan_ibal in combination with a Python script.
 
There are blocks 8x8 pixels. It reminds me a jpeg compression and looks like a stage of quantization and DCT.

Again, this all sounds very plausible to me, alghouth the image still isn't popping out just yet. Thank you to everyone for your input on this. Unfortunately while none of these resulted in a definitive conclusion, it's all very helpful input. If anyone else would like to give this a go, feel free to take a look at the textures and pass them through your favourite raw image processing pipeline. I'd love to get to the bottom of this.

I'm going to leave it there for today. But tomorrow I'll be starting to look at the Gecko code again in earnest.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comment
18 May 2024 : Gecko dev diary #
I'm back to writing my daily Gecko dev diary again. Staring with a recap of things that have been happening over the last two weeks while I was taking a break from it. There will be updates daily.
6 May 2024 : Gecko dev diary pause #
As I explained in my latest post, I'm pausing my Gecko dev diary for two weeks while I attend Durham HPC/AI Days 2024 and sort a few other things out in my life afterwards. I'll be back up and writing daily again from Saturday 18th May.
3 May 2024 : Day 235 #
As discussed at the start of the week, this is going to be my last developer diary post for a little bit. But I want to make absolutely clear that this is a temporary pause. I'm heading to the HPC Days Conference in Durham next week where I'll be giving a talk on Matching AI Research to HPC Resource. I'm expecting it to be a packed schedule and, as is often the case with this kind of event, I'm not expecting to be able to fit in my usual gecko work alongside this. So there will be a pause, but I'll be back on the 18th May to restart from where I'm leaving things today.

During the pause I'm hoping write up some of the blog posts that I've put on hold while working on the gecko engine, so things may not be completely silent around here. But the key message I want to get across is that I'm not abandoning gecko. Not at all. I'll be back to it in no time.

Nevertheless, as I write this I'm still frustrated and stumped by my lack of progress with the offscreen rendering. So I'm also hoping that a break will give me the chance to come up with some alternative approaches. Over the last couple of days I attempted to capture the contents of the surface used to transfer the offscreen texture onscreen, but the results were inconclusive at best.

On the forums I received helpful advice from Tone (tortoisedoc). Tone highlighted the various areas where texture decoding can go awry:
 
two things can go wrong (assuming the data is correct in the texture):
  • alignment of data (start, stride)
  • format of pixel (ARGB / RGBA etc)
Which one is the texture you are look at configured with?

Is the image the same the browser would show? Which image are you displaying in the embedded view?


These are all great points and great questions. Although I know what sort of image I'm asking for (either RGB or RGBA in UNSIGNED_BYTE format) the results don't seem to be matching that. It's made more complex by the fact that the outline of the image is there (which suggests things like start and stride are correct) but still many of the pixels are just blank. It's somewhat baffling and I think the reason might be more to do with an image that doesn't exist rather than an image which is being generated in the wrong format.

But I could be wrong and I'll continue to consider these possibilities and try to figure out the underlying reason. I really appreciate the input and as always it gives me more go to on.

But today I've shifted focus just briefly by giving the code a last review before the pause: sifting through the code again to try to spot differences.

There were many places I thought there might be differences, such as the fact that the it wasn't clear to me that the GLContext->mDesc.isOffscreen flag was being set. But stepping through the application in the debugger showed it was set after all. So no fix to be had there.

The only difference I can see — and it's too small to make a difference I'm sure — is that the Surface capabilities are set at slightly different places. It's possible that in the gap between where it's set in ESR 78 and the place it's set slightly later in ESR 91, something makes use of it and a difference results.

I'm not convinced to be honest, but to avoid even the slightest doubt I've updated the code so that the values are now set explicitly:
GLContext::GLContext(const GLContextDesc& desc, GLContext* sharedContext,
                     bool useTLSIsCurrent)
    : mDesc(desc),
      mUseTLSIsCurrent(ShouldUseTLSIsCurrent(useTLSIsCurrent)),
      mDebugFlags(ChooseDebugFlags(mDesc.flags)),
      mSharedContext(sharedContext),
      mWorkAroundDriverBugs(
          StaticPrefs::gfx_work_around_driver_bugs_AtStartup()) {
  mCaps.any = true;
  mCaps.color = true;
[...]
}

[...]

bool GLContext::InitImpl() {
[...]
  // TODO: Remove SurfaceCaps::any.
  if (mCaps.any) {
    mCaps.any = false;
    mCaps.color = true;
    mCaps.alpha = false;
  }
[...]
The updated code is built and right now transferring over to my development phone. As I say though, this doesn't look especially promising to me. I'm not holding my breath for a successful render.

And I was right: no improvements after this change. Argh. How frustrating.

I'll continue looking through the fine details of the code. There has to be a difference in here that I'm missing. And while it doesn't make for stimulating diary entries, just sitting and sifting through the code seems like an essential task nonetheless.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.

If you've got this far, then not only do you have my respect and thanks, but I'd also urge you to return on the 18th May when I'll be picking things up from where I've left them today.
Comment