flypig.co.uk

List items

Items from the current list are shown below.

Blog

21 Mar 2024 : Day 192 #
Over the last week or so while I've been struggling with what's almost certainly a texture or surface related bug, I've received many more suggestions than I expected from other Sailfish OS developers, both from Sailors (Jolla employees) and independent developers. Receiving all of this helpful input is hugely motivational and much appreciated. I'm always keen to try out others' suggestions, not least because I'm very aware of how much amazing knowledge there is out there and which it's always going to be beneficial to draw from. The Sailfish OS hivemind is impressively well-informed when it comes to technical matters.

However, these diary entries all form part of a pipeline. There's the pipeline of the plans I'm working my way though, on top of which there's also the "editing" pipeline, which means there's a lag of a couple of days between me writing a post and it going online.

So since I already had a set collection of things I wanted to try (checking creation/deletion balance, measuring memory consumption, running the app through valgrind), I've not been in a position to try out some of these suggestions until now — until the other tasks had worked their way through the pipeline — essentially.

Now that I've moved off valgrind and am going back to the source code, now seems like a good time to take a look at some of the suggestions I've received. Now while on the one hand I want to give each suggestion each a fair crack at solving the problem, on the other hand many of the suggestions touch on areas I'm not so familiar with. Consequently I'm going to explain what's happening here, but I may need further input for some of them.

First up, Tone (tortoisedoc) has made a number of useful suggestions recently. The latest is about making use of extra Wayland debugging output:
 
It seems you have exited browser world and are stepping into wayland lands, thats a good sign, problems yes but somewhere else 🙂 you could try the WAYLAND_DEBUG=1 (or "all" iirc) to complete your logs with the wayland surface creation protocol (it might be qt-wayland doesnt support swapchains?). Its a fairly simple protocol.

Thank you for this Tone. Running using WAYLAND_DEBUG=1 certainly produces a lot of output:
$ WAYLAND_DEBUG=1 harbour-webview 2>debug-harbour-webview-esr91-01.txt
[D] unknown:0 - QML debugging is enabled. Only use this in a safe environment.
[D] main:30 - WebView Example
[D] main:44 - Using default start URL:  "https://www.flypig.co.uk/search/
    "
[D] main:47 - Opening webview
[1522475.832]  -> wl_display@1.get_registry(new id wl_registry@2)
[1522477.064]  -> wl_display@1.sync(new id wl_callback@3)
[1522479.962] wl_display@1.delete_id(3)
[1522480.054] wl_registry@2.global(1, "wl_compositor", 3)
[...]
Having completed the initialisation sequence the output from the app then settles into a loop contianing the following:
[1523976.770] wl_buffer@4278190081.release()
[1523976.951] wl_callback@45.done(5115868)
[1523985.299]  -> wl_surface@20.frame(new id wl_callback@45)
[1523985.422]  -> wl_surface@20.attach(wl_buffer@4278190080, 0, 0)
[1523985.486]  -> wl_surface@20.damage(0, 0, 1080, 2520)
[1523985.557]  -> wl_surface@20.commit()
[1523985.581]  -> wl_display@1.sync(new id wl_callback@44)
[1523995.055] wl_display@1.delete_id(44)
[1524013.656] wl_display@1.delete_id(45)
Periodically there are also touch events that punctuate the output, presumably the result of me scrolling the page.
[1531447.943] qt_touch_extension@10.touch(5123658, 68, 65538, 7950000, 
    16980000, 7367, 6740, 50027, 50027, 255, 0, 0, 196608, array)
[1531448.190] qt_touch_extension@10.touch(5123665, 68, 65538, 7940000, 
    16920000, 7358, 6716, 50027, 50027, 255, 0, 0, 196608, array)
This continues right up until the app seizes up. Working through the file I don't see any changes towards the end that might explain why things are going wrong, but maybe Tone or another developer with a keener eye and greater expertise than I have can spot something?

Please do feel free to download the output file yourself to take a look. I've also generated a similar file for the working ESR 78 build and which I was hoping may be useful for comparison. On ESR 78 the loop it settles into is similar:
[2484510.845] wl_buffer@4278190080.release()
[2484515.820] wl_display@1.delete_id(44)
[2484515.928] wl_callback@44.done(186889767)
[2484515.967]  -> wl_surface@20.frame(new id wl_callback@44)
[2484516.006]  -> wl_surface@20.attach(wl_buffer@4278190082, 0, 0)
[2484516.062]  -> wl_surface@20.damage(0, 0, 1080, 2520)
[2484516.129]  -> wl_surface@20.commit()
[2484516.152]  -> wl_display@1.sync(new id wl_callback@49)
[2484516.942] wl_display@1.delete_id(49)
There is a slight difference in ordering: one of the deletes is out of sync across the two versions. Could this be significant? It's quite possible these logs are hiding a critical piece of information, but nothing is currently leaping out at me unfortunately.

A second nice suggestion, this one made by Tomi (tomin) in the same thread, was to check in case the textures aren't being properly released, resulting in the reserve of file descriptors becoming exhausted:
 
The issue that @flypig is now trying to figure out sounds a bit like the one I had with Qt Scene Graph recently. I wasn’t properly releasing textures so it kept reserving dmabuf file descriptors and then eventually the app crashed because it run out of fds (or rather I think it tried to use fd > 1024 for something that’s not compatible with such “high” value). Anyway lsof -p output might be worth looking at.

This sounds very plausible indeed. My suspicion has always been that the particular issue I'm experiencing relates to textures/surfaces not being released, but it hadn't occurred to me at all that it might be file-descriptor related and it certainly wouldn't have occurred to try using lsof to list them.

Executing lsof while the app is running shows over 400 open file descriptors. But there's nothing too dramatic that I can see there. Again, please feel free to check the output file in case you can spot something of interest. Running the command repeatedly to show open file descriptors over time shows a steady increase until it hits 440, at which point it stays fairly steady:
$ while true ; do bash -c \
    'lsof -p $(pgrep "harbour-webview") | wc -l' ; \
    sleep 0.5 ; done
[...]
121
280
351
434
435
444
443
443
443
440
440
440
From this it doesn't look like it's a case of file descriptor exhaustion, but I'd be very interested in others' opinions on this.

I've also received some helpful advice from Raine (rainemak). As it was a private message I don't want to quote it here, but Raine suggests several ways to make better use of the gecko logging capabilities which I'll definitely be making use of. He also suggested to look at a couple of the patches:
 
  1. rpm/0065-sailfishos-gecko-Prioritize-loading-of-extension-ver.patch
  2. rpm/0047-sailfishos-egl-Drop-swap_buffers_with_damage-extensi.patch


I'll take a careful look at these two and report back.

Finally, I need to reiterate my thanks to Florian Xaver (wosrediinanatour). While I've not had a chance to document in this diary all of the useful points Florian has been making over the last few weeks, I still very much intend to do so.

As I round off, I also want to mention this nice article by Arthur Schiwon (blizzz) which details his impressive attempts to reduce the memory footprint of his Nextcloud Talk app. Very relevant to some of the discussion here in recent days and with thanks to Patrick (pherjung) for pointing it out.

A big thank you for all of the great suggestions, feedback and general encouragement. I'm always happy to get this input and gradually it's all helping to create a clearer picture.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.

Comments

Uncover Disqus comments