flypig.co.uk

List items

Items from the current list are shown below.

Blog

16 Mar 2024 : Day 187 #
Yesterday I was struggling. And getting this WebView render pipeline working is starting to feel drawn out. I don't want to make any claims about the larger task, but I'm hoping that a good night's sleep and some time to ponder on the best approach in the shower this morning will have helped me focus on the task I'm struggling with.

And this task is figuring out what — if anything &mash; is different between the execution of TextureClient::Destroy() on ESR 78 compared to ESR 91. I'm still labouring under the hypothesis that it's this method that's causing the (hypothetical) memory leak that's causing ESR 91 execution to seize up over time.

The difficulties I experienced yesterday were twofold. First on ESR 78 the actual section of code that reclaims the allocated memory appeared never to be executed. Second applying the debugger to ESR 91 gave peculiar results: what appeared to be an infinite loop of calls to Destroy() that never allowed me to step in to the method.

To tackle these difficulties I'm going to try two things. First I need to stick a breakpoint inside the conditional code that reclaims the memory on ESR 78, to establish whether it ever gets called. Second I'm going to annotate the ESR 91 code with debug prints. That should allow me to get a better idea of the true execution flow. If the debugger isn't playing by the rules, I'll take my ball somewhere else.

So, first up, checking the ESR 78 flow. The structure of the Destroy() method looks like this on ESR 78:
void TextureClient::Destroy() {
[...]
  RefPtr<TextureChild> actor = mActor;
[...]
  TextureData* data = mData;
  if (!mWorkaroundAnnoyingSharedSurfaceLifetimeIssues) {
    mData = nullptr;
  }

  if (data || actor) {
[...]
    DeallocateTextureClient(params);
  }
}
I'm interested in whether it ever goes inside the condition at the end in order to ultimately call DeallocateTextureClient(). If it doesn't — or if this only happens occasionally, say on shutdown — then I'm likely to be looking in the wrong place.

The reason I've never seen it enter this condition is because data (which is derived from the mData class variable) and actor (which is derived from the mActor class variable) have always been null when entering this method.

Let's do this then.
(gdb) break TextureClient.cpp:583
Breakpoint 5 at 0x7fb8e9b2fc: file gfx/layers/client/TextureClient.cpp, line 
    585.
(gdb) r
[...]
Thread 7 &quot;GeckoWorkerThre&quot; hit Breakpoint 5, mozilla::layers::
    TextureClient::Destroy (this=this@entry=0x7f8ea53bb0)
    at gfx/layers/client/TextureClient.cpp:585
585         params.allocator = mAllocator;
(gdb) c
Continuing.
[LWP 16174 exited]

Thread 7 &quot;GeckoWorkerThre&quot; hit Breakpoint 5, mozilla::layers::
    TextureClient::Destroy (this=this@entry=0x7f8effa780)
    at gfx/layers/client/TextureClient.cpp:585
585         params.allocator = mAllocator;
(gdb) p mWorkaroundAnnoyingSharedSurfaceLifetimeIssues
$21 = false
(gdb) p data
$22 = (mozilla::layers::TextureData *) 0x7f8dac3a70
(gdb) p actor
$23 = <optimized out>
(gdb) b DeallocateTextureClient
Breakpoint 6 at 0x7fb8e9a908: file gfx/layers/client/TextureClient.cpp, line 
    490.
(gdb) c
Continuing.

Thread 7 &quot;GeckoWorkerThre&quot; hit Breakpoint 6, mozilla::layers::
    DeallocateTextureClient (params=...)
    at gfx/layers/client/TextureClient.cpp:490
490     void DeallocateTextureClient(TextureDeallocParams params) {
(gdb) p params
$24 = {data = 0x7f8dac3a70, actor = {mRawPtr = 0x7f8d648720}, allocator = 
    {mRawPtr = 0x7f8cb40ff8}, clientDeallocation = false, syncDeallocation = 
    false, workAroundSharedSurfaceOwnershipIssue = false}
(gdb) n
491       if (!params.actor && !params.data) {
(gdb) n
324     obj-build-mer-qt-xr/dist/include/nsCOMPtr.h: No such file or directory.
(gdb) n
499       if (params.allocator) {
(gdb) n
500         ipdlThread = params.allocator->GetThread();
(gdb) n
501         if (!ipdlThread) {
(gdb) n
510       if (ipdlThread && !ipdlThread->IsOnCurrentThread()) {
(gdb) n
532       if (!ipdlThread) {
(gdb) n
540       if (!actor) {
(gdb) n
555       actor->Destroy(params);
(gdb) n
497       nsCOMPtr<nsISerialEventTarget> ipdlThread;
(gdb) n
505           return;
(gdb) c
Continuing.
[...]
So that clears things up: it does go inside the condition and it does deallocate the actor. But the data and actor values are non-null and ultimately in this case because we're executing on the IPDL thread, the actor is destroyed directly.

Let's now try the same thing on ESR 91.
(gdb) break TextureClient.cpp:574
Breakpoint 4 at 0x7ff1148ddc: TextureClient.cpp:574. (2 locations)
(gdb) r
[...]
Thread 7 &quot;GeckoWorkerThre&quot; hit Breakpoint 4, mozilla::layers::
    TextureClient::Destroy (this=this@entry=0x7fc5612680)
    at gfx/layers/client/TextureClient.cpp:587
587         if (actor) {
(gdb) c
Continuing.

Thread 7 &quot;GeckoWorkerThre&quot; hit Breakpoint 4, mozilla::layers::
    TextureClient::Destroy (this=this@entry=0x7fc55b8690)
    at gfx/layers/client/TextureClient.cpp:587
587         if (actor) {
(gdb) c
Continuing.

Thread 7 &quot;GeckoWorkerThre&quot; hit Breakpoint 4, mozilla::layers::
    TextureClient::Destroy (this=this@entry=0x7fc55b6c10)
    at gfx/layers/client/TextureClient.cpp:587
587         if (actor) {
(gdb) p data
$12 = (mozilla::layers::TextureData *) 0x7fc4d79e80
(gdb) p actor
$13 = <optimized out>
(gdb) b DeallocateTextureClient
Breakpoint 5 at 0x7ff1148394: file gfx/layers/client/TextureClient.cpp, line 
    489.
(gdb) c
Continuing.

Thread 7 &quot;GeckoWorkerThre&quot; hit Breakpoint 5, mozilla::layers::
    DeallocateTextureClient (params=...)
    at gfx/layers/client/TextureClient.cpp:489
489     void DeallocateTextureClient(TextureDeallocParams params) {
(gdb) p params
$14 = {data = 0x7fc4d79e80, actor = {mRawPtr = 0x7fc59e6620}, allocator = 
    {mRawPtr = 0x7fc46684e0}, clientDeallocation = false, syncDeallocation = 
    false}
(gdb) n
490       if (!params.actor && !params.data) {
(gdb) n
496       nsCOMPtr<nsISerialEventTarget> ipdlThread;
(gdb) n
[New LWP 5954]
498       if (params.allocator) {
(gdb) n
499         ipdlThread = params.allocator->GetThread();
(gdb) n
[LWP 5954 exited]
[New LWP 6110]
500         if (!ipdlThread) {
(gdb) n
509       if (ipdlThread && !ipdlThread->IsOnCurrentThread()) {
(gdb) n
[LWP 6110 exited]
867     ${PROJECT}/obj-build-mer-qt-xr/dist/include/nsCOMPtr.h: No such file or 
    directory.
(gdb) n
539       if (!actor) {
(gdb) n
548       actor->Destroy(params);
(gdb) n
496       nsCOMPtr<nsISerialEventTarget> ipdlThread;
(gdb) n
mozilla::layers::TextureClient::Destroy (this=this@entry=0x7fc55b6c10)
    at gfx/layers/client/TextureClient.cpp:591
591         DeallocateTextureClient(params);
(gdb) c
Continuing.
[...]
Here we see something similar. The inner condition is entered and the DeallocateTextureClient() method is ultimately called on the same thread. The data and actor values are both non-null.

To return to our original questions, I think this has answered both of them. First we can see that on ESR 78 this is definitely a place where memory is actually being freed. But on the other hand we also see it being freed on ESR 91. That doesn't mean that there isn't a problem here, but it does make it less likely.

Nevertheless there has been a change to this code. The mWorkaroundAnnoyingSharedSurfaceLifetimeIssues flag was removed by upstream. It's possible that this is causing the issue we're experiencing, so I'm going to reverse this and reinsert the removed code. I'm not really expecting this to fix things, but having travelled out into the sticks I now need to check under every stone. I've no choice but to figure this thing out if the WebView is going to get back up and running again.

[...]

Having worked carefully through the code and reintroduced the mWorkaroundAnnoyingSharedSurfaceLifetimeIssues variable and its associated logic, it's disappointing to find it's not fixed the issue. I'm not out of ideas yet though. Tomorrow I'm going to have a go at profiling the application and using specific memory tools (e.g. valgrind) to try to figure out which memory is being allocated but not deallocated. I'm not sure I hold out much hope of success using valgrind given that gecko is so big and messy and suffering from leakage as it is, but you never know.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.

Comments

Uncover Disqus comments