flypig.co.uk

List items

Items from the current list are shown below.

Gecko

14 Mar 2024 : Day 185 #
By the time I'd finished writing my diary entry yesterday I was pretty tired; my mind wasn't entirely with it. But it wasn't just the need for sleep that was causing me confusion. I was also confused as to why the HasEglImageExtensions() was returning false on ESR 91 while HasExtensions() — which is essentially the same functionality — was returning true.

A good night's sleep hasn't helped answer this question unfortunately. My manual inspection of the code coupled with output from the debugger suggested that HasEglImageExtensions() should have been returning true.

What I'd really like to see is explicit output for each of the items in the condition to figure out which is returning false. The debugger on its own won't be any further help with this as there are too many steps optimised out. But if I expand the code a bit, rebuild and redeploy, then I may be able to get a clearer picture. So at least that's a clear path for today.

The first step is to make some changes to the code. I've added in variables to store return values for each of the four flags, all marked as volatile in the hope this will prevent the compiler from optimising them away. Then I print them all out. In practice I don't really care whether they actually get printed out or not, since my plan is to inspect them using the debugger. But I need to do something with them; printing them out is as good as anything.
static bool HasEglImageExtensions(const GLContextEGL& gl) {
  const auto& egl = *(gl.mEgl);

  volatile bool imagebase = egl.HasKHRImageBase();
  volatile bool tex2D = egl.IsExtensionSupported(
    EGLExtension::KHR_gl_texture_2D_image);
  volatile bool external = gl.IsExtensionSupported(
    GLContext::OES_EGL_image_external);
  volatile bool image = gl.IsExtensionSupported(GLContext::OES_EGL_image);

  printf_stderr("RENDER: egl HasKHRImageBase: %d\n", imagebase);
  printf_stderr("RENDER: egl KHR_gl_texture_2D_image: %d\n", tex2D);
  printf_stderr("RENDER: gl OES_EGL_image_external: %d\n", external);
  printf_stderr("RENDER: gl OES_EGL_image: %d\n", image);

  return egl.HasKHRImageBase() &&
         egl.IsExtensionSupported(EGLExtension::KHR_gl_texture_2D_image) &&
         (gl.IsExtensionSupported(GLContext::OES_EGL_image_external) ||
          gl.IsExtensionSupported(GLContext::OES_EGL_image));
}
Now I've set it building. As I write this I'm on a different mode of transport: travelling by bus. It's surprising that using a laptop on a bus feels far more socially awkward compared to using a laptop on a train. It's true the ride is more bumpy and the space more cramped, but I still find it odd that I don't see other people doing it. Everyone is on their phones; nobody (apart from me) ever seems to have a laptop out.

On executing the updated code I'm surprised to discover that it does actually output to the console. And the results aren't what I was expected. Well, not exactly.
RENDER: egl HasKHRImageBase: 1
RENDER: egl KHR_gl_texture_2D_image: 1
RENDER: gl OES_EGL_image_external: 1
RENDER: gl OES_EGL_image: 1
They're all coming back true, so I must have been mistaken about why the SurfaceFactory_EGLImage::Create() method is exiting early. I've therefore annotated the Create() method with some more debug output in the hope this will shed more light on things. See the added printf_stderr() calls in the code below.
  if (HasEglImageExtensions(*gle)) {
    printf_stderr("RENDER: !HasEglImageExtensions()\n");
    return ret;
  }

  MOZ_ALWAYS_TRUE(prodGL->MakeCurrent());
  GLuint prodTex = CreateTextureForOffscreen(prodGL, formats, size);
  if (!prodTex) {
    printf_stderr("RENDER: !prodTex\n");
    return ret;
  }

  EGLClientBuffer buffer =
      reinterpret_cast<EGLClientBuffer>(uintptr_t(prodTex));
  EGLImage image = egl->fCreateImage(context,
                                     LOCAL_EGL_GL_TEXTURE_2D, buffer, nullptr);
  if (!image) {
    prodGL->fDeleteTextures(1, &prodTex);
    printf_stderr("RENDER: !image\n");
    return ret;
  }

  ret.reset(new SharedSurface_EGLImage(prodGL, size, hasAlpha, formats, prodTex,
                                       image));
  printf_stderr("RENDER: returning normally\n");
  return ret;
None of these should be necessary: it should be possible to extract all of this execution flow from the debugger. But for some reason the conclusion I came to from using the debugger doesn't make sense based on the values HasEglImageExtensions() is returning. Maybe I made a mistake somewhere. Nevertheless, this approach should hopefully give an answer to the question we want to know.

Here's the output I get. And as soon as I see this output I realise the stupid mistake I've made.
RENDER: egl HasKHRImageBase: 1
RENDER: egl KHR_gl_texture_2D_image: 1
RENDER: gl OES_EGL_image_external: 1
RENDER: gl OES_EGL_image: 1
RENDER: !HasEglImageExtensions()
So did you notice the stupid mistake? Here's the relevant ESR 78 code:
  if (!HasExtensions(egl, prodGL)) {
    return ret;
  }
And here — oh dear — is what I replaced it with:
  if (HasEglImageExtensions(*gle)) {
    return ret;
  }
See what's missing? It's the crucial negation of the condition. Oh boy. I can see why I made this mistake: it's because elsewhere in the same file the condition is used — correctly — with the opposite effect, like this:
  if (HasEglImageExtensions(*gle)) {
    ret.reset(new ptrT({prodGL, SharedSurfaceType::Basic,
      layers::TextureType::Unknown, true}, caps, allocator, flags, context));
  }
In one case the method should return early if the extension check fails; in the other case it should reset the returned texture if the extension check succeeds.

I feel more than a little bit silly. But it's okay, the important point is that it's fixed now. I've added in that crucial missing ! and this should now work as expected:
  if (!HasEglImageExtensions(*gle)) {
    return ret;
  }
I'm not expecting this change to miraculously fix the entire rendering pipeline, but it should certainly help.

On executing the app and with this change in place we still don't unfortunately get a render. In fact the app now seems to hog CPU cycles and make my phone unresponsive. I have a feeling this is a memory leak, but a bit more digging will help confirm it (or otherwise).

If this change has triggered a memory leak, it's likely because the surface being created by SurfaceFactory_EGLImage::Create() is never being freed. Creating a new 1080 by 2520 texture each frame will start to eat up memory pretty fast. So an obvious next step is to find out where it's being freed on ESR 78 and establish whether the same thing is happening on ESR 91 or not.

Unfortunately it turns out to be harder to find than I'd expected. There are quite a few methods that are used for deleting textures or memory associated with them. I've tried adding breakpoints to all of the following:
  1. SharedSurface_EGLImage::~SharedSurface_EGLImage()
  2. GLContext::Readback()
  3. GLContext::fDeleteFramebuffers()
  4. GLContext::raw_fDeleteTextures()
  5. SharedSurfaceTextureClient::~SharedSurfaceTextureClient()
And they're either not hit in ESR 78, or they hit in both ESR 78 and ESR 91. So I've yet to find the smoking gun. I think I've reached the limit for my day today though, so the investigation will have to continue in the morning.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.

Comments

Uncover Disqus comments