flypig.co.uk

List items

Items from the current list are shown below.

Blog

27 Sep 2023 : Day 42 #
Yesterday it felt like we were getting close, and we were. Today the build hit an important milestone: the linking step.

Although the linking step failed, it is the last step of the first stage ("get the build to pass"), in our three stage process ("1. get the build to pass"; "2. get the browser to run and render"; "3. fix up all the browser functionality").

There are a lot of linker errors, all of which are undefined references, which is exactly to be expected at this point.
391:01.21 toolkit/library/build/libxul.so
396:19.99 aarch64-meego-linux-gnu-ld: ${PROJECT}/obj-build-mer-qt-xr/toolkit/
          library/build/../../../gfx/thebes/gfxQtPlatform.o:
          (.data.rel.ro.local._ZTV13gfxQtPlatform[_ZTV13gfxQtPlatform]+0x48):
          undefined reference to `gfxQtPlatform::CreatePlatformFontList()'
396:19.99 aarch64-meego-linux-gnu-ld: ${PROJECT}/obj-build-mer-qt-xr/toolkit/
          library/build/../../../gfx/thebes/gfxQtPlatform.o:
          (.data.rel.ro.local._ZTV13gfxQtPlatform[_ZTV13gfxQtPlatform]+0x198):
          undefined reference to `gfxQtPlatform::UpdateFontList(bool)'
396:19.99 aarch64-meego-linux-gnu-ld: ${PROJECT}/obj-build-mer-qt-xr/toolkit/
          library/build/../../../dom/geolocation/Geolocation.o:
          in function `nsGeolocationService::Init()':
396:20.00 ${PROJECT}/gecko-dev/dom/geolocation/Geolocation.cpp:493:
          (.text._ZN20nsGeolocationService4InitEv+0x7c): undefined reference to
          `QTMLocationProvider::QTMLocationProvider()'
396:20.00 aarch64-meego-linux-gnu-ld: ${PROJECT}/obj-build-mer-qt-xr/toolkit/
          library/build/../../../widget/qt/GfxInfo.o:
          (.data.rel.ro._ZTVN7mozilla6widget7GfxInfoE[_ZTVN7mozilla6widget7GfxInfoE]+0x38):
          undefined reference to `mozilla::widget::GfxInfo::GetEmbeddedInFirefoxReality(bool*)'
396:20.00 aarch64-meego-linux-gnu-ld: ${PROJECT}/obj-build-mer-qt-xr/toolkit/
          library/build/../../../widget/qt/GfxInfo.o:
          (.data.rel.ro._ZTVN7mozilla6widget7GfxInfoE[_ZTVN7mozilla6widget7GfxInfoE]+0x80):
          undefined reference to `mozilla::widget::GfxInfo::GetTestType(nsTSubstring&)'
[...]
396:20.03 aarch64-meego-linux-gnu-ld: libxul.so: hidden symbol
          `_ZN7mozilla9embedlite19EmbedLiteXulAppInfo19GetFissionAutostartEPb'
          isn't defined
396:20.03 aarch64-meego-linux-gnu-ld: final link failed: bad value
396:20.03 collect2: error: ld returned 1 exit status
The job now is to go through and check each of the references to see why it's not implemented and what it should be doing. In most cases the fix is likely to be creating an implementation of a particular method.

I count 35 of them, but some of them are repeated. Most of them are missing methods, although there are also two missing vtables. Each class has a virtual table (or vtable) which is basically a list of pointers to methods. Most methods are turned into pointers (or at least the linkable equivalent) at compile time. However virtual functions can be changed (overridden) at run-time through class inheritance and polymorphism. For example, depending on what you cast your class to at run-time, a different function may be called. The virtual table handles this list of mutable function pointers.

When the error about an undefined vtable reference comes up it's because there's a virtual method that hasn't been defined. The compiler won't necessarily complain about this because when you're creating a library that may be exactly what you want: the references should get resolved by the code you're linking your library to. But if the linker tries to link some code that uses an undefined virtual method, the error will be triggered.

There are also six errors that say "non-virtual thunk to some method or other". A thunk is like a subroutine: a bit of code called like a function but that doesn't necessarily then return to the caller. In relation to the error here, they're being used to deal with multiple inheritance, to choose which method to actually call: the caller calls the thunk, the thunk then passes the call on to the correct method, and it's the method which returns, rather than the thunk. That's my (limited) understanding, at least.

That's all well and good, but I'm not actually certain what the error is referring to here. I do notice that these thunk errors are all referring to methods that appear as straightforward undefined references further up the list, so maybe these errors will get resolved when we fix the earlier ones. We can figure this out as we go along.

Here are the — in effect — 24 errors that need to be fixed in full.
 1. gfxQtPlatform::CreatePlatformFontList()
 2. gfxQtPlatform::UpdateFontList(bool)
 3. QTMLocationProvider::QTMLocationProvider()
 4. mozilla::widget::GfxInfo::GetEmbeddedInFirefoxReality(bool*)
 5. mozilla::widget::GfxInfo::GetTestType(nsTSubstring&)
 6. mozilla::widget::GfxInfo::GetDrmRenderDevice(nsTSubstring&)
 7. nsXPLookAndFeel::NativeGetInt(mozilla::LookAndFeel::IntID, int&)
 8. nsXPLookAndFeel::NativeGetFloat(mozilla::LookAndFeel::FloatID, float&)
 9. vtable for nsAppShell
10. do_GetBasicNativeThemeDoNotUseDirectly()
11. vtable for nsNativeAppSupportQt
12. mozilla::embedlite::BrowserChildHelper::GetChromeOuterWindowID(unsigned long*)
13. mozilla::embedlite::EmbedLiteXulAppInfo::GetFissionAutostart(bool*)
14. mozilla::embedlite::EmbedLiteXulAppInfo::GetFissionExperimentStatus(nsIXULRuntime::ExperimentStatus*)
15. mozilla::embedlite::EmbedLiteXulAppInfo::GetFissionDecisionStatus(nsIXULRuntime::FissionDecisionStatus*)
16. mozilla::embedlite::EmbedLiteXulAppInfo::GetFissionDecisionStatusString(nsTSubstring&)
17. mozilla::embedlite::EmbedLiteXulAppInfo::GetSessionHistoryInParent(bool*)
18. mozilla::embedlite::EmbedLiteXulAppInfo::GetProcessStartupShortcut(nsTSubstring&)
19. non-virtual thunk to mozilla::embedlite::EmbedLiteXulAppInfo::GetFissionAutostart(bool*)
20. non-virtual thunk to mozilla::embedlite::EmbedLiteXulAppInfo::GetFissionExperimentStatus(nsIXULRuntime::ExperimentStatus*)'
21. non-virtual thunk to mozilla::embedlite::EmbedLiteXulAppInfo::GetFissionDecisionStatus(nsIXULRuntime::FissionDecisionStatus*)
22. non-virtual thunk to mozilla::embedlite::EmbedLiteXulAppInfo::GetFissionDecisionStatusString(nsTSubstring&)'
23. non-virtual thunk to mozilla::embedlite::EmbedLiteXulAppInfo::GetSessionHistoryInParent(bool*)'
24. non-virtual thunk to mozilla::embedlite::EmbedLiteXulAppInfo::GetProcessStartupShortcut(nsTSubstring&)
It's also worth noting that it is possible to run the link step using the partial build process, like this:
make -j1 -C obj-build-mer-qt-xr/toolkit/library/build
This makes things much more tractable, but it's still a lengthy process: just this one link step takes over five minutes to execute.
real    5m11.772s
user    3m27.207s
sys     1m44.486s
That's still a lot shorter than running the incremental build though. It's also not clear whether this will recompile the code changes that I'm going to need to make to get the object files to link. We'll also have to find that out as we go along.

So let's start with the first missing reference to gfxQtPlatform::CreatePlatformFontList(). When I look at the code the underlying issue quickly becomes apparent. The header files are all fine, but the implementation of the method in the gfxQtPlatform.cpp looks like this:
CreatePlatformFontList()
{
    return gfxPlatformFontList::Initialize(new gfxFcPlatformFontList);
}
In my rush to get the implementation in I've missed the class prefix to the method name. It should be like this:
gfxQtPlatform::CreatePlatformFontList()
{
    return gfxPlatformFontList::Initialize(new gfxFcPlatformFontList);
}
If I run the partial build command from earlier it doesn't attempt to recompile the source file; it just comes out with the same errors. However, the source file change I made was in the gfx/thebes directory, so maybe if I do a partial rebuild on that directory first it may give a more positive result?
make -j1 -C obj-build-mer-qt-xr/gfx/thebes
make -j1 -C obj-build-mer-qt-xr/toolkit/library/build
Now the first error has switched from being about CreatePlatformFontList() to being about UpdateFontList(bool). Success! So I now have an idea about how to fix the errors, plus a way to get reasonably fast turnaround on testing them.

Time to work through them all. The second is similar. The third is different though. The QTMLocationProvider.h header file is being included in dom/geolocation/Geolocation.cpp. But it's not clear that the implementation QTMLocationProvider.cpp file is even being build or linked in at all. It's referenced in dom/system/qt/moz.build, but it's not clear where that gets referenced.

It's supposed to be referenced in the dom/system/moz.build file, but for some reason that hasn't happened. It was probably an oversight when I applied the 0002 "Bring back Qt layer" patch. This is the change I should have made, but somehow missed:
diff --git a/dom/system/moz.build b/dom/system/moz.build
index 095a5f098bd2..92910886be20 100644
--- a/dom/system/moz.build
+++ b/dom/system/moz.build
@@ -43,7 +43,9 @@ with Files("tests/*1197901*"):
 
 toolkit = CONFIG['MOZ_WIDGET_TOOLKIT']
 
-if toolkit == 'windows':
+if toolkit == 'qt':
+    DIRS += ['qt']
+elif toolkit == 'windows':
     DIRS += ['windows']
 elif toolkit == 'cocoa':
     DIRS += ['mac']
Unfortunately after making this change partial builds refuse to run. I'll need to do a full build again. I'll try to fix as many of the others as I can before I do that, but now I'm doing it without a way to check the changes.

At least the next three seem straightforward. I accidentally placed the implementations for GetEmbeddedInFirefoxReality(bool*), GetTestType(nsTSubstring&) and GetDrmRenderDevice(nsTSubstring&) in a debug-only portion of code. I've moved them to somewhere more sensible!

The NativeGetInt() error is an odd one. In ESR 78 the nsLookAndFeel::NativeGetInt() method called its equivalent parent method nsXPLookAndFeel::NativeGetInt() and only returned its own value if that parent call returned a failure result.

That approach reflected the code used for other toolkits (e.g. Gtk or Android). That seems to have changed as nsXPLookAndFeel::NativeGetInt() has been removed completely. We can non longer call it and have to do all the work ourselves in our Qt version of nsLookAndFeel::NativeGetInt().

I've done my best to align the Qt version with Android (since this seems to be where the previous values were mirrored from; and it kind of makes sense anyway) as found in widget/android/nsLookAndFeel.cpp and in the process have removed the call to nsXPLookAndFeel::NativeGetInt(). I've done the same for NativeGetFloat() as well.

The two vtable errors are clearly related to the MOC generation that I made changes to earlier. I'm going to come back to those.

Next up we have the missing do_GetBasicNativeThemeDoNotUseDirectly() method. This was introduced since ESR 78 which is why we don't have it. It was, in fact, an entire nsNativeBasicThemeQt.cpp file that was added and when I looked at the Gtk implementation of the nsNativeBasicThemeGTK class it all looked a bit complex. So Id hoped there would be some default non-toolkit-specific implementation for the code to fall back on.

Apparently not. The good news is that now, having been forced to look into it properly, I've checked out the Android implementation. It's much simpler as it just falls straight back to the default Android implementation nsNativeBasicThemeAndroid class. We do have an nsNativeBasicThemeQt class, so we can also get our nsNativeBasicThemeQt to use that as well.

It still involves adding a new file and amending the build files to incorporate it. But we can start from the Android base and just need to make a few changes:
$ sed -e "s/Android/Qt/g" widget/android/nsNativeBasicThemeAndroid.cpp \
    > widget/qt/nsNativeBasicThemeQt.cpp 
$ sed -e "s/Android/Qt/g" widget/android/nsNativeBasicThemeAndroid.h \
    > widget/qt/nsNativeBasicThemeQt.h
$ sed -i -e "s/'nsNativeThemeQt.cpp',/'nsNativeThemeQt.cpp',\n    'nsNativeBasicThemeQt.cpp',/g" \
    widget/qt/moz.build
That should do it.

Next we have BrowserChildHelper::GetChromeOuterWindowID(unsigned long*). The upstream implementation is in BrowserChild but for some reason in the EmbedLite code it's always been part of the BrowserChildHelper class. That's fine, but it means we need to implement it there as well.

The reason I didn't do this before is because it doesn't appear directly in the BrowserChildHelper class definition. Instead it's hidden inside the NS_DECL_NSIBROWSERCHILD macro, defined in the autogenerated nsIBrowserChild.h file. Too many layers of indirection.

I can create a dummy implementation easily but there's a bit of a problem in that the upstream implementation of BrowserChild is a subclass of TabContext, which is where the ChromeOuterWindowID() implementation comes from that we need to get the value to return. It's not clear where this is going to come from given our BrowserChildHelper doesn't inherit it (maybe it should?.. No, I don't think so).

Digging more through the code, it's not even really clear that this ID is particularly important. It seems to get used for determining whether the window supports "Protected Media" whatever that is (DRM?) and whether it supports WebVR. I don't think either are things we need to worry about for Sailfish Browser. EmbedLite does have an OuterWindowID value which is used for a variety of things; it's not clear to me whether that needs to be different to the ChromeOuterWindowID.

Given it doesn't look very important I've added a BrowserChildHelper::GetChromeOuterWindowID() implementation that just returns the OuterWindowID. This is one of those things which will probably need testing in practice to get to the bottom of. For the time being, hopefully this should be enough to get things to build.

Now we have five undefined references to methods in the EmbedLiteXulAppInfo class. These are all coming from the nsIXULRuntime interface autogenerated from xpcom/system/nsIXULRuntime.idl, via the NS_DECL_NSIXULRUNTIME macro. We don't yet have any implementations for them so I guess I'll need to create them. The interface file does have some detailed descriptions for each of the related attributes, which should help.

There's not a lot going on in the EmbedLiteXulAppInfo.h header; all of the method signatures are coming from the macros. The NS_DECL_NSIXULRUNTIME macro contains a lot of other method signatures, but the majority of these are already implemented in the EmbedLiteXulAppInfo.cpp source file. The implementation is a lot busier than the header, although most of the implementations are pretty simple.

The remaining implementations we need to add relate to Project Fission, used for process separation. At present this isn't something EmbedLite supports, which will help to simplify our implementation. There's a lot of upstream implementation related to this in the toolkit/xre/nsAppRunner.cpp file that can be referred to.

For the curious, "XRE" stands for "XUL Runtime Environment", which was the precursor to XULRunner. XULRunner is the code that was originally used to handle bootstrapping of XUL applications (Firefox; Thunderbird) and which also happens to be the build path that generates libxulrunner which is what Sailfish Browser uses. XUL stands for "XML User Interface Language" which is the language the Firefox user interface was originally written in. This has now been entirely replaced by a JavaScript interface, but the terminology still lingers in the source code. As it happens ESR 60 was the last version of the Sailfish Browser that contained legacy XUL components (for media controls, if I recall correctly). XML stands for eXtensible Markup Language, but this is as much acronym spelunking as I'm willing to do. There will be bears at the bottom of this particular cavern... if it has a bottom at all.

I've added all of the methods and have them generating "unimplemented" return codes. That should do the trick for now.

The only things remaining are now the two missing vtables.

I'm pretty sure the relevant issues are the following:
commit cccc969f3668b5696bc1cb59b885b9c983a0f4c6
Author: David Llewellyn-Jones 
Date:   Fri Sep 22 22:22:12 2023 +0100

    Remove reference to moc_nsAppShell.cpp
    
    Removes moc_nsAppShell.cpp from the moz.build file to allow the build to
    proceed.

commit 4626f98315b8bf30878a765dfd0de792bbee9e97
Author: David Llewellyn-Jones 
Date:   Thu Sep 21 08:44:37 2023 +0100

    Remove build reference to moc_nsNativeAppSupportQt.cpp
    
    This prevents the build system trying to compile
    moc_nsNativeAppSupportQt.cpp, which doesn't exist.
    
    This change relates to patch 0002 "Bring back Qt Layer" which introduced
    the line, so this change should be merged into that patch.
I've manually reverted these. Maybe this, combined with some other changes I've made elsewhere, will mean these go through now (I'm not convinced, but let's see).

Time to run the build.

Oh, that was a short one. It seems the gecko build system is an alphabetical pedant.
 1:12.91     ['mozbuild.util.UnsortedError: An attempt was made to add an 
             unsorted sequence to a list. The incoming list is unsorted starting
             at element 5. We expected "nsNativeBasicThemeQt.cpp" but got
             "nsNativeThemeQt.cpp"\n']
I have to be more careful. Filenames reordered, let's try that again.

[...]

Sadly the MOC errors are back. There must be something to this that I'm missing.
78:59.83 make[4]: *** No rule to make target 'moc_QTMLocationProvider.cpp',
         needed by 'moc_QTMLocationProvider.o'.  Stop.
78:59.83 make[3]: *** [${PROJECT}/gecko-dev/config/recurse.mk:72:
         dom/system/qt/target-objects] Error 2
78:59.83 make[2]: *** [${PROJECT}/gecko-dev/config/recurse.mk:34: compile] Error 2
It is possible there is something more fundamental going on with the build process. I notice that patch 0015 reverts changes that removed Qt-related rules to the build process. Could it be that this is the problem?

I apply the patch and it goes first time:
$ git am ../rpm/0015-Revert-Bug-1567888-remove-unneeded-QT-related-rules-.patch
Applying: Revert "Bug 1567888 - remove unneeded QT-related rules and configure
  bits; r=nalexa
And kick the build off again. Let's see.

Sadly getting a result from this is going to take until morning. We're getting there, slowly.

It was a bit of a long one today. Well done if you reached this far!

If you want to read even more of my Gecko adventures, you can find them in my Gecko Dev Diary.

Comments

Uncover Disqus comments