Gecko-dev Diary
Between August 2023 and September 2024 I upgrading the Sailfish OS browser from Gecko version ESR 78 to ESR 91, writing a daily blog as I went along. This page catalogues my progress, alongside the other browser-related topics I've looked in to since.
Latest code changes are in the gecko-dev sailfishos-esr91 branch.
There is an index of all posts in case you want to jump to a particular day.
Gecko
5 most recent items
18 Jul 2024 : Day 292 #
The build I started yesterday had completed by the time I got up this morning. I'm still working on avoiding a crash in the hyphenation code with the latest build by adding an early return to the find_hyphen_values() method of the Hyphenator class in mapped_hyph/src/lib.rs.
This time our test page loads and renders without crashing. This tells us two things. First that the partial build wasn't covering the Rust code, as I suspected. Second that the problem is happening in the Rust code itself. This doesn't preclude the possibility that the error is originating outside the Rust code, but it does at least give us something to focus on.
The header of the find_hyphen_values() method mentions a couple of situations in which a panic can be triggered. It's worth taking a look at these since a panic could be responsible for bringing down the browser. One obvious situation in which a panic will be triggered from the Rust code is as a result of the following line, the first line of the method:
So I've made an additional change to check that the hyphenator is invalid when the call to hyphenate a string is made, like this:
Here's the method inside amInstallTrigger.jsm where this error is being generated:
So I'm going to try applying the patch; maybe that will fix this error?
The hyphenation build has gone through and I've copied the packages over to my phone. That means I can now set this change building instead. In the meantime let's return to the hyphenation crash. I've tried a few different arrangements of things now, rebuilding the code and transferring the result over to my phone. What I've discovered is that adding a condition to check the validity of the hyphenator made no difference to the crash; so I've removed that.
However, rather unexpectedly, after converting the assert that tests whether the values return buffer is at least as large as the word string into a condition with an early return, the site now no longer crashes the browser.
Half of me is surprised, the other half isn't. I'm not surprised because this was clearly signposted code designed to cause a panic. On the other hand, I'm surprised because this shouldn't be happening; and also increasing the output buffer size didn't appear to help. So there may be something deeper going on here, but for now I'm going to take the win.
I feel satisfied with these changes today. I've been able to fix a JavaScript error and a crash bug. While I can't claim any credit for finding a solution (Raine, who authored the patch, deserves all of that), it's nonetheless good to be making progress.
Tomorrow I'll need to pick another task from the issue tracker to take a look at. Although, in fact, I think I'll do some housekeeping and try to catch up on some of the work others have been doing. Now that I look at the issue tracker I see some exciting developments there that I really need to examine.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
This time our test page loads and renders without crashing. This tells us two things. First that the partial build wasn't covering the Rust code, as I suspected. Second that the problem is happening in the Rust code itself. This doesn't preclude the possibility that the error is originating outside the Rust code, but it does at least give us something to focus on.
The header of the find_hyphen_values() method mentions a couple of situations in which a panic can be triggered. It's worth taking a look at these since a panic could be responsible for bringing down the browser. One obvious situation in which a panic will be triggered from the Rust code is as a result of the following line, the first line of the method:
assert!(values.len() >= word.len());That's not a debug assert, so it'll apply in the release build as well. I can't just remove it, since although that might stop the crash happening at that point in the code, it'll leave the browser in an unsafe state. The assert is there for a reason after all. So instead I've turned it into a condition with an early return to avoid the possibility of a panic:
$ git diff -U1 diff --git a/third_party/rust/mapped_hyph/src/lib.rs b/third_party/rust/ mapped_hyph/src/lib.rs index 848c93d25790..c205bb09359c 100644 --- a/third_party/rust/mapped_hyph/src/lib.rs +++ b/third_party/rust/mapped_hyph/src/lib.rs @@ -477,3 +477,6 @@ impl Hyphenator<'_> { pub fn find_hyphen_values(&self, word: &str, values: &mut [u8]) -> isize { - assert!(values.len() >= word.len()); + if (values.len() < word.len()) { + return 0; + } + //assert!(values.len() >= word.len()); values.iter_mut().for_each(|x| *x = 0);However, as I read through the code and look again at the backtrace, this looks highly unlikely to be the problem. Although we don't know the exact line, it's the level() method that's at the top of the stack when the crash occurs. So actually we know the crash is happening there. Here's what the code there looks like:
fn level(&self, i: usize) -> Level { let offset = u32::from_le_bytes(*array_ref!(self.0, FILE_HEADER_SIZE + 4 * i, 4)) as usize; let limit = if i == self.num_levels() - 1 { self.0.len() } else { u32::from_le_bytes(*array_ref!(self.0, FILE_HEADER_SIZE + 4 * i + 4, 4)) as usize }; debug_assert!(offset + LEVEL_HEADER_SIZE <= limit && limit <= self.0.len()); debug_assert_eq!(offset & 3, 0); debug_assert_eq!(limit & 3, 0); Level::new(&self.0[offset .. limit]) }It's challenging to understand what might be going wrong here. It looks to me like this is where the requirement for self.0 to be valid (mentioned as the other reason for a potential panic) comes into play. I'm now wondering whether the problem might be an invalid hyphenator file being loaded in, or it being invalid for some other reason.
So I've made an additional change to check that the hyphenator is invalid when the call to hyphenate a string is made, like this:
$ git diff -U1 diff --git a/intl/hyphenation/glue/nsHyphenator.cpp b/intl/hyphenation/glue/ nsHyphenator.cpp index c3b377767e3f..974596b17115 100644 --- a/intl/hyphenation/glue/nsHyphenator.cpp +++ b/intl/hyphenation/glue/nsHyphenator.cpp @@ -348,2 +348,6 @@ nsresult nsHyphenator::Hyphenate(const nsAString& aString, + if (!IsValid()) { + return NS_ERROR_FAILURE; + } + bool inWord = false;I've set it rebuilding with these changes, which will take at least a few hours to complete. In the meantime, I noticed the following error when loading the test page:
JavaScript error: resource://gre/modules/amInstallTrigger.jsm, line 43: TypeError: this.mm is nullThat error seems to be coming from the gecko code itself, rather than some JavaScript loaded by the page, so this is also something it would be worth fixing. I'm sure it's unrelated to the crash, but I may as well look into it while the build completes.
Here's the method inside amInstallTrigger.jsm where this error is being generated:
function RemoteMediator(window) { this._windowID = window.windowGlobalChild.innerWindowId; this.mm = window.docShell.messageManager; this.mm.addWeakMessageListener(MSG_INSTALL_CALLBACK, this); this._lastCallbackID = 0; this._callbacks = new Map(); }It looks likely that the call to get window.docShell.messageManager is returning null. Here's the code from nsDocShell which is supposed to return it:
NS_IMETHODIMP nsDocShell::GetMessageManager(ContentFrameMessageManager** aMessageManager) { RefPtr<ContentFrameMessageManager> mm; if (RefPtr<BrowserChild> browserChild = BrowserChild::GetFrom(this)) { mm = browserChild->GetMessageManager(); } else if (nsPIDOMWindowOuter* win = GetWindow()) { mm = win->GetMessageManager(); } mm.forget(aMessageManager); return NS_OK; }So the error goes up, potentially, to EmbedLite code. It looks like this might be related to patch 0043, which patches this method and which hasn't get been applied. The patch has the title "Get ContentFrameMessageManager via nsIDocShellTreeOwner" with the following description:
nsDocShellTreeOwner has a reference to WebBrowserChrome which in turn can return ContentFrameMessageManager from BrowserChildHelper.
So I'm going to try applying the patch; maybe that will fix this error?
$ git am --3way ../rpm/ 0043-sailfishos-docshell-Get-ContentFrameMessageManager-v.patch Applying: Get ContentFrameMessageManager via nsIDocShellTreeOwner. JB#55336 OMP#55336Nice: the patch applies cleanly! Now the GetMessageManager() method looks like this:
NS_IMETHODIMP nsDocShell::GetMessageManager(ContentFrameMessageManager** aMessageManager) { RefPtr<ContentFrameMessageManager> mm; nsCOMPtr<nsIBrowserChild> bc; if (mTreeOwner) { bc = do_GetInterface(mTreeOwner); } if (bc) { bc->GetMessageManager(getter_AddRefs(mm)); } else if (RefPtr<BrowserChild> browserChild = BrowserChild::GetFrom(this)) { mm = browserChild->GetMessageManager(); } else if (nsPIDOMWindowOuter* win = GetWindow()) { mm = win->GetMessageManager(); } mm.forget(aMessageManager); return NS_OK; }As we can see, an extra clause has been added to the condition, to catch the case where the mTreeOwner variable has an nsIBrowserChild XPCOM interface. If it does then the message manager will be extracted from there.
The hyphenation build has gone through and I've copied the packages over to my phone. That means I can now set this change building instead. In the meantime let's return to the hyphenation crash. I've tried a few different arrangements of things now, rebuilding the code and transferring the result over to my phone. What I've discovered is that adding a condition to check the validity of the hyphenator made no difference to the crash; so I've removed that.
However, rather unexpectedly, after converting the assert that tests whether the values return buffer is at least as large as the word string into a condition with an early return, the site now no longer crashes the browser.
Half of me is surprised, the other half isn't. I'm not surprised because this was clearly signposted code designed to cause a panic. On the other hand, I'm surprised because this shouldn't be happening; and also increasing the output buffer size didn't appear to help. So there may be something deeper going on here, but for now I'm going to take the win.
I feel satisfied with these changes today. I've been able to fix a JavaScript error and a crash bug. While I can't claim any credit for finding a solution (Raine, who authored the patch, deserves all of that), it's nonetheless good to be making progress.
Tomorrow I'll need to pick another task from the issue tracker to take a look at. Although, in fact, I think I'll do some housekeeping and try to catch up on some of the work others have been doing. Now that I look at the issue tracker I see some exciting developments there that I really need to examine.
If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.
Comments
Uncover Disqus comments