flypig.co.uk

List items

Items from the current list are shown below.

Gecko

17 Jul 2024 : Day 291 #
Yesterday we were looking at a crash in the hyphenator, triggered by visiting a page on amazon.de. It makes for an interesting bug because it's the first time — that I can recall at least — that we've been hit with an issue related to the Rust code, other than as part of the build pipeline. I'm not really certain what the problem is, but given the comments we saw yesterday, it could be either an overflow in the output buffer, or the block of memory represented by self.0 being invalid. According to the header, either of these could cause a panic.

It looks like debugging the Rust code may turn out to be tricky, so in the first instance I've gone about disabling one of the calls further down the call stack so that the crashing code is never called.

This is a precursor to trying to establish the real solution. At this stage I just want to find out whether this fixes the issue or not. My fake fix is to early return from the nsHyphenator::HyphenateWord() method like this:
   AutoTArray<uint8_t, 200> hyphenValues;
   hyphenValues.SetLength(utf8.Length());
 
+  return;
   int32_t result = mDict.match(
       [&](const void*& ptr) {
         return mapped_hyph_find_hyphen_values_raw(
             static_cast<const uint8_t*>(ptr), mDictSize, utf8.BeginReading(),
             utf8.Length(), hyphenValues.Elements(), hyphenValues.Length());
       },
You may notice that I've made the change to the C++ code rather than the Rust code. That's mostly for convenience because I don't actually know whether the partial build process will notice and rebuild changes in the Rust code. I'm sure I'll find out, but right now I'm just trying to find an answer as quickly as possible.

Other than the panic theory, there are other reasons why the code may be crashing. It could simply be that the stack is being exhausted. I must admit that I'm not convinced by this theory: the backtrace doesn't look corrupted and although the stack is very large (177 frames) it's not uncommon for the gecko stack to get quite large as the DOM is traversed. The change I've made now should at least help shed some light on this: if the crash persists, then the exhausted-stack theory becomes more persuasive.

The partial build completed with this small change and I've copied the library over to my phone. Now will it crash?

No; there's no crash. Presumably hyphenation is now no longer working correctly, but it's hard to tell that just from the way the page is laid out. Nevertheless, it looks like we've pinned down the location of the problem, even if it's not clear why it's happening. I thought I'd also have a go at increasing the size of the buffer for the return values. I've extended it by an additional 1024 bytes:
   AutoTArray<uint8_t, 200> hyphenValues;
-  hyphenValues.SetLength(utf8.Length());
+  hyphenValues.SetLength(utf8.Length() + 1024);
   int32_t result = mDict.match(
       [&](const void*& ptr) {
         return mapped_hyph_find_hyphen_values_raw(
             static_cast<const uint8_t*>(ptr), mDictSize, utf8.BeginReading(),
             utf8.Length(), hyphenValues.Elements(), hyphenValues.Length());
       },
But with this change the crash still occurs. It's possible I didn't choose a large enough amount to add to the buffer, but this seems unlikely to me.

As the next test I've amended the Rust code so that the find_hyphen_values() method in mapped_hyph/src/lib.rs returns early.
     pub fn find_hyphen_values(&self, word: &str, values: &mut [u8]) -> isize {
         assert!(values.len() >= word.len());
         values.iter_mut().for_each(|x| *x = 0);
         let top_level = self.level(0);
         let (lh_min, rh_min, clh_min, crh_min) = top_level.word_boundary_mins(
    );
         if word.len() < lh_min + rh_min {
             return 0;
         }
+        if word.len() >= lh_min + rh_min {
+            return 0;
+        }
This may look a little contrived: the two conditions together are equivalent to a guaranteed return. But I didn't want to just add a return here because the Rust compiler is quite thorough. If it had noticed the guaranteed early return it might have also complained about unused functionality later in the method.

I'm interested to know whether this will be enough to fix the crash. If it is, I can play around with the code a bit more to try to isolate the error. However I'm not even certain whether the partial build is actually rebuilding the Rust code or not. I think not, but this test should help determine this with more certainty.

Having run the partial build and copied over the binary, I find that the crash still happens on the page. That makes me think that the partial build wasn't enough to capture the changes to the Rust code. So my next step will be to do a full rebuild and have a go with the output from that once it's ready in the morning.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.

Comments

Uncover Disqus comments