flypig.co.uk

List items

Items from the current list are shown below.

Blog

13 Jan 2024 : Day 137 #
I'm still trying to get myself a copy of the DuckDuckGo website. To recap the latest situation, I still want a copy I can serve from my personal server, but which triggers the same errors as when accessing DuckDuckGo from the real website using ESR 91.

I feel confident that yesterday I got myself a full verbatim copy of the site. The catch is that it's woven into the logging output from ESR 91. My task today is to disentangle it.

The log file is 1.3 MiB of data. That's not crazy quantities, but it would work better if the text files used line-wrapping, rather than including massively long lines that seemingly go on for ever... my text editor really doesn't like having to deal with them and just hangs up for tens of minutes at a time.

[...]

Nevertheless, and although it took an age, I have managed to get it all done. The file structure is very similar to the one I showed yesterday:
$ tree ddg8/
ddg8/
├── assets
│   ├── logo_homepage.alt.v109.svg
│   ├── logo_homepage.normal.v109.svg
│   └── onboarding
│       ├── arrow.svg
│       └── bathroomguy
│           ├── 1-monster-v2--no-animation.svg
│           ├── 2-ghost-v2.svg
│           ├── 3-bathtub-v2--no-animation.svg
│           ├── 4-alpinist-v2.svg
│           └── teaser-2@2x.png
├── dist
│   ├── b.9e45618547aaad15b744.js
│   ├── d.01ff355796b8725c8dad.js
│   ├── h.2d6522d4f29f5b108aed.js
│   ├── lib
│   │   └── l.656ceb337d61e6c36064.js
│   ├── o.2988a52fdfb14b7eff16.css
│   ├── p.f5b58579149e7488209f.js
│   ├── s.b49dcfb5899df4f917ee.css
│   ├── ti.b07012e30f6971ff71d3.js
│   ├── tl.3db2557c9f124f3ebf92.js
│   └── util
│       └── u.a3c3a6d4d7bf9244744d.js
├── font
│   ├── ProximaNova-ExtraBold-webfont.woff2
│   ├── ProximaNova-Reg-webfont.woff2
│   └── ProximaNova-Sbold-webfont.woff2
├── index.html
├── locale
│   └── en_GB
│       └── duckduckgo85.js
└── post3.html

9 directories, 24 files
And not just that, but in fact the contents are very similar overall:
$ diff -q ddg ddg8/
Only in ddg: 3.html
Common subdirectories: ddg/assets and ddg8/assets
Common subdirectories: ddg/dist and ddg8/dist
Common subdirectories: ddg/font and ddg8/font
Files ddg/index.html and ddg8/index.html differ
Common subdirectories: ddg/locale and ddg8/locale
Only in ddg8/: post3.html
Although the index.html file is quite different to the equivalent one I downloaded earlier, it is similar to a previous one that was downloaded using the python script:
$ diff ddg5/index.html ddg8/index.html 
2,7c2,7
< <!--[if IEMobile 7 ]> <html lang="en-US" class="no-js iem7"> <![endif]-->
< <!--[if lt IE 7]> <html class="ie6 lt-ie10 lt-ie9 lt-ie8 lt-ie7 no-js"
  lang="en-US"> <![endif]-->
< <!--[if IE 7]>    <html class="ie7 lt-ie10 lt-ie9 lt-ie8 no-js" lang="en-US">
  <![endif]-->
< <!--[if IE 8]>    <html class="ie8 lt-ie10 lt-ie9 no-js" lang="en-US">
  <![endif]-->
< <!--[if IE 9]>    <html class="ie9 lt-ie10 no-js" lang="en-US"> <![endif]-->
< <!--[if (gte IE 9)|(gt IEMobile 7)|!(IEMobile)|!(IE)]><!--><html class="no-js"
  lang="en-US" data-ntp-features="tracker-stats-widget:off"><!--<![endif]-->
---
> <!--[if IEMobile 7 ]> <html lang="en-GB" class="no-js iem7"> <![endif]-->
> <!--[if lt IE 7]> <html class="ie6 lt-ie10 lt-ie9 lt-ie8 lt-ie7 no-js"
  lang="en-GB"> <![endif]-->
> <!--[if IE 7]>    <html class="ie7 lt-ie10 lt-ie9 lt-ie8 no-js" lang="en-GB">
  <![endif]-->
> <!--[if IE 8]>    <html class="ie8 lt-ie10 lt-ie9 no-js" lang="en-GB">
  <![endif]-->
> <!--[if IE 9]>    <html class="ie9 lt-ie10 no-js" lang="en-GB"> <![endif]-->
> <!--[if (gte IE 9)|(gt IEMobile 7)|!(IEMobile)|!(IE)]><!--><html class="no-js"
  lang="en-GB" data-ntp-features="tracker-stats-widget:off"><!--<![endif]-->
48,49c48,49
<       <title>DuckDuckGo — Privacy, simplified.</title>
< <meta property="og:title" content="DuckDuckGo — Privacy, simplified." />
---
>       <title>DuckDuckGo â Privacy, simplified.</title>
> <meta property="og:title" content="DuckDuckGo â Privacy, simplified." />
64c64
< <script type="text/javascript" src="/locale/en_US/duckduckgo14.js"
  onerror="handleScriptError(this)"></script>
---
> <script type="text/javascript" src="/locale/en_GB/duckduckgo85.js"
  onerror="handleScriptError(this)"></script>
107c107
<                                               <!-- en_US All Settings -->
---
>                                               <!-- en_GB All Settings -->
146a147,148
> 
> 
As you can see, the only real difference is the switch from en-US to en-GB, a one-character difference to the title of the page and the name of the locale file.

The result is also the same when viewing the page with either ESR 78 or the desktop browser: just a blank page.

Once again we find ourselves in an unsatisfactory position. But I will persevere and we will get to the bottom of this!

The next step is to check the network output from opening the page in the browser. And there's something important in there! There are many entries that look a bit like this:
[ Request details ------------------------------------------- ]
    Request: GET status: 404 Not Found
    URL: https://www.flypig.co.uk/dist/o.2988a52fdfb14b7eff16.css
And now the penny drops: the page is expecting to be in the root of the domain. So while the location it's expecting to find the file is this:
https://www.flypig.co.uk/dist/o.2988a52fdfb14b7eff16.css
I've instead been storing the file in this location:
https://www.flypig.co.uk/tests/ddg8/dist/o.2988a52fdfb14b7eff16.css
Checking inside the index.html file the reason is clear. The paths are being given as absolute paths from the root of the domain, with a preceding slash, like this:
<link rel="stylesheet" href="/dist/o.2988a52fdfb14b7eff16.css" type="text/css">
That / in front of the dist is causing all the trouble. It frustrates me that I didn't notice this before. But at least now I have something clear to fix. That'll be my task for tomorrow. Thankfully it should be really easy to fix. I feel a bit silly now.

If you'd like to read any of my other gecko diary entries, they're all available on my Gecko-dev Diary page.

Comments

Uncover Disqus comments