Links, Dead Links and Missing Resources
We at Griffmonsters Great Walks make extensive use of the humble HTML anchor element to link to external resources. These links provide additional information and context for each walk, as well as references for facts and historical details mentioned on the site.
However, external websites are often transient by nature. A valuable resource may be available one day, then disappear a month, a year, or a decade later. In the best-case scenario, a 301 HTTP response redirects to an updated URI. In many cases, though, links instead return a 404 response, or worse, a 5XX server error.
This has naturally led to regular checks of linked resources using both online tools and open-source software. Traditionally, this has been a time-consuming and labour-intensive task, usually carried out during the winter months when short daylight hours and poor weather make walking less practical.
This year, however, the process has improved considerably. A custom Python script was created to process a generated list of all pages and validate their links. This proved that some previously used methods were missing a significant number of broken pages and links.
In addition to this full-site testing, a batch script is now generated in the build pipeline. It takes the source data and transforms it into blog HTML while also extracting a list of external URIs. These are then tested using Curl, with the results written to a text report. This allows targeted testing whenever a new walk is added or an older page is updated.
This year’s results have been particularly thorny, with a great deal of time spent finding suitable replacements for missing resources. Fortunately, in many cases, the Archive.org Wayback Machine has been able to provide archived copies of missing pages.
That is not always possible. Academic websites and commercial organisations — especially those hosting PDF documents — are often not archived reliably. In these cases, documents may have moved to a new URI without redirection, been removed entirely, or been superseded by newer material. This is especially problematic when walk notes rely on quoted or referenced material from those sources.
Another area affected by dead links are pubs. In recent years, many have closed permanently or been converted into housing or other uses. The scale of closures is substantial and surprising. In these cases, archived pub websites are often of limited practical value, though they may still hold historical interest. Where relevant, notes have been added to indicate closures, and links to CAMRA pages are included where pub history information is available.
Transport links are similarly transient. Routes are withdrawn, services change, and operators come and go. For anyone planning a walk from the site, the best option is to consult Traveline directly. Transport information on individual walk pages should therefore be treated primarily as historical reference material. More improvements in this area may follow.
In conclusion, the Griffmonsters Great Walks site has now been updated, and most links should be functioning correctly apart from a small number of archived pages that still require manual attention. Rhodes Great Walks is next in line for validation, though it is expected to contain fewer dead links overall.
0 comments:
Post a Comment