Page MenuHomePhabricator

Create acceptance tests for IABot
Closed, ResolvedPublic3 Estimated Story Points

Description

In preparation for T140377, we should come up with a suite of test cases for InternetArchiveBot. This would basically be a long list of WikiText strings that include various citation templates, raw external links, and various well-formed and malformed versions thereof.

Event Timeline

DannyH set the point value for this task to 3.Jul 14 2016, 5:34 PM
DannyH moved this task from New & TBD Tickets to Up Next (June 3-21) on the Community-Tech board.
kaldari renamed this task from Create a test suite for IABot to Create a WikiText parsing test suite for IABot.Jul 21 2016, 7:46 PM
kaldari added a subscriber: Cyberpower678.

Added some very basic acceptance tests here:
https://github.com/wikimedia/Cyberbot_II/commit/d756c30466948fbf3ffe9f33a073de0628a97adf

We need to write a script to run these tests.

MusikAnimal renamed this task from Create a WikiText parsing test suite for IABot to Create acceptance tests for IABot.Sep 1 2016, 9:04 PM

Pull request at https://github.com/cyberpower678/Cyberbot_II/pull/32/files

This is a work in progress in the sense that we need better test cases. I wasn't able to get the bot to run without {{dead link}} templates, so there's that, and we should go through reported bugs in the past and try to add test cases for those.

In order for the acceptance tests to work, you also need have your deadlink.config.local.inc.php set up properly by setting $debug to true, the $debugPage to the appropriate page (see comment above testPages()), and set $debugStyle to 'test'. $testMode despite the name should be false. I will document this.

With the most recent version of IABot, it's (properly) updating the date parameter of the {{wayback}} template, which comes from the Wayback API. This means we need some complex system of regex to ensure these are not compared in the acceptance tests. We should triage it's importance, or perhaps omit testing of wayback templates.

@MusikAnimal: Don't the citation templates have a similar issue with the archivedate parameter?

@MusikAnimal: Don't the citation templates have a similar issue with the archivedate parameter?

I think the archivedate is the date of when the archive was made, so if we are given an accessdate (when the link was originally accessed) the archive date should be consistent. Even when no accessdate was given, I'm seeing the same archive date is being chosen. For the "statistics.sk" examples it's going with a archive from 2007. Perhaps that was the most recent archive available, not sure what the logic is. But the good news is we just want to test the parser, so if we know certain URLs always have the same archive date we can use those and fiddle with the syntax to cover the scenarios we want.

OK, I think we can live without testing the Wayback templates for now.

I ended up striping out dates from the source and result, which effectively gets around the issue of them being different for each run. It's true some dates should be retained (accessdate for instance) and we are unable to test that, but overall we still have pretty good coverage. I've gone through all the bug reports on @Cyberpower678's talk page from July onward and have made test cases for them.

PR ready at https://github.com/cyberpower678/Cyberbot_II/pull/32

Code has been merged.