Page MenuHomePhabricator

Add articles with url-reserved characters to Cassandra testing data
Closed, ResolvedPublic

Description

In order to test proper handling of wiki articles with url-reserved characters in their titles, we need at least one such article in the Cassandra test data. We currently have no such data.

This blocks completion of T299735: AQS 2.0: Pageviews: Implement Integration Tests , specifically implementation of the test "character encoding in article titles is happily handled (should handle per article queries with encoded characters)"

If it is helpful, here is a list of enwiki articles with slashes in their title.

It would be good to test more than one encoded character. It is not necessary to test every possibility, as that would be a seriously pathological article title. For comparison, the existing production AQS service uses an article with the (decoded) title "dash - space : colon % percent / slash". No such actual article exists. The existing service does not use actual data from production for testing. Instead, it inserts test data directly from within the tests. We do not necessarily have to have testing data for an article of that same name, but coverage of the same set of special characters is desirable..

There are several possibilities for how to get sufficient test data into the dataset. These include:

  1. Find one or more articles with special characters in the title(s) on enwiki and pull data using the same script that was used to pull the current test data
  2. Mock up test data by hand in the same format as the existing test data pulled from enwiki
  3. Pull test data for some page in enwiki (whatever page, doesn't matter) then manually switch the title to a made-up title with special characters in it.

Regardless of what approach we use, we should use this opportunity to commit the script used to pull production test data into the GitLab repo for the test environment. We should also document the procedure used to pull test data, either within that repo's README, or as a comment within the script.

Completion criteria:

  • test data sufficient for the blocked test is included in data imported into the test env

Originally, this task also had the following creation criteria:

  • script to pull data is committed to the repo
  • instructions for executing the script are documented within the repo

These items were moved to T330512: AQS 2.0: document procedure for adding data to Cassandra testing env, and are no longer necessary here.

Event Timeline

JArguello-WMF raised the priority of this task from Medium to High.Feb 1 2023, 9:36 PM
BPirkle moved this task from Estimated/Discussed to Done on the AQS2.0 board.