Page MenuHomePhabricator

Did something change in beta cluster configuration around September 16 2017?
Closed, ResolvedPublic

Description

TLDR: Did something change in beta cluster configuration around September 16 2017?

RelatedArticles repository has Selenium tests (tests/selenium). They are running:

A reported in T176315, until about September 16 2017 selenium-RelatedArticles-jessie daily job was able to create pages at beta cluster. The pages are created via action API using nodemw NPM package. Code is in mediawiki/core.

Since about September 16 2017, selenium-RelatedArticles-jessie started failing with 😳

Edit failed: Error: Error returned by API: Wikipedia has restricted the ability to create new pages. You can go back and edit an existing page, or [[Special:UserLogin|log in or create an account]].

We only keep jobs for the last 15 days so there is no record of failures available in Jenkins. 🙁

The workaround for the failure is to log in before creating a page (rMWd1439a3e67467dee3c1993943aa7cca1d7904e9e). Please notice the commit is from October 3 2017.

I have talked with @hashar about it and he says:

wmf-config/InitialiseSettings.php
'groupOverrides' => [
    'enwiki' => [
        '*' => [ 'createpage' => false ], // See P2059

P2059 Anonymous users have been prevented from creating new pages since 2005!

What is strange is that RelatedArticles tests were not failing until about September 16. I have implemented logging in via the API on October 3, so I am pretty sure the tests did not log in before that date. I do not even know where to start looking. Any help is appreciated.

Event Timeline

zeljkofilipin triaged this task as Low priority.
zeljkofilipin renamed this task from Changes to beta cluster to Did something change in beta cluster configuration around September 16 2017?.Oct 27 2017, 12:34 PM
zeljkofilipin updated the task description. (Show Details)
zeljkofilipin added a subscriber: hashar.

It would be helpful here to include the names of the pages that the script tries to create, and some examples from when the script used to work prior to sept 16.

Perhaps related:

  • 16:47, 21 September 2017 Greg (WMF) (talk | contribs) unblocked Scanner user 0 (talk | contribs) (Not a spammer)
  • 16:40, 21 September 2017 Greg (WMF) (talk | contribs) unblocked Selenium Echo user 2 (talk | contribs) (Not a spammer)
  • 17:20, 19 September 2017 Legoktm (talk | contribs) unblocked Selenium user (talk | contribs) (part of selenium tests)
  • 08:15, 18 September 2017 Sau226 (talk | contribs) blocked Selenium Echo user 2 (talk | contribs) with an expiration time of indefinite (account creation blocked, autoblock disabled, email disabled, cannot edit own talk page) (Spambot)
  • 08:14, 18 September 2017 Sau226 (talk | contribs) blocked Selenium user (talk | contribs) with an expiration time of indefinite (account creation blocked, autoblock disabled, email disabled, cannot edit own talk page) (Spambot)
  • 03:32, 17 September 2017 Sau226BOT (talk | contribs) deleted page Related Articles 1 (Mass delete spam made by Selenium user)

So, if Related Articles 1 was deleted on sept 17, that probably means that prior to that the page already exists, so selenium could edit it without logging in (as it wasn't creating the page since it already existed).

As an aside, the fact it took over a month to notice suggests something about how useful these tests are.

As an aside, the fact it took over a month to notice suggests something about how useful these tests are.

It did not took a month to notice. T176315 was created on September 20. Failures started on September 16.

It would be helpful here to include the names of the pages that the script tries to create, and some examples from when the script used to work prior to sept 16.

The script tries to create page Related Articles 1 (code, page).

I am not sure how to provide examples prior to September 16 since Jenkins has jobs only from the last 15 days. (I have thought it was 30.)

  • 03:32, 17 September 2017 Sau226BOT (talk | contribs) deleted page Related Articles 1 (Mass delete spam made by Selenium user)

So, if Related Articles 1 was deleted on sept 17, that probably means that prior to that the page already exists, so selenium could edit it without logging in (as it wasn't creating the page since it already existed).

Thanks, that's probably it.

@Bawolff we noticed the failures within a day. Whats taken long is fixing the problem. Not sure how you are jumping to this conclusion!

@zeljkofilipin as I tried badly to explain on the last ticket I believe we might be approaching this incorrectly and asking the wrong question . My belief is that the before step has not been running successfully and only recently started reporting errors when the edit failed. If we assume that's the case (the api edit failure previously died silently) the test would still pass.

If you look at the edit history of the page only the first edit matters (as further edits are null and do not change the content)

..
https://en.m.wikipedia.beta.wmflabs.org/wiki/Special:History/Related_Articles_2

So my theory is that either our error handling; our edit api now sends errors on null edits and didn't before; our selenium error handling got better or the node library that deals with the edit had a behavioral changed. I'd start by looking there.

If this is the case :

https://phabricator.wikimedia.org/T179157#3715926

We should probably look into why it doesn't fail if the page already exists but user is set up incorrectly (e.g. trying to edit anonymously) and at least fire a warning message in that situation.

enwiki at beta cluster allows anonymous edits, but not anonymous page creation. So, if the page was already created, the tests would run fine. If the page got deleted, they would start to fail. (Not any more, since now the tests log in.)

Ah. Thanks for this distinction. Then yes.. mystery solved :)

Probably. I will leave the task open until Monday, if case somebody else will have further ideas. If not, I will resolve the task on Monday.

Thanks @Bawolff and @Jdlrobson, I think the mystery is solved.