Page MenuHomePhabricator

Write an automated test to ensure users creating a new account from the editor are redirected back to the editor, not to Welcome survey
Open, MediumPublic

Description

Original problem

Copied from T397193: New accounts created from editor redirect to welcome survey, not back to editor.

Steps to replicate the issue (include links if applicable):
tested on mobile web test wiki

What happens?:

  • after creating the account the system redirects the new account to the welcome survey

What should have happened instead?:

  • the system should redirect the new account back to the article they wanted to edit
Acceptance Criteria
  • write an automated test (or multiple if needed for multiple components) that verifies the problem does not occur (in GrowthExperiments)
  • Ensure the individual components changed in T397193 (VisualEditor, CentralAuth) behave as required (avoiding the problem to happen again for the same reason)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Urbanecm_WMF added a subscriber: Michael.

Transferring my comment from the original task:

Is our CI setup sufficiently close to production that we can create a meaningful automated test for this?

Not really. Most of production's auth logic (including SUL3, which caused this issue) is in MediaWiki-extensions-CentralAuth. Unfortunately, CentralAuth-related code is both mostly untested and mostly untestable, as CI does not use CentralAuth at all (except in CentralAuth's own tests). This is an intentional decision – we basically have two bad options to pick from:

  1. Installing CentralAuth in CI: This would allow us to cover the authz mechanisms we actually use in production via automated CI-enforced tests. However, it would result in having the default authz mechanisms (provided by Core) mostly untestable. In other words, our automated tests could help us ensure our own production will not break, but it will cause random breakages in third-party MediaWiki installations.
  2. Not installing CentralAuth in CI (status quo): This is almost a complete inverse of the first option – it allows us to test the default authz logic, but it makes it nearly impossible to test anything CentralAuth-related, particularly if we want to test integration between CentralAuth and some other extension.

While the status quo is still a bad option, it arguably puts us in a better state than the alternative (installing CI). This is because we tend to notice issues in our production relatively quickly: we update our installation every week (compare to the "every six months" release cycles for external wikis), we are by far the most used installation of MediaWiki and we have the densest concentration of MediaWiki experts. None of those applies for third parties, and relying on user reports (generally delayed by at least 6 months since the breaking patch) is much worse.

For more details on this, please see T321864 and T333541.


So, now that I outlined why a simplish test is not possible, what can we do? I see several options:

  1. Support multiple versions of configurations in CI: This would allow us to run CI pipelines with both CentralAuth enabled and CentralAuth not enabled, which would mean we would no longer have to choose. It would also allow us to ensure that certain behaviour exists regardless of configuration, which is pretty much impossible to do in the present day.
  2. Create an ad-hoc CI job just for one extension: In CI config, CI jobs for extensions usually come from predefined templates. However, in principle, it should be possible to create a job specific for a given repository. In theory, we should be able to create a job that would run (for example) GrowthExperiments' tests with CentralAuth enabled, allowing us to write a full integration test for this.
  3. Run out-of-CI tests regularly against live wikis: We can run some tests outside of the regular CI pipelines, hitting production (presumably, testwiki) or beta directly, rather than a fresh MW instance as CI does currently. This would help us to notice issues that are hard to test for directly. There would be a slight delay (if the test runs daily, we would notice the issue at most the day after it occurred), but it would be simpler to do (for Growth anyway) than changing how CI itself behaves. For what it is worth, Pywikibot took this approach (see https://test.wikipedia.org/wiki/User:Pywikibot-test for related info; GitHub Workflow specifications are available as well) .
  4. Test individual subfeatures and not the full integration: If we determine it is not viable to simulate the actual breakage within a CU-executed test, we can attempt to break it into components. For example, it should be trivial to test that CentralAuth passes certain parameters over when redirecting users to auth.wikimedia.org. Similarly, it should be possible to ensure that VisualEditor sets the parameter on the first URL.

None of those options is a perfect one, and all of them have significant tradeoffs. Here is a quick summary of them:

  1. Support multiple versions of configurations in CI: In terms of benefits, this is the clear winner. However, it would require significant investments in the CI pipeline, and it is not something we can do currently. It is extremely likely to break a ton of tests, including both actual failures (that we did not notice) and behaviours that are not necessary in certain cases (for example, we expect most of core's auth logic to do nothing if CentralAuth is present). It would have to be a long-term project, not something an engineer or two can carry out in their spare time.
  2. Create an ad-hoc CI job just for one extension: Assuming we can create a pipeline that would fulfil our needs in terms of enabled extensions, we would need to convince RelEng to add an ad hoc bit to an otherwise standardized set of pipelines. This is likely to cause issues each time a CI upgrade is made, and it would make GrowthExperiments a very special extension from a CI perspective in general, which is likely not warranted. Wikibase is a present-day example of this (given the Wikibase repository contains more than one extension), and the troubles associated with the setup there is the primary reason why we have CommunityConfiguration and CommunityConfigurationExample as fully separate extensions (in fact, CI admin docs do explicitly mention to check CI works on both core and Wikibase when doing upgrades). While this solution would allow us to solve what we need to, I think it would be declined by RelEng (and probably rightfully so).
  3. Run out-of-CI tests regularly against live wikis: This is likely the simplest solution for the Growth team that would give us a full integration test. It would require us hosting the special tests somewhere and maintaining the infrastructure for them (not only the tests themselves, but also the how-to-run scripts/config files). GitLab allows us to get a repo with CI enabled relatively simply, but I'm not sure we want to maintain such pipeline in the longer run. The biggest problem is that this would either only detect issues (shortly) before deployment (on testwiki in the earliest case), or suffer from fairly frequent outages in Beta.
  4. Test individual subfeatures and not the full integration: I'm afraid this is the best we can do given our current infrastructure. On the plus side, it would detect if this issue was to happen because of the exact same cause. On the negative said, it wouldn't catch if the same regression happened for a different reason, which is something we want to know about as well.

FTR, this was originally in sprint. @Michael, feel free to move it back there if you consider that appropriate.

@Michael @Cyndymediawiksim Curious to hear whether you have any thoughts on my CI options comment from above. Do you have any preferences from the not-so-good choices we have? Do you see any path forward I might have missed?

@Urbanecm_WMF,
I'd lean on option 4 for now, test the subfeatures in CI so we at least catch the exact regression we just fixed and maybe pair it with option 3 that runs against beta or testwiki on a daily/weekly timetable to cathc other cuases. Longterm we should work towards option 1 as it's the ony option that truly removes the false choice between core and CentralAuth coverage.But it's a RelEng-level investment, so it needs to be pitched as a cross-team infra project, not a Growth-only need. I'd avoid option 2(CI for one extension) because of the relEng pushback and even if approved, we'd inherit maintenance tax everytime CI changes.

Michael triaged this task as Medium priority.Aug 25 2025, 3:55 PM

Thank you, @Urbanecm_WMF, for outlining these options! I would like to suggest another potential approach that is somewhat of a variation of your option 1 and 3:

Option 5: Have a secondary CI that runs post-merge on GitHub. There we could create a simple setup that makes use of MediaWiki, CentralAuth, VisualEditor, and GrowthExperiments and runs a dedicated Cypress test to walk through this "mid-edit workflow without redirect to Welcome Survey".
That test could then run either one every push to master or on a daily schedule. We would need to think about a way for it to reliably notify us if it fails, but that should be solve. At worst it will be a part of chores to check whether it always passed in the last days.

There is some precedent for this with the Wikibase (https://github.com/wikimedia/mediawiki-extensions-Wikibase/blob/master/.github/workflows/secondaryCI.yml) and
EntitySchema (https://github.com/wikimedia/mediawiki-extensions-EntitySchema/blob/master/.github/workflows/dailyCI.yml) extensions.

(I've also asked about this idea in a foundation-internal Slack conversation.)

@vaughnwalters Can we estimate how long this test would take to create? We may hold off on implementing this until we can rely on Catalyst to set up CentralAuth and use it.

Let's reach out to Test Platform and see what would need to happen to make this test work, and maybe reach out to editing (@Ryasmeen and/or @EAkinloose ) to see if there are any additional tests already existing around this.

Based on refinement this morning: GrowthExperiments Cypress take ~5 minutes. This is already starting to get long.

Additional questions: Should we be adding more tests here in GrowthExperiments, can this be done with a lower level set of tests (this is a question more for Engineering (@dennismwagiru) than QS-Test-Automation).

Update: I had a conversation with Dennis yesterday about this - we believe there may be an opportunity here to use PHPUnit or some other lower-level testing framework to test this without a specific need for a flaky, slow E2E UI test. I'm doing further research on this to continue to explore that possibility. Assigning this to myself for the moment as I seek more information.