Page MenuHomePhabricator

Unconference: How can we automate new wiki creation
Closed, ResolvedPublic

Description

Automating wiki creating has been desired for a long time:

This has been discussed a bit at the Tech Conf in the T234641: Wikimedia Technical Conference 2019 Session: Continuous Delivery/Deployment in Wikimedia: The Future of the Deployment Pipeline session, but we need a follow-up.

One thing that we can resolve right in this unconference session is T238158: Identify which parts of the "Add a wiki" procedure can be integrated with the deployment pipeline. If we have for anything more, it would be even better.



NOTES

Wikimedia Technical Conference
Atlanta, GA USA
November 12 - 15, 2019

Notes: Tyler, AKosiaris

Session Name / Topic
New Wiki creation

  • * T158730 -- tim starling created
  • * T228745 -- amir created
  • * Neither of them is possible, instead we have https://wikitech.wikimedia.org/wiki/Add_a_wiki -- many of these steps are unstable -- everytime we create a wiki this gets updated
  • * 15 wikis were created this year -- so this is fairly regular -- Amir would like to move wikis out of the incubator as quickly as possible
  • Amir Sarabadani recently fixed a number of bugs that made this process work again, but it still is a mess. Extremely unstable
  • Up to 20 wikis per year up to now, Amir would like up to dozens a year.
  • How do we make this more stable?
  • Tangentially related to the deployment pipeline
  • Addshore: who here has added a wiki?
  • * Reedy + Daniel raised hands.*
  • Alex: how much do we want to automate this? A button?
  • Greg: Tim's suggestion on the task was 3 or 4 steps, which I think is good
  • Amir: if a language is elligble, then a wiki has been created -- 3 or 4 people who can write pages and is unique language -- this can be turned into a button, but not everyone should be able to click this button. The language commitee should be able to click this button -- that'd be fine.
  • Joe: Given that a lot of modifications in production need to happen, even if it's automated, I think that button should be clicked by someone who could follow these tasks. Every step in this task is basically adding configuration to a bunch of services. Some of this can be automated by the addWiki.php script. The script adds tables, etc. But we need to coordinate creation across different parts of the infra.
  • Reedy: it shouldn't need *n* different stuff
  • Problems with the ways we treat the configuration that needs to be updated to add a new wiki
  • Timo: for cxserver and for restbase -- they just have a copy of the sitematrix -- that sounds like an api-call they can cache and pull every minute
  • Joe: if an application needs to know which wikis exist and which don't that should come from mediawiki as the canonical service
  • Daniel: that's the dns langlist for each wiki.
  • TImo: Idon' t think DNS should be the source of truth
  • Addshore: what I made for wmde doesn't touch parsoid, but from my point of view you shouldn't need a static list
  • Timo: restbase uses the wiki you're interacting with as a pathname component -- sinc we don't want to tuse third party hostnames
  • And still internally does, even if the external API is <lang>.<wikiproject>.org
  • G: this is about propogation of changes
  • Timo: two categories (1) setup steps, actually needs to do something (2) cxserver/restbase -- list of valid hostnames -- not about validation or actually doing something which should be a pull
  • R: addwiki is not tested or testable and only gets run every 6 months
  • A: It works only in production
  • R: in beta ... kind of
  • Addshore: we have wikis that go into the world and have things broken
  • G: this needs CI and it needs to be idempotent -- series of stages each of which are scripts that do two things: 1. check state of world 2. if state of the world has not bee updated, update it -- we need to re-engineer that script. We need to wrap it in a structure that is more resilient
  • maybe shouldn't be one script?
  • R: prep it down, split it to various sections
  • G: Does it need mediawiki libraries?
  • R: Yes. CirrusSearch, frex which calls to CirrusSearch libraries
  • A: none of the wiki creation in my script is inside MW, it's all seperate. Anything that the setup script needs to do with MW it does via private APIs to keep is serperate

Review of addWiki.php script

  1. Creates database
  2. Creates database for Echo
  3. Creates the main page (which should be idempotent) -- could be an api call
  4. Set fundraising link (fundraising tech hasn't used this for at least 7 years) -- adds a link to the sidebar (it's probably already there, and not really needed)
  5. Creates search index
  6. Populates global sites table -- which breaks frequently -- more than addWiki.php
  7. Cognate sites table
  • G: if most of this is applying sql or creating an index in elasticsearch -- we have a framework to automate this in steps
  • Timo: since this is a maintenance script, it needs a mediawiki instance running, but this script can be independent
  • R: from the dbname it's creating it figures out the type of wiki that's being created
  • A: there are another 5 steps before we run addWiki
  • R: but that might be out of scope: do you want us automating dns deploys?
  • G: if the problem is dns and apache
  • R: for 90% of cases languagelist does a lot of the work
  • D: this is for languages that don't have any wiki yet
  • G: several things need to updated and we have no reliable way to know if the script is broken or not
  • A: Who owns this?
  • Noone currently, but in practice: Amir (Ladsgroup)+Sam+Urbanecm? (And language committee drives this from the users' perspective: Jan-Harald Søby, Amir Aharoni, etc.)
  • A: Who *should* own this?
  • G: Several teams, but someone should drive.
  • T: it can be made robust/easier to maintain by making it do less -- services should use mediawiki as the source of truth
  • G: improvements can be made incrementally
  • D: the problem is all the different repositories involved
  • G: split into steps, make sure those steps are testable
  • R: should be easy and doable. Do we want to have a CI process that creates a wiki ?
  • T: the solution is not to automate it but to make it obsolete

everyone is shocked by wikicreation

A list by @Krinkle

20191114_154839.jpg (2×4 px, 2 MB)

Event Timeline

Adding @Reedy. Quoting @akosiaris: “most of the Add a wiki procedure says ‘ping Reedy’ ”.

Amire80 updated the task description. (Show Details)
Amire80 updated the task description. (Show Details)

Notes from the session:

Wikimedia Technical Conference
Atlanta, GA USA
November 12 - 15, 2019

Notes: Tyler, AKosiaris

Session Name / Topic
New Wiki creation

  • T158730 -- tim starling created
  • T228745 -- amir created
  • Neither of them is possible, instead we have https://wikitech.wikimedia.org/wiki/Add_a_wiki -- many of these steps are unstable -- everytime we create a wiki this gets updated
  • 15 wikis were created this year -- so this is fairly regular -- Amir would like to move wikis out of the incubator as quickly as possible
  • Amir Sarabadani recently fixed a number of bugs that made this process work again, but it still is a mess. Extremely unstable
  • Up to 20 wikis per year up to now, Amir would like up to dozens a year.
  • How do we make this more stable?
  •  Tangentially related to the deployment pipeline
  • Addshore: who here has added a wiki? 
    • *Reedy + Daniel raised hands.*
  • Alex: how much do we want to automate this? A button?
  • Greg: Tim's suggestion on the task was 3 or 4 steps, which I think is good
  • Amir: if a language is elligble, then a wiki has been created -- 3 or 4 people who can write pages  and is unique language -- this can be turned into a button, but not everyone should be able to click this button. The language commitee should be able to click this button -- that'd be fine. 
  • Joe: Given that a lot of modifications in production need to happen, even if it's automated, I think that button should be clicked by someone who could follow these tasks. Every step in this task is basically adding configuration to a bunch of services. Some of this can be automated by the addWiki.php script. The script adds tables, etc. But we need to coordinate creation across different parts of the infra.
  • Reedy: it shouldn't need *n* different stuff
  • Problems with the ways we treat the configuration that needs to be updated to add a new wiki
  • Timo: for cxserver and for restbase -- they just have a copy of the sitematrix -- that sounds like an api-call they can cache and pull every minute
  • Joe: if an application needs to know which wikis exist and which don't that should come from mediawiki as the canonical service
  • Daniel: that's the dns langlist for each wiki.
  • TImo:  Idon' t think DNS should be the source of truth
  • Addshore: what I made for wmde doesn't touch parsoid, but from my point of view you shouldn't need a static list
  • Timo: restbase uses the wiki you're interacting with as a pathname component -- sinc we don't want to tuse third party hostnames
    •  And still internally does, even if the external API is &lt;lang&gt;.&lt;wikiproject&gt;.org
  • G: this is about propogation of changes
  • Timo: two categories (1) setup steps, actually needs to do something (2) cxserver/restbase -- list of valid hostnames -- not about validation or actually doing something which should be a pull
  • R: addwiki is not tested or testable and only gets run every 6 months
    • A: It works only in production
    • R: in beta ... kind of
  • Addshore: we have wikis that go into the world and have things broken
  • G: this needs CI and it needs to be idempotent -- series of stages each of which are scripts that do two things: 1. check state of world 2. if state of the world has not bee updated, update it -- we need to re-engineer that script. We need to wrap it in a structure that is more resilient
  • maybe shouldn't be one script?
  • R: prep it down, split it to various sections
  • G: Does it need mediawiki libraries?
  • R: Yes. CirrusSearch, frex which calls to CirrusSearch libraries
  • A: none of the wiki creation in my script is inside MW, it's all seperate. Anything that the setup script needs to do with MW it does via private APIs to keep is serperate

Review of addWiki.php script

  1. Creates database
  2. Creates database for Echo
  3. Creates the main page (which should be idempotent) -- could be an api call
  4. Set fundraising link (fundraising tech hasn't used this for at least 7 years) -- adds a link to the sidebar (it's probably already there, and not really needed)
  5. Creates search index
  6. Populates global sites table -- which breaks frequently -- more than addWiki.php
  7. Cognate sites table
  • G: if most of this is applying sql or creating an index in elasticsearch -- we have a framework to automate this in steps
  • Timo: since this is a maintenance script, it needs a mediawiki instance running, but this script can be independent
  • R: from the dbname it's creating it figures out the type of wiki that's being created
  • A: there are another 5 steps before we run addWiki
  • R: but that might be out of scope: do you want us automating dns deploys?
  • G: if the problem is dns and apache
  • R: for 90% of cases languagelist does a lot of the work
  • D: this is for languages that don't have any wiki yet
  • G: several things need to updated and we have no reliable way to know if the script is broken or not
  • A: Who owns this?
    • Noone currently, but in practice: Amir (Ladsgroup)+Sam+Urbanecm? (And language committee drives this from the users' perspective: Jan-Harald Søby, Amir Aharoni, etc.)
  • A: Who *should* own this?
      • G: Several teams, but someone should drive.
      • T: it can be made robust/easier to maintain by making it do less
    • services should use mediawiki as the source of truth
      • G: improvements can be made incrementally
      • D: the problem is all the different repositories involved
  • G: split into steps, make sure those steps are testable
  • R: should be easy and doable. Do we want to have a CI process that creates a wiki ?
  • T: the solution is not to automate it but to make it obsolete

*everyone is shocked by wikicreation*

@Amire80 / @Theklan: Thank you for proposing and/or hosting this session. This open task only has the archived project tag Wikimedia-Technical-Conference-2019.
If there is nothing more to do in this very task, please change the task status to resolved via the Add Action...Change Status dropdown.
If there is more to do, then please either add appropriate non-archived project tags to this task (via the Add Action...Change Project Tags dropdown), or make sure that appropriate follow up tasks have been created and resolve this very task. Thank you for helping clean up!

Thanks for the poke! I'm in COVID-19 mess at home, but I made myself a reminder to go over all of it and make last clean-ups ASAP. If I don't resolve it before the end of March 2020, feel free to close it yourself.

Amire80 claimed this task.

OK, the only action here, I guess, is to write a pitch for why this should be on... somebody's annual plan ¯\_(ツ)_/¯

I'll write it ASAP, but this task can be closed.