Page MenuHomePhabricator

Create discourse-mediawiki.wmflabs.org (pilot instance)
Closed, ResolvedPublic

Description

discourse-mediawiki.wmflabs.org is planned to be a pilot for discourse.mediawiki.org, a developer support channel. The plan is documented at https://www.mediawiki.org/wiki/Discourse

  • Pilot plan drafted by Qgil and then approved(?) by Qgil.
  • Identification of points to consider before starting, especially things that we might regret when migrating to production.
    • Guidelines for pre-SSO usernames i.e. "user your Wikimedia username"? Not needed.
    • Clarity about what will happen to the accounts and the content created during the pilot if we move to production. Content and users will be migrated. Users will have to remember the email they used in the pilot to register in the new site with their own accounts.
  • Agreement on additional plugins required from start, if any.
  • Wikimedia Cloud VPS instance decided and available.
  • Discourse installed out of the box.
  • Decent SMTP solution.
  • Decent HTTPS solution.
  • Customization required before opening user registration.
    • Text-based header.
    • Footer copied from Wikimedia Phabricator.
    • Banner topic
    • Pinned welcoming topic with basic instructions and link to mw:Help:Discourse
  • Opening registration
  • Announcement.

Event Timeline

@chasemp @bd808 @Tgr your thoughts about things to consider before starting is very welcome (especially looking at the future migration to production if the pilot is a success).

I have noted some potential problems almost blindly at https://www.mediawiki.org/wiki/Discourse#Implementation_plan, based on what I recall from that lovely and exciting Phabricator pilot and its migration to production (ah, when were were so young...).

SSO task is T124691, should probably be a blocker.

discourse-mediawiki.wmflabs.org is planned to be a pilot for discourse.mediawiki.org, a developer support channel. The plan is documented at https://www.mediawiki.org/wiki/Discourse

We might require a security review of discourse before putting it on a subdomain of a production wiki (ie discouse.mediawiki.org) (I'm not sure). The pilot of course does not require that.

@Tgr would hosting this VM fit in the scope of T180662: Request creation of qna VPS project?

That certainly can be done if @Qgil wishes, but given that we have a discourse project already and the two discourse instances probably want to share puppet settings, it does not seem practical.

SSO task is T124691, should probably be a blocker.

Blocker for the pilot or for T180853: Bring a discourse instance for technical questions to production ?

That certainly can be done if @Qgil wishes

I don't have a strong opinion. Whatever you think it's best.

Blocker for production, I mean.

This task is about setting up the pilot. Let's discuss blockers for production at T180853: Bring a discourse instance for technical questions to production .

A basic blocker for the pilot: where should discourse-mediawiki.wmflabs.org be hosted? Options (if I understand the situation correctly:

I will not pretend knowing what is the best option from a technical point of view.

Another possible blocker. If we start without SSO (the most likely scenario?) then should we have any guideline about new usernames created? Should we recommend to use Wikimedia usernames or it doesn't matter?

Related: are we promising that accounts created during the pilot will "survive" the arrival of SSO and the move to production? In other words, if user qgil-WMF creates a local account during the pilot and creates a bunch of topics and replies, will the same user be able to login to the SSO instance in production and have (or claim) that content as theirs?

Please think about more blockers for the beginning of the pilot. Things that we don't want to regret after the Discourse pilot has been opened to the public. We can create subtasks for blockers that are not straightforward to resolve.

A basic blocker for the pilot: where should discourse-mediawiki.wmflabs.org be hosted? Options (if I understand the situation correctly:

I had not realized that there was already a 'discourse' project when I asked about the fit with the proposed 'qna' project. The 'discourse' project has lots of quota available. I would recommend that the new deployment be made as its own instance (VM) in the 'discourse' project.

I would also recommend that the deployment be managed with Puppet (and possibly scap3?) as an initial step towards the eventual production deployment. A discussion should be started with the TechOps team to determine what deployment methods they would be comfortable with for a Ruby on Rails application. To my knowledge we do not currently have any RoR apps being deployed in production, so this will take some thought. The "normal" means would be either packaging the application as a deb file and installing with apt (as is done for scap3 itself) or using scap3 to deploy from a git repository maintained on Wikimedia servers (as is done for ORES and Striker). Using the scap3 deploy method will require some work to figure out how to deploy gem dependencies. I'm not sure if there is a direct equivalent to the Python wheel deployment method that is used for the Python apps that we deploy with scap3 or not.

Qgil renamed this task from Create discourse-mediawiki.wmflabs.org to Create discourse-mediawiki.wmflabs.org (pilot instance).Nov 21 2017, 10:02 PM
Qgil updated the task description. (Show Details)

I would also recommend that the deployment be managed with Puppet (and possibly scap3?) as an initial step towards the eventual production deployment.

Would this be a requirement from the start, or would it be possible to start installing Discourse following their official instructions (based on Docker containers) and then improve the setup with Puppet?

What is clear is that the plan to handle upgrades agreed with SRE is a blocker for T180853: Bring a discourse instance for technical questions to production .

A discussion should be started with the TechOps team to determine what deployment methods they would be comfortable with for a Ruby on Rails application.

I have sent an email to Faidon asking for the appropriate way to get Operations involved in this project.

Re maintenance, I also wonder what is the role of the official way to update the software, which is pressing a button from the Administration UI (à la Wordpress). Backend upgrades through the command line are left for dependencies (i.e. Docker server requiring an upgrade) and when things go wrong via the UI (as it happened last time, when an almost 2-year jump of releases didn't go through by just pressing a button). In regular circumstances, administrators would receive emails for every consequent update available, and release notes (and probably the own UI) would explain when dependencies must be upgraded or when other deeper changes need to be made.

Could these upgrades via UI make the requirements for maintenace simpler?

Could these upgrades via UI make the requirements for maintenace simpler?

This type of deployment will work in a Cloud VPS project, but I do not believe that it would be allowed on the main Wikimedia network. Typically we do not allow production hosts to make outbound contact to arbitrary 3rd party servers and we also generally do not allow the user that a service runs as to modify the locally deployed code. There are many reasons for this, but the easiest to explain is that these are both attack vectors for malicious 3rd parties to exploit. Deploying via Docker containers may be possible via the relatively new Kubernetes cluster in production, but I do not know what stance TechOps will take about deploying 3rd party container images vs building our own.

Security concerns are the part of putting new code into production use that can cause confusion for people who are looking at documentation for upstream projects and not considering the special circumstances of the new software existing on the same network as a Top 10 internet property. The sort of security practices that are allowed for a closed network (e.g. corporate internet) deployment or a personal use are typically not on par with the protections that need to be taken on the main Wikimedia network.

I would also recommend that the deployment be managed with Puppet (and possibly scap3?) as an initial step towards the eventual production deployment.

Would this be a requirement from the start, or would it be possible to start installing Discourse following their official instructions (based on Docker containers) and then improve the setup with Puppet?

Its not a hard requirement for you to deploy software into a Cloud VPS project, no. I would recommend that it be a requirement before you go too far into your evaluation of the software however. Production deployment is a stated goal of the evaluation and thus packaging, deployment, and upgrade evaluations need to be a part of overall evaluation. If the software is awesome for end user needs but impossible to support in out deployment tool chain then it will not end up being deployed into production. Any product we deploy into production has a support cost, and that cost can actually be quite high depending on the peculiarities of the software stack and their overlap with other projects.

Guidelines for pre-SSO usernames i.e. "user your Wikimedia username"?

If I am reading this thread correctly, usernames of local accounts used during the pilot would not be relevant in a migration to SSO because Discourse takes email address as the main user identifier. We should advise users to register with the same email address that they are using for their Wikimedia account, though. Correct?

Clarity about what will happen to the accounts and the content created during the pilot if we move to production.

This HowTo explains how to migrate a Discourse instance to a new server. It seems quite straightforward?

I guess the move to SSO should be done before, still in the pilot, to assure that that part works well. Testing migration to new server and SSO at once would be too much.

Agreement on additional plugins required from start, if any.

For the sake of simplicity, I would start the with a plain Discourse, and then see what we really miss that can be provided by a plugin. There is a list of potentially useful plugins here.

Discourse installed out of the box.

Right. We agreed to use the same instance where discourse.wmflabs.org is located.

debian-8.2-jessie (deprecated 2016-02-16)

Should we start by upgrading the OS?

(If you trust me as poorperson's administrator, I can help. I am a long term Debian user, I have tinkered with Debian servers for some time (always as a self-taught hobbyist) and I have installed a Discourse instance for a (non-Wikimedia) pet project.)

@Andrew @Austin @EBernhardson @Tgr @Samwilson @yuvipanda, as current admins of the Discourse instance, how do you want to proceed?

Ideally by watching someone set it up :) Not sure if that's what you were asking.

Can you give me admin access, please?

@Qgil you probably know what you're up to, but give me a yell if I can help at all.

Thanks! One thing is to fiddle in your own server and another thing is to do the same in a Cloud instance with other admins. I will try to find my way but I will ask before choosing among many possible options, or before getting stuck for too long.

Subdomain, mail and https setups are likely to be the points where I'll have to find specific documentation.

debian-8.2-jessie (deprecated 2016-02-16)

Should we start by upgrading the OS?

After reading https://wikitech.wikimedia.org/wiki/Distribution_upgrades/jessie_stretch and seeing that an up-to-date Discourse instance is running in that server, I will skip this step and leave it to a proper sysadmin. :)

After reading https://wikitech.wikimedia.org/wiki/Distribution_upgrades/jessie_stretch and seeing that an up-to-date Discourse instance is running in that server, I will skip this step and leave it to a proper sysadmin. :)

Build a new VM from scratch. That's the right way to "upgrade" a Cloud VPS instance. Its also something that needs to be done as step zero towards an eventual production quality deploy. If these are early days of trial and error to see if the solution is worth further investment then I don't think it needs to start with Puppet and software packaging and upgrade paths however.

OK, thank you.

I created the new instance: https://tools.wmflabs.org/openstack-browser/server/discourse-mediawiki.discourse.eqiad.wmflabs

As expected, the first hop was email configuration. Discourse asks for SMTP server, username and password. I checked how discourse.wmflabs.org had solved it, and this is when I started to realize that that Discourse instance doesn't look like a plain installation:

  • Not all the files are under /var/discourse. containers/app.yml is under /srv
  • The SMTP credentials were expected to be defined in containers/app.yml but only the SMTP server is defined, username and password are missing there.

Without providing this data, the regular installation cannot proceed. I introduced provisional SMTP credentials to go to the next step.

But then, in the next step... the installer didn't want to build the app because Discourse doesn't support Devicemapper as Docker storage driver. At this point I decided to stop the installation and share this update here. Any suggestions?

containers/app.yml file from discourse.wmflabs.org contains many notes explaining the specifics of that custom installation. What I don't know is who did what and why. I am not sure whether I should keep trying to continue with a plain install or leave a custom installation to someone more experienced.

I think you can change to use overlay FS by adding this in /etc/docker/daemon.json:

{
    "storage-driver": "overlay"
}
  • I started fresh with a new instance.
  • I installed Docker following the official instructions.
  • I run docker info and to my dismay, devicemapper was still there. I don't get it, Docker docs say that the recommended storage driver for Debian Stretch is overlay2, and devicemapper is discouraged.
  • Anyway, I started Docker and tested that everything was working fine.
  • Then I proceeded to change the storage driver following the official instructions (which go down to what @Samwilson posted above, just using overlay2 instead of overlay).
  • However, after sudo nano /etc/docker/daemon.json and adding the JSON info and saving the file, Docker will not restart and is left in a broken state. Changing values in that JSON file will not solve anything, only removing the file completely will allow to restart Docker (with Devicemapper).

I wonder whether this is a problem of /etc/docker/daemon.json file/directory permissions? Or overlay2 support not working in the Debian Stretch image?

I wonder whether this is a problem of /etc/docker/daemon.json file/directory permissions? Or overlay2 support not working in the Debian Stretch image?

T184018: Remove overlay from kernel blacklist on toolforge -- You will need to set profile::base::overlayfs: true in the hiera settings for the instance or project.

Thank you @bd808! I would have never found this information on my own (I had searched in Wikitech).

Alright, a fresh Discourse install: http://discourse-mediawiki.wmflabs.org/

  • In order to complete the installation I had to sort out the SMTP server/username/password. I have created a personal account at https://elasticemail.com/ as a very interim solution. We need to implement a solution good enough to run the pilot.
  • HTTPS is pending. Discourse makes very easy to define a LetsEncrypt certificate during the installation but I saw that https://discourse.wmflabs.org was using something else.

Should I create separate tasks/requests for SMTP and HTTPS support?

While these two problems are open, registration is closed. Needless to say, invitations can be sent to those who are going to work administering / moderating. We also need a minimum customization of the site. Since this is a pilot, it is probably a good idea to start "very default" and configure / customize as needs arise.

  • In order to complete the installation I had to sort out the SMTP server/username/password. I have created a personal account at https://elasticemail.com/ as a very interim solution. We need to implement a solution good enough to run the pilot.

Related: T41785: Create a Cloud VPS SMTP smarthost

If SMTP is only needed for sending outbound messages that should be possible via existing relays I think. Inbound gets trickier.

  • HTTPS is pending. Discourse makes very easy to define a LetsEncrypt certificate during the installation but I saw that https://discourse.wmflabs.org was using something else.

https://discourse-mediawiki.wmflabs.org/ seems to work just fine. The nginx proxy that Cloud VPS instances use to connect their services to the public internet has a valid *.wwmflabs.org cert. Right now there is no way to enforce HTTPS when you create the proxy (T131288 and related tasks). You may be able to setup something on the instance that redirects to https://discourse-mediawiki.wmflabs.org/ if the X-Forwarded-Proto header seen does not say https.

Qgil updated the task description. (Show Details)

SMTP is only needed for outbound messages.

Silly me, I hadn't check https. OK, good to know. Discourse has a setting to "force HTTPS". I have enabled it, but this will not be enough in a setup where SSL termination is coming from somewhere else.

I wonder how http://discourse.wmflabs.org has solved this problem?

In other news, I have customize the header and footer in the simplest way, I have added a banner, and during today I will add the rest of content needed to open the pilot. I plan to add the Discourse Solved plugin, and with this (plus SMTP & HTTPS solved) we could open registration.

I think the other installation does it in templates/web.template.yml (see T179649#3772701):

+  # from https://wiki.mozilla.org/Community_Ops/Discourse/Setup#Edit_web.template.yml_.28Only_for_SSL_Sites.29
+  - replace:
+      filename: "/etc/nginx/conf.d/discourse.conf"
+      from: "server {"
+      to: |+
+        server {
+        #Messy hack to force SSL on only the hostname, not IPs so ELB and Icinga work.
+          set $use_https NO;
+          if ($host ~* 'discourse.wmflabs.org') {
+            set $use_https A;
+          }
+          if ($http_x_forwarded_proto != 'https') {
+            set $use_https "${use_https}B";
+          }
+          if ($use_https = AB) {
+            rewrite ^ https://$host$request_uri? permanent;
+          }

Forced HTTPS works now. Thanks @Samwilson! Another detail that I would not have found alone.

Yes, I will document all this to make life easier to whoever sets up the production instance eventually.

And the Solved plugin works.

Only SMTP and a couple of intro texts are left to open registration. Yay!

Turns out that post-installation I could edit the SMTP parameters and just copy what discourse.wmflabs.org has. I have tested it and it seems to work!

I have written the basic content to open the gate.

The registration is now "silently" open. You are encouraged to register, look around, and start providing feedback.

This has been a funny quasi-hobby project this weekend. Thank you @bd808 and @Samwilson! Without you, nothing would have happened this weekend, and I am not sure I would have found the time until the following weekend...

Let's wait for the first users, and if there are no big problems we will announce.

Anyway, see you at https://discourse-mediawiki.wmflabs.org/

SMTP is only needed for outbound messages.

It's also needed to be able to post by replying to emails. That's more important for the other Discourse instance that tries to fill the same social role mailing lists do, but it would be still nice to have.

If replying via email is a wanted feature, then it should be discussed in a separate task blocking T180853: Bring a discourse instance for technical questions to production . I will close this task here as soon as the pilot is announced.

Currently discourse.wmflabs.org provides this functionality using pop.gmail.com and a GMail address. Not a very sophisticated setup. I guess a proper @wikimedia.org address and the corresponding email server would do.