Page MenuHomePhabricator

Request creation of wikinewsie VPS project
Closed, ResolvedPublic

Description

Project Name: wikinewsie

Wikitech Usernames of requestors: Leaderboard, acagastya

Purpose: T281520, sysadmins don't like creating it on production so will give this a try and see if it works

Brief description: T281520 (same reason).

How soon you are hoping this can be fulfilled: As soon as possible. A floating IP address will be required (for the mail server).

Event Timeline

Some notes (as requested at #wikimedia-cloud)

  • The main purpose of the server is to run a MediaWiki instance that can be accessed anywhere from the world, but the contents of the wiki can be read only by authorised users, namely Wikinews recruiters (so something similar to a CU wiki from a configuration point of view). The purpose of the wiki is to share reviewer materials that cannot be done publicly (for example, private recordings), as this is required for Wikinews' rigorous reviewing policies.

There are also a couple of secondary plans, which will be dependent on the feasibility and other factors:

  • A mail server. Currently (as described at https://en.wikinews.org/wiki/Wikinews:Accreditation_requests), users who pass the accreditation request are given a personal email address (currently at @wn-reporters.org). The aim is to try to move that to the Cloud VPS instance, so (as I was told at IRC) either the emails would be of the form username@wikinewsie.wmfcloud.org, or a domain map would be done so that the server would send/receive emails at @wn-reporters.org).
  • Occasional use as a proxy - there are some cases where the Wikinewsie cannot access essential sites (when reviewing for instance), such as one instance where I was told that the said reviewer had to resort to Pornhub's VPN as some US government sites could not be accessed from the reviewer's country, and hence the Cloud VPN instance may be used as a proxy (Wireguard) for such purposes. I would expect this use to be uncommon at most and only for Wikinews-related purposes.
  • Occasional use as a proxy - there are some cases where the Wikinewsie cannot access essential sites (when reviewing for instance), such as one instance where I was told that the said reviewer had to resort to Pornhub's VPN as some US government sites could not be accessed from the reviewer's country, and hence the Cloud VPN instance may be used as a proxy (Wireguard) for such purposes. I would expect this use to be uncommon at most.

This use case is very explicitly disallowed by the Cloud Services Terms of Use:

Using Wikimedia Cloud Services as a network proxy: Do not use Wikimedia Cloud Services servers or projects to proxy or relay traffic for other servers. Examples of such activities include running Tor nodes, peer-to-peer network services, or VPNs to other networks. In other words, all network connections must originate from or terminate at Wikimedia Cloud Services. An explicit exception for Github Actions has been granted, with conditions. See T260746 for details.

  • Occasional use as a proxy - there are some cases where the Wikinewsie cannot access essential sites (when reviewing for instance), such as one instance where I was told that the said reviewer had to resort to Pornhub's VPN as some US government sites could not be accessed from the reviewer's country, and hence the Cloud VPN instance may be used as a proxy (Wireguard) for such purposes. I would expect this use to be uncommon at most.

This use case is very explicitly disallowed by the Cloud Services Terms of Use:

Using Wikimedia Cloud Services as a network proxy: Do not use Wikimedia Cloud Services servers or projects to proxy or relay traffic for other servers. Examples of such activities include running Tor nodes, peer-to-peer network services, or VPNs to other networks. In other words, all network connections must originate from or terminate at Wikimedia Cloud Services. An explicit exception for Github Actions has been granted, with conditions. See T260746 for details.

This is the reason I've mentioned it below (citing the exact provision you quoted) after discussing at #wikimedia-cloud, so that it can be reviewed for a (possible) exemption either way.

A floating IP address will be required (for the website).

Are you aware that Cloud VPS provides HTTPS reverse proxies? A project specific public IP address should really only currently be needed for:

  • Hosting under a domain other than *.wmcloud.org
  • Hosting non-HTTP based public services

A floating IP address will be required (for the website).

Are you aware that Cloud VPS provides HTTPS reverse proxies? A project specific public IP address should really only currently be needed for:

  • Hosting under a domain other than *.wmcloud.org
  • Hosting non-HTTP based public services

I'm not sure myself; I was told at #wikimedia-cloud that it would be required for our use-case. Presume it was for the mail server plan. If we can do without one (for example if the mail server plan is not done), the floating IP address will not be used.

I'm not sure myself; I was told at #wikimedia-cloud that it would be required for our use-case. Presume it was for the mail server plan. If we can do it without one, the floating IP address will not be used.

It would be a new, but reasonable, use case for WMCS to setup inbound mail handling for Cloud VPS projects via a shared announced mail exchange (MX) host. This probably will come down to expedience of getting your services running. Giving your project a public IP and letting you manage the MX server directly is likely quickest. Setting up a shared MX with sub-domain forwarding rules to direct delivery to a project hosted message delivery agent is more scalable in the longer term. The "right" fix might be to do both, with the public IP to be replaced with the shared MX service once it is up and running. See also T47829: Labs: Let mail (from cron and perhaps other defaults) go to project forwarder by default and T47828: Implement mail aliases for Cloud-VPS projects (<novaproject>@wmcloud.org) which are semi-related ideas from the backlog.

Per discussion at T281520: Create Wikinewsie's Portal wiki, it looks like one of the needs of this wiki is storage of confidential/private information. Is that a correct statement @Leaderboard? If so, can you help us understand the nature of confidentiality (is this "personally identifiable information" about humans? what bad things will happen when this data leaks?).

Per discussion at T281520: Create Wikinewsie's Portal wiki, it looks like one of the needs of this wiki is storage of confidential/private information. Is that a correct statement @Leaderboard? If so, can you help us understand the nature of confidentiality (is this "personally identifiable information" about humans? what bad things will happen when this data leaks?).

That's true. @Acagastya can explain better about the nature of confidentiality, but yes, it's often about humans (for instance, a private interview where parts of the speech by the interviewee should not be made public).

@bd808 Much about the use case was explained in this comment on T281520, I will paste it here before answering your question.

We used to have a chapter/wiki, but few years ago, it was no longer considered by AffCom.
'Wikinewsie' word is used to mean two things -- one being someone who contributes to Wikinews, and second, it is used to refer to the accredited reporters and reviewers (users who can sight the edit, technically speaking) with scoop access.
Just like how we have Znuny where trusted VRT-agents can access the information to validate and approve a file on Commons (talking about permission@wikimedia), people who have access to scoop access are able to access evidence for the purpose of verification of the facts in a news article. Sometimes it is email, sometimes it is media document. By nature, it is not free, and cannot be put on Commons; and sometimes it is something reviewers should have access to, but not the general public. (An example that comes to mind is a source revealing something under the request of anonymity). One use being an archive repo for non-free media used as a source, be it a video, an audio interview, photos, and even PDF (an example that immediately pops up in mind is the PDF letter sent by the secretary of European Council upon request).

While most review process happens publicly on-wiki, I am aware one time, this article was first put on the private wiki, where it was reviewed, and once ready for publication, it was copied to on-wiki, and published. At times when reporting about other media being in violation of law, or when reporting about a story dealing with DMCA takedown -- and putting original sources in the article will lead to us receiving a take down notice too, merely for citing/stating "the violation took place on this site", we had to use private wiki.

'Wikinewsies' in this case refers to reviewers who are willing to review an OR or an exclusive interview. Any reviewer can choose to become a 'newsie, but not all reviewers are Wikinewsies. Hence the request for the name. 'Wikinewsie' refers to the enwn's reviewers who review ORs. Other language wikinews uses different terminology (like wikirreporteros on eswn).

Re the file size limit, I don't know but it is very unlikely we have files *that big*! 4GB will be way way more than enough. It is likely @Gryllida has the dump, which she might be able to provide for the purpose of importing.

To answer your question, Brian, person things would include things like the identity of the person; and in some cases, their location. Sources at times disclose in the interviews where they live, or names of their children. I was interviewing a sex-worker about how COVID-19 had affected her, and she would give many personal details some of which she wanted to be omitted from print. She had, in that conversation, detailed information which could cause troubles; because, let's be honest, sex-work in this part of the world is frowned upon by so many! The other sex-worker said how he faced the judgement from the neighbours when they would find out what his work was.

What can happen if it gets leaked, could range on a lot of factors, the least being the failure to protect the privacy of the sources, to, that fact being used by someone to trouble or harass them, or, in extreme cases, had we been speaking to a whistleblower (or the source lived in a country run be government has no respect for freedom of the press, who knows how serious it can get.

Frankly, the seriousness of it should be treated no less than CU, or OC wiki, though it won’t require near the same amount of maintenance. As I mentioned, we need a simple wiki to have revision control, and option to upload files.

Frankly, the seriousness of it should be treated no less than CU, or OC wiki, though it won’t require near the same amount of maintenance. As I mentioned, we need a simple wiki to have revision control, and option to upload files.

This is the part that make me wary of Cloud VPS as the hosting environment your data requires. The Cloud VPS project is setup to make it easier for Wikimedians to collaborate on technical projects and share information with each other. It is not a core goal of the project to enable collection, storage, and long term security of sensitive information. That is not to say that we force everyone's data to public, there are certainly "secrets" in many Cloud VPS hosted projects, but typically these "secrets" are in the form of passwords and API access tokens which can be revoked if leaked. Holding data that could cause a human to have legal, financial, or physical safety issues if disclosed is a completely different sort of secret.

Well, Leaderboard tried asking for a prod wiki, which was denied because there are too many moving parts in prod. We were suggested to go for WMCS instead. This is the requirement which I have detailed above. What do you suggest given this requirement? We could host a nextcloud instance for the data, while having a wiki dealing with non-media, text content, which needs to be handled behind closed doors. I don't know if it solves any problems, but let's see what we can do.

Frankly, the seriousness of it should be treated no less than CU, or OC wiki, though it won’t require near the same amount of maintenance. As I mentioned, we need a simple wiki to have revision control, and option to upload files.

This is the part that make me wary of Cloud VPS as the hosting environment your data requires. The Cloud VPS project is setup to make it easier for Wikimedians to collaborate on technical projects and share information with each other. It is not a core goal of the project to enable collection, storage, and long term security of sensitive information. That is not to say that we force everyone's data to public, there are certainly "secrets" in many Cloud VPS hosted projects, but typically these "secrets" are in the form of passwords and API access tokens which can be revoked if leaked. Holding data that could cause a human to have legal, financial, or physical safety issues if disclosed is a completely different sort of secret.

This was the reason I had initially gone with the production route, as according to @Acagastya the content that's going to be hosted is equivalent in confidentiality to that of a CU, steward wiki or anything that would require signing a NDA, however Ladsgroup said that just because that content does not use the WMF NDA, it does not belong in production (citing him: "I'm confident this wiki is not storing NDA-level information..."), which I found strange. They were adamant when I further discussed via IRC.

I've added @Ladsgroup as the person who denied the production request; if this is not warranted, feel free to revert or remove. I've similarly added @Urbanecm as someone who also encouraged us to take the WMCS route (again feel free to unsubscribe, meant this in good faith).

sysadmins don't like creating it on prod so will give this a try and see if it works; fought hard but they are persistent

What type of phrasing is this?

I outlined my reasoning in the declined ticket. I see no need to "fight" (iterate again) here. The requested material are not private in sense of NDA-protected private information and certainly WMCS is much more secure than some random third party vendor.

Also, this project wouldn't need a floating IP. You can simply use the DNS proxies.

sysadmins don't like creating it on prod so will give this a try and see if it works; fought hard but they are persistent

What type of phrasing is this?

I outlined my reasoning in the declined ticket. I see no need to "fight" (iterate again) here. The requested material are not private in sense of NDA-protected private information and certainly WMCS is much more secure than some random third party vendor.

Apologies if this hurt you; the phrasing has been corrected.

@Ladsgroup wrote:
and certainly WMCS is much more secure than some random third party vendor.

I'm sorry to disagree. Most cloud vendors will have much higher guarantees about data security than WMCS, by design. We may eventually come to a point where WMCS does have data security guarantees, but this is certainly not the current state. I don't even think it is at all planned for WMCS right now.

Also, this project wouldn't need a floating IP. You can simply use the DNS proxies.

To have a mail server with their own domain.

I would reiterate, I am not aware of the security of WMCS, and how it is different from the prod; I had asked to, I think Martin Urbanecm, if he were to set up a wiki on prod and on WMCS, with the same security in place, what kind of data might be breached from WMCS, that might be protected on prod. While the confidential data is unrelated to WMF's NDA, it is still confidential regardless.

Some of the technical security concerns in WMCS are historical and have since been fixed. I have an easier time sniffing network packets on a real LAN than I do on the same hypervisor on Cloud VPS at this point, to speak to some of what I think @Chicocvenancio is hinting at.

I should mention, just for your information that those with cloud root can access your data. That includes WMCS staff and a couple volunteers who have completed a process to get those privileges, including an NDA, to actively assist with projects related to the cloud as a whole (see https://wikitech.wikimedia.org/wiki/Help:Access_policies for more information on people with that level of access).

A bigger caution is that we make no guarantees in our terms of service (https://wikitech.wikimedia.org/wiki/Wikitech:Cloud_Services_Terms_of_use#What_disclaimers_does_the_Foundation_want_to_make?). We do our best to make it all safe and private because we care about that as a (small) team, but our legal statement is that it is basically best effort at this time.

A bit of advice: Our secrets management in Cloud VPS is somewhat rudimentary and you may want to provide your own better way of storing things like passwords (like password managers, etc) especially if you have data you care about. I suggest using cinder volumes for data that should persist beyond the life of a VM, too (https://wikitech.wikimedia.org/wiki/Help:Adding_Disk_Space_to_Cloud_VPS_instances#Cinder).

If you still like the idea after all that, I would also ask that you make it as clear as possible, possibly even with a click-through modal or something that this is not an official WMCS service and the responsibility for the data is on the project maintainers. All that sound good?

@Bstorm, could you please explain "Our secrets management in Cloud VPS is somewhat rudimentary and you may want to provide your own better way of storing things like passwords (like password managers, etc) especially if you have data you care about"?

Also, when you said "A bigger caution is that we make no guarantees in our terms of service" -- which guarantee were you referring to?

I'm saying that we don't provide a secure place to share passwords for servers, and we have one for puppet, but it involves setting up a plain text single-point of failure puppet server in your project https://wikitech.wikimedia.org/wiki/Help:Standalone_puppetmaster. That's what I'm referring to.

As for guarantees, we provide no guarantee of any kind in Cloud Services. Paid cloud providers often have guarantees of merchantability and security of the services you are paying for. We don't.

@Bstorm, I still do not understand why the discussion is about passwords -- are you talking about VPS root password? (Couldn't we disable root login and password-based login in favour of SSH?) Or is it about something else?

Re guarantees, are you referring to loss of data, or backups, or storage failures?

I'm talking about the password of your database server, mediawiki admin, etc. You cannot disable root, etc. as that is controlled by WMCS via puppet, and keeping puppet running is generally required. There are no ssh passwords (VMs are key only). I'm talking about what you use to design your services. All we provide is VMs. I was suggesting that you would probably want to be careful with that data. We don't have a KMS-type service like AWS and similar systems do.

I do not use AWS, so I would not know what KMS is supposed to be doing. Could you please answer about this, @Bstorm?

Re guarantees, are you referring to loss of data, or backups, or storage failures?

KMS would be a cryptographic key management system

Re guarantees, are you referring to loss of data, or backups, or storage failures?

I think what @Bstorm is saying is that WMCS does not store backups, so it would be our responsibility to take regular backups of the content in the virtual machine (in case a failure happens for any reason). That is, all they will give is a blank VM, and everything else is our responsibility.

Bstorm, Agastya's concern is basically whether the wiki can be as secure (in terms of the probability of leakage) as that of a production CU wiki, because the content in the wiki is as confidential (as @Acagastya explained in an earlier post) as that of a CU/steward wiki. This was the concern of @bd808 if I understand correctly.

Fair. I'm saying it could be pretty secure if you make it so. Putting it in Cloud VPS is replacing the Wikimedia SRE, network and security teams with yourselves. Cloud Services provides an infrastructure, and we try to make it good. From there, you have to ensure the wiki's security through best practices. That's what I mean.

Our team will not be responsible if the wiki is compromised, nor will we even know unless you tell us. It will not be monitored unless you provide monitoring and paging. A production CU/steward wiki is professionally maintained. If you are capable of maintaining this wiki on Cloud Services as well as the Wikimedia production admins maintain the production wikis, it would probably have similar security. If you are not, then it absolutely will not. We only provide infrastructure for you to deploy on. This is why we always caution people about putting private data on WMCS. Managing/protecting private data is hard, and you must do it yourself there.

I poke at the cloud systems for security problems every now and then, and I have pretty positive thoughts about some parts of it right now. However, your big issue on security of data is not the computer you run it on most of the time. It's keeping the computer safe from people who should not be on it. It is easy to make mistakes and sometimes hard to defends against network, brute force, social engineering and other attacks. The Foundation has procedures to defend against these things, but in this instance, it'll be you folks figuring it out. That might be ok for you! You may want a short answer, but I'm only able to give a complete answer here and be fully truthful.

By "infrastructure" I mean a blank virtual computer, per @Leaderboard 's comments. You have to build a database, web server, etc. and keep them safe.

By "infrastructure" I mean a blank virtual computer, per @Leaderboard 's comments. You have to build a database, web server, etc. and keep them safe.

I have another question - can you clarify whether we can run Wireguard though (for a VPN proxy) - rationale for this is given in T281600#7054120. I am aware that this is not allowed per the ToS, and hence need to check whether this can be exempted.

Otherwise I think we're OK to give this a try.

To clarify, right now multiple people have been writing news about events in the US -- citing various government websites. I am not sure if the problem exists if they are geoblocking the content, or my ISP is blocking, and I have been relying on PornHub's VPN for proxying from the US. I understand the current rules prohibit using WMCS to be used as a proxy, but given this circumstance, would it be okay to set up wireguard on the VPS? The use is minimal, and for the purpose of wiki only, to access otherwise geo-restricted info for the purpose of verification.

Re this comment, @Bstorm (T281600#7085513) thanks for clarifying. @Gryllida do you think you could find the dumps from the wikinewsie.org backup we got in July-August? If you do not know how to get the dump, maybe we can ask @Urbanecm to help you find it from the archive?

To clarify, right now multiple people have been writing news about events in the US -- citing various government websites. I am not sure if the problem exists if they are geoblocking the content, or my ISP is blocking, and I have been relying on PornHub's VPN for proxying from the US. I understand the current rules prohibit using WMCS to be used as a proxy, but given this circumstance, would it be okay to set up wireguard on the VPS? The use is minimal, and for the purpose of wiki only, to access otherwise geo-restricted info for the purpose of verification.

No. Using Cloud VPS as a proxy to access geo-blocked content is not an acceptable use.

No. Using Cloud VPS as a proxy to access geo-blocked content is not an acceptable use.

+1 This is not a grey area. It is simply not an acceptable use.

No. Using Cloud VPS as a proxy to access geo-blocked content is not an acceptable use.

+1 This is not a grey area. It is simply not an acceptable use.

<sigh> looks like PornHub's VPN is here to stay; in that case that can be considered as struck (so only two plans, rather than 3). <shrug>

Is there any other blocker we need to consider @Bstorm, or can we get this going, removing the proxy plan entirely?

Ok, I think we can create the project so you can create some VMs. What was the verdict on the floating IP? I'm a bit lost on that.

Ok, I think we can create the project so you can create some VMs. What was the verdict on the floating IP? I'm a bit lost on that.

I thought there was approval for that? At least I did not see any opposition after the rationale was given.

Just wanted to make sure you still wanted/needed it.

Just wanted to make sure you still wanted/needed it.

To clarify, while we won't be needed it immediately (as the immediate aim would be to get the wiki going), in future when/if we go for the mail server plan, it would be needed. That is, having the option would help, even if it is not used immediately.

@Leaderboard we can't have the mailserver there. So we can really drop that part of the request.

@Leaderboard we can't have the mailserver there. So we can really drop that part of the request.

If you say so, in which case the floating IP will not be needed...

aborrero added a subscriber: aborrero.

Please open a quota request later for the floating IP if you need it.