Page MenuHomePhabricator

CloudVPS instance for ProVe
Closed, ResolvedPublic

Description

Feature summary (what you would like to be able to do and where): We're a group of researchers from King's College London, where we've developed ProVe (https://www.wikidata.org/wiki/Wikidata:ProVe) a tool for helping editors improve the references of Wikidata items.

We have a solid ~1K users and 17k API calls, and we'd like to have ProVe promoted as a Gadget. However, for this we need to move part of our infrastructure from the university's HPC cluster to Wikimedia infrastrucutre/CloudVPS, as we're told Gadgets shouldn't call external services.

A brief with the current architecture and what needs to be moved to CloudVPS is here.

We'd very much appreciate guidance on how to get started with a CloudVPS instance to get this part of the infrastructure moved. Thanks!

Use case(s) (list the steps that you performed to discover that problem, and describe the actual underlying problem which you want to solve. Do not describe only a solution): ProVe is a tool for helping editors improve the references of Wikidata items. In general, Wikidata item statements should be verifiable and referenced by a source (e.g. a book, scientific paper, etc.). However, a large number of Wikidata item statements do not provide any reference and, if they do, such references are of varying quality which tends to decay over time. ProVe helps with this by providing information about the quality of the references of Wikidata items, based on techniques such as large language models, triple verbalisation, and semantic similairty.

Benefits (why should this be implemented?): General improvement of Wikidata reference quality

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
LucasWerkmeister renamed this task from CluodVPS instance for ProVe to CloudVPS instance for ProVe.Oct 27 2025, 11:45 AM
LucasWerkmeister updated the task description. (Show Details)
JJMC89 changed the subtype of this task from "Feature Request" to "Task".

we're told Gadgets shouldn't call external services

Can you elaborate on this? Is there a written policy you're following, or a particular team or engineer who's providing this rule? I'm sure it's correct but having more specifics will help us decide how best to direct you :)

we're told Gadgets shouldn't call external services

Can you elaborate on this? Is there a written policy you're following, or a particular team or engineer who's providing this rule? I'm sure it's correct but having more specifics will help us decide how best to direct you :)

Sure, thanks @Andrew This comes from a few discussions in Tools/Potential Gadgets specifically here and here. In further discussions with Lydia and others we've learned this is mainly to protect user privacy (ie avoid their usernames written in external databases) but I don't think I've seen a written policy about it (perhaps there is and I just don't know it).

Hello again!

We don't have any objection to you opening a VPS project for this. I do suggest that you get more approval from whoever it is that will or won't sign off on your gadget; it might be that adding a caching layer in cloud-vps will be considered adequate for privacy protection, or it might be looked on as 'laundering' an otherwise unapproved backend.

Also, be aware that our terms of use includes this rule:

• Do not use WMCS as a network proxy.

It looks to me like your proposal is almost, but not quite, a network proxy :)

Hi Andrew,

Many thanks for your response. We are trying to find out whom to contact to move this forward and understand all that is involved. Do you know who we could contact to create a VPS project and get it approved? Also, who is responsible for signing off gadgets, etc?

I appreciate what you're saying about proxying and we absolutely want to adhere to all requirements. Our goal is to find a sustainable long-term solution that is compliant with all terms of use. That means not only understanding the requirements of provision but also the existing infrastructure to host the service, so we can make the project viable.

In the end, all we want is to provide a great service that supports the community in the best possible way.

Best,

Odinaldo

Hi Andrew,

Many thanks for your response. We are trying to find out whom to contact to move this forward and understand all that is involved. Do you know who we could contact to create a VPS project and get it approved? Also, who is responsible for signing off gadgets, etc?

You are in the right place! Francesco or I can approve the project request; we just want to make sure it will actually get you what you need!

+1 to creating a project with enough quota to support one g4.cores8.ram16.disk20 instance (per the requirements doc).

who is responsible for signing off gadgets, etc?

The decision on whether this can be approved as a gadget sits with the Wikidata community. The fact that a part of the code runs in Wikimedia Cloud might or might not be enough to convince them.

Thank you Francesco (and Andrew). We will satisfy the requirements and ensure everything is transparent to allay any community concerns.

What would the next steps be please?

Thank you Francesco (and Andrew). We will satisfy the requirements and ensure everything is transparent to allay any community concerns.

What would the next steps be please?

Who would be the admins of the project? (to add when creating)

@Odinaldo you'll need to create developer accounts (https://www.mediawiki.org/wiki/Developer_account), or if you have one already, you'll have to link it to your phabricator account (from the management link in the wiki page for the developer account).

Let me know when you have it or if you are having issues and I'll create the project

Dear @dcaro

I think we managed to create and link the developer accounts to the accounts 1. and 2. above. I hope we followed the right procedure. Please let me know if this is now OK. I really appreciate your support.

Thanks,

Odinaldo

fnegri changed the task status from Open to In Progress.Nov 25 2025, 5:47 PM
fnegri claimed this task.

@komla attempted to create this project, but the cookbook failed because of an unrelated issue: T410265: [tofu-infra] "tofu plan" failing in codfw.

To work around that issue, I will have to complete the project creation manually. I should be able to do this tomorrow.

Thanks for your patience.

Mentioned in SAL (#wikimedia-cloud-feed) [2025-11-26T10:23:49Z] <fnegri@cloudcumin1001> START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch (T408387)

Mentioned in SAL (#wikimedia-cloud-feed) [2025-11-26T10:24:41Z] <fnegri@cloudcumin1001> END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan+apply for main branch (T408387)

Mentioned in SAL (#wikimedia-cloud-feed) [2025-11-26T10:32:48Z] <fnegri@cloudcumin1001> START - Cookbook wmcs.vps.create_project for project prove in eqiad1 (T408387)

Mentioned in SAL (#wikimedia-cloud-feed) [2025-11-26T10:33:49Z] <fnegri@cloudcumin1001> END (PASS) - Cookbook wmcs.vps.create_project (exit_code=0) for project prove in eqiad1 (T408387)

Thank you all so much for your help on this!

@komla @fnegri could you also please add user Albertmeronyo / amp to the project? Thanks

@komla @fnegri could you also please add user Albertmeronyo / amp to the project? Thanks

Done!

You can also self-manage the list of users with access to the project by going to https://horizon.wikimedia.org/project/member/

Amazing, thank you so much! Much appreciated :)

Hi @komla and @fnegri, I was trying to access our instance using Cloud VPS bastion following this link https://wikitech.wikimedia.org/wiki/Help:Accessing_Cloud_VPS_instances and I keep getting the following error

channel 0: open failed: connect failed: No address associated with hostname

The DNS I'm trying to use is the following: svc.prove.eqiad1.wikimedia.cloud. That I got from https://openstack-browser.toolforge.org/project/19a3d846f067417898228b4b030ca0a3

I'm using the primary bastion bastion.wmcloud.org to access it. Am I doing something wrong? Or is there a problem with the DNS?

@NathanGavenski I'm on the run today but I glanced at the 'prove' project on Horizon and I don't see any VMs there, so it doesn't look like there's anywhere to ssh to. The domain you're seeing on openstack-browser is meant to be a container for future services (e.g. myinternalservice.svc.prove.eqiad1.wikimedia.cloud) ; it doesn't itself refer to any actual host or destination.

Hope that makes sense!

Thank you @Andrew and @fnegri.

It seems we have the project but no machines. How do we go about getting a virtual machine for this?

Thanks,

Odinaldo

You should be able to log in with your developer account credentials on https://horizon.wikimedia.org/project/ -- that is the web interface for managing things in your project.

Hi @Andrew and @fnegri.

We hope this finds you well. Thank you very much for all the assistance you have given us so far. We made some progress and have a version of the server up and running. However we still have some issues and we're not sure whom to turn to. Would you kindly be able to help us further or point us to someone who could please?

  1. As we mentioned we hope to turn this service into a gadget that users can load into Wikidata's editing pages and interact with the server for checking and verification of references. The server is working in cloudVPS and accepting connections via the API. https://prove.wmcloud.org/apidocs. However, the user interface, which is a javascript file https://www.wikidata.org/wiki/User:NathanGavenski/ProVe.js cannot connect to the server. We think this has to do with Wikidata (because the API works). We get the following error message:
Connecting to 'https://prove.wmcloud.org/api/items/checkItemStatus?qid=Q42395533' violates the following Content Security Policy directive: "default-src 'self' data: blob: upload.wikimedia.org https://commons.wikimedia.org meta.wikimedia.org *.wikimedia.org *.wikipedia.org *.wikinews.org *.wiktionary.org *.wikibooks.org *.wikiversity.org *.wikisource.org wikisource.org *.wikiquote.org *.wikidata.org *.wikifunctions.org *.wikivoyage.org *.mediawiki.org wikimedia.org". Note that 'connect-src' was not explicitly set, so 'default-src' is used as a fallback. The policy is report-only, so the violation has been logged but no further action has been taken.

Apologies for our inexperience with this, but we have tried to get some help in #wikimedia-cloud in libera.chat, without success. What is the right way to connect to the server from the Wikidata's javascript without violating the security policy directive?

Assuming that we can overcome the above issue, we have the following problem.

  1. We are testing the performance of the server and we have concerns with the lack of GPUs, which we need to speed up the inference process (not necessarily the checking of results). We see two possibilities for this and we would really appreciate your advice and expertise. A) Is it possible at all to launch a cloudVPS instance with a GPU? This would speed up the current server and would be a short term fix that would allow us to conduct some targetted workshops with end users. B) Eventually, we would like to leave the server in cloudVPS to deal exclusively with user interactions, hence satisfying the requirements for a gadget, and compute the inferences externally, from a host with more computing power. To realise this, we need to be able to inject results from the inference server into the database in cloudVPS, and make requests to the external host from cloudVPS. We know this is technically possible, using a user's private key. Would this be allowed? Is there a more elegant and user-independent solution to do this? In particular we are concerned that this type of connection has a fixed number of calls before the external server is blacklisted (cf. https://wikitech.wikimedia.org/wiki/Robot_policy. In fact, we have been blocked once, temporarily).

We understand this is all very specific, but we appreciate any help trying to realise this service and making it available to all users. If there is a better way to communicate, please let us know and we will work around your times.

Many thanks,

Odinaldo

Hi @Odinaldo

we have tried to get some help in #wikimedia-cloud in libera.chat, without success

There were some replies in the channel, that you might have missed because you no longer connected to the channel. If you don't have a way to remain connected to IRC you can also use the Telegram bridge for the same channel at https://t.me/wmcloudirc

[12:49:39] <NathanGavenski>	 Hey, I have a machine at VPS Cloud where I configured my external DNS to be able to use my API on a wikidata gadget I've been working on. I can access just fine using https://prove.wmcloud.org/apidocs/, but when I try to use it inside wikidata I get the error:
[12:49:40] <NathanGavenski>	 ```Connecting to 'https://prove.wmcloud.org/api/items/checkItemStatus?qid=Q42395533' violates the following Content Security Policy directive: "default-src 'self' data: blob: upload.wikimedia.org https://commons.wikimedia.org meta.wikimedia.org *.wikimedia.org *.wikipedia.org *.wikinews.org *.wiktionary.org *.wikibooks.org *.wikiversity.org
[12:49:40] <NathanGavenski>	 *.wikisource.org wikisource.org *.wikiquote.org *.wikidata.org *.wikifunctions.org *.wikivoyage.org *.mediawiki.org wikimedia.org". Note that 'connect-src' was not explicitly set, so 'default-src' is used as a fallback. The policy is report-only, so the violation has been logged but no further action has been taken.
[12:49:41] <NathanGavenski>	 Is there anything I can do on my end to prevent the CPS violation? Or is this a Wikidata directive?
[16:38:46] <wm-bb>	 <jeremy_b> I have no idea what you mean about DNS. CSP not CPS. what page is this happening on? (re @wmtelegram_bot: <NathanGavenski> Is there anything I can do on my end to prevent the CPS violation? Or is this a Wikidata directive?)
[16:45:02] <bd808>	 NathanGavenski: there are browser add-ons that can change the CSP protection locally for you. When you use a tool like that you are exposing your web application use to risks that the app authors wanted to protect you from (3rd party content interaction).

Is it possible at all to launch a cloudVPS instance with a GPU?

No, unfortunately it's not something that we currently offer in Cloud VPS. A few other people requested this and we might consider adding GPU support in the future, but it's not in our medium-term roadmap.

To realise this, we need to be able to inject results from the inference server into the database in cloudVPS, and make requests to the external host from cloudVPS. We know this is technically possible, using a user's private key. Would this be allowed? Is there a more elegant and user-independent solution to do this?

I would suggest creating a separate Phabricator task to discuss this, add the label "Cloud-VPS" and we'll try to loop in the right people to answer your questions.

If there is a better way to communicate, please let us know and we will work around your times.

IRC/Telegram are good ways, but don't expect an immediate answer: it might take a few hours for someone to reply. Phabricator is also a very good venue, but I would recommend creating more specific tasks with a short descriptive title that describes your problem. This task was about creating an instance and has been completed and marked as Resolved.

Thank you very much! I have just opened a new task with details and diagrams and will communicate via other tasks leaving this closed.

Regards,

Odinaldo