Page MenuHomePhabricator

Set up OpenRefine on Cloud VPS
Open, NormalPublic

Description

We would like to set up a public instance of OpenRefine. It would be useful for users who may not have the expertise or time to run OpenRefine locally, or for groups who want to use OpenRefine collaboratively.

We must be clear about the restrictions of this instance: OpenRefine has no access control mechanism as far as I’m aware, so this would be completely open to vandalism. We must advise users to regularly back up their projects; we should also set up automatic backups (similar to this), but since I don’t expect we’ll be able to provide an automatic restore mechanism, that should be a last resort, requiring manual assistance from the instance administrators.

(For a minimal degree of vandalism protection, perhaps we could at least disable the project listing, so that you need to know the link to a project before you can vandalize it.)

Due to the memory requirements of OpenRefine, as well as the desire to set up automatic btrfs snapshots (see “automatic backups” link above), I think this should be done as a Cloud VPS project, not a Toolforge tool. (A formal project request will be filed as a subtask later.)

Who is “we”?

Related Objects

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 15 2018, 2:46 PM
Abbe98 added a subscriber: Abbe98.May 15 2018, 4:23 PM

Note to self: for this we would need to rethink Wikidata authentication in OpenRefine, migrating it to OAuth. This would include adding OAuth support in Wikidata-Toolkit. This has not been done yet because OAuth is not suited for open source software that is run directly by the user on their own machine.

Yeah, that’s going to be tricky… for a first version it might be easiest to completely disable Wikidata authentication, so that users have to use QuickStatements instead :/

To clarify – the problem is not that the server needs to do the edits (which should be possible, AFAIU, although usually the edits are done client-side), but that software running on localhost can’t provide a useful redirect URL to the OAuth registration?

Edit: We also need to restrict each OAuth access token and secret to one browser session, even though the API requests will actually be made by the server. (Right?)

When running software on localhost, the client needs to have OAuth consumer credentials, which are supposed to be private. If I apply for an OAuth consumer for OpenRefine, I cannot put the credentials in OpenRefine's source code, because it would allow anyone to reuse them for any other application. So every user would need to go through the OAuth registration themselves (and then OAuth login).

https://stackoverflow.com/questions/27585412/can-i-really-not-ship-open-source-with-client-id

For hosted versions of OpenRefine the problem disappears, but indeed we need to be more careful with tying OAuth tokens to sessions.

Perhaps you could use an owner-only consumer for default installations? Those are tied to a single account and don’t need confirmation, so I think it might be possible to request them automatically (but I’m not sure if that’s a good idea).

Okay, I started setting up the server and OpenRefine is running. I haven’t set up any proxy yet, so for now you can only test it via SSH proxy:

ssh -L 3333:localhost:3333 openrefine01.eqiad.wmflabs

@Pintoch can you see if you’re able to access the server? Then we can figure out the next steps.

I’ve also started to document the project under wikitech:Nova Resource:Openrefine.

Vvjjkkii renamed this task from Set up OpenRefine on Cloud VPS to 4wcaaaaaaa.
Vvjjkkii triaged this task as High priority.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
1997kB renamed this task from 4wcaaaaaaa to Set up OpenRefine on Cloud VPS.Jul 1 2018, 2:30 AM
1997kB lowered the priority of this task from High to Normal.
1997kB updated the task description. (Show Details)
1997kB added a subscriber: Aklapper.
Addshore moved this task from incoming to monitoring on the Wikidata board.Sep 19 2018, 8:01 AM