Page MenuHomePhabricator

[M] Investigate: Do we need to allow IP actor creation for imports?
Closed, ResolvedPublic

Description

Backgound

For temporary accounts, we'd like to disable the creation of actors with IP addresses as names: T345578: Ensure that an IP address cannot be saved permanently if IP Masking is enabled

However, there may be a legitimate case for creating an IP actor: when importing a page that has revisions by an IP actor (from a time before temporary accounts were enabled).

Example

Here's an example of a page that was initially imported from another wiki, preserving the revision history:

https://de.wikipedia.org/w/index.php?title=Sema_%C5%9Eim%C5%9Fek&action=history

The older revisions are assigned to the original authors. Some are foreign users, and others are (local) IP actors, which would need creating if they didn't already exist.

What we want to know
  • What ways are there are to import from other wikis?
  • Which of these create IP actors?
  • Do they need to create IP actors?
    • If not, what can they do instead?
    • If so, how can we allow them to create IP actors even though it's generally disallowed?

Event Timeline

What ways are there are to import from other wikis?

MediaWiki core:

Other:

Sources:

Which of these create IP actors?

TL;DR: all that are able to import revisions from an IP actor.

In general

  • When a revision is inserted, if its author doesn't exist as an actor, a new actor is created
  • An IP actor will be created when IP edits are just assigned to an IP address as normal (and that IP address doesn't already have an actor)

Specifically

  • SpecialImport and ApiImport create IP actors via WikiImporter
    • WikiImporter::processRevision (and processLogItem, processUpload) add an external wiki prefix to a username, but not to an IP
  • importDump.php and importTextFiles.php create IP actors via ImportableOldRevisionImporter
    • ImportableOldRevisionImporter::import sets the revision's user from the ImportableOldRevision's user text, which presumably could be an IP
  • importImages.php seems to fail for images that don't have a user account
Do they need to create IP actors?

Edited and moved to T350155#9320733

Change 973219 had a related patch set uploaded (by Tchanders; author: Tchanders):

[mediawiki/core@master] Don't assign imported revisions to local IP actors

https://gerrit.wikimedia.org/r/973219

Change 973222 had a related patch set uploaded (by Tchanders; author: Tchanders):

[mediawiki/core@master] DNM Demonstrate adding flag to skip IP validation in ActorStore

https://gerrit.wikimedia.org/r/973222

Do they need to create IP actors?

Here are some alternatives and notes on each one:

Assign foreign IP edits to a "foreign" IP address, e.g. 'en>1.2.3.4' (https://gerrit.wikimedia.org/r/973219) - my favourite

  • Doesn't introduce weird new concepts to the codebase: we have infrastructure for creating external users with un-creatable usernames
  • Would mean that, instead of forcing IP revisions to be assigned to a local actor, we now force them to be assigned to a foreign actor (either way, they could never be controlled by Special:Import's "Assign edits to local users where the named user exists locally" option)
  • A foreign IP user doesn't really make sense in the way that a foreign user does: Two different people could have registered the same username on different wikis, so it makes sense to disambiguate between local user 'Foo' and foreign user 'en>Foo'. However, an IP address is not supposed to represent a single person - just an IP address, which is the same thing from wiki to wiki.
  • However, IP users have local Special:Contributions pages, and doing this would mean that contributions are only recorded on the original wiki. Currently importing revisions by an IP users creates a local history for them, which isn't easy to trace back to the original wiki

Assign foreign IP edits to some anonymous system user

  • Presumably we import history for a reason.
  • Is the main reason to credit the original authors?
  • Do IPs need to be credited - legally perhaps?
  • Do we need the IP authors for patrolling work, or is it enough to patrol on the original wiki?

Allow IP actors to be created by import scripts when temporary accounts are enabled (https://gerrit.wikimedia.org/r/973222)

  • We'd be introducing the idea that actor name validation depends on the caller
  • I'm not sure if it's even desirable that imported revisions are assigned to local IP actors (see above)

Disabling temporary accounts for the duration of an import

  • This seems reasonable for the maintenance scripts, so I haven't tried to fix them (also they seem more concerned with restoring a backup)
  • This isn't a workable option for SpecialImport and ApiImport, which could be used at any time

Assign foreign IP edits to some anonymous system user

  • Presumably we import history for a reason.
  • Is the main reason to credit the original authors?
  • Do IPs need to be credited - legally perhaps?
  • Do we need the IP authors for patrolling work, or is it enough to patrol on the original wiki?

IANAL, but I'd say this option is a non-starter. Imports are done for attribution, and attribution is required by the CC BY-SA licenses that WMF-run wikis use.

Patrolling of imported edits isn't done in the traditional sense unless something like copyright issues are found. It would be more of a review of the imported content (last revision).

File imports are done frequently to move images from wikis to commons. This is done when the image is released under a compatible license and so can be added to commons instead of left on the local wiki for a specific 'fair use' use.

This is done via the FileImporter extension on WMF wikis (https://www.mediawiki.org/wiki/Extension:FileImporter). This will move revisions on the file which can be made by temporary accounts or IP addresses if they edit the description of the image.

Change 973359 had a related patch set uploaded (by Tchanders; author: Tchanders):

[mediawiki/extensions/FileImporter@master] Don't assign imported revisions to local IP actors

https://gerrit.wikimedia.org/r/973359

@Dreamy_Jazz Thanks - I've suggested an update to FileImporter (https://gerrit.wikimedia.org/r/973359) along the lines of:

Assign foreign IP edits to a "foreign" IP address, e.g. 'en>1.2.3.4' (https://gerrit.wikimedia.org/r/973219)

After some further discussions, here's what we'll do:

Allow IP actors to be created by import scripts when temporary accounts are enabled (https://gerrit.wikimedia.org/r/973222)

With some improvements to that patch:

  • Rather than add importing special-casing everywhere, add import-aware ActorStore and RevisionStore classes
  • Allow IPs to be affected by the "Assign edits to local users where the named user exists locally" flag (this is a feature enhancement, so maybe do separately)

Change 979991 had a related patch set uploaded (by Tchanders; author: Tchanders):

[mediawiki/core@master] WIP Draft: Introduce import-aware ActorStore and RevisionStore

https://gerrit.wikimedia.org/r/979991

Change 973219 abandoned by Tchanders:

[mediawiki/core@master] Don't assign imported revisions to local IP actors

Reason:

https://gerrit.wikimedia.org/r/973219

Change 973359 abandoned by Tchanders:

[mediawiki/extensions/FileImporter@master] Don't assign imported revisions to local IP actors

Reason:

https://gerrit.wikimedia.org/r/973359

Change 973222 abandoned by Tchanders:

[mediawiki/core@master] DNM Demonstrate adding flag to skip IP validation in ActorStore

Reason:

https://gerrit.wikimedia.org/r/973222

After some further discussions, here's what we'll do:

Allow IP actors to be created by import scripts when temporary accounts are enabled (https://gerrit.wikimedia.org/r/973222)

With some improvements to that patch:

  • Rather than add importing special-casing everywhere, add import-aware ActorStore and RevisionStore classes
  • Allow IPs to be affected by the "Assign edits to local users where the named user exists locally" flag (this is a feature enhancement, so maybe do separately)

Now we know what we're doing, I'll close this investigation and tag the patch https://gerrit.wikimedia.org/r/979991 against a task for making the changes: T354207: Allow IP actors to be created for imports when temporary accounts are enabled