Page MenuHomePhabricator

Special:Import should skip revisions with identical content to the current revision
Open, Needs TriagePublic

Description

Special:Import currently skips importing revisions when there's an identical revision on the wiki being imported to. However, if a revision being imported has identical content to the current latest revision, but a different author and later timestamp, it gets imported anyways. These revisions are generally not useful, especially when they correspond to some log action such as (un)protection or undeletion, and therefore should be skipped by default. Especially for widely-used templates, this also avoids generating potentially millions of useless update jobs on the queue.

See https://en.wikipedia.org/w/index.php?title=User_talk:MusikAnimal&oldid=952194241#Imports_of_templates/modules and https://en.wikipedia.org/wiki/Special:PermanentLink/951242782#Using_steward_access for a case on en.WP that this would have helped with.

Event Timeline

Can you clarify the specific circumstances in which you would want a revision for import to be skipped? What would the person by trying to do, import the current revision of a page from elsewhere or potentially multiple revisions?

It seems that update jobs should not be queued when there is no change between current and new revisions; that would address one of the concerns.

The specific circumstance is where the imported revision is newer than the current local revision of the page, and would thus replace it and trigger jobs. In this case, if the revision text is identical, there is no point in importing the revision. This can happen if you imported a template or module at some point in the past, and then on the source wiki an edit was made and undone, or the page's protection was changed, or a similar action was performed that generates an empty revision/diff (which will then have a different, later timestamp and (often) a different editor from the older imported revision), and then you reimport the current revision of the page. If the revision to be imported is older than the current local revision, it doesn't really matter if it gets imported or not, and if the export contains the full page history instead of only the current revision, it should be imported regardless (and/or an option to skip importing these revisions should be provided).

It seems that update jobs should not be queued when there is no change between current and new revisions; that would address one of the concerns.

That would help some, yes. I think the annoyance of adding empty revisions is great enough to look into this even without considering the job queue impact, though.

So to be very clear, this report is only about importing the current revision of a page if it is the same as the existing top revision on the local wiki. I am unsure as to whether skipping such revisions ought to be the default, but it could at least be a checkbox.

Yes. Sorry if it seems like I talked around that instead of just saying it outright.