Page MenuHomePhabricator

Import should allow mapping of namespace names and aliases
Open, LowestPublicFeature

Description

Import data and the importing Wiki may contain differing and/or more or less name space names and name space name aliases. Currently, if adjustemns are needed, one needs to know that before the import is started, and has to alter the import data so as to match the importing wikis names space names and aliases. This can be cumbersome and time consuming for. It is error prone, and next to impossible for large automated imports.

We could add an option (checkbox) asking dor a stepwise approach like this:

  1. Show a list of all namespace names in the import and in the local wiki with

an automatically generated mapping suggestion.

  1. Allow the importer to adjust the mapping.
  2. Do the final import.

The downsides:
A) An uploaded file has to be preserved over some time including possibly
multiple data submissions by the importer.
B) The import file has to be read twice. It has to be read and analyzed in its
entirity during the 1st scan already since the the list of original namespaces
in the beginning does not deal with possible occurrences of
name space name aliases embedded in page data. Those need to be part of the
mapping, however.

The good sides:

  • Most flexible.
  • Often used mappings can be preserved and automagically be recalled by the

import process.

  • Step 1) could by the way reveal some statistics to the importer, allowing e.g. to not import implausible data.

This looks like a major revaming of the import code, however.


Version: unspecified
Severity: enhancement
URL: https://bugzilla.wikimedia.org/show_bug.cgi?id=30723#c6
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=62111

Details

Reference
bz41969

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 12:55 AM
bzimport set Reference to bz41969.
bzimport added a subscriber: Unknown Object (MLST).
TTO lowered the priority of this task from Low to Lowest.Sep 23 2015, 11:58 AM
TTO subscribed.

This would be a significant feature to add, not least because it would turn import into a two-step procedure. The dump would be uploaded to the server at the first stage, then MW would look at it and display a namespace choice form. Then, upon submission of that form, it would go back to the previously uploaded dump and actually do the import procedure. So among other things, it would require some kind of intermediate storage place for dumps.

What's more, while this feature would be nice in some situations, I don't think it would see a whole lot of use. The resolution of T32723 has helped with namespace matching.

So in summary, there is still some room for improvement here, so I won't close this task as declined. But this is very unlikely to be implemented unless we end up implementing the "intermediate storage place for dumps" as part of some other feature or bug fix.

The mapping could be input before the upload and import begins, but that has disadvantages:

  • users must know all original namespacenames
  • users must type original namespacenames and cannot be prompted
  • misspellings would have to abort the entire process after the initial dump lines are scanned

Advantages:

  • much more easily implemented
  • import is a single pass process

Of course, uploaders could copy&paste the original namespacenames from the lines at the beginning of dumps.

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 11:14 AM
Aklapper removed a subscriber: Purodha.