Page MenuHomePhabricator

Medium-trust mode for Special:Import using uploading user's credentials and current timestamp
Closed, ResolvedPublic

Description

Author: clamengh

Description:
Hello, I am admin at lmo.wikipedia.org. I would like the 'Import pages' feature
to be enabled, in such a way that we could upload multiple pages. At present the
error message while opening http://lmo.wikipedia.org/wiki/special:import is:
'No transwiki import sources have been defined and direct history uploads are
disabled.'
We would like to import pages exported (by the 'special:export' feature) from:
http://ca.wikipedia.org/wiki/special:export
http://en.wikipedia.org/wiki/special:export
http://fr.wikipedia.org/wiki/special:export.
Many thanks.


Version: 1.11.x
Severity: enhancement

Details

Reference
bz8319

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 9:34 PM
bzimport set Reference to bz8319.
bzimport added a subscriber: Unknown Object (MLST).

Site updated. Thanks for your patience.

robchur wrote:

I don't understand why Ashar is changing the summaries back - purely enabling
Special:Import is totally useless; one has to configure the allowed import
sources as well.

clamengh wrote:

Hi all, many thanks.
Unfortunately this is not exactly what I had asked: we would like to import XML
pages to Lombard wikipedia. i.e. we would like to make the reverse operation of the
special:export feature.
Summing up: we are preparing multiple pages (with titles, contributors, etc) in XML
format (the same in which they are generally exported) and we would like to upload
them with one oparation only: someone from wikitech list suggested to do so, rather
than using a robot.
I raise a bit the priority of this byg, since a user recently manually uploaded
about 1000 pages.
I am afraid I have to reopen this bug: this does not mean that I do not appreciate
your work, of course. Many thanks.
A happy new year to you all!
Claudi

clamengh wrote:

I add an example:

  1. I go to http://ca.wikipedia.org/wiki/special:export
  2. I type A,B,C
  3. I click 'export'
  4. I get this XML file:
  5. <mediawiki xmlns="http://www.mediawiki.org/xml/export-0.3/"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.3/
http://www.mediawiki.org/xml/export-0.3.xsd" version="0.3" xml:lang="ca">

  • <siteinfo> <sitename>Viquipèdia</sitename> <base>http://ca.wikipedia.org/wiki/Portada</base> <generator>MediaWiki 1.9alpha</generator> <case>first-letter</case>
  • <namespaces> <namespace key="-2">Media</namespace> <namespace key="-1">Especial</namespace> <namespace key="0" /> <namespace key="1">Discussió</namespace> <namespace key="2">Usuari</namespace> <namespace key="3">Usuari Discussió</namespace> <namespace key="4">Viquipèdia</namespace> <namespace key="5">Viquipèdia Discussió</namespace> <namespace key="6">Imatge</namespace> <namespace key="7">Imatge Discussió</namespace> <namespace key="8">MediaWiki</namespace> <namespace key="9">MediaWiki Discussió</namespace> <namespace key="10">Plantilla</namespace> <namespace key="11">Plantilla Discussió</namespace> <namespace key="12">Ajuda</namespace> <namespace key="13">Ajuda Discussió</namespace> <namespace key="14">Categoria</namespace> <namespace key="15">Categoria Discussió</namespace> <namespace key="100">Portal</namespace> <namespace key="101">Portal Discussió</namespace> <namespace key="102">Viquiprojecte</namespace> <namespace key="103">Viquiprojecte Discussió</namespace> </namespaces> </siteinfo>
  • <page> <title>A</title> <id>53</id>
  • <revision> <id>782992</id> <timestamp>2006-12-22T13:58:32Z</timestamp>
  • <contributor> <username>Thijs!bot</username> <id>3429</id> </contributor> <minor /> <comment>Robot afegeix: [[th:A]] esborra: [[sq:A]]</comment> <text xml:space="preserve">{{Alfabet llatí}} '''A''' és la primera lletra de

l'[[alfabet català]] i de la majoria dels llatins. Té el seu origen en l'[[alfabet
llatí]] el qual la pren de l'[[alfa]] de l'[[alfabet grec]] que la pren de l'[[alef]]
de l'[[alfabet fenici]], símbol que derivava del jerogrífic egipci d'un cap de bou.

Evolució=== {| align="center" cellspacing="10" |- align="center" |

[[Imatge:EgyptianA-01.png|Cap de bou egipci]]<br />Jerogrífic egipci<br />cap de bou |
[[Imatge:Proto-semiticA-01.png|Cap de bou proto-semític]]<br />Proto-semític<br />cap
de bou |[[Imatge:PhoenicianA-01.png|Alef fenici]]<br />''Alef'' fenici |
[[Imatge:Alpha uc lc.svg|65px|Alfa grega]]<br />''alfa'' grega |[[Imatge:EtruscanA-
01.png|A etrusca]]<br />A estrusca |[[Imatge:RomanA-01.png|A romana]]<br />a romana

} ===Diversitat de grafismes=== {align="center" cellspacing="10"- align="center"
[[Image:BlackletterA-01.png|Blackletter A]]<br />[[Blackletter]] A

01.png|Uncial A]]<br />[[Uncial]] A |[[Image:Acap.png|Another Capital A]] |-
align="center" |[[Image:ModernRomanA-01.png|Modern Roman A]]<br />Modern Roman A |
[[Image:ModernItalicA-01.png|Modern Italic A]]<br />Modern Italic A |
[[Image:ModernScriptA-01.png|Modern Script A]]<br />Modern Script A |}

Representacions alternatives=== {{Lletra |NATO=Alpha |Morse=·– |Character=A1

Braille=&#x2801; }} ===Símbols derivats=== *[[ª]] *[[À]] *[[Á]] *[[Â]] *[[Ã]] *[[Ä]]

*[[Å]] *[[Æ]] *[[&#258;]] *[[&#260;]] *[[∀]] ==Fonètica== En català es pronuncia com
a /a/ (la [[vocal oberta anterior no arrodonida]] de l'[[AFI]]) menys en els
dialectes orientals on en posició àtona es pronuncia com a ''vocal neutra'' o
[[schwa]] /ə/ (excepte a l'Alguer). Quan es troba en posició tònica i compleix les
regles d'accentuació s'escriu de les formes À, à.
[[Image:Latin_alphabet_Aa.png|right|A]] ==Significats de la A== *''[[Bioquímica]]'':
Abreviatura de l'[[Adenina]] i de l'[[Alanina]] o de la '''[[Vitamina A]]'''
*''[[Educació]]'': En els països anglosaxons representa la màxima qualificació,
equivalent a l'excel·lent *''[[Electrònica]]'': Abreviatura de l'[[Amper]].
*''[[Entreteniment]]'': *Representa l'as en la baralla de cartes francesa
*''[[Física]]'': Abreviatura habitual de l'àrea. *''[[Lingüística]]'': És un
[[prefix]] de privació o negació en la majoria de llengües amb influències
grecollatines (com per exemple a "anormal") *''[[Matemàtiques]]'' i ''[[lògica]]'':
La A invertida (∀) és el [[quantificador universal]]. *''[[Medicina]]'': Designa un
[[grup sanguini]]. *''[[Música]]'': Per als anglesos i els alemanys simbolitza la
nota musical [[la]]. En aquest sentit és molt poc utilitzat pels catalanoparlants. La
Cara A és la cara més important d'una cassette o un vinil, contenint usualment els
temes més populars *''[[Política]]'': Envoltada d'un cercle és el símbol de
l'anarquisme *''[[Preposició]]'': Introduint un complement de lloc, expressa
relacions que denoten la direcció del moviment. També indica lloc puntual en repòs:
M'he fet mal '''a''' la mà. M'he trobat un gos '''al''' carrer. S'usa davant de
topònims quan no van precedits per l'article determinat: Xavier viu '''a''' Alacant.
Tanmateix es prefereix '''en''' davant de locatius precedits per un indeterminat o
indefinit: He sentit això '''en''' una conferència. Podeu seure '''en''' aquella
cadira. *''[[Química]]'': Abreviatura habitual del [[nombre màssic]], el total de
neutrons i protons que conté un nucli atòmic. *''[[SI]]'': en minúscula símbol de
l'[[atto]]. *''[[Unitat de longitud]]'': En majúscula i amb un cercle al damunt (Å)
és el símbol de l'[[àngstrom]].És la lletra d'un tipus de paper internacional ([[din
A4]], [[din A5|A5]]) *''[[Vehicle]]s'': Indica procedència d'Àustria. A [[França]] es
col·loca als cotxes dels conductors novells, com un equivalent de la [[L]]
internacional {{commons|A}} [[Categoria:Alfabet llatí]] [[als:A]] [[ar:A]] [[arc:A]]
[[bs:A]] [[cs:A]] [[da:A]] [[de:A]] [[el:A]] [[en:A]] [[eo:A]] [[es:A]] [[et:A]]
[[eu:A]] [[fi:A]] [[fr:A (lettre)]] [[fur:A]] [[gd:A]] [[gl:A]] [[he:A]] [[hr:A]]
[[hu:A]] [[ia:A]] [[id:A]] [[ilo:A]] [[io:A]] [[is:A]] [[it:A]] [[ja:A]] [[ko:A]]
[[ku:A (herf)]] [[kw:A]] [[la:A]] [[lt:A]] [[nl:A (letter)]] [[nn:A]] [[no:A]]
[[nrm:A]] [[pl:A]] [[pt:A]] [[ro:A]] [[ru:A (латиница)]] [[scn:A]] [[simple:A]]
[[sk:A]] [[sl:A]] [[sr:A (латиничко)]] [[sv:A]] [[th:A]] [[tl:A]] [[tr:A]] [[uk:А
(літера)]] [[vi:A]] [[yo:A]] [[zh:A]] [[zh-yue:A]]</text>

</revision>
</page>
  • <page> <title>B</title> <id>389</id>
  • <revision> <id>767223</id> <timestamp>2006-12-13T20:54:38Z</timestamp>
  • <contributor> <username>Robbot</username> <id>57</id> </contributor> <minor /> <comment>Robot afegeix: [[eo:B]]</comment> <text xml:space="preserve">{{alfabet llatí}}La '''B''' és la segona lletra de

l'[[alfabet català]] i primera de les [[consonant|consonants]]. Ve de l'alfabet llatí
i evoluciona de la segona lletra de l'[[alfabet fenici]]: beth (casa).
[[Image:Latin_alphabet_Bb.png|right|B]] A l'antigor representava [[Barcelona]] i es
cosia en els draps cosits en aquesta ciutat. ==Fonètica== En català epresenta
l'[[oclusiva bilabial sonora]], representada com a /b/ en l'[[AFI]]. Si es troba
entre dues vocals, però, sona de manera menys explosiva del que és habitual /&beta;/.
En canvi a final de síl·laba i no seguit d'una consonant sonora s'ensordeix i es
pronuncia /p/ (àrab). El seu nom és ''be''. Abans d'una '''l''' i després d'una
síl·laba tònica es pronuncia doble (/bb/): moble, poble... ==Significats de la B==
*''[[Astrologia]]'': Representa el símbol dels bessons. *''[[Biologia]]'': Un dels
[[Grup Sanguini|grups sanguinis]] del sistema A0B. *''[[Cronologia]]'': En el
[[calendari]] és la segona lletra que representa el [[dilluns]].
*''[[Educació]]:''Usada com a sistema de qualificació escolar, als països anglosaxons
equival al notable i a l'[[educació primària]] espanyola s'identifica com una feina
que "està bé" *''[[Física]]'': En majúscula represente el [[Nombre Quàntic|nombre
bariònic]].És la lletra del color [[blau]] en el sistema de codificació RGB
*''[[Informàtica]]'': En minúscula, '''b''' és el símbol del [[bit]]. En
majúscula , '''B''' és el símbol del [[byte]]. *''[[Matemàtiques]]'': Els romans
utilitzaven aquest símbol amb el valor de tres-cents i amb una ratlla horitzontal al
damunt equivalia a tresmil. *''[[Música]]'': En la música antiga representava la nota
musical ''si''. Avui dia els alemanys l'utilitzen per representar el ''si bemoll''.
La cara B de les cassettes i vinils contenia versions de temes populars
*''[[Química]]'': Símbol del [[bor]]. També designa una vitamina *''[[Vehicle]]''s:
Identifica els ciutadans de [[Bèlgica]] {{commons|B}} [[Categoria:Alfabet llatí]]
[[als:B]] [[ar:B]] [[arc:B]] [[bs:B]] [[cs:B]] [[cy:B]] [[da:B]] [[de:B]] [[el:B]]
[[en:B]] [[eo:B]] [[es:B]] [[et:B]] [[eu:B]] [[fi:B]] [[fr:B (lettre)]] [[gd:B]]
[[gl:B]] [[he:B]] [[hr:B]] [[hu:B]] [[id:B]] [[ilo:B]] [[io:B]] [[it:B]] [[ja:B]]
[[ko:B]] [[kw:B]] [[la:B]] [[nl:B (letter)]] [[nn:B]] [[no:B]] [[pl:B]] [[pt:B]]
[[ro:B]] [[ru:B (латиница)]] [[simple:B]] [[sl:B]] [[sv:B]] [[tl:B]] [[tr:B]]
[[vi:B]] [[yo:B]] [[zh:B]] [[zh-yue:B]]</text>

</revision>
</page>
  • <page> <title>C</title> <id>549</id>
  • <revision> <id>769753</id> <timestamp>2006-12-15T19:48:56Z</timestamp>
  • <contributor> <username>TXiKiBoT</username> <id>3050</id> </contributor> <minor /> <comment>Robot modifica: [[el:C]]</comment> <text xml:space="preserve">{{vegeu|la lletra C|Llenguatge C}} {{alfabet llatí}}

La '''C''' és la tercera lletra de l'[[alfabet català]] provinent del [[alfabet
llatí|llatí]]. El seu orígen gràfic està en símbols [[fenici]]s per representar un
arma que es llençava. Al llatí es va arrodonir i es va diferenciar de la G.

Fonètica== En català representa l'[[oclusiva velar sorda]] /k/ de l'[[AFI]] davant

de les [[vocal|vocals]] ''a'' (''casa''), ''o'' (''cosa''), ''u'' (''cullera''), i
les [[consonant|consonants]] ''r'' (''creu'') i ''l'' (''clau'') i en posició de
tancament de [[síl·laba]] (''acte'', ''sac''), i el fonema de la [[fricativa alveolar
sorda]] /s/ davant ''e'' (''cera''), ''i'' (''ciri''). Cal recalcar que si el
foenma /k/ va seguit d'una lletra ''[[u]]'' que representa el fonema /w/, aleshores
el primer d'aquests [[fonema|fonemes]] no es representa per la lletra ''c'' sinó per
la lletra [[q]] (per això ''cua'' va amb ''c'' i ''quadre'' s'escriu amb ''[[q]]'').
També forma part de dos [[dígraf|dígrafs]] fonètics: [[dígraf nc|nc]] i [[dígraf
ng|ng]]. En l'[[escriptura]] [[català|catalana]] prenormativa també formava part del
dígraf [[dígraf ch|ch]]. ==Significats de C== * ''[[Bioquímica]]'': en majúscula,
símbol de la [[citosina]] i de la [[cisteïna]][[Image:Latin_alphabet_Cc.png|right|C]]

  • ''[[Economia]]'': Es posen dues C per designar compte corrent i abreuja els

cèntims.* ''[[Física]]'': en minúscula, una de les constants absolutes, es igual a la
[[velocitat de la llum]]. *''[[Educació]]'': Als països anglosaxons, representa
l'aprovat en les qualificacions escolars * ''[[Informàtica]]'': el [[llenguatge C]]
es un [[llenguatge de programació]] *''[[Lingüística]]'': En [[gramàtica]] designa
habitualment un complement sintàctic. En alguns manuscrits llatins és una abreviatura
de "cum" * ''[[Matemàtiques]]'': En el [[sistema de numeració]] romà, signe que el
número [[cent|100]]; amb una ratlla al damunt representa el ''cent mil''. A més és la
abreviatura de "centi-" en les unitats de mesura * ''[[Música]]'': en la notació
germànica, nota [[do]]. *En els països anglosaxons, és el símbol de l'aprovat en les
qualificacions escolars * ''[[Química]]'': en [[majúscula]], símbol del [[carboni]].
Designa també una [[vitamina]] molt present als cítrics *''[[Símbol]]s'': Amb un º al
davat designa els graus Celsius de [[temperatura]]. Si en canvi està dins del
[[cercle]], significa "copyright", és a dir, [[propietat intel·lectual]] registrada
{{commons|C}} [[Categoria:Alfabet llatí]] [[af:C]] [[als:C]] [[ar:C]] [[arc:C]]
[[bs:C]] [[co:C]] [[cs:C]] [[da:C]] [[de:C]] [[el:C]] [[en:C]] [[eo:C]] [[es:C]]
[[eu:C]] [[fi:C]] [[fr:C (lettre)]] [[gd:C]] [[gl:C]] [[he:C]] [[hr:C]] [[hu:C]]
[[id:C]] [[ilo:C]] [[io:C]] [[it:C]] [[ja:C]] [[ko:C]] [[kw:C]] [[la:C (littera)]]
[[nl:C (letter)]] [[nn:C]] [[no:C]] [[pl:C]] [[pt:C]] [[ro:C]] [[ru:C (латиница)]]
[[simple:C]] [[sk:C]] [[sl:C]] [[sv:C]] [[th:C]] [[tl:C]] [[tr:C]] [[vi:C]] [[yo:C]]
[[zh:C]]</text>

</revision>
</page>
</mediawiki>
  1. I translate Catalan into Lombard, getting a new file, say LMOimportABC.xml
  2. Then I would like to upload LMOimportABC.xml to Lombard wikipedia, turning it into

the three pages A,B,C.(Of course we are dealing with much more massive actions, say
1000 pages with one upload)
Thank you once more,
Claudi

I allowed the sources so it should work in Special:Import :)

What's wrong when you are doing it ?

robchur wrote:

The user wants to import from XML, but we don't allow that on Wikimedia wikis.

robchur wrote:

Er, that should have been "...import from XML via upload..."

Try it with Special:Import , not with Special:Upload !

ayg wrote:

Uploading such XML files is not allowed on Wikimedia wikis, because it allows
you to entirely forge authorship with no trace.

Ashar, different issue, never mind. :)

Occasionally we've set up the XML-file-upload import for wikis moving content
from an external site, but that's not really needed for the internal sites. The
transwiki source setup should already work effectively for those.

clamengh wrote:

Hi all, many thanks. I was wandering about authorship as well, indeed. However,
let us narrow a bit the problem: in about a months I will have about 3000
articles to upload, and I will have written them by myself, so this case could
be included in the 'occasionaly we've set the XML-file-upload...' stated by
Brion. There probably should hold the restriction that the formal author should
be the user who uploaded the XML file via special:import (no <user> tags
allowed). The 'site' is de facto external, since once the articles have been
translated or transformed, they no longer belong to an internal site.
So I guess I should use a robot instead, isn't it? Or, otherwise, could someone
(e.g. Brion) upload the XML file for me, please? This should happen about thrice
a year for the next two years. It remains to deal with the same job when made by
users who are not admin, though. Many thanks. Cheers,
Claudi

ayg wrote:

Reopening for Brion to consider.

clamengh wrote:

Hi all, it looks like that 'pagefromfile' robot should work for uploading
multiple pages. However, I would like to pose you another question, please: does
the device Ashar made up allow importing more than one page with one operation
(in this case images)? If so (or if it can be adapted to do so), it would be
very useful, since we have a number of paper about Catalan municipalities yet
without images; this is almost surely because those articles had been the first
ones made up at Catalan wikipedia, so they didn't upoload the images to Commons
(as they currently do, as far as I know): so this is a sort of 'transitional
device' to solve a problem which in the future will diminish in importance, but
is still impèortant enough.
Thank you.
Cheers,
Claudi

clamengh wrote:

Sorry, I forgot: could you please add en: to the sources anyway? Many thanks.
Cheers,
Claudi

river wrote:

someone seems to have added en to lmowiki import sources already...

clamengh wrote:

Thank you! Now it would be very fine if some way to import articles by
administrators could be configured.
I suggest something like this:
A START TAG

THE WIKI CODE OF THE ARTICLE

AN END TAG

Some kind of comment tag would be also welcome: for instance it would allow to
credit the author if different form the administrator who uploaded the articles.
Thank you,
Claudi

robchur wrote:

We've already discussed this; it's not typically done for the reasons cited above.

clamengh wrote:

Yes, the XML -it is clear- shouldn't be activated for the reasons above. But,
maybe the device: START TAG+
THE WIKI CODE OF THE ARTICLE + the END TAG and, at this point: AUTHOR=WHO
UPLOADED THE PAGES, should be fine: indeed the same goal is achieved by the
pagefromfile pyhthon robot, but by using much more time. Thank you,
Claudi

robchur wrote:

Accepting an XML file which asserts authorship that can't be confirmed is not at
all the same as allowing users to upload via a robot, where authorship is
confirmed in the same manner as a normal edit.

clamengh wrote:

Yes, I do agree. Thus, if it is technically feasible, we could enable import
from text file with the only <page> and </page> tags allowed.
All in all, I am the author of most articles to be imported, so I would import
them as the author, which is correct. Another user has written so many articles
in the same fashion that he/she could be made admin and import pages.
The key question is: "is import-pages faster than python"? If the answer is yes,
then we should go on discussing, since this feature looks like to be very
important for a project in an endangered language as Lombard is; if the answer
is no, then, in fact, the python "pagefromfile" robot should be used instead,
and we could close this bug.
Thank you, Claudi

river wrote:

changing summary, this feature can't be enabled until someone would write the code.

clamengh wrote:

Yes, that's correct. I am ready to help, but I am afraid I do not have know how enough: anyway, please let me know if I can do anything. However, couldn't we adapt the code already operating at EN.WP? Would this adaptation be difficult? Thank you. Bests,
Claudi

Changing summary to something that doesn't make me totally confused. :)

A medium-trust mode for Special:Import was planned but not yet implemented, using the uploader's account information and present timestamp, thus visibly marked as an import with no possibility of past-credential-forging.

clamengh wrote:

Wonderful! So, please let me know if I can help anyway. Bye, Claudi

robchur wrote:

(In reply to comment #23)

A medium-trust mode for Special:Import was planned but not yet implemented,
using the uploader's account information and present timestamp, thus visibly
marked as an import with no possibility of past-credential-forging.

I've got an implementation of this hanging around, but I thought I'd better document the main concern with it, which is the "use of present timestamp(s)".

This implies that the imported revisions should all assume the current time, however discussion with Brion led to the conclusion that this caused an undesirable pile-up of revisions all with the current date/time. So some clever decision needs to be made about how to treat this.

I'll dig up the actual patch file and attach it to this bug.

clamengh wrote:

Hello, thank you!
Provided it's technically feasible, I propose the following:
During each import session

  1. The file with multiple uploads is acquired by the server and put in some kind of "buffer"
  2. The articles are created, say, at a time interval of 1 minute each one from the following.

Best regards,
Claudi

robchur wrote:

(In reply to comment #26)

Provided it's technically feasible, I propose the following:

It's not particularly, no.

  1. The articles are created, say, at a time interval of 1 minute each one from

the following.

What happens if someone edits the page in between? If we accept the request, how do we notify the uploading user of the problem? This could theoretically lead to a denial of service vector, whereby somebody uploads a multiple-revision file which then effectively blocks editing of a page for a period of time due to pending revisions.

I don't like the idea of delaying anything at all; we should probably think of some clever way of merging multiple revisions together.

clamengh wrote:

Hello, thank you.
So far, I cannot imagine anything else. I'll go on thinking. Perhaps, if the trouble is about the pile-up of revisions all with the current date/time, we could provide the following:

  1. the first article should be dated at current date/time;
  2. the second at 1 minute before

...
n) the n-th at (n-1) minutes before

etc.
The main concern with this is that it wouldn't be -strictly speaking- true. Yet it would be "almost" true and the articles could be immediately edited by everybody. Also, "minutes" could be replaced by "seconds", and this fact could be automatically made explicit by an edit comment like "this article has been predated of X minutes/seconds due to technical reasons."
Thank you.
Bye,
Claudi

importupload has existed for a while; open new bugs if you have issues with it.