User Details
- User Since
- Dec 26 2014, 1:37 PM (524 w, 6 d)
- Availability
- Available
- LDAP User
- Uzume
- MediaWiki User
- Unknown
Tue, Dec 31
It actually has been happening for a rather long time but the frequency has certainly increased as Commons has grown. To actually fix it, the issue is a Commons issue and not a PRP one and thus should get retagged/reassigned there. There are numerous tickets about mentioning things like PDF/DjVu having 0x0 size that are essentially the same thing. This seems to just be the latest duplicate of such.
Mon, Dec 30
@Jan.Kamenicek Yes, but what exactly do you want PRP to do about that? About the only thing it might be able to do would be to improve the error messaging under those circumstances. PRP has no control over how media are hosted, remotely on Commons or otherwise. The problem description does define such things but it does not really specify what you want to happen. Also this issue is current tagged/attached to PRP so presumably you are looking for a solution in that space but short of potentially improving some error messaging there is really nothing PRP can do about the situation.
This is a fairly common issue with media from Commons and is not really a PRP issue itself. PRP requires metadata about the media to be available and correctly yields errors when it cannot get such. The typical workaround it to issue a purge on the media on Commons forcing an update there and all sites using Commons to remote host media.
Dec 8 2024
Dec 4 2024
I do not see an issue when I browse to the URL listed in the description of this issue. It should be noted that the page does not currently exist but the editor and image on the right appear just fine if one clicks the "Utwórz" ("Create") link in the toolbar (for me in the upper right). One can additionally also see the appropriate image by selecting the "Grafika" ("Image") link from the toolbar (for me upper left). I personally have my UI set to English in global preferences but I provided link labels for Polish (uselang=pl) since that is likely the default UI language for most users of that site.
Nov 14 2024
I do not think we should be supporting Internet Archive collections, as per se, however, based upon the IA identifier from the provided description, this really about multiple scanned objects available at a single IA identifier.
Nov 13 2024
Since this is related to the integration of the text layer using _djvu.xml, and seems to happen when there is a mismatch in the number of pages, this is likely related to T194861 (and the numerous other tickets merged/closed as duplicates of that).
There haven't been any new uploads in over 30 days so you won't find anything in commons:Special:RecentChanges (e.g., the recent-uploads link at the top of the tool page).
Nov 6 2024
T16711#193346 states the ability to filter logs by namespace via the API was removed but T16712#193524 states it was added back in https://gerrit.wikimedia.org/r/135283. Apparently it was merged smoothly and a brief perusal of the code seems to imply the code is still there.
Oct 29 2024
As long as we primarily depend on WMF local copies and remote IIIF images are only used as an option, it sounds like a good idea. Having situations where we depend on WMF remote data (i.e., images, etc.) without any local fallback seems problematic.
Oct 28 2024
@Samwilson: I am closing this as resolved based upon your aforementioned PR being merged on 2024-07-16:
Oct 15 2024
Aug 12 2024
But those are different files scanned from different sources. পথের পাঁচালী.djvu is not even originally from Internet Archive.
Aug 11 2024
I agree that in general there is little advantage to creating DjVus from PDFs but sometimes people prefer such formats. PDF technology has now subsumed most of the advantages DjVu previously had. Unfortunately this now means PDF is a very large and complex set of specifications and it is hard to know how any single PDF is constructed without analysis by digital tools.
I do not believe this is an IA Upload issue as it is not specific to IA Upload nor to DjVu as it happens with PDFs. This is a common issues with Commons in general. The workaround it is to purge the file on Commons (and sometimes a null edit too) to reset its media metadata. Sometimes such things also have to be done on a local wiki that uses Commons too (e.g., on a Wikisource site, etc.)
I too am looking forward to scandata.xml addToAccessFormats page filtering. That would get rid of the irritating color card and white card pages often included at the end of many scans (but I have seen them in the middle of book scans too).
Jul 25 2024
I am also interested in this behavior and I also do not want to see the return of magic links (in fact I think it would be better if magic links were removed from MediaWiki and stuffed into an extension instead of just disabled by default). That said, I would rather see ISSN links as well as Special:BookSources ISBN links functionality moved into an extension and not part of the MW core. Perhaps an extension that allows generic identifier links via "template" pages would be best. Then it could also subsume things like https://isin.toolforge.org/ for ISIN which currently has/uses German and English "templates" at Benutzer:Magnus Manske/ISIN and User:Magnus Manske/ISIN respectively, etc. The list of allowed "template" pages could be listed in a MediaWiki: namespace page so random people could not introduce link bait and I also suggest using a prefix for Special:XYZZY/idname/id and Project:XYZZY/idname where XYZZY is rhe extension name and idname is ISBN, ISSN, etc. MediaWiki:XYZZY-allowed-ids or such would determine which idnames would be mapped. Anyway--just some thoughts on the subject.
Jul 20 2024
FYI: Here is another example: One of a Thousand/Preface. I suppose I could try to alter the order of the pages in the media (DjVu) except based upon research, I believe this was a printing error in the actual book. That is why I hesitate to change it. As a workaround, I am transcluding individual pages out of order without using the <pages/> tag (i.e., via Template:Page). I tried using individual pages out of order with multiple <pages/> tags but then the text across pages was broken (some sort of vertical spacing separation).
May 15 2024
FYI: In terms of the error reporting bug from this issue, the following seems to be applicable:
I agree the lack of useful error reporting to know about when this happens is certainly a bug but Proofread Page (PRP) is specifically for media-backed transcriptions. As far as I know it has never supported transcription without being backed by some File media. So in that way I would argue this is not a bug. Most Wikisource sites do support other forms of transcription (and even translation, etc.) not involving PRP (in addition to supporting PRP-based).
May 14 2024
May 13 2024
I agree—this is expected behavior. Moving forward there are a few options:
Apr 27 2024
@PerfektesChaos: I am not sure I am really looking to violate any explicit limitations but avoiding tripping them is good. It is not a bad idea to use mw.loadData() to wrap the pageLang as a workaround to prevent increased "expensiveness" but mw.loadData() validates the return value and it does not allow functions and metatables as mw.title objects like what is returned by mw.title.getCurrentTitle() provides.
Apr 26 2024
This is an interesting problem however, this thread is getting into TL;DR territory (I need to come back to this and read everything). However, referring to the problem statements in the issue's description:
- issue no. 2 can be mitigated by using a single template in a fashion like {{table|start}} ... {{table|end}}
I am glad to see this long overdue implementation finally arrive, however, it was made "expensive" regardless of whether other related title information was already fetched about the same target page during the parser rendering of the current page. This is substantially different from the PAGELANGUAGE variable which does not appear to be "expensive" (I only looked at the documentation and not the code so far).
Apr 21 2024
@Xover: Thanks for pointing that out. For some reason I missed that reference.
Apr 17 2024
<pagelist/> is just a fancy way of generating links to the pages (which you can actually do by hand if you want; this is somewhat still necessary when one has a group of media like a collection of JPEG images, etc.).
Apr 16 2024
If the issue goes away with a purge there isn't really an issue with the page anyway. The issue is with a stale page cache. If you manage to update a page with such an error in order add processing to catch and categorize such an error, then the error would already have gone away because you necessarily had to update the cached page by editing the page either directly or through one of its transclusions.
Apr 6 2024
I assume this also breaks (via the quoted code) when <pages/> tag is included to an Index page indirectly (e.g., Xyzzy/ToC has <pages/> and an Index transcludes such via something like: {{Xyzzy/ToC}}. Incidentally, is this an across the board restriction or just one for prevent circular transclusion? Meaning can I use a <pages/> tag in an Index so long as the <pages/> tag refers to a different Index (e.g., a ToC in one volume of a multivolume work where volumes not containing the ToC could still include the ToC in their Index pages using a <pages/> tag` because it is not a circular reference)?
Mar 26 2024
I am not really against removal of "slave" terms as there and usually plenty of other more precise words that can be used that are unrelated to human slavery, however, I am against unnecessary remove of "master" terminology as it was only applied to slave owners considerably after it already had many other usages and meanings (e.g., mastering a skill and the origin of the "Mr.", etc.). I have no issues with "master" branches and think it is silly and not useful to seriously consider trying to remove such references.
Feb 22 2024
It seems to me a superior solution would just be to use the existing wikitext redirects (necessitating a change in the content model upon rename/moves) and have Scribunto fetch the targets of such things before it #invokes, requires, etc.
Mar 17 2023
Splitting the content model into sequential and non-sequential content models is an interesting concept but I am not sure that is really that useful or necessary.
It seems to me the issue is Parsoid introducing parallel processing across all the individual parts of parsing during a page render. This way it can memoize the results of these individual parts and potentially run them out-of-order. In order for it to accomplish such, Parsoid has to know all of the inputs to any part of the parse and it assumes that anytime the inputs are the same the output will be the same.
Mar 1 2023
I am not sure how this related to ResourceLoader but this seems pertinent: T313514: Enable Wikistories for Desktop users.
Is this related to ResourceLoader and T329891: Remove mobile-only modules in Wikistories ?
Feb 28 2023
I would also like to see Special:BookSources get moved out of core to an Extension:ISBN that provides a Special:ISBN (with a Special:BookSources alias) for things like T148274: Implement a convenient way to link to ISBNs without magic links.
I do not really see the value here. First, I would like to see Special:BookSources moved out of core (e.g., into an extension) not unlike how magic links are likely to be handled anyway (see T252054). And what is wrong with links like [[Special:BookSources/{{{ISBN}}}]]? If you really want something shorter why not just make Special:ISBN be an alias for Special:BookSources (I believe several Special pages have aliases as well as language localization names)? Then ISBN {{{ISBN}}} magic links can just be changed to [[Special:ISBN/{{{ISBN}}}]] style links (e.g., via templates), etc.
Feb 24 2023
While I am for adding such revision tags I am against migrating to and depending the value there (which is good for watching, etc.).
One possible method towards this could be to use Wikibase in much the same way as Structured Data on Commons was deployed. We could develop something akin to Extension:WikibaseMediaInfo (or perhaps more like Extension:WikibaseLexeme since I am not sure we would need or want names, descriptions and aliases for these new objects) and for querying (per T172408) we could leverage things like WikibaseCirrusSearch, e.g., haswbstatement, wbstatementquantity, etc. or whatever else they are using.
@Tpt I also prefer the MCR route. I think that might allow better handling of the migration cost too.
Feb 22 2023
I find it strange that NAMESPACENUMBER was added that works on full pagenames but nothing was added to do the same with just namespace names—the corollary to ns: and nse:. It is easy enough to workaround as I can always just smash on a random pagename to a namespace and then pass to NAMESPACENUMBER: but that seems crudely unnecessary and a strange oversight.
Feb 15 2023
Feb 13 2023
In addition to the "Central description" ("Zentrale Beschreibung") the "Page information" ("Seiteninformationen") also directly specifies "Page content language" ("Seiteninhaltssprache"):
Which is clearly "en" and not "de".
Using something simplistic like the proposed return require( "Module:Target" ) can easily be detected by the target module, perhaps even affecting its functionality. Specifically, ... will have a value during the module initialization because the module is required and not directly #invoked and mw.getCurrentFrame():getTitle() will refer to the #invoked title and not the redirection target. Also reported script errors will list the #invoked module in the call stack because the target module is not truly treated as the Lua "main" module.
Feb 12 2023
Please make sure any solution works for both int-like and float-like numeric key values. It would be bad if 9.1 and "9.1" no longer sorted as larger than 9 and "9".
Jun 26 2022
It would be nice if the jobs metadata and the logs were kept longer. That said all jobs should have a master timeout and die that makes them all end up in the completed/aborted bucket. That bucket can then be cleaned after a longer period. This allows one to manually retry jobs by clicking a button on the aborted ones if one can ascertain from the logs that the error was somehow temporal.
Jun 24 2022
@Samwilson After pruning old job items via the fix for T183338, the line moved from 150 to 111 but it is possible it was fixed. Is this still happening?
Why not just use direct URL upload to begin with? Let Commons pull it from IA Upload then we do not have to worry about teaching addwiki async chunked uploading as IA Upload's part would be downloading instead (and from the perspective of IA Upload, another request is inherently asynchronous). This has the added benefit of transparency as we would have to provide the media URL and file description metadata anyway.
Jun 23 2022
This depends how how you are trying to process that. That IA item does not have an existing DjVu file (it was created well after March 2016 when they stopped making those).
Currently IA Upload uploads DjVu obtained via three possible sources:
- Use existing DjVu
- From original scans (JP2)
- From PDF (maybe of lower quality)
I too have run into this issue and I do not think it is is so much of an issue with the OCR layer being on the wrong pages per se as the
Jp2DjvuMaker including extraneous pages from the jp2 set and thus the OCR layers effectively no longer match up with resultant pages.
Jan 18 2022
For reference, {{PAGELANGUAGE}} mentioned in the description was added in T59603, however, it only allows one to obtain the language of the page being rendered (since it does not take any arguments unlike {{PAGENAME}} and friends) not the content language of arbitrary pages despite arbitrary page content being available via getContent on mw.title objects.
I am hoping that a resolution here can lead to a resolution of T161976 for which the fix was reverted due to T298659 and ultimately resulted in this issue. I believe if page content is available during the rendering of another page that the purported content language of the available page content should also be available during the page render (much like the content model already is).
Jan 11 2022
Jan 10 2022
The problem with this is that this affects the MediaWIki core code since the Scribunto extension just uses the core template frame code to parse the parameters and arguments. That code makes no attempt to retain original order and is thus this information is lost (despite such order being available to parser functions like #invoke in general). To make matters more complex, parameters and arguments are not only available (likely out of order) for the current #invoke frame but also the parent wikitext template frame. Wikitext template's also need both numbered and named parameters but so far have had no need to retain original ordering. Since Scribunto allows access to the parent frame args which also does not retain the original ordering they were given in this necessitates a core change to fix such.
Nov 24 2021
It should probably be noted that there are Wikidata items that state they represent (P31) Wikimedia categories (Q4167836). Some of those have category sitelinks at Commons, i.e., Q9013822 sitelinks to Category:Text logos). These should probably not be considered in error despite also having P373 "Commons category" statements claiming the same value. Having a MediaInfo entity's statements linking to such Wikidata items might be considered erroneous (depending on the claims).
Nov 23 2021
I am not sure if this is actually unexpected. {{#statements:P195|from=M89709639}} yields <span><span>[[Category:Media contributed by Toledo-Lucas County Public Library|Toledo-Lucas County Public Library]]</span></span> because of the P195 claim on M89709639 that points to Q7814140 which in turn has the commons sitelink that points to Category:Media contributed by Toledo-Lucas County Public Library (I doubt that sitelink is really correct and could use to be fixed, e.g., {{#property:P373|from=Q7814140}} also yields Media contributed by Toledo-Lucas County Public Library).
Nov 14 2021
I never suggested such was feasible or in scope, however, I do think it deserves a discussion point as the reason for wanting such external linking is actually even larger for editor created anchors than for sections. As such, they help define the problem here and possible arguments for or against the proposals here.
It should be noted that while these work:
- https://cs.wikipedia.org/w/index.php?title=Demokratura&veaction=edit§ion=5
- https://cs.wikipedia.org/w/index.php?title=Demokratura&action=edit§ion=5
the following does not work nor refer to the same thing:
but rather one has to use something like:
Jun 26 2020
I actually did not do that. I think somehow I must have edited/submitted and older version (though I am not sure how as that was not my intention).
Jun 23 2020
@Aklapper Does it really need triage? There was already a patch for it (thought it seems to need to be updated). I can see how the patch itself needs triage but the issue seems well understood. Anomie already clarified that mw.language.getPageLanguage was not the right thing and demonstrated that a pageLanguage field of mw.title objects was the way to go. What further triage does this issue really need? I only assigned it to Anomie so that he would respond based on the patch he created. I understand if he wanted to remove himself at this point in time but the point was to get him to make such a statement.
Jun 22 2020
Jun 21 2020
I highly doubt this sort of functionality will arrive anytime soon. The main issue is that if a Scribunto module supplies different output based on different input from remote wikis, how does Mediawiki track the links and maintain the page rendering caches (so cached output gets properly updated when a dependency changes)? To accomplish this sort of dependency tracking the link tables would have to somehow be expanded to support cross wiki linking so that things like [[Special:Whatlinkshere]] can list remote page transclusions, etc. (perhaps you read that getContent causes the page to be recorded as a transclusion and this is why).
@Anomie Can we get your change from over three years ago merged? This is an easy and straightforward fix but Gerrit is reporting some sort of merge conflict even though Jenkins had no issues with it.
Jun 20 2020
Jun 16 2020
May 14 2020
This is actually a regression when the extension moved from GeSHi (PHP) to Pygments (Python): T85794: Convert SyntaxHighlight_Geshi from Geshi to Pygments (was: Don't register 250+ modules) Since MW is PHP the extension can just use GeSHi as a library without having to fork a separate process via one of the unsafe functions removed in the hardened PHP.
Jan 24 2020
The issue becomes how to represent multiple edition links in Mediawiki toolbars across multiple WMF wikis across their projects. Currently, as implemented via WD sitelinks, we only allow one link per wiki per project per WD item. This is in part owning to the limited space in the Mediawiki toolbars where such links are displayed. Even across wikis within a single project when only a single link is allowed per wiki, there can sometimes be a *very* large number of links (there are many languages in Wikipedia alone and already there are mechanisms that limit the number of sitelinks displayed in the toolbar by default).