Page MenuHomePhabricator

Mitar (Mitar)
User

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Oct 25 2014, 6:11 AM (406 w, 6 d)
Availability
Available
LDAP User
Mitar
MediaWiki User
Mitar [ Global Accounts ]

Recent Activity

Sat, Aug 6

Mitar awarded T19993: Option on API lists to only have count of links/categories/whatever returned, rather than a full resultset a Like token.
Sat, Aug 6, 4:00 PM · Performance Issue, MediaWiki-Action-API
Mitar updated subscribers of T312200: Mediawiki API endpoint to get number of pages in a namespace.

A workaround is available in this StackOverflow question/answer: https://stackoverflow.com/questions/73223844/get-the-number-of-pages-in-a-mediawiki-wikipedia-namespace

Sat, Aug 6, 3:55 PM · MediaWiki-Action-API

Jul 13 2022

Mitar added a comment to T300907: Wikimedia Enterprise HTML dump for Wikimedia Commons.

Just HTML dumps. So what you provide here https://dumps.wikimedia.org/other/enterprise_html/ but also for commons wiki. (You already provide namespace 6 for other wikis.)

Jul 13 2022, 4:36 PM · Wikimedia Enterprise External Request, Wikimedia Enterprise
Mitar updated subscribers of T300907: Wikimedia Enterprise HTML dump for Wikimedia Commons.

I tried now to use API to fetch things myself, but it is going very slow (also because rate limit on HTML REST API endpoint is 100 requests per second and not documented 200 requests per second, see T307610). I would like to understand if I should at least hope for this to be done at some point soon or not at all. I find it surprising that so many dumps are made but just this one is missing. Would that be just one switch to enable dump on one more wiki?

Jul 13 2022, 4:07 PM · Wikimedia Enterprise External Request, Wikimedia Enterprise

Jul 6 2022

Mitar created T312200: Mediawiki API endpoint to get number of pages in a namespace.
Jul 6 2022, 9:24 AM · MediaWiki-Action-API

Jul 5 2022

Mitar added a comment to T312112: Mediawiki Commons structured data is missing structured data for files with quotes.

OK, it is not connected to characters in the filename. There are files in entities with above characters. But I do not get why not all files on Wikimedia Commons have entities.

Jul 5 2022, 3:07 PM · Structured-Data-Backlog, StructuredDataOnCommons, Structured Data Engineering, Commons
Mitar updated the task description for T312112: Mediawiki Commons structured data is missing structured data for files with quotes.
Jul 5 2022, 2:59 PM · Structured-Data-Backlog, StructuredDataOnCommons, Structured Data Engineering, Commons
Mitar updated the task description for T312112: Mediawiki Commons structured data is missing structured data for files with quotes.
Jul 5 2022, 2:54 PM · Structured-Data-Backlog, StructuredDataOnCommons, Structured Data Engineering, Commons
Mitar updated the task description for T312112: Mediawiki Commons structured data is missing structured data for files with quotes.
Jul 5 2022, 2:52 PM · Structured-Data-Backlog, StructuredDataOnCommons, Structured Data Engineering, Commons
Mitar created T312112: Mediawiki Commons structured data is missing structured data for files with quotes.
Jul 5 2022, 2:44 PM · Structured-Data-Backlog, StructuredDataOnCommons, Structured Data Engineering, Commons

Jul 4 2022

Mitar added a comment to T311977: Wikimedia Commons entity dumps are lacking datatype field.

Oh, what a sad issue T149410. :-(

Jul 4 2022, 10:59 AM · Structured-Data-Backlog, Structured Data Engineering, StructuredDataOnCommons, Commons
Mitar created T311977: Wikimedia Commons entity dumps are lacking datatype field.
Jul 4 2022, 6:50 AM · Structured-Data-Backlog, Structured Data Engineering, StructuredDataOnCommons, Commons

Jun 29 2022

Mitar created T311633: Unable to get imageinfo only for the latest revision.
Jun 29 2022, 2:23 PM · MediaWiki-Action-API
Mitar added a comment to T307610: I am hitting a rate limit on REST API endpoint.

Hm, I am pretty sure that I am doing rate limiting correctly on my side, but I am hitting 429s after a brief time when trying to do 1000/10s rate limit to the REST API endpoint. If I lower it to 500/10s then I do not hit 429s. No idea why, but I am doing many requests in parallel.

Jun 29 2022, 6:12 AM · Documentation, Traffic, SRE, RESTBase-API

Jun 28 2022

Mitar added a comment to T300907: Wikimedia Enterprise HTML dump for Wikimedia Commons.

Hm, there was no response since February. :-( OK, I will wait.

Jun 28 2022, 2:47 PM · Wikimedia Enterprise External Request, Wikimedia Enterprise
Mitar added a comment to T300907: Wikimedia Enterprise HTML dump for Wikimedia Commons.

Who could I ask from their team about this?

Jun 28 2022, 1:27 PM · Wikimedia Enterprise External Request, Wikimedia Enterprise
Mitar added a comment to T311441: Missing Enterprise Dumps from 2022-06-20 run.

So if I understand correctly, those files have never been generated so that particular dump for that particular date will not be available?

Jun 28 2022, 1:26 PM · Dumps-Generation
Mitar updated subscribers of T300907: Wikimedia Enterprise HTML dump for Wikimedia Commons.

@ArielGlenn: Do you think dumps of file descriptions (so not media files themselves, but wikitext rendered) could be provided for Wikimedia Commons as part of public Enterprise dumps? Given that so many other wikis are generated, why not also Wikimedia Commons? This could help me obtain descriptions for files on Wikimedia Commons (and given already no other dumps for Wikimedia Commons, it would help me hit its API less).

Jun 28 2022, 10:46 AM · Wikimedia Enterprise External Request, Wikimedia Enterprise
Mitar added a comment to T300124: In Wikimedia Enterprise HTML Dumps, categories and templates are not always extracted.

@Protsack.stephan Was there any progress on this?

Jun 28 2022, 10:44 AM · Wikimedia Enterprise External Request, Wikimedia Enterprise

Jun 24 2022

Mitar closed T301104: Wikimedia Commons structured data dump does not contain all fields, e..g, title as Resolved.
Jun 24 2022, 12:48 PM · Wikimedia-Hackathon-2022, Structured-Data-Backlog, Structured Data Engineering, Commons
Mitar added a comment to T301104: Wikimedia Commons structured data dump does not contain all fields, e..g, title.

I checked commons-20220620-mediainfo.json.bz2 and it contains title field (alongside other fields which are present in API).

Jun 24 2022, 12:47 PM · Wikimedia-Hackathon-2022, Structured-Data-Backlog, Structured Data Engineering, Commons
Mitar closed T278031: Wikibase canonical JSON format is missing "modified" in Wikidata JSON dumps as Resolved.
Jun 24 2022, 12:47 PM · MW-1.39-notes (1.39.0-wmf.14; 2022-05-30), wdwb-tech, Dumps-Generation, Wikidata, Wikibase (3rd party installations)
Mitar added a comment to T278031: Wikibase canonical JSON format is missing "modified" in Wikidata JSON dumps.

I checked wikidata-20220620-all.json.bz2 and it contains now modified field (alongside other fields which are present in API).

Jun 24 2022, 12:46 PM · MW-1.39-notes (1.39.0-wmf.14; 2022-05-30), wdwb-tech, Dumps-Generation, Wikidata, Wikibase (3rd party installations)

Jun 13 2022

Mitar added a comment to T301104: Wikimedia Commons structured data dump does not contain all fields, e..g, title.

So for the next dump which will run, this will now be included? Or is there some deployment which is still necessary?

Jun 13 2022, 12:52 PM · Wikimedia-Hackathon-2022, Structured-Data-Backlog, Structured Data Engineering, Commons

Jun 11 2022

Mitar added a comment to T301104: Wikimedia Commons structured data dump does not contain all fields, e..g, title.

Awesome. I will try to do so when you are online, but feel free also to just merge it without me. I do not know if I can be of much help being around anyway. :-)

Jun 11 2022, 2:24 PM · Wikimedia-Hackathon-2022, Structured-Data-Backlog, Structured Data Engineering, Commons

Jun 9 2022

Mitar added a comment to T301104: Wikimedia Commons structured data dump does not contain all fields, e..g, title.

What is this subsetting you are talking about?

Jun 9 2022, 5:23 PM · Wikimedia-Hackathon-2022, Structured-Data-Backlog, Structured Data Engineering, Commons
Mitar added a comment to T301104: Wikimedia Commons structured data dump does not contain all fields, e..g, title.

So what is the next step here?

Jun 9 2022, 2:32 PM · Wikimedia-Hackathon-2022, Structured-Data-Backlog, Structured Data Engineering, Commons

Jun 8 2022

Mitar added a comment to T301104: Wikimedia Commons structured data dump does not contain all fields, e..g, title.

Yes, this change should fix both this issue and T278031.

Jun 8 2022, 7:23 PM · Wikimedia-Hackathon-2022, Structured-Data-Backlog, Structured Data Engineering, Commons

Jun 7 2022

Mitar added a comment to T298437: Provide a public pull API endpoint.

Awesome, thanks!

Jun 7 2022, 4:26 PM · Wikimedia Enterprise External Request, Wikimedia Enterprise
Mitar added a comment to T301104: Wikimedia Commons structured data dump does not contain all fields, e..g, title.

Thanks for testing!

Jun 7 2022, 10:09 AM · Wikimedia-Hackathon-2022, Structured-Data-Backlog, Structured Data Engineering, Commons

Jun 5 2022

Mitar added a comment to T301104: Wikimedia Commons structured data dump does not contain all fields, e..g, title.

Done. Added it to June 7 puppet request window. Please review/advise if I did something wrong.

Jun 5 2022, 8:16 PM · Wikimedia-Hackathon-2022, Structured-Data-Backlog, Structured Data Engineering, Commons

May 27 2022

Mitar added a comment to T301104: Wikimedia Commons structured data dump does not contain all fields, e..g, title.

Awesome. Thanks for explaining.

May 27 2022, 12:33 PM · Wikimedia-Hackathon-2022, Structured-Data-Backlog, Structured Data Engineering, Commons
Mitar updated subscribers of T301104: Wikimedia Commons structured data dump does not contain all fields, e..g, title.

So fix to the dump script has been merged to the Wikibase extension. It is gated behind a CLI switch. What is the process that this gets turned on for dumps from Wikimedia Commons (and ideally also for Wikidata)?

May 27 2022, 12:27 PM · Wikimedia-Hackathon-2022, Structured-Data-Backlog, Structured Data Engineering, Commons

May 22 2022

Mitar added a comment to T301104: Wikimedia Commons structured data dump does not contain all fields, e..g, title.

https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/793934 is ready for a review, it has both opt-in configuration option and a test.

May 22 2022, 11:52 PM · Wikimedia-Hackathon-2022, Structured-Data-Backlog, Structured Data Engineering, Commons

May 21 2022

Mitar added a comment to T305407: Stale data / missing pages in HTML ("enterprise") dumps.

I thin this might be related to T274359.

May 21 2022, 11:09 PM · Wikimedia Enterprise, Dumps-Generation
Mitar added a comment to T274359: Mobile REST API delivers year old+ content for very select pages.

I think this might be related to T305407.

May 21 2022, 11:09 PM · User-TheresNoTime, Platform Engineering, Page Content Service, Product-Infrastructure-Team-Backlog, Wikipedia-Android-App-Backlog, RESTBase-API, affects-Kiwix-and-openZIM
Mitar added a comment to T301104: Wikimedia Commons structured data dump does not contain all fields, e..g, title.

I made another pass, adding configuration option to not include page metadata (then dump is without title and other page metadata).

May 21 2022, 3:56 PM · Wikimedia-Hackathon-2022, Structured-Data-Backlog, Structured Data Engineering, Commons

May 20 2022

Mitar added a comment to T301104: Wikimedia Commons structured data dump does not contain all fields, e..g, title.

I made a first pass. Feedback welcome.

May 20 2022, 9:41 PM · Wikimedia-Hackathon-2022, Structured-Data-Backlog, Structured Data Engineering, Commons
Mitar added a comment to T301104: Wikimedia Commons structured data dump does not contain all fields, e..g, title.

So the plan is:

May 20 2022, 4:48 PM · Wikimedia-Hackathon-2022, Structured-Data-Backlog, Structured Data Engineering, Commons

May 11 2022

Mitar added a comment to T307610: I am hitting a rate limit on REST API endpoint.

Most of that is controlled by the SRE team at a level in front of the REST API, since the frontend caching layer is a shared resource across everything.

May 11 2022, 6:42 AM · Documentation, Traffic, SRE, RESTBase-API

May 10 2022

Mitar added a comment to T307610: I am hitting a rate limit on REST API endpoint.

Because our edge traffic code enforces a stricter limit of ~100/s (for responses that aren't frontend cache hits due to popularity), before the requests ever get to the Restbase service.

May 10 2022, 7:32 PM · Documentation, Traffic, SRE, RESTBase-API
Mitar added a comment to T307610: I am hitting a rate limit on REST API endpoint.
May 10 2022, 6:54 PM · Documentation, Traffic, SRE, RESTBase-API
Mitar added a comment to T307610: I am hitting a rate limit on REST API endpoint.

Sadly bulk downloads do not have HTML dumps, and Enterprise dumps do not offer them for template/module documentation (only articles, categories, and files). Also, there are no Enterprise dumps for Wikimedia Commons.

May 10 2022, 6:28 PM · Documentation, Traffic, SRE, RESTBase-API
Mitar added a comment to T307610: I am hitting a rate limit on REST API endpoint.

Hm, but documentation for REST API says I can use 200 requests per second? https://en.wikipedia.org/api/rest_v1/

May 10 2022, 6:27 PM · Documentation, Traffic, SRE, RESTBase-API

May 4 2022

Mitar updated the task description for T307629: Unable to use REST API to get HTML of Template:;.
May 4 2022, 8:53 PM · RESTBase-API
Mitar created T307629: Unable to use REST API to get HTML of Template:;.
May 4 2022, 8:53 PM · RESTBase-API
Mitar created T307610: I am hitting a rate limit on REST API endpoint.
May 4 2022, 6:09 PM · Documentation, Traffic, SRE, RESTBase-API
Mitar added a comment to T300124: In Wikimedia Enterprise HTML Dumps, categories and templates are not always extracted.

Even if you request a single title, I think you still might get continue param.

May 4 2022, 11:07 AM · Wikimedia Enterprise External Request, Wikimedia Enterprise

May 3 2022

Mitar added a comment to T300124: In Wikimedia Enterprise HTML Dumps, categories and templates are not always extracted.

Are you using Mediawiki API to obtain categories and templates? I am betting you are not processing continue properly to merge multiple API responses when one batch of data is distributed across multiple responses. You have to merge data, otherwise some pages look like they have no templates/categories. I just now encountered that when I was using API to populate templates/categories manually (because dumps are missing them randomly). I used the following API query and you can see with some luck that some of returned pages are missing templates/categories, because you have to follow continue params, but then other pages are missing. Only when batchcomplete is true you know you got everything (but you have to merge everything you got before that).

May 3 2022, 6:34 PM · Wikimedia Enterprise External Request, Wikimedia Enterprise

May 1 2022

Mitar added a comment to T63111: Convert primary key integers and references thereto from int to bigint (unsigned).

I think I misunderstood in T301039 from documentation that those pointers are pointing to the text table.

May 1 2022, 4:28 PM · MediaWiki-General, Schema-change, DBA

Apr 28 2022

Mitar added a comment to T301104: Wikimedia Commons structured data dump does not contain all fields, e..g, title.

I would be interesting in doing that, but I probably need a helping hand to do it. So I have programming background, but zero understanding of where and how this could be fixed. My understanding is that hackathon would be suitable for this? Do I have to make a session? How do I find other people who might be able to help me?

Apr 28 2022, 7:21 AM · Wikimedia-Hackathon-2022, Structured-Data-Backlog, Structured Data Engineering, Commons

Apr 27 2022

Mitar added a comment to T301104: Wikimedia Commons structured data dump does not contain all fields, e..g, title.

I added it to wikimedia-hackathon-2022. I think it would be a nice thing to fix as part of it.

Apr 27 2022, 9:25 PM · Wikimedia-Hackathon-2022, Structured-Data-Backlog, Structured Data Engineering, Commons
Mitar added a project to T301104: Wikimedia Commons structured data dump does not contain all fields, e..g, title: Wikimedia-Hackathon-2022.
Apr 27 2022, 9:24 PM · Wikimedia-Hackathon-2022, Structured-Data-Backlog, Structured Data Engineering, Commons

Apr 22 2022

Mitar updated subscribers of T306409: Regression in processing PDFs on Wikimedia Commons.
Apr 22 2022, 10:45 AM · User-TheDJ, MediaWiki-File-management, Commons

Apr 19 2022

Mitar added a comment to T62380: OAuth developers should be able to change what grants their application asks for instead of having to submit a new application.

Thanks for linking to that task.

Apr 19 2022, 11:34 AM · MediaWiki-extensions-OAuth
Mitar added a comment to T62380: OAuth developers should be able to change what grants their application asks for instead of having to submit a new application.

I was interested in this primarily for my own self-approved app used only by me. There it should be trivial to just change grants.

Apr 19 2022, 10:19 AM · MediaWiki-extensions-OAuth
Mitar added a comment to T62380: OAuth developers should be able to change what grants their application asks for instead of having to submit a new application.
Apr 19 2022, 8:41 AM · MediaWiki-extensions-OAuth
Mitar added a comment to T62380: OAuth developers should be able to change what grants their application asks for instead of having to submit a new application.

Hm, should this be prioritized more? It is 8 years now.

Apr 19 2022, 8:12 AM · MediaWiki-extensions-OAuth
Mitar created T306409: Regression in processing PDFs on Wikimedia Commons.
Apr 19 2022, 5:24 AM · User-TheDJ, MediaWiki-File-management, Commons

Apr 6 2022

Mitar updated the task description for T305548: Better handling of orphan local file descriptions when a Wikimedia Commons file is renamed.
Apr 6 2022, 12:20 PM · Commons
Mitar created T305548: Better handling of orphan local file descriptions when a Wikimedia Commons file is renamed.
Apr 6 2022, 12:19 PM · Commons
Mitar added a comment to T301104: Wikimedia Commons structured data dump does not contain all fields, e..g, title.

Is there any way I could help to push this further?

Apr 6 2022, 11:57 AM · Wikimedia-Hackathon-2022, Structured-Data-Backlog, Structured Data Engineering, Commons

Apr 5 2022

Mitar added a comment to T298394: Produce regular public dumps of Commons media files.

I see. Thank you so much for detailed update. This helps a lot to understand things.

Apr 5 2022, 10:03 PM · Datasets-Archiving, Internet-Archive, Dumps-Generation, Commons-Datasets, Commons
Mitar added a comment to T298394: Produce regular public dumps of Commons media files.

What is limiting here? That backups are large so it is hard to host them? So if backups are made, then it is just a question of pushing them somewhere? If somebody offers storage for those backups, would then help moving this issue further?

Apr 5 2022, 12:22 PM · Datasets-Archiving, Internet-Archive, Dumps-Generation, Commons-Datasets, Commons

Apr 4 2022

Mitar added a comment to T53001: Image tarball dumps on your.org are not being generated.

I think all media files should be made available through IPFS. Then it would be easy to host a copy of files, or contribute to hosting part of a copy of files. You could pin files you are interested. And it would work like torrent, just that it is dynamic (new files can be added as they are uploaded, removed files can be unpinned by Wikimedia and can be hosted by others, or get lost by the IPFS). It could probably be made it so that Wikimedia does not have to host files twice, so that IPFS would use same files otherwise used for serving the web/API. This is something people behind IPFS are thinking about as well, so it could align: https://filecoin.io/store/#foundation I think this could help the fact that it is hard to make a static dump of all media files at the current size. So making this more distributed and fluid could help.

Apr 4 2022, 3:06 PM · Dumps-Generation, SRE, Datasets-Archiving, Datasets-General-or-Unknown
Mitar added a comment to T73405: Medium-sized image dump.

I think there are two actionable things to do here:

Apr 4 2022, 12:59 PM · Internet-Archive, Dumps-Generation, Datasets-Archiving

Mar 30 2022

Mitar added a comment to T301788: Metadata issues with few .mpg files on Wikimedia Commons.

So should we delete all except the Test_conductitivity.mpg files? Or should I re-code the first and third file as MPG and re-upload them?

Mar 30 2022, 8:57 AM · TimedMediaHandler, Commons

Mar 1 2022

Mitar added a comment to T302677: Metadata of a PDF in image table dump does not match the website.

What was the condition you searched for? Because there are PDFs which have 0x0 in the database but also in the web interface. See T301291. At least in English Wikipedia and Wikimedia Commons I could not find any other PDF or Djvu which would have 0x0 but in the web interface a reasonable number.

Mar 1 2022, 5:15 PM · MediaWiki-File-management, Dumps-Generation, Commons

Feb 28 2022

Mitar added a comment to T302677: Metadata of a PDF in image table dump does not match the website.

Given that this is the only row where this is the case (I wen through whole dump), could I suggest that somebody just writes the numbers in and this is it? :-) Investigating how this happened and why does the website still report correct numbers (maybe it is some cache?) might be more work.

Feb 28 2022, 2:16 PM · MediaWiki-File-management, Dumps-Generation, Commons

Feb 27 2022

Mitar updated the task description for T301291: PDF and Djvu files on Commons failed to be processed (no thumbnails, zero pages) but otherwise valid.
Feb 27 2022, 5:23 PM · Commons
Mitar updated the task description for T301291: PDF and Djvu files on Commons failed to be processed (no thumbnails, zero pages) but otherwise valid.
Feb 27 2022, 5:02 PM · Commons
Mitar updated the task description for T301291: PDF and Djvu files on Commons failed to be processed (no thumbnails, zero pages) but otherwise valid.
Feb 27 2022, 5:01 PM · Commons
Mitar updated the task description for T301291: PDF and Djvu files on Commons failed to be processed (no thumbnails, zero pages) but otherwise valid.
Feb 27 2022, 4:57 PM · Commons
Mitar updated the task description for T301291: PDF and Djvu files on Commons failed to be processed (no thumbnails, zero pages) but otherwise valid.
Feb 27 2022, 4:56 PM · Commons
Mitar updated the task description for T301291: PDF and Djvu files on Commons failed to be processed (no thumbnails, zero pages) but otherwise valid.
Feb 27 2022, 3:32 PM · Commons
Mitar updated the task description for T301291: PDF and Djvu files on Commons failed to be processed (no thumbnails, zero pages) but otherwise valid.
Feb 27 2022, 2:49 PM · Commons
Mitar added a comment to T155741: img_metadata missing.

I fixed 06.45 Management rep letter.pdf using mutool.

Feb 27 2022, 2:49 PM · Commons, MediaWiki-File-management, CommonsMetadata, WMF-General-or-Unknown, Multimedia
Mitar updated the task description for T301291: PDF and Djvu files on Commons failed to be processed (no thumbnails, zero pages) but otherwise valid.
Feb 27 2022, 2:45 PM · Commons
Mitar created T302677: Metadata of a PDF in image table dump does not match the website.
Feb 27 2022, 1:33 PM · MediaWiki-File-management, Dumps-Generation, Commons

Feb 16 2022

Mitar added a comment to T301758: OverrideUcfirstCharacters not in public settings.

Oh, what will then happen when you upgrade PHP? Is there a ticket to track about it and issues related to title names because of it? So then upgrade will change title of the page I linked above.

Feb 16 2022, 9:03 AM · Wikimedia-Site-requests

Feb 15 2022

Mitar added a comment to T301807: Two MPG files are audio files, but are classified as video.

Currently I am not able to upload mp3s (no autopatrol flag), so somebody else will have to look into this.

Feb 15 2022, 11:32 PM · MediaWiki-File-management, Commons
Mitar added a comment to T301807: Two MPG files are audio files, but are classified as video.

So what to do then here?

Feb 15 2022, 9:07 PM · MediaWiki-File-management, Commons
Mitar added a comment to T301807: Two MPG files are audio files, but are classified as video.

If I were to convert this to mp3 or ogg and upload it as a new revision of this file, what would happen? Can file type be changed with a new file revision?

Feb 15 2022, 8:57 PM · MediaWiki-File-management, Commons
Mitar added a comment to T301291: PDF and Djvu files on Commons failed to be processed (no thumbnails, zero pages) but otherwise valid.

@mau If you made this PDF yourself, could I recommend removing the first blank page? Because otherwise the first thumbnail does not show anything.

Feb 15 2022, 8:52 PM · Commons
Mitar added a comment to T301291: PDF and Djvu files on Commons failed to be processed (no thumbnails, zero pages) but otherwise valid.

So I fixed it using mutool clean. But the ones I listed above cannot be fixed this way. And this is what I am reporting. So mutool clean does not fix it, looking at MediaBox values show reasonable page sizes (including the first page), and even metadata (example for the first file above shows page size available:

Feb 15 2022, 8:52 PM · Commons
Mitar added a comment to T301291: PDF and Djvu files on Commons failed to be processed (no thumbnails, zero pages) but otherwise valid.

No, this one seems just a slightly broken PDF. I just fixed it.

Feb 15 2022, 7:48 PM · Commons
Mitar added a comment to T155320: Implement strict mime type detection and media type inferring of audio/video files.

I filled T301807 for two MPG files which are misclassified.

Feb 15 2022, 6:06 PM · TimedMediaHandler-Transcode, Commons, MediaWiki-File-management, Multimedia, Technical-Debt
Mitar added a parent task for T301807: Two MPG files are audio files, but are classified as video: T155320: Implement strict mime type detection and media type inferring of audio/video files.
Feb 15 2022, 6:05 PM · MediaWiki-File-management, Commons
Mitar added a subtask for T155320: Implement strict mime type detection and media type inferring of audio/video files: T301807: Two MPG files are audio files, but are classified as video.
Feb 15 2022, 6:05 PM · TimedMediaHandler-Transcode, Commons, MediaWiki-File-management, Multimedia, Technical-Debt
Mitar created T301807: Two MPG files are audio files, but are classified as video.
Feb 15 2022, 6:04 PM · MediaWiki-File-management, Commons
Mitar added a comment to T301788: Metadata issues with few .mpg files on Wikimedia Commons.

Files do provide this info, see output of ffprobe (there is both duration and width and height in there). But this is not detected correctly by Mediawiki software. So it seems support for mpg files is not complete and some are not handled correctly. So this task is about supporting those files, too.

Feb 15 2022, 5:20 PM · TimedMediaHandler, Commons
Mitar added a subtask for T44725: Multimedia file format support (tracking): T301788: Metadata issues with few .mpg files on Wikimedia Commons.
Feb 15 2022, 3:47 PM · Tracking-Neverending, WMF-General-or-Unknown
Mitar added a parent task for T301788: Metadata issues with few .mpg files on Wikimedia Commons: T44725: Multimedia file format support (tracking).
Feb 15 2022, 3:47 PM · TimedMediaHandler, Commons
Mitar created T301788: Metadata issues with few .mpg files on Wikimedia Commons.
Feb 15 2022, 3:46 PM · TimedMediaHandler, Commons
Mitar added a comment to T301774: Multiple .flac files on Wikimedia Commons have zero reported duration despite not having them with other tools.

Interesting that even remuxing does not fix this. I will try recoding, given that flac is lossless.

Feb 15 2022, 2:56 PM · Commons
Mitar added a parent task for T301774: Multiple .flac files on Wikimedia Commons have zero reported duration despite not having them with other tools: T44725: Multimedia file format support (tracking).
Feb 15 2022, 1:41 PM · Commons
Mitar added a subtask for T44725: Multimedia file format support (tracking): T301774: Multiple .flac files on Wikimedia Commons have zero reported duration despite not having them with other tools.
Feb 15 2022, 1:41 PM · Tracking-Neverending, WMF-General-or-Unknown
Mitar created T301774: Multiple .flac files on Wikimedia Commons have zero reported duration despite not having them with other tools.
Feb 15 2022, 1:38 PM · Commons
Mitar added a comment to T63900: Invalid Ogg file: Stream Undecodable.

Oh, and I wanted to do the same for mp3 files, but I could not because I do not have autopatrol flag which seems to be required for uploading (fixed) mp3 files.

Feb 15 2022, 1:29 PM · TimedMediaHandler
Mitar added a comment to T63900: Invalid Ogg file: Stream Undecodable.

OK. I made a pass over all application/ogg files. The fmpeg -err_detect command I mentioned above detected some badly broken files which I reported for deletion and they got deleted.

Feb 15 2022, 1:25 PM · TimedMediaHandler