I’ve tried Wikidata dumps in QuickStatements format with Zstd compression, and benchmarked it: https://github.com/brawer/wikidata-qsdump
File size shrinks to one third, and decompression is 150 times faster (on a typical modern cloud server) compared to pbzip2.

May 8 2023, 12:41 PM · Dumps-Generation

Sascha added a comment to T222985: Provide wikidata JSON dumps compressed with zstd .

May 8 2023, 12:41 PM · [DEPRECATED] wdwb-tech, Dumps-Generation, Wikidata

Mar 18 2023

Sascha closed T319945: Migrate osmviews from Toolforge GridEngine to Toolforge Kubernetes as Resolved.

Done.

Mar 18 2023, 1:04 PM · Grid-Engine-to-K8s-Migration

Sascha closed T319983: Migrate qrank from Toolforge GridEngine to Toolforge Kubernetes as Resolved.

For the past ~2 years, qrank.toolforge.org has been serving redirects to qrank.wmcloud.org. After such a long time, it should be okay to shut down the redirect service entirely. Done.

Mar 18 2023, 12:59 PM · Grid-Engine-to-K8s-Migration

Dec 14 2022

Sascha added a comment to T324954: Provide zstd compression algorithm for dumps.

Actually, due to how the zstd formats were designed, the current parallelization approach for bzip2 will probably work in the exact same way also with zstd. With the right command line option, the zstd tool will already distribute its input to all CPU cores for parallel compression (the reference zstd implementation is similar to pbzip2 in that respect). But one should also be able to split the input oneself, compress the shards in parallel on a set of machines, and then in the end simply concatenate the compressed outputs. Again, it’s using the same approach as pbzip2.

Dec 14 2022, 12:31 PM · Dumps-Generation

Dec 12 2022

Sascha created T324958: Publish dumps on CDN.

Dec 12 2022, 11:25 AM · Datasets-General-or-Unknown

Sascha created T324954: Provide zstd compression algorithm for dumps.

Dec 12 2022, 10:57 AM · Dumps-Generation

Sascha created T324952: Mount dumps to instances in the qrank VPS project.

Dec 12 2022, 10:49 AM · VPS-Projects, Data-Services

Sep 26 2022

Sascha created T318533: Offer S3-compatible storage.

Sep 26 2022, 11:28 AM

Apr 25 2022

TheresNoTime awarded T306790: Set up monitoring for community cronjobs a Like token.

Apr 25 2022, 1:16 PM · User-aborrero, Toolforge, observability

Sascha added a comment to T306790: Set up monitoring for community cronjobs.

This seems like a duplicate

Somewhat, although this ticket here was meant specifically for monitoring cronjob completions. This is different (and simpler) than setting up Cortex/Thanos-like monitoring on metrics exposed by continuously running services.

Apr 25 2022, 12:35 PM · User-aborrero, Toolforge, observability

Sascha added a comment to T279621: Set up Misc Object Storage Service (moss).

Will Toolforge and Cloud VPS jobs be able to read and write into their own custom buckets? (That would be super helpful.)

Apr 25 2022, 12:16 PM · SRE-swift-storage

Sascha created T306790: Set up monitoring for community cronjobs.

Apr 25 2022, 11:44 AM · User-aborrero, Toolforge, observability

Feb 2 2022

Sascha added a comment to T236992: Order Wikidata search result by number of statements/labels/sitelinks/identifiers.

Possibly relevant: https://qrank.wmcloud.org/ which ranks Wikidata items by how often their pages get viewed on Wikipedia, Wikitravel, etc.; updated ~weekly.

Feb 2 2022, 6:45 AM · CirrusSearch, Discovery-Search, Wikidata

Jan 25 2022

Sascha added a comment to T194332: [Epic,builds-api,components-api,webservice,jobs-api] Make Toolforge a proper platform as a service with push-to-deploy and build packs.

@nskaggs, it’s in Go, although I’m working on a web frontend that’ll eventually have a part written in JavaScript/React. Here’s my current “release process” that I’m hoping to make less manual. The environment variable GOOS=linux can be omitted when the compiler runs on a Linux machine.

Jan 25 2022, 6:20 AM · Toolforge, cloud-services-team (FY2023/2024-Q3-Q4), Goal, User-dcaro, Cloud-Services-Origin-Team, Cloud-Services-Worktype-Project, Cloud Services Proposals, Epic

Jan 21 2022

Sascha added a comment to T194332: [Epic,builds-api,components-api,webservice,jobs-api] Make Toolforge a proper platform as a service with push-to-deploy and build packs.

Does the push-to-deploy pipeline accept early adopters? I’d gladly volunteer as a guinea pig.

Jan 21 2022, 8:23 AM · Toolforge, cloud-services-team (FY2023/2024-Q3-Q4), Goal, User-dcaro, Cloud-Services-Origin-Team, Cloud-Services-Worktype-Project, Cloud Services Proposals, Epic

Jan 6 2022

Sascha added a comment to T215438: Aggregate pageviews to Wikidata entities.

Meanwhile I’ve set up Wikidata QRank, which computes this dataset on a weekly basis and offers the results for public download. So, feel free to close this ticket. But if the data engineering team was interested in joining/improving/taking over the project, please feel welcome; it’d be great to work together!

Jan 6 2022, 6:44 AM · Data-Engineering

Dec 23 2021

Sascha created T298228: Request creation of qrank VPS project.

Dec 23 2021, 7:03 AM · Cloud-VPS (Project-requests)

Nov 17 2021

Sascha created T295879: Wikidata: Support language gmh for monolingual text.

Nov 17 2021, 1:39 PM · Wikidata Dev Team, Language codes, Wikidata

Jun 15 2021

Sascha added a comment to T284947: Shut down certmon?.

Oh, if it’s useful to you, I’ll gladly keep it running. Do you want it to monitor other domains beyond the current four?

Jun 15 2021, 1:21 PM · Tools

Jun 14 2021

Sascha created T284947: Shut down certmon?.

Jun 14 2021, 5:40 PM · Tools

May 17 2021

Sascha added a comment to T215098: Playing audio for the second time always stutters.

Friendly ping? The bug is still present.

May 17 2021, 2:59 PM · Kaltura player

May 14 2021

Sascha added a comment to T282264: Monitor certificate validity for Cloud VPS.

Cool, glad it’s useful! When you set up Prometheus rules, consider alerting when certmon_tls_certificate_expiration_timestamp - time() becomes less than ~2 weeks or so for a domain; see Prometheus recommendations for timestamps. Then, the the SRE team would get plenty of advance notice for expiring TLS certificates, allowing problems to be fixed long before they become user-visible outages. (Apologies if I’m stating the obvious here, you’ll know more about this than me).

May 14 2021, 9:46 AM · cloud-services-team (Kanban), Cloud-VPS

May 11 2021

Sascha added a comment to T282102: certificate for Cloud VPS has expired.

If it helps, feel free to adopt https://certmon.toolforge.org/ which was quickly thrown together in an attempt to help Wikimedia to improve its monitoring. See source code and the metrics endpoint for Prometheus monitoring. Feel free to fork, send pull requests, whatever. Please do tell if you end up using it, I’m quite curious. If it’s useful, my personal preference would be that you’d clone the repo into a better place (perhaps a Phabricator project) and run it yourself, so the Wikimedia SRE team could change things without me getting involved.

May 11 2021, 1:36 PM · SRE, Traffic, HTTPS, Cloud-VPS

Sascha added a comment to T282264: Monitor certificate validity for Cloud VPS.

May 11 2021, 1:36 PM · cloud-services-team (Kanban), Cloud-VPS

May 7 2021

Sascha created T282264: Monitor certificate validity for Cloud VPS.

May 7 2021, 5:45 PM · cloud-services-team (Kanban), Cloud-VPS

Apr 28 2021

Sascha added a comment to T209390: Output some meta data about the wikidata JSON dump.

Hm, good point. Could the dumps be made consistent? Maybe like this: Before starting a dump, find the current last revision; pass this cut-off revision ID to the dumping shards; change the dump-producing code to not consider changes after the cut-off revision. But I wouldn’t know how hard this would be. Actually, DumpEntities already seems to take a last-page-id flag, but I don’t know if/where that is getting set in production (and if that’s really enough).

Apr 28 2021, 9:14 AM · [DEPRECATED] wdwb-tech, Dumps-Generation, Wikidata

Sascha added a comment to T87283: Wikidata dumps should have revision ID or other sequence mark.

Regarding dump-level metadata, it would be super useful to know what timestamp should be passed to EventStreams for catching up with user edits after the dump was produced. To find this timestamp, can clients extract the entity ID with the highest lastrevid from a Wikidata dump, and then retrieve the corresponding timestamp via Special:EntityData like this? Or would a sync-up client loose some edits if it were to do this? (For example, if dumps get produced by parallel workers, they’d probably have to agree on a cut-off revision before starting the dumping process; otherwise, the JSON file wouldn’t necessarily contain all changes before the highest lastrevid in the dump file... correct?)

Apr 28 2021, 8:59 AM · MW-1.34-notes (1.34.0-wmf.5; 2019-05-14), Wikidata

Sascha added a comment to T209390: Output some meta data about the wikidata JSON dump.

To find the timestamp of the last Wikidata change that went into a dump file, couldn’t one — while processing the dump — extract the entity and revision ID with the highest lastrevid value in the entire dump, and then retrieve the corresponding modified timestamp for that single edit via Special:EntityData like in this query? The lastrevid field seems to have been added to dumps by T87283 in changeset 500806.

Apr 28 2021, 8:19 AM · [DEPRECATED] wdwb-tech, Dumps-Generation, Wikidata

Apr 9 2021

YFdyh000 awarded T275024: Toolforge: Update go runtime a Love token.

Apr 9 2021, 1:23 PM · Toolforge (Software install/update), cloud-services-team (Kanban)

Apr 6 2021

Sascha added a comment to T277749: [Toolforge] Generic webservice not working on Kubernetes.

Sure, glad to try. I’ve changed the qrank-builder job config to use the new image. It seems to work fine.

Apr 6 2021, 6:12 AM · Kubernetes, Toolforge

Mar 25 2021

Sascha created T278416: Mention QRank in “Analytics Datasets”.

Mar 25 2021, 9:42 AM · Analytics-Kanban, Dumps-Generation, Analytics

Sascha created T278409: [Legal] Downloads license should mention CC0 for Analytics datasets.

Mar 25 2021, 7:17 AM · Analytics-Kanban, Analytics, Datasets-General-or-Unknown

Mar 22 2021

Sascha created T278176: Use ranking signal for Special:Search.

Mar 22 2021, 7:49 PM · Discovery-Search, MediaWiki-Search, Advanced-Search

Sascha added a watcher for community-labs-monitoring: Sascha.

Mar 22 2021, 10:56 AM

Sascha changed the status of T278097: Monitoring and alerting for Toolforge tools from Open to Stalled.

Thanks for the pointer! Indeed, I was hoping the Wikimedia Cloud had something like Cortex or Thanos running on behalf of custom tools. Hm, considering for how long these discussions seem to already have been taking place, it doesn’t really look like this will be coming anytime soon. So, closing this ticket here as stalled; things won’t go any faster with more tickets around.

Mar 22 2021, 10:43 AM · Toolforge

Sascha created T278097: Monitoring and alerting for Toolforge tools.

Mar 22 2021, 10:04 AM · Toolforge

Mar 19 2021

Sascha added a comment to T277749: [Toolforge] Generic webservice not working on Kubernetes.

As @bd808 suspected, security is indeed the main reason why I’d like to run my dinky webservice in a constrained environment. As an external volunteer developer, I’m always fearing that my contributions may cause more harm than good. Especially when contributing some minor tool that doesn’t see much attention, I can sleep better when there isn’t much else bundled into the container for my webservice. Of course, the risks can be mitigated with container scanning, actively checking CVEs, etc. — but as an external volunteer, I don’t really want to impose such maintenance burden on others. Of course, keeping containers lean isn’t the universal solution to all problems in production security—still, with less baggage, fewer things can go wrong. Basically, it’s an attempt at taming the beast of system complexity.

Mar 19 2021, 8:35 AM · Kubernetes, Toolforge

Mar 18 2021

Sascha added a comment to T277457: Request increased quota for qrank Toolforge tool.

Thanks, @aborrero! I filed a separate ticket T277808 about deployment since it’s a bit off-topic from the CPU quota.

Mar 18 2021, 9:32 PM · Toolforge (Quota-requests)

Sascha created T277808: [Deployment pipeline] Support Toolforge?.

Mar 18 2021, 9:19 PM · Release-Engineering-Team (Seen), Toolforge

Sascha added a comment to T277749: [Toolforge] Generic webservice not working on Kubernetes.

With the Go programming language, binaries typically get statically linked. So, compiled programs will typically run without any runtime dependencies whatsoever — they wouldn’t access package files, call shared libraries, or use any other files. When compiling for Linux, the compiler builds an ELF binary that directly invokes the operating system kernel through Linux system calls, not even using libc or anything else in a Linux distribution. Rust may be similar in that respect (not sure); static linking can also be done with C and C++, although it’s a bit less common there.

Mar 18 2021, 8:42 PM · Kubernetes, Toolforge

Sascha created T277749: [Toolforge] Generic webservice not working on Kubernetes.

Mar 18 2021, 1:14 PM · Kubernetes, Toolforge

Sascha added a comment to T277457: Request increased quota for qrank Toolforge tool.

Also, you mention building the go program on Toolforge. How do you build it? I guess you build it in the toolforge bastion?

Mar 18 2021, 12:56 PM · Toolforge (Quota-requests)

Sascha added a comment to T277457: Request increased quota for qrank Toolforge tool.

Thank you! Yes, this is a build pipeline for data, it isn’t compiling code. For background, see the technical design document. (Feedback very welcome!)

Mar 18 2021, 10:15 AM · Toolforge (Quota-requests)

Mar 16 2021

Sascha updated the task description for T277457: Request increased quota for qrank Toolforge tool.

Mar 16 2021, 6:35 PM · Toolforge (Quota-requests)

Sascha updated the task description for T277457: Request increased quota for qrank Toolforge tool.

Mar 16 2021, 6:32 PM · Toolforge (Quota-requests)

Sascha added a comment to T143424: Explore the Entity Relevancy Scoring for Wikidata.

Perhaps the QRank signal might be helpful here? The signal is computed in the Wikimedia cloud infrastructure (Toolforge) and gets periodically refreshed. It’s just aggregated pageviews, but I found it pretty useful in my own projects, which is why I contributed it to Toolforge.

Site: https://qrank.toolforge.org/
Data download: https://qrank.toolforge.org/download/qrank.gz
Source code: https://github.com/brawer/wikidata-qrank
Technical design: https://github.com/brawer/wikidata-qrank/blob/main/doc/design.md

Mar 16 2021, 11:30 AM · Wikidata

Sascha added a comment to T174981: Add pageviews total counts to WDQS.

Site: https://qrank.toolforge.org/
Data download: https://qrank.toolforge.org/download/qrank.gz
Source code: https://github.com/brawer/wikidata-qrank
Technical design: https://github.com/brawer/wikidata-qrank/blob/main/doc/design.md

Mar 16 2021, 11:29 AM · Analytics-Radar, Discovery-ARCHIVED, Wikidata-Query-Service, Wikidata

Mar 15 2021

Sascha created T277457: Request increased quota for qrank Toolforge tool.

Mar 15 2021, 12:16 PM · Toolforge (Quota-requests)

Mar 2 2021

Sascha added a comment to T275371: Toolforge on Kubernetes: Broken symlink to dumps.

Yes, it works now. Thank you!

Mar 2 2021, 6:02 AM · cloud-services-team (Kanban), Kubernetes, Toolforge

Feb 24 2021

Sascha created T275703: [Toolforge] Reading dumps is very slow.

Feb 24 2021, 10:14 PM · cloud-services-team (Kanban), Kubernetes, Toolforge

Sascha added a comment to T275555: [toolforge,storage] Support Cinder volumes.

Note that Kubernetes can also directly mount volumes from Ceph RBD, so this wouldn’t necessarily have to be done via Cinder. If Kubernetes was directly mounting Ceph RBD, there would be one less layer to maintain. But I don’t know how well this would fit into Wikimedia’s production setup in terms of quota enforcement, key management, monitoring, etc. Here’s some pointers, in case you want to explore this. The example setup looks actually quite simple.

Feb 24 2021, 11:15 AM · cloud-services-team, Kubernetes, Toolforge

Feb 23 2021

Chicocvenancio awarded T275555: [toolforge,storage] Support Cinder volumes a Love token.

Feb 23 2021, 8:29 PM · cloud-services-team, Kubernetes, Toolforge

Sascha created T275555: [toolforge,storage] Support Cinder volumes.

Feb 23 2021, 8:22 PM · cloud-services-team, Kubernetes, Toolforge

Feb 22 2021

Sascha created T275371: Toolforge on Kubernetes: Broken symlink to dumps.

Feb 22 2021, 10:46 AM · cloud-services-team (Kanban), Kubernetes, Toolforge

Feb 17 2021

Sascha created T275024: Toolforge: Update go runtime.

Feb 17 2021, 1:05 PM · Toolforge (Software install/update), cloud-services-team (Kanban)

Nov 12 2019

Sascha added a comment to R2362:938075faf218: Add bulk lexeme creation mode.

Do you need a beta tester? I have a public domain list of 600 Sursilvan verbs including inflected forms. (In Sursilvan, verb inflection is quite complicated and fills entire textbooks; sort of like Latin, but with more exceptions). I’d like importing this knowledge to Wikidata. (Actually, if QuickStatements2 was able to create lexemes and refer to the newly created lexeme from within the same batch, that would probably be enough). Example:

Nov 12 2019, 8:40 AM

May 6 2019

Sascha added a comment to T215438: Aggregate pageviews to Wikidata entities.

Friendly ping?

May 6 2019, 11:04 AM · Data-Engineering

May 3 2019

Sascha added a comment to T222426: Add monolingual language codes rm-rumgr, rm-surmiran, rm-sursilv, rm-sutsilv, rm-vallader, rm-puter.

The codes are valid (and registered) IETF BCP47 language codes.

May 3 2019, 2:48 PM · Language codes, MW-1.35-notes (1.35.0-wmf.30; 2020-04-28), Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)), Wikidata

Sascha added a comment to T210293: Add language codes rm-rumgr, rm-sursilv, rm-surmiran, rm-sutsilv, rm-vallader, rm-puter for Lexemes.

@GerardM Yes, this request is merely about supporting Wikidata *content* in those language variants, eg. allowing people to enter Sursilvan usage examples for a Sursilvan lexeme (see T222426). No need to translate the *user interface* to Sursilvan, Vallader etc.

May 3 2019, 2:44 PM · Language codes, MW-1.34-notes (1.34.0-wmf.3; 2019-04-30), User-Michael, Wikidata Lexicographical data, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)), Wikidata

Sascha added a comment to T222420: Wikidata links to Commons should appear in Special:WhatLinksHere.

Hm... in the sidebar on Wikimedia Commons (see screenshot), would it perhaps make sense to replace the link to Special:WhatLinksHere by a link to Special:GlobalUsage? Currently, there seems to be a usability/UX issue: the feature is already implemented (thanks for the kind explanation on this bug, I had no idea!). However, people may might never come across the Special:GlobalUsage unless they already know that it exists. Hence the suggestion to remove “What links here” from the sidebar and replace it by “Global usage” which seems to be a superset. (There’s a risk of cluttering the user experience when the sidebar has too many links).

May 3 2019, 9:33 AM · Commons, Wikidata, MediaWiki-Special-pages

Sascha added a comment to T210293: Add language codes rm-rumgr, rm-sursilv, rm-surmiran, rm-sutsilv, rm-vallader, rm-puter for Lexemes.

@Lea_Lacroix_WMDE Thank you! Filed T222426.

May 3 2019, 7:59 AM · Language codes, MW-1.34-notes (1.34.0-wmf.3; 2019-04-30), User-Michael, Wikidata Lexicographical data, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)), Wikidata

Sascha created T222426: Add monolingual language codes rm-rumgr, rm-surmiran, rm-sursilv, rm-sutsilv, rm-vallader, rm-puter.

May 3 2019, 7:58 AM · Language codes, MW-1.35-notes (1.35.0-wmf.30; 2020-04-28), Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)), Wikidata

Sascha added a comment to T210293: Add language codes rm-rumgr, rm-sursilv, rm-surmiran, rm-sutsilv, rm-vallader, rm-puter for Lexemes.

Is there anything specific I should do so that people can enter usage examples for Sursilvan lexemes, and likewise for lexemes in the various other Romansh variants? I’ll gladly file more tickets if it helps; just tell me what to do.

May 3 2019, 7:33 AM · Language codes, MW-1.34-notes (1.34.0-wmf.3; 2019-04-30), User-Michael, Wikidata Lexicographical data, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)), Wikidata

Sascha added a comment to T210293: Add language codes rm-rumgr, rm-sursilv, rm-surmiran, rm-sutsilv, rm-vallader, rm-puter for Lexemes.

Filed T222423 for another (very minor) issue that seems related to language variants.

May 3 2019, 7:15 AM · Language codes, MW-1.34-notes (1.34.0-wmf.3; 2019-04-30), User-Michael, Wikidata Lexicographical data, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)), Wikidata

Sascha created T222423: Lexemes should display language name (not code) of Romansh variants in gloss language.

May 3 2019, 7:10 AM · Language codes, I18n, MediaWiki-extensions-CLDR, Wikidata

Sascha reopened T210293: Add language codes rm-rumgr, rm-sursilv, rm-surmiran, rm-sutsilv, rm-vallader, rm-puter for Lexemes as "Open".

May 3 2019, 6:20 AM · Language codes, MW-1.34-notes (1.34.0-wmf.3; 2019-04-30), User-Michael, Wikidata Lexicographical data, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)), Wikidata

Sascha reopened T210293: Add language codes rm-rumgr, rm-sursilv, rm-surmiran, rm-sutsilv, rm-vallader, rm-puter for Lexemes, a subtask of T144272: [DO NOT USE] new monolingual language code requests for Wikidata (tracking) [superseded by #language_codes], as Open.

May 3 2019, 6:20 AM · Language codes, MediaWiki-extensions-WikibaseRepository, Tracking-Neverending, Wikidata

Sascha added a comment to T210293: Add language codes rm-rumgr, rm-sursilv, rm-surmiran, rm-sutsilv, rm-vallader, rm-puter for Lexemes.

Hm, adding usage examples (and probably similar properties) doesn’t seem to work yet. Try adding the sentence “Ils tgauns vivan dalla naschientscha naven ensemen cullas nuorsas.” as usage example (P5831) in language “rm-sursilv” for tgaun (L45642); see screenshot.

May 3 2019, 6:19 AM · Language codes, MW-1.34-notes (1.34.0-wmf.3; 2019-04-30), User-Michael, Wikidata Lexicographical data, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)), Wikidata

Sascha created T222420: Wikidata links to Commons should appear in Special:WhatLinksHere.

May 3 2019, 5:52 AM · Commons, Wikidata, MediaWiki-Special-pages

May 2 2019

Sascha closed T210293: Add language codes rm-rumgr, rm-sursilv, rm-surmiran, rm-sutsilv, rm-vallader, rm-puter for Lexemes as Resolved.

Ah, got it. Thank you!

May 2 2019, 7:49 PM · Language codes, MW-1.34-notes (1.34.0-wmf.3; 2019-04-30), User-Michael, Wikidata Lexicographical data, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)), Wikidata

Sascha closed T210293: Add language codes rm-rumgr, rm-sursilv, rm-surmiran, rm-sutsilv, rm-vallader, rm-puter for Lexemes, a subtask of T144272: [DO NOT USE] new monolingual language code requests for Wikidata (tracking) [superseded by #language_codes], as Resolved.

May 2 2019, 7:49 PM · Language codes, MediaWiki-extensions-WikibaseRepository, Tracking-Neverending, Wikidata

Sascha reopened T210293: Add language codes rm-rumgr, rm-sursilv, rm-surmiran, rm-sutsilv, rm-vallader, rm-puter for Lexemes as "Open".

Is something else needed to activate lexemes in variants of Romansh? See screenshot:

May 2 2019, 10:20 AM · Language codes, MW-1.34-notes (1.34.0-wmf.3; 2019-04-30), User-Michael, Wikidata Lexicographical data, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)), Wikidata

May 2 2019, 10:20 AM · Language codes, MediaWiki-extensions-WikibaseRepository, Tracking-Neverending, Wikidata

Apr 8 2019

Sascha added a comment to T210293: Add language codes rm-rumgr, rm-sursilv, rm-surmiran, rm-sutsilv, rm-vallader, rm-puter for Lexemes.

Just to clarify, the codes in this ticket (rm-rumgr etc.) are not made up; they have been standardized by IETF and appear in the IANA language subtag registry.

Apr 8 2019, 5:09 PM · Language codes, MW-1.34-notes (1.34.0-wmf.3; 2019-04-30), User-Michael, Wikidata Lexicographical data, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)), Wikidata

Apr 2 2019

Sascha created T219914: Outdated project codes in pagecounts-ez.

Apr 2 2019, 6:23 PM · Analytics

Apr 1 2019

Sascha added a comment to T215438: Aggregate pageviews to Wikidata entities.

If nobody else has time to do this, may I volunteer to write the code? Please tell me where to start (which programming language, what framework, etc.)

Apr 1 2019, 11:55 AM · Data-Engineering

Mar 25 2019

Sascha added a comment to T210293: Add language codes rm-rumgr, rm-sursilv, rm-surmiran, rm-sutsilv, rm-vallader, rm-puter for Lexemes.

Curious, is it possible to estimate by what date this might get implemented? Is there anything I can do to help?

Mar 25 2019, 2:12 PM · Language codes, MW-1.34-notes (1.34.0-wmf.3; 2019-04-30), User-Michael, Wikidata Lexicographical data, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)), Wikidata

Mar 20 2019

Ninjastrikers awarded T213576: Display a warning when entering Zawgyi-encoded Burmese a Like token.

Mar 20 2019, 5:48 AM · Wikidata

Mar 14 2019

Sascha added a comment to T124758: [Story] Show all available languages in monolingual text value's suggester.

Oh, all you need from CLDR is an English label? Nothing else? In that case, this Wikidata query might be helpful:

Mar 14 2019, 5:24 PM · MW-1.36-notes (1.36.0-wmf.34; 2021-03-09), Patch-For-Review, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)), UX-Debt, UniversalLanguageSelector, WMDE-Design, Design, Story, Wikidata-Sprint-2016-04-26, Wikidata-Sprint-2016-03-01, Wikidata

Sascha added a comment to T124758: [Story] Show all available languages in monolingual text value's suggester.

Sure, but it will take a while until the next official release of CLDR so you'd have to read the CLDR data from the development branch ("trunk"). I do wonder, though, if you could read the IANA registry in addition to CLDR and use IANA as fallback for the English names when CLDR has no data yet. Then, you would immediately get an English name for every language with an ISO 639 or IETF BCP 47 code, so you'd add support for a couple thousand languages at once.

Mar 14 2019, 2:53 PM · MW-1.36-notes (1.36.0-wmf.34; 2021-03-09), Patch-For-Review, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)), UX-Debt, UniversalLanguageSelector, WMDE-Design, Design, Story, Wikidata-Sprint-2016-04-26, Wikidata-Sprint-2016-03-01, Wikidata

Sascha added a comment to T124758: [Story] Show all available languages in monolingual text value's suggester.

The easiest way to add a new language to CLDR is preparing ‘seed’ files in XML format;

Mar 14 2019, 7:00 AM · MW-1.36-notes (1.36.0-wmf.34; 2021-03-09), Patch-For-Review, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)), UX-Debt, UniversalLanguageSelector, WMDE-Design, Design, Story, Wikidata-Sprint-2016-04-26, Wikidata-Sprint-2016-03-01, Wikidata

Feb 28 2019

MichaelSchoenitzer awarded T213535: Audio processing: normalize loudness a Love token.

Feb 28 2019, 12:17 PM · Lingua-Libre

Feb 8 2019

Sascha added a comment to T210293: Add language codes rm-rumgr, rm-sursilv, rm-surmiran, rm-sutsilv, rm-vallader, rm-puter for Lexemes.

Friendly ping, is there anything I can do to help with this ticket?

Feb 8 2019, 1:47 PM · Language codes, MW-1.34-notes (1.34.0-wmf.3; 2019-04-30), User-Michael, Wikidata Lexicographical data, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)), Wikidata

Feb 6 2019

Sascha created T215438: Aggregate pageviews to Wikidata entities.

Feb 6 2019, 5:00 PM · Data-Engineering

Feb 2 2019

Sascha created T215098: Playing audio for the second time always stutters.

Feb 2 2019, 9:06 AM · Kaltura player

Jan 11 2019

Sascha added a comment to T210293: Add language codes rm-rumgr, rm-sursilv, rm-surmiran, rm-sutsilv, rm-vallader, rm-puter for Lexemes.

@GerardM, is there anything I can do to help with this ticket? There’s a sizable Romansh dictionary whose data can be donated to Wikidata, but this is currently blocked on this ticket. (Try an exact search for a few German words, eg. “Hund” or “Gelbsucht”, to see how the words are different in various variants of the Romansh language).

Jan 11 2019, 8:54 PM · Language codes, MW-1.34-notes (1.34.0-wmf.3; 2019-04-30), User-Michael, Wikidata Lexicographical data, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)), Wikidata

Sascha updated subscribers of T208641: Lua module: Improve categorisation for languages that do not have ISO 639-3 code.

For languages that have no language code yet, perhaps Lingua Libre could use “mis-x-Q12345” (where Q12345 would be the Wikidata item for the language of the pronunciation audio). That would be a syntactically valid IETF BCP 47 tag, and you wouldn’t lump unrelated languages into the same category. Once the language does get a code, some bot could change the categories of uploaded files on Wikimedia Commons. @GerardM, what do you think?

Jan 11 2019, 8:45 PM · Lingua-Libre-Legacy

Sascha created T213576: Display a warning when entering Zawgyi-encoded Burmese.

Jan 11 2019, 8:26 PM · Wikidata

Sascha added a comment to T213556: Unicode: detect Zawgyi encoding for Burmese strings.

Sorry, here’s the correct link to the Unicode FAQ about Zawgyi: https://www.unicode.org/faq/myanmar.html

Jan 11 2019, 4:13 PM · Lingua-Libre-Legacy

Sascha created T213556: Unicode: detect Zawgyi encoding for Burmese strings.

Jan 11 2019, 4:12 PM · Lingua-Libre-Legacy

Sascha created T213535: Audio processing: normalize loudness.

Jan 11 2019, 11:16 AM · Lingua-Libre

Sascha created T213534: Audio processing: compress audio before uploading?.

Jan 11 2019, 11:08 AM · Lingua-Libre