User Details
- User Since
- Nov 6 2014, 11:07 PM (323 w, 6 d)
- Availability
- Available
- IRC Nick
- Kaldari
- LDAP User
- Kaldari
- MediaWiki User
- Kaldari [ Global Accounts ]
Sat, Dec 26
Fri, Dec 25
Oops, my CSS is missing a space!
Wed, Dec 23
Observations by the end of November 2020
After turning off IP editing on ptwiki, we saw:
- an increase in active registered editors
- an increase in new accounts
- an decrease in total edits
- an decrease in reverts
- an decrease in non-reverted edits
- an decrease in blocks
However, the ORES model is known for being biased against IP editors.
@jwang - This seems like an extremely important detail. Can you elaborate on it? I skimmed through the paper you cited, but didn't see anything about it.
Tue, Dec 22
I changed it to "Dankeschön senden (für andere sichtbar)?" which is 19 characters shorter.
Maybe the real problem here is just that the string in German is way too long.
@Addshore - Both Amir and myself have tried to ping @Lydia_Pintscher by Phabricator and email several times over the last six months to see if she had any remaining objections to merging https://gerrit.wikimedia.org/r/602412 and https://gerrit.wikimedia.org/r/602422. Neither of us have heard back, so I'm going to assume that silence means consent in this context. Unless you know something that we don't, it seems like we should move ahead with these patches, as there is community consensus in favor of this approach. (See RFC and Wishlist proposal.) As I mentioned above, it seems like Lydia's previous objections have been resolved by the redirect badge and templates like {{Wikidata redirect}}. What do you think?
Dec 21 2020
@matthiasmullie - Thanks for the thorough investigation! Yeah, it sounds like we should revisit this once it is decided how MediaSearch will be utilized on Commons. Depending on how that goes, it may make sense to just drop the depicts auto-suggestions.
Dec 16 2020
@matthiasmullie - Which code repo actually controls this feature? Is it a hook in WikibaseMediaInfo or something implemented directly in CirrusSearch or something else?
Dec 15 2020
Fixed. You can see the new version of the button in action at https://test.wikipedia.org/wiki/User:Kaldari.
Dec 14 2020
A new test using Test OCR document 2.jpg
Engine | Formatting Errors | Character Errors | Whitespace Errors | Curly Quotes Preserved | Other Notes |
Tesseract 4.1.1 | none | 15 | 0 | yes | 'Lancaster.'→'———', 'I should'→'1 sheuld', period changed to comma, 'a'→'a_', 'negro'→'necro' |
Tesseract 4.1.1 (eng+Latin) | none | 13 | 1 | yes | 'Lancaster.'→'enge', 'I should'→'1 sheuld', period changed to comma |
Google OCR (English) | none | 3 | 0 | no | 'I' deleted, 'inflict'→'indlict', em dash changed to space |
Indic OCR | none | 1 | 0 | no | em dash changed to hyphen |
''Character Errors" means errors other than not detecting diacritics or curly quotes.
@Samwilson @aezell - Now that we have Tesseract 4.1.1 on Toolforge, I went back and tested with it. Interestingly, the accuracy was greatly improved by specifying the languages to apply (even for the English part), suggesting to me that Tesseract doesn't have good language detection (a problem that merlijn.wajer at the Internet Archive is apparently working on).
Engine | Formatting Errors | Character Errors | Whitespace Errors | Diacritics Preserved | Curly Quotes Preserved | Other Notes |
Internet Archive | none | 4 | 0 | no | yes | confused by opening caps and ç, converted most diacritics to correct character without diacritics |
Internet Archive (French) | none | 11 | 0 | yes | yes | confused by opening caps, changed w to m, changed ; to j , changed l to i, etc. |
Tesseract 4.0.0-beta.1 | none | 8 | 1 | only é | yes | "Alice"→"Aitice", changed l’ to P, confused by diacritics other than é |
Tesseract 4.1.1 | none | 13 | 1 | only é | yes | "Alice"→"Aitice", all other errors in the French part |
Tesseract 4.1.1 (eng+fra+Latin) | none | 2 | 1 | yes | yes | 2 apostrophes missing in the French part |
Google OCR (English) | extensive errors | 0 | 2 | yes | sometimes | no paragraph breaks, only line breaks |
Indic OCR | none | 2 | 4 | yes | sometimes | changed ? into ., omitted a quotation mark |
''Character Errors" means errors other than not detecting diacritics or curly quotes.
Test file: Test OCR document.jpg
Dec 10 2020
For future reference, here are some ways to check if pages are disambiguation pages on save...
@Xover - Have you tried it with a newly uploaded file? That should let us know if it's a caching issue or just not working at all.
I think we should push out a MediaWiki-wide fix for this ASAP. TagItemWidget is prominently used in the Preferences interface, the Search interface, in ContentTranslation, UploadWizard, etc. Once this hits Wikipedia, a lot of people are going to notice.
Rather, I think the focus on the IP information is a bit too short-term and is a distraction from the issue that really we just have very bad counter-vandalism tooling as soon as someone signs up. This is a problem today as well. I think it's worth focussing on that and exploring more the space of how we can empower patrollers and admins to do more with less.
@Krinkle - That's exactly what the Anti-Harassment Tools team is working on. You can read more about it at https://meta.wikimedia.org/wiki/IP_Editing:_Privacy_Enhancement_and_Abuse_Mitigation#Tools and https://meta.wikimedia.org/wiki/IP_Editing:_Privacy_Enhancement_and_Abuse_Mitigation/Improving_tools.
Instead of a cloak flag what if it was just a new user right (e.g. hide-ips) that could be assigned to any user group, like extended confirmed users? That way each wiki could tailor it to their specific anti-vandalism needs and capacities.
Dec 9 2020
I've updated the Developers/Maintainers list.
I approve as well in case it matters ;)
Note that this has been made into a proposal at the Community Wishlist Survey: https://meta.wikimedia.org/wiki/Community_Wishlist_Survey_2021/Wikidata/Link_Wikipedia_redirects_to_Wikidata_items
We’ve actually wanted to do this from the start, but have not pursued this because of the dual licensing.
The licensing is not an issue for any data that would be automatically filled in by UploadWizard. None of that data is copyrightable. Bots have been adding this data from the template content since at least January 2020 with no controversy.
Dec 8 2020
Dec 7 2020
@Iniquity - If you expect anyone to work on this bug, you need to provide some specifics. When you say "The player's launch buttons are slightly not adapted to the accessibility recommendations" what are you actually talking about specifically?
Note that this is a cosmetic issue as the buttons are still clickable and functional.
@WDoranWMF @AMooney - This was being worked on by a volunteer a year ago, but never made it through code review and is now stalled. At the very least we are going to need CheckUser to use the actor table in order to move forward with IP masking. Is this something that Core Platform could pick up and carry across the finish line?
Dec 4 2020
@kchapman - This was being worked on by a volunteer a year ago, but never made it through code review and is now stalled. At the very least we are going to need CheckUser to use the actor table in order to move forward with IP masking. Is this something that Core Platform could pick up and carry across the finish line?
Dec 3 2020
I would strongly favor full extension of externalLinks.json via a page like Mediawiki:KartographerExternalLinks.js. There's nothing wrong with doing this, especially now that the MediaWiki namespace is locked-down to a very small number of editors. Other MediaWiki extensions and services allow similar on-wiki configuration (e.g. WikiLove, PageCuration, Citoid, etc.).
Dec 2 2020
@sdkim - Would love to learn more about this task.
Note that the LandingCheck extension only does its own geolocation lookup as a fallback (if Geo.country was not passed to it within the link).
Nov 19 2020
Nov 9 2020
@Xover - What would be the effect of just deleting all the caches? Tesseract has been upgraded since most of those caches were generated anyway.
Nov 4 2020
Just to give some context, AntiSpoof has two main use cases (which don't always align perfectly well):
Nov 2 2020
Oct 29 2020
Oct 28 2020
Yes, since this feature has already been implemented and deployed (although not to the bigger Wikipedias yet), I think we can close the RFC. See https://meta.wikimedia.org/wiki/Community_Tech/Watchlist_Expiry and Expiring-Watchlist-Items for further updates.
Oct 27 2020
@Ammarpad - Ah, that makes sense! Thanks for clearing up the mystery!
I wonder why you are phrasing this as a response to what I wrote? Personally I agree that increasing the length of the column is a bad idea.
I think we are misunderstanding each other. I was saying that using a hash would bloat the table as we would be storing 32 bytes per row rather than 6-16 bytes (in practice). Changing the column to varbinary(50) would have no effect on the size of the table or keys (assuming that none of our wikis have any plans to switch to collations with extra long names).
@ssastry - Recently I discovered that the reason that pronunciation links on Wikipedia are backwards (the speaker icon links to the file page, while the text links to the actual audio) is because of this 12 year old parser bug. It looks like the bug got integrated into Parsoid as well in order to match the Parser output. The purpose of the Media pseudo-namespace is to easily allow linking to files directly, but for some reason this was never implemented correctly for the link parameter in File links. @Bawolff came tantalizingly close to fixing this seven years ago, but sadly it was never followed up on. Just wanted to bring it to your attention so it could finally be properly triaged.
Oct 26 2020
Oct 21 2020
@patilise - Because of the nature of the problem, we unfortunately can't share many details right now. We consider this issue high priority and are actively working to resolve it, but the problem has turned out to be more complicated than initially expected. We are still hoping to be able to re-enable the Score extension in safe mode once a couple more bugs are fixed and we have completed a security audit of Score and Lilypond. Unfortunately, some features will not work under safe mode, so even that will only be a partial solution. (Some of the features that are disabled under safe mode are listed in the description of T174413.)
Oct 20 2020
If we create them on write (ie. when an actor ID needs to be inserted somewhere), the user will be detached from their contribution if the user agent does not persist the session (e.g. browsing with cookies disabled) without us being able to detect it beforehand and warn them.
I don't imagine that a warning would have much effect anyway. Any user worried about getting detached from their contributions would presumably create an account.
Oct 19 2020
No, there is no production/external API for Tesseract, and we would not want to call Toolforge from production. If Tesseract is needed, we'll need to request that Platform Engineering build a service for that. Since Platform Engineering already has a large backlog, we should make that request as soon as we are sure that Tesseract would be needed in production, as it could take a long time (a year or more) to get such a service up and running in production.
@ifried, @Samwilson, @aezell - I talked with Alexandros Kosiaris about how we could communicate with Google's OCR API from a production extension (similar to what Content Translation is already doing). He informed me that all you have to do is proxy the API requests through the HTTP proxy specified by $wgCopyUploadProxy. Thus it should be relatively easy to move Wikisource OCR into a MediaWiki extension, if we decide we want to do that.
@akosiaris - I updated the documentation at Manual:$wgAllowCopyUploads. Feel free to tweak further.
@akosiaris - Thanks for that info! That's super helpful!
Oct 17 2020
The UX still works fine for me, including for audio files. I guess we'll have to agree to disagree.
Oct 16 2020
@akosiaris - Thanks for the reply and clearing up my misunderstandings. We have no need to keep the OCR service on Toolforge and would like to eventually move it into a MediaWiki extension (which would also make it easier for all the Wikisource projects to utilize). But it won't make any sense for us to do that unless there is an API proxy available in production that can communicate with the Google Vision API. What API proxy is cxserver.wikimedia.org utilizing to communicate with Google? Would it be possible for us to use that as well or have a similar proxy set up for this service?
I am not sure that T258622: Poor display of media on Special:NewFiles is personal opinion :)
Regardless of whether its an opinion or not, it's purely a cosmetic issue. I don't think that should block deployment, personally.
Oct 14 2020
This was causing breakage on all the group1 wikis, so Dan rolled back the deployment. I imagine you may need to do a 2-step change to the CSS in order to accommodate server-cached HTML.
Oct 10 2020
See also T265187 (Commons search auto-suggest for "files depicting..." should filter out articles).
Oct 9 2020
@ST47 - Sorry I didn't notice this task until now. I've marked the patch as -2 as we explicitly chose not to use the Unicode confusables list in Equivset for several reasons. Most importantly, confusables.txt and Equivset have different purposes and are not intended to be utilized in the same way. Confusables.txt is intended to be used to see if two strings can be confused with each other, but it isn't intended to be used to create filter strings like we do in AbuseFilter, nor does it handle casefolding as Equivset does. This allows you to filter with a single string like "POOP" instead of "Pp|Oo|Oo|Pp" in AbuseFilter. It also results in significantly different mappings. For example we don't map capital I to lowercase L even though they are confusable. Otherwise you would have to filter out the word "idiot" with "LDLOT" instead of "IDIOT" in AbuseFilter. Secondly, confusables.txt is much bigger than our list with lots of obscure symbols mapping to other obscure symbols which we don't actually care about (for the most part). For AbuseFilter, we have to run every character of every edit through the entire Equivset list, and the longer that list is the more of a performance hit it is on saving an edit.
Oct 7 2020
Oct 2 2020
Since we're moving from OOUI to Vue, it seems likely that OOUI will be deprecated before jQueryUI is. At some point we may want to convert WikiLove to Vue, but converting it to OOUI seems like it would be counter-productive at the moment.
Sep 30 2020
@dmaza - Can you provide a summary of what ended up being deployed with the 1.35.0 release?
Sep 28 2020
@Mvolz - I would be fine with that, but there are a couple caveats:
- The refToolbar code isn't foolproof, and in fact, there is no foolproof way to handle the WorldCat author data since its formatting is just too inconsistent.
- It currently handles "Jr." and "Sr.", but not suffixes or prefixes that may exist in other languages.
If you want to use it, the most recent version of the code is at https://github.com/alexz-enwp/reftoolbar/blob/master/lookup.php#L139.