User Details
- User Since
- Mar 2 2015, 10:04 AM (477 w, 6 d)
- Availability
- Available
- IRC Nick
- MisterSynergy
- LDAP User
- MisterSynergy
- MediaWiki User
- MisterSynergy [ Global Accounts ]
Feb 6 2024
Oct 3 2023
Aug 27 2023
May 6 2023
Apr 9 2023
To add background to this request: we have recently had ~2400 deleted Wikidata items that were still used by SDC; many of these deletions have taken place years ago. It is super inconvenient to check whether an item is being used with SDC, thus practically no Wikidata admin includes this check in their deletion routine. This check needs to be made much much simpler.
Mar 24 2023
To do experience the same problem: I occasionally can't save notebooks anymore after some time, although I still can run them. This started pretty much exactly at the time when PAWS switched to JupyterLab. (My setup: latest Firefox on Windows 10).
Mar 22 2023
Some remarks:
- We should consider these canonical HTTP URIs to be names in the first place, which are unique worldwide and issued by the Wikidata project as the "owner" [1] of the wikidata.org domain. The purpose of these names is to identify things.
- Following linked data principles, it is no coincidence that these names happen to be valid URIs. These are meant to be used to look up information about the named entity. It is okay to redirect a canonical URI to another location, including of course to a secure HTTPS location.
- Pretty much every external project (i.e. outside Wikimedia) that has aligned its content with Wikidata in the past 10+ years uses these canonical HTTP URIs. While the canonical HTTP URIs are not very present within Wikidata (but still relevant e.g. in WDQS and hardcoded in plenty of tools/bots), external usage is huge—not necessarily to look information up, but primarily to express identity with names issued by others for the same entity [2].
- To my understanding, HSTS can be used to secure all but the first request of a client (that supports HSTS).
- Canonical HTTP URIs are still widespread in many other linked data resources, since many projects have started issueing these before everything transitioned to HTTPS. Some projects have transitioned to canonical HTTPS URIs, however, with GND doing this in 2019 being a prominent example [3].
Feb 17 2023
The tool has just been migrated to k8s, and the cronjob on the grid engine has been deleted. Thanks for the new mariadb docker image.
Feb 1 2023
Nov 30 2022
Nov 28 2022
Nov 24 2022
Nov 23 2022
Thank you for the advice. I have tried to implement as much as my capabilities allow; I am not a frontend dev, thus please verify :-)
Nov 20 2022
Nov 7 2022
I have already proposed a bot task that would deal with exactly such cases here: Wikidata:Requests for permissions/Bot/MsynBot 10.
Oct 25 2022
Thanks for looking into this. Some more context:
Oct 17 2022
Another status update:
Oct 13 2022
Tool maintainer here.
Sep 5 2022
Aug 22 2022
Can we please also quickly reset the counters to actual values?
Aug 10 2022
In order to keep things simple, I'd like to mention that the community will anyways operate a (daily?) bot that manages these badges:
- Add "sitelink to redirect" badge where it is missing
- Remove "sitelink to redirect" badge and "intentional sitelink to redirect" badge from sitelinks to non-redirects
Pages on client wikis can be turned into redirects (and vice versa) without any changes to Wikidata, so we need to keep updating these anyways. There is no way to keep this synced just with sitelink editing constraints on Wikidata.
Jul 21 2022
Status update: the backlog of sitelinks to inexistent pages is cleared, except for:
- Sitelinks to wikis that have been closed (their status is undetermined anyways; number of cases is unknown)
- Sitelinks to Special pages, which appear as inexistent in some contexts but actually exist (these should not happen per guidelines, but there are ~1000 of such sitelinks currently in Wikidata)
- Sitelinks to User pages where the user has a genered namespace prefix on the client wiki; these pages appear as inexistent in some scenarios as well; ~10 cases)
I do not plan to touch these at the moment.
Jul 12 2022
Status update: In the past days, I have removed deleted sitelinks for the "easy" cases where the reason is relatively obivous. This has reduced the number of open cases from ~60k to ~6k (i.e. 90% reduction). Findings:
- Around 6k cases resulted from "move without redirect" scenarios on client wikis. This is much less than what I anticipated earlier, yet still a substantial amount.
- Around 40k cases resulted from scenarios where the user batch-deleted plenty of pages on the client wiki at a high rate, either by using Special:Nuke or a custom deletion bot script. Since admins on client wikis usually enjoy noratelimit priviledges on the client wiki but not on Wikidata, this causes ratelimit issues when removing the sitelinks from Wikidata items. Since this is by far the most important reason why a deleted page might remain as a sitelink on the Wikidata item, it might be valuable to consider optimizations for this scenario.
- Another 8k "deleted sitelinks" where not actually deleted, but their namespaces where renamed (on srwikinews and lmowiki only). I have simply updated the sitelinks so that this is not an issue any longer. There are more such cases waiting for a fix within the remaining 6k cases.
Within the next days, I will have a look at the remaining "deleted sitelinks" in order to fix them as well. I will also set up a bot task that executes regularly, in order to keep the backlog short.
Jul 8 2022
I don't think "User:Hoo bot" has much influence here as this bot has not edited Wikidata since 2016-10. While many cases are a couple of years old, they are not *that* old in fact. As much as I am aware, nobody has taken care of this for a long time now (but I am determined to do so…)
@Manuel: I have looked into this again. As of now, I have this list of potential reasons for sitelink update failures:
Jun 22 2022
- I got a bot task approved that allows me to tidy these sitelinks up regularly (i.e. remove from the item if the page is inexistent on the client wiki). This itself can be considered a "dirty" solution to the problem, but clearly not the best one.
- However, it has not been executed yet due to a lack of time for Wikidata on my side in recent months.
- AFAIR, the main issue currently is that the evaluation workflow is kinda demanding regarding memory usage. During drafting the code on PAWS with its 3 GB memory limit, I offloaded parts of the evaluation for larger wikis to my local machine which has sufficient memory available. For a fully automated deployment on Toolforge, this is of course not possible. Instead, there may even be stricter memory limits applying on Toolforge than on PAWS.
- Why does it need so much memory? My approach queries "all pages per client wiki" (from the client's page table) and "all sitelinks in Wikidata" (from Wikidata's wb_items_per_site table) into separate Pandas DataFrames and subsequently looks for differences using Python. In other words: I avoid checking millions of cases individually by sitelink, and use a pretty quick per-client-wiki approach instead that requires me to hold all information for a given client wiki in memory.
May 16 2022
Mar 23 2022
Feb 19 2022
Dec 16 2021
I came across some of these cases and thought the situation could require some tidying, so I wrote a script which lists sitelinks to inexistent client wiki pages in order to process them. Some patterns that I notice after closely looking at dewiki, ptwiki, and cawiki:
Dec 11 2021
Nov 16 2021
Good to see this problem being addressed. Some remarks:
- As much as I am aware, we do not fail the classification job completely. It's the P279/subclass-of hierarchy which some refer to as the "Wikidata ontology" that is problematic, because it is generic in topic, global in reach, and does not closely resemble any other ontology from elsewhere so that we cannot stricly build this on sources. I suggest to limit modifications to P279 claims.
- Main reasons for the poor P279 ontology, from daily Wikidata editing experience over several years:
- Requires high level of knowledge and experience. We leave editors pretty much alone to learn the necessary skills.
- Poor tooling; simple edits in the P279 hierarchy can have severe adverse effects that are difficult to project even for experienced users.
- Lack of awareness; editors often modify P279 claims to fix something else, such as e.g. a constraint violation in another item (it would be better to fix the item, leave the constraint violation there for others to fix it, or sometimes to fix the constraint definition).
- Also: often there is not a clear "correct" or "incorrect" approach when classifying data items, and some situations are arguably not easy to resolve. This needs more community discussion and probably also an explicit definition of the term "Wikidata ontology", its purposes, and its design principles.
- In general, I think we should rather restrict the ability to add, modify, or remove P279 main values by introducing a new user group "ontologist" (or so). This would be similar to "property creator", which is another user group based on technical skills and experience in a certain field. The community could then elect or assign the right to interested, qualified users. My only concern is that this might not scale well.
Aug 29 2021
I know that the revtag table is definitely required, but I am not exactly sure about the other ones due to the incomplete documentation. I think all permanent/non-temporary content that is not sensitive should be accessible in the replicas, in order to allow maximum possibilities.
Aug 18 2021
This is more or less complete. Results:
I am still working on this task, but I think I am choosing a very different path than @Theklan did. I am basically doing an import from Olympedia with a Python crawler script.
Aug 15 2021
All 339 identifiers are matched now.
Aug 12 2021
Almost 4800 identifiers have meanwhile been added via this catalog, only few are currently remaining:
This is a bit stalled at the moment since not all 339 events already have a page at Olympedia. ~298 existing identifiers are currently matched to Wikidata items, some 41 are still waiting for creation.
Aug 10 2021
More tasks:
- Add "Olympedia event ID" (P9055) to all of the 339 items of type "Olympic sporting event" for 2020. Related query: https://w.wiki/3qa4. (Only ~100 or so are still missing for 2020; this can be done manually.)
- Add participation data to athlete items using P1344 ("participant in"). Discussion is needed which qualifiers should be added there alongside. This would result in very useful infobox-able statements.
Aug 9 2021
The problem with P1532 ("country for sport") is that in some cases the required values are not of type "country". Think of the Refugee team, the ROC team (not Russia), or situations such as "Great Britain" which typically forms teams not exactly along country borders. There have been similar instances of non-country delegations at several past editions of the Olympics, and this happens at other international competitions as well. We are used to expect "national teams" that represent "countries", but exceptions are sort of the rule here.
The Olympedia mix'n'match catalog is available at https://mix-n-match.toolforge.org/#/catalog/4628. Particularly via https://mix-n-match.toolforge.org/#/list/4628/auto, users can review suggestions and add missing Olympedia identifiers to Wikidata items via "Confirm", or reject the suggestion via "Remove". There is plenty to do, and still many easy cases to process.
Aug 7 2021
Jul 4 2021
Jun 19 2021
Jun 12 2021
Jun 11 2021
May 18 2021
Continuation of the table above (numbers taken from the revision history of https://www.wikidata.org/wiki/User:MisterSynergy/itemstats):
May 4 2021
@DannyH and others: German Wikipedia uses "flagged revisions" on all pages; changes are only being displayed to readers if they have been flagged/reviewed by an experienced editor.
Apr 30 2021
Apr 25 2021
Apr 22 2021
The German Wikipedia community has not yet even discussed whether Wikidata descriptions should be dumped completely, or which milestone would be appropriate in case this should be done. Some remarks:
- In case this gets approved, there is a plan to add short descriptions from [[de:Vorlage:Personendaten]], which actually has systematic "short descriptions" for all biographies in German Wikipedia that would immediately qualify for use in SHORTDESC as well. With 870.000 transclusions, it alone would be sufficient to pass the proposed 850.000 requirement, leaving all non-biographies without descriptions.
- A couple of days ago, I queried the situation a bit. There were 2,56 million German Wikipedia articles (main namespace, no redirects), of which 2,13 million (83%) use a German description from Wikidata. This is considerably more than what we had three years ago for English Wikipedia.
You propose to ignore 1,3 million existing descriptions from Wikidata. This is way too much, particularly considering that quite some editors have invested considerable time into adding Wikidata descriptions due to the way they have been used and exposed to readers in the past six years or so. It is also unclear at this point whether there is a desire to dump Wikidata descriptions entirely that is as strong as it was in English Wikipedia three years ago.
Mar 30 2021
I read the announcement and I am pretty excited about the improvements. The query-preview servers do not seem to have the problem that I have reported here, but I am not sure right now whether you have reloaded the entities there as well.
Mar 29 2021
Mar 28 2021
Mar 5 2021
The bot job is on hold for a couple of days to see where this goes. If this ticket gets stalled, I will continue as this is strictly seen not a problem with my bot or its job.
Mar 3 2021
Week ending ... | Total skipped items | weekly increase |
5 December 2020 | 8303905 | +1256459 |
12 December 2020 | 8351248 | +47343 |
19 December 2020 | 8380252 | +29004 |
26 December 2020 | 8420623 | +40371 |
2 January 2021 | 8431286 | +10663 |
9 January 2021 | 8459454 | +28168 |
16 January 2021 | 8473979 | +14525 |
23 January 2021 | 8487360 | +13381 |
30 January 2021 | 8505173 | +17813 |
6 February 2021 | 8514746 | +9573 |
13 February 2021 | 8524740 | +9994 |
20 February 2021 | 8535448 | +10708 |
27 February 2021 | 8542979 | +7531 |
Jan 13 2021
More: only 77 items have been created in the past 30 days using Relator (per recentchanges table), 67 of them by User:Animalparty [1]. This user also created some items related to "Edwin M. Post" recently [2][3][4]. The other Relator users are User:Ayack (5 creations on Dec 21/22 and Jan 05), User:Miraclepine (1 creation on Dec 23), User:Andrew_Gray (1 creation on Jan 10), and User:Ldhank (3 creations on Jan 13). I think the tool is to be blamed here, but maybe you might want to interview them to understand their workflow and ask for suspicious tool behavior that they might have experienced.
Re. "Maggie Rogers": the request does not create an item since the input is not formatted correctly. If I throw this exact input to the API using pywikibot's editentity function [1], I get this error message: "WARNING: API error not-recognized-language: The supplied language code was not recognized." No idea whether this wastes a QID.
Jan 12 2021
@Lucas_Werkmeister_WMDE: We have not seen excessive phases of QID skipping since the week ending December 5. We skip around 10k–50k QIDs per week since then which seems pretty "normal".
Dec 5 2020
New numbers from the past week, in order to keep the momentum here:
Nov 24 2020
Following my report in T44362#6638174, I looked into this a little more. From Wikidata's mediawiki database, I queried page creation times for the items created during the reported time period (14 Nov, 1:42 to 21 Nov, 1:42) and quickly plotted Q-ID vs. item creation timestamp.
Nov 21 2020
Reminder that this is still a thing:
Nov 11 2020
Nov 3 2020
Sep 24 2020
Aug 3 2020
Jul 20 2020
Regarding the proposed solution:
- "flooders" should be treated the same as bots
- I would like to see a way to limit all other unlimited users' edit rates as well (sysops, apparently global rollbackers, ...). When I use my sysop account with QuickStatements and run a single batch, it can easily go up to 400 or 500 edits per minute and there is no possibility for me to slow down. Not fair for all users without such elevated rights.
Jun 7 2020
@Lydia_Pintscher: Can we please push this a little more? According to this Grafana chart, use of preferred+deprecated rank together went up significantly during the past year from ~720,000 claims (June 2019) to ~15,000,000 claims (June 2020). My personal impression is that use of ranks are much more often discussed controversially in recent months, as many users for instance prefer to remove deprecated claims as the deprecation is barely recognizable for many users.
Feb 6 2020
Jan 20 2020
@Lea_Lacroix_WMDE @Lydia_Pintscher: Can you please have a look at this problem? I think it is really important that it does not get lost here.
Jan 12 2020
If an editor cannot edit an entity due to a page protection, I think it would be the best to just replace all the "edit" links (in terms box, all statement boxes, sitelinks box) on the protected entity page with something useful, such as a link to an appropriate page, or a tooltip explaining the situation and indicating options for the editor. Currently those "edit" links are just missing if one is not allowed to edit the page which massively breaks the usual workflow and makes it difficult to figure our what options one still has.
Jan 9 2020
Jan 7 2020
Dec 13 2019
Confirm, I have the very same problem here using the latest Firefox and Chrome, and it is very annoying.
Dec 10 2019
Just want to mention that not all users are affected:
Nov 27 2019
Nov 6 2019
There is also https://wikitech.wikimedia.org/wiki/Wikidata_query_service#Manually_updating_entities with a description about that shell script.
@Lydia_Pintscher: because you asked for this phab topic at https://www.wikidata.org/w/index.php?title=Wikidata:Contact_the_development_team&diff=1045986007&oldid=1045871780
Oct 14 2019
If we add those badges, we should remove the mechanism that prevents adding redirects as sitelinks (per 2017 redirect RfC). I can remember one situation in German Wikipedia where a user got in trouble because they disabled/re-enabled too many redirects to link them to Wikidata, and some community members including an admin considered that as vandalism that had to be stopped.
Aug 7 2019
In this edit, a user at German Wikipedia complains that the feature is visible again when using the mobile skin (minerva), and I can confirm that this is actually the case. This became apparent after CSS code was removed from MediaWiki:Mobile.css which set display:none; for .read-more-container (the container which contained the tool output).
- Is this intentional, i.e. should the feature be enabled in mobile view in German Wikipedia?
- Has the feature been activated before the mis-configuration was deployed initially?
Aug 2 2019
German Wikipedia user here who wants to use the feature.
Jul 18 2019
Same as T224669 if I understand correctly.
Jun 14 2019
Jun 4 2019
May 30 2019
May 8 2019
Thanks, but T216605 is access restricted and I cannot see any of its content. I feel pretty lost with this problem.