User Details
- User Since
- Dec 30 2014, 5:46 PM (571 w, 6 d)
- Availability
- Available
- LDAP User
- WolfgangFahl
- MediaWiki User
- Seppl2013 [ Global Accounts ]
Sun, Dec 14
second copy on my server fur is also running. Tools ran smoothly. Note that the size of the main file is not available remotely so one has to fill in the value manually and rerun the jnlget --cat.
bash ./jnlget --cat ✓ md5sums.txt exists ➜ reassembling downloads into 2025-12-07-wikidata-data.jnl.gz ⚠ 2025-12-07-wikidata-data.jnl.gz exists will not override ✓ reassembled 2025-12-07-wikidata-data.jnl.gz (438103694405 bytes) ➜ checking md5 for 2025-12-07-wikidata-data.jnl.gz of size 438103694405 to be ad0006a38103efd715c782a539a6f482 ✓ 2025-12-07-wikidata-data.jnl.gz size and hash
@Harej this was just and time and much appreciated for https://www.wikidata.org/wiki/Wikidata:Scholia/Events/2025_12#scholia_and_snapquery
Sat, Dec 13
curl -H "User-Agent: TestBot/1.0 (test@example.com)" \
> "https://www.wikidata.org/w/api.php?action=query&list=recentchanges&format=json&rcprop=title|timestamp|ids&rclimit=1"
{"batchcomplete":"","continue":{"rccontinue":"20251213230327|2517185297","continue":"-||"},"query":{"recentchanges":[{"type":"edit","ns":0,"title":"Q8589960","pageid":8563160,"revid":2442092221,"old_revid":2390459973,"rcid":2517185298,"timestamp":"2025-12-13T23:03:27Z"}]}}works from inside the docker container
see https://wiki.bitplan.com/index.php/Wikidata_Import_2025-12-13 for the full story.
We now have
22:52:44.153 [main] ERROR org.wikidata.query.rdf.tool.Update - Error during updater run.
java.lang.RuntimeException: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'Please': was expecting ('true', 'false' or 'null
at [Source: (org.apache.http.conn.EofSensorInputStream); line: 1, column: 8]
at org.wikidata.query.rdf.tool.wikibase.WikibaseRepository.fetchRecentChanges(WikibaseRepository.java:237)
at org.wikidata.query.rdf.tool.change.RecentChangesPoller.doFetchRecentChanges(RecentChangesPoller.java:325)
at org.wikidata.query.rdf.tool.change.RecentChangesPoller.fetchRecentChanges(RecentChangesPoller.java:314)
at org.wikidata.query.rdf.tool.change.RecentChangesPoller.batch(RecentChangesPoller.java:338)
at org.wikidata.query.rdf.tool.change.RecentChangesPoller.firstBatch(RecentChangesPoller.java:162)
at org.wikidata.query.rdf.tool.change.RecentChangesPoller.firstBatch(RecentChangesPoller.java:38)
at org.wikidata.query.rdf.tool.Updater.run(Updater.java:152)
at org.wikidata.query.rdf.tool.Update.run(Update.java:174)
at org.wikidata.query.rdf.tool.Update.main(Update.java:98)./jnlget --cat ✓ md5sums.txt exists ➜ reassembling downloads into 2025-12-07-wikidata-data.jnl.gz 57.7GiB 0:01:06 [ 110MiB/s] [ 895MiB/s] [======> ] 14% ETA 0:06:40
@Addshore i am not sure - the speed has increased a lot this afternoon.
wf@wikidata:/hd/gamma/wikidata2025-12-11$ date;du -sm . Sat Dec 13 04:57:26 PM CET 2025 12701 . wf@wikidata:/hd/gamma/wikidata2025-12-11$ date;du -sm . Sat Dec 13 04:59:24 PM CET 2025 16953 .
Unfortunately the problems we had in June show up again. The download is very very unreliable and slow. So far i only managed to download some 15 chunks. Any ideas how i can get a higher download thruput than the current less than 1 MBit?
Thu, Dec 11
@Harej - thx - download with 4 parallel threads is underway currently at 6 MByte/s ETA +18 h tomorrow afternoon our time. Will report on the md5 reassambly. Do i need an update of the https://github.com/scatter-llc/private-wikidata-query which i already forked to get the streaming update problem fixed?
Thu, Dec 4
sparqlquery -qp queries_tiny_wikidatasubsets.yaml -qn Timezones --limit 3
[
{
"timezone": "http://www.wikidata.org/entity/Q116908814",
"timezoneLabel": "Q116908814"
},
{
"timezone": "http://www.wikidata.org/entity/Q6913416",
"timezoneLabel": "Bagdad Time zone"
},
{
"timezone": "http://www.wikidata.org/entity/Q2280365",
"timezoneLabel": "Calcutta time"
}
]Wed, Dec 3
Just asked for participants on the wikidata mailinglist.
Tue, Nov 25
@dr0ptp4kt - even wget is IMHO not up to the task since for 1 TB up file transfers you need a reliable line for multiple hours and for larger datasets potentially even days and even a simple MD5 check already takes hours even if source and target disks are SSDs. I will happily try things out if the file to be tested with is juicy enough. The three blazegraph journal files certainly are :-)
Sun, Nov 23
@Addshore Adam - i really would appreciate this. What do you imagine would be the transfer logistics?
Sat, Nov 22
Would be great if we get this before the legacy full graph is switched off.
Tue, Nov 18
@Krinkle thanks a ton - very swift resolving. I love it!
Mon, Nov 17
Sep 4 2025
Sep 3 2025
May 19 2025
May 7 2025
we could host this on toolforge
May 4 2025
May 3 2025
Query for ISBN items available in wikidata https://w.wiki/Dywk - currently around 3 million
- 📚 Public ISBN Databases
IMHO there is previous work at https://github.com/rdmpage/wikidata-bibliographic-data
Apr 19 2025
Still open and no pointer on the documentation. Anybody willing to work on this in Istanbul at the 2025 Hackathon?
Mar 5 2025
Unfortunately the feeback on this task seems to have no effect and the variables extension functionality is still not being restored.
What can be done to revert the current tradeoff decicions in favor of the enduser needs that have been described here?
Jan 11 2025
Still i am in Limbo in what to do - what the next steps are. How can we make use of this environment?
Nov 15 2024
What is the next step here? I do not know what it means that this ticket has been worked on.
Oct 31 2024
@bking - thanks for the hint i'll try it out on the upcoming imports
Oct 24 2024
Do you know a tool that will track CPU and Mem load of a server over time (on an Ubuntu server)?
Currently we have 3% Memory of 512 GB and 300% CPU Load (of 16 cores) while the indexing runs. Hannah Bast reported the memory needs are not at 128 GB any more these days but the peak i do not know. If we want to go try and error we could go with the current max and run things for a test.
@Harej
thank you for pointing out your infrastructure availabilty which is sort of already "priced" in in the calculation. We are seeking a third mirror environment sponsored by the Mediawiki foundation. As for the technical aspects: https://wiki.bitplan.com/index.php/Wikidata_Import_2024-10-17
Oct 23 2024
it's 4 TB of SSD and 128 GB RAM for starters which will give you a single QLever instance to be indexable. If you want multiple instances e.g. QLever and blazegraph in parallel the footprint for indexing gets bigger. The runtime requirements are lower so it might be possible to just get the index result over but that would not be the idea of the game.
Sep 25 2024
Hurray! Can we make [[ https://github.com/WolfgangFahl/snapquery | snapquery ]]an official middlware infastructure ? I assume that would be a separate ticket again.
Sep 23 2024
The snapquery concept https://github.com/WolfgangFahl/snapquery with demo at https://snapquery.bitplan.com/ will happily support much larger queries and given them a proper Persistent identifier consiting
of Domain/Namespace/Name and accept meta info such as title, url and other details such as purpose or rating.
Sep 7 2024
Given the upcoming graphsplit this issue is getting even more urgent. https://github.com/WolfgangFahl/snapquery has been developed at Wikimedia Hackathon May 2025 Tallinn to mitigate the issue. The Wikimedia foundation currently does not actively participate in this infrastructure but we hope this will happen soon.
May 1 2024
May 23 2021
Any action yet ?
May 16 2021
One special case of this is publications of proceeedings of scientific conferences. I am intending to have a wikidata backend for my Proceedings Title Parser at ptp.bitplan.com anyways so i could add an "ISBN" input mode.
Apr 9 2021
thx for pointing this out so my issue is linked to the wrong phabricator task. Is there already one for the "zombie" case i am experiencing where i can't delete a page?
https://www.openresearch.org/wiki/Undeletable/EUROCRYPT_2017 is an example for the problem.
Mar 24 2021
@Xqt - thx for the reply and asking - i am fine with this schedule - thank you very much.
Mar 23 2021
Thx. for the explanation. And no - testing the script just now with just the config2.py changed will be tough since the three CI pipelines i am using are all pulling directly from PyPi. For my local environment it would be sort of feasible but i fear the installation routines would override my local changes as they did when i was trying things out for almost a day recently. My current project priorities are also different -i need to make sure i don't have showstoppers and this one isn't any more.
looks like avoiding the exception
I do not know how to use the review tool. I logged in but have no clue what to do. I tried edit/marked as reviewed.
I am happy with my workaround for the time being. I also intend to change my app to a webbased app and use py2app to create an app that only communicates with the webbased app. That way i'll avoid any similar problems. Still to fix things upstream the default should be not to throw an exception but simply log the problem by default. So os.listdir might be wrapped in a try catch blog. Only if configured to specifically show start up issues should there be an exception.
Mar 22 2021
The sample
% python Python 3.8.8 (default, Mar 18 2021, 06:01:57) [Clang 11.0.3 (clang-1103.0.32.62)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from wikibot import wikibot >>>
@Xqt https://github.com/WolfgangFahl/py-3rdparty-mediawiki/issues/43 has the issue about pycrypto...
see https://github.com/WolfgangFahl/py-3rdparty-mediawiki/commit/8ab0bd846d191fa93cfd661c89e2bc769d69f06b for trying this workaround
If you then do a
import pywikibot
I still have to use the awkward work around
import os os.environ["PYWIKIBOT_NO_USER_CONFIG"]="1" import pywikibot
Mar 13 2021
With Mediawiki 1.35 the situation got worse. Now the code from my own github repository https://github.com/WolfgangFahl/S5SlideShow doesn't work any more and i didn't even follow up on the integration in the in mediawiki world yet. http://wiki.bitplan.com/index.php/S5SlideShow tries to describe the issues before the 1.35 LTS changes that broke the extension again.
Feb 14 2021
I am sorry - i did not record the message. Today I tried with HitCounters-REL1_35-52da1b7.tar.gz and the problem didn't show up.
Jan 20 2021
Happened e.g. today when trying to install extension HitCounters on MediaWiki 1.35.
Jan 18 2021
See https://cr.bitplan.com/index.php/Main_Page for the example data shown here.
@Yaron_Koren - i'll certainly try the explanation and add more examples to make my point. As far as i understand the Form definition approach the only way to handle multiple instances is currently explained in https://www.mediawiki.org/wiki/Extension:Page_Forms/Defining_forms#Multiple-instance_templates. For this to work the context of these multiple instances need to be known in advanced. So if i have pages of type:
- Open Source Project
- Git Respository
- Internal Project
...
and I'd love to use the "commit" section in all these i'd need to repeate the definition in the forms.
Could partial forms be made working by putting the definition into a standalone context - e.g. defining a form page as partial and having the form definition be picked up automatically when a fitting macro/WikiSon notation is found?
I have never tried https://www.mediawiki.org/wiki/Extension:Page_Forms/Defining_forms#Partial_forms - does this feature still work?
Before getting rid of Page Forms all together I was thinking of we could have some improvements that would lead to a "workaround" for a start.
Jan 17 2021
Hi, @Yaron_Koren - thank you for looking into this.
Jan 16 2021
@Xqt - thanks for trying again. I still have some project based on pywikibot although I am trying to avoid it and use mwclient more these days due to this limited useability in intranet situations.
Jan 12 2021
It's not clear for me how to open a "documentation" ticket from what https://www.mediawiki.org/wiki/How_to_report_a_bug has to offer. I searched for the keyword "documentation" in the page and couldn't find that keyword. What should i search for ? I never created a documentation ticket before and don't know what it looks like.
I don't know whether this a documentation problem or a bug. It's clearly an unexpected change in behavior that makes the system unusable without any warning i'd know of e.g. in a migration guide. So there are two options:
- fix the behavior
- change the documentation
IMHO it's a valid question and might be a bug if there is no proper documentation for answering the questions. I could't find the proper documentation that describe why the behavior changed and what the reason for this is.
The workaround is
# make sure we don't use "pretty urls" $wgUsePathInfo=false;
Dec 22 2020
Dec 19 2020
https://github.com/WolfgangFahl/py-3rdparty-mediawiki/issues/35 will have to wait for the release on pypi see https://pypi.org/project/pywikibot/
Dec 18 2020
looks great. How do i get rid of the "family and mylang ar not set ..." messages?
Dec 14 2020
Wouldn't this be a good opportunity to generally add a link to a FAQ Wiki per error-message? I feel that following a link per error-message gives a lot more options for discussing individual situations in which an error might appear, linking to related Phabricator tasks and so on.
Dec 13 2020
It happens when I try to change the protocol for accessinga wiki. What I try is to change the apache settings:
<VirtualHost *:80> ServerName server.domain.com Redirect permanent / https://server.domain.com/ </VirtualHost> <VirtualHost *:443> .... ServerName server.domain.com </VirtualHost>
Dec 12 2020
Dec 5 2020
For the extensions https://www.mediawiki.org/wiki/Extension:Diagrams and https://www.mediawiki.org/wiki/Extension:Piwo the approach discussed here might be very useful.
Nov 21 2020
same on my main wiki - showed up after some 3 weeks of migration to 1.35 LTS - it is a showstopper! Why is this low priority if it makes the wiki completly unusable?
Nov 2 2020
Unfortunately https://www.mediawiki.org/wiki/API:Login is inconsistent. E.g. the python example uses login as an example but the text says you shouldn't use it any more but use clientLogin instead. I could not find sufficient information on the cookie issue either.
Nov 1 2020
E.g. how would a curl access with cookie work?
Opening up the wiki for public read access:
One of the issues seems to be that I don't know how to get rid of the "You need read permission to use this module." messsage on MediaWiki1.35 so that things would at least work for my public wikis.
curl -s "http://test.bitplan.com/api.php?action=query&meta=siteinfo&siprop=general|namespaces|namespacealiases|statistics&format=json" | jq .
{
"error": {
"code": "readapidenied",
"info": "You need read permission to use this module.",
"*": "See http://test.bitplan.com/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce> for notice of API deprecations and breaking changes."
}
}Oct 31 2020
If you try to debug the situation and click on "site" in the loginmanager a sequence of API call is initiated ... that makes it even harder to find out what is going on ...
If checking the version is the issue please user the information from the family file (and trust it) instead of trying to find out via Mediawiki API doing an http request before being logged in ...
There is a
WARNING: API error readapidenied: You need read permission to use this module.
warning with no details on what the API called is. When debugging it reveals that a meta=siteinfo is tried and if that fails a login is performed. This doesn't make any sense if a proper user/password combination is given. Why try to use the API - how can this be switched off because of course you can't use the API if you are not logged in ...
What is the proper procedure to tell pywikibot the password for login via API ? I am using
# needs patch as outlined in https://phabricator.wikimedia.org/T248471 #self.site.login(password=self.getPassword()) lm = LoginManager(password=self.getPassword(), site=self.site, user=self.user) lm.login()
and i get a WARNING:
With pywikibot 5.0.0 i have to start all over again ... sigh ..
Oct 1 2020
The same secondary message might appear in other contexts. In my case it is:
Sep 30 2020
I can't see where the problem is.
I tried debugging the line 595492 mentioned in the error message. See my stackoverflow answer to How to get few lines from a .gz compressed file without uncompressing
