- User Since
- Sep 16 2019, 11:47 AM (183 w, 13 h)
- LDAP User
- MediaWiki User
- So9q [ Global Accounts ]
Mon, Mar 13
Tue, Feb 21
Mon, Feb 20
I don’t need this task anymore as I found the real terminal and successfully edited the hidden file as I wanted.
Sun, Feb 19
Feb 14 2023
Thanks for working hard on this to get it sorted out and sharing the root cause analysis. :)
Jan 21 2023
I investigated this as I have trouble in Firefox on iOS (iphone 7) editing long lines.
I added a few fixes in my css see https://www.wikidata.org/wiki/User:So9q/vector-2022.css but I suggest switching to textarea and so user can set a height and avoid scroll issues when mobile editing lexemes.
Dec 2 2022
Nov 21 2022
Nov 12 2022
See discussion here https://github.com/openstreetmap/operations/issues/764#issuecomment-1312407828
Nov 11 2022
I never heard about this project. Would it not be possible to move to the kubernetes cluster?
Oct 21 2022
Thanks for taking the time to submit this ticket. I support it.
Cradle is a good start, but it lacks many important features.
For example it lacks the check for duplicates that is present e.g in the Wikidata lexeme forms tool or through reconciliation in openrefine.
Jun 25 2022
Added Wikibase tags which was requested by Lydia.
Jun 22 2022
Any news on this? Is something hindering it from being triaged?
May 4 2022
I'm leaning towards forking pywikibot and removing the offending lines in bot.py causing the verbose log of files.
Apr 11 2022
It now happens again.
Apr 8 2022
I would love to see this improved. I often get this generic error when an empty string is passed with a string property. I would also really like to know which property was related to the input error so I don't have to scan through tens of properties to find the anomaly manually.
Mar 31 2022
Mar 30 2022
Based on the discussion above I suggest closing this task.
Mar 18 2022
They look good. Thanks. I’m missing Wikibase related infrastructure there though.
The blazegraph issue is the fact that WMDE and the search platform tream are unsure if the backend can handle any more triples without catastrophic failure. But it might not be possible to detect anyway using graphs like these.
Mar 10 2022
Mar 9 2022
Feb 26 2022
@Hannah_Bast informed in the last WDQS scaling meeting that QLever could have 2 indexes to provide near-realtime queries. See https://github.com/ad-freiburg/qlever/wiki/QLever-support-for-SPARQL-1.1-Update
Feb 18 2022
Hangor and Ordia/Lexeme forms has this already. I use those to create lexemes because it is "safer" until this ticket is fixed. Unfortunately neither Hangor nor Lexeme forms support creating phrases or idioms yet.
Feb 13 2022
related to https://phabricator.wikimedia.org/T260687 maybe a duplicate?
Feb 10 2022
Is there a reason this issue has stalled?
Feb 8 2022
Terrible error messages like these push the user away. The system is unreliable, saving other statements work, sometimes. A system that cannot explain why it does not work as intended leads to bad UX.
FYI: I added https://www.wikidata.org/wiki/Q110853896 RDF Delta and Andy to Wikidata.
Jan 29 2022
I recommend using WikibaseIntegrator v0.12 instead(RC1 was recently released). It already supports most if not all of Wikibase and has nice APIs ;-)
See the notebooks here for a demonstration: https://github.com/LeMyst/WikibaseIntegrator/tree/rewrite-wbi/notebooks
The problem was in the query, stuffing everything in one optional clause.
Jan 28 2022
Here is their sparql evaluation strategy:
Actual Halyard Evaluation Strategy turns the previous model inside-out. I call it "PUSH Model". The SPARQL query is transformed into a chain (or tree) of pipes (Binding Set Pipe) and then it is asynchronously filled with data. An army of working threads periodically take requests with the highest priorities from the priority queue and perform them (usually by requesting the data from the underlying store and by processing them through the pipes). Each working thread can serve its own synchronous requests to the underlying storage system or process the data through the system almost independently of the others. There are two critical parts of the model implementation to make it really working. One hard part is synchronisation of the joints, where bad synchronisation leads to data corruption. And the second (with the same importance) is perfect balancing of the thread workers jobs. It was critical to design the system to do not let thread workers block each other. When most of the thread workers are blocked, it leads to the performance similar to the previous model. Halyard Strategy handles the worker threads jobs in a priority queue, where the priority is determined from the position in the parsed SPARQL query tree. Pipe iterations and active pumps are another methods to connect Halyard Strategy model with the original RDF4J API (or in some unfinished cases also with Iterations implemented in the original model).
For example your have a SPARQL query containing inner join. The request for data from left part of the join is enqueued with priority N. A worker thread that asynchronously delivers that data to the left pipe of the join also enqueues a request to receive relevant data from the right part of the join (with priority N+1). The higher priority of the right part here is very important to reflect the fact that once you get the right data, you can finish the join procedure and “reduce” the cached load and proceed down the pipes. However (based on the priority queue) the other worker threads can simultaneously prefetch more data for the left part of the join. In ideal situation you can see a continuous CPU load of all thread workers in a connected Java Profiler.
I should mention some numbers here. According to the experiments the Halyard Strategy has been approximately 250 times faster with 50 working threads and a SPARQL query containing 26 various joins. The effectivity of the Halyard Strategy is higher with more joins and unions. However feel free to compare my experimental measurements with your own. Both strategies can be individually selected for each Halyard repository. For an experiment you can set up two repositories (both pointing to the same data) with different SPARQL evaluation strategies.
I researched this solution a little:
- Here is an overview of the mapping and explanation of the choice of hashing of triples and quads into columns and not storing anything in the value column. https://www.linkedin.com/pulse/inside-halyard-1-triples-keys-columns-values-upside-adam-sotona
Yes, this issue can be closed unless WMF wanna implement an own solution based on the linked paper.
Jan 25 2022
Jan 24 2022
@Lydia_Pintscher can you remember why this was declined back in 2015? What was the state of Jena back then compared to BlazeGraph?
I read the whole thread and just want to point out that Jena supports SPARQL Update also.
Jan 21 2022
Jan 19 2022
It did not. It still got killed and I more or less gave up on running this on k8s. I just tried running it on the bastion instead and got this:
I'm curious how this killing of processes are governed. In a shared webhost at Dreamhost I once had a job killed, but then there was a clear message, so I could see why it was killed (long running wget jobs was not permitted).
Jan 16 2022
lowered to saving every 1000 lines now. I hope that will solve the issue
still gets killed. hm
still getting killed with 10000 as limit. hm.
Jan 15 2022
after lowering to saving every 15000 lines it still gets killed.
10000 was ok so lowering to that.
Jan 14 2022
A job just got killed again.
This time I was extracting using this exact code: https://github.com/dpriskorn/WikidataMLSuggester/commit/8f411459cbf685852aea9a238719988e5ba0611e
Jan 13 2022
Note this is low priority for me because I found a workaround and simply output a pickle for every x lines. Afterwards I can join them all easily to one big dataframe.
During writing of the output the process was killed.
Evidence is here of partial write of the output file: