Page MenuHomePhabricator

So9q (Dennis Priskorn)
User

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Sep 16 2019, 11:47 AM (145 w, 2 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
So9q [ Global Accounts ]

Recent Activity

Sat, Jun 25

So9q added a comment to T35470: Create API for mass deleting pages (aka Special:Nuke).

Added Wikibase tags which was requested by Lydia.

Sat, Jun 25, 11:09 AM · Wikibase and Wikidata Architecture Overview, Wikibase (3rd party installations), MediaWiki-Action-API, MediaWiki-extensions-Nuke
So9q added projects to T35470: Create API for mass deleting pages (aka Special:Nuke): Wikibase (3rd party installations), Wikibase and Wikidata Architecture Overview.
Sat, Jun 25, 11:08 AM · Wikibase and Wikidata Architecture Overview, Wikibase (3rd party installations), MediaWiki-Action-API, MediaWiki-extensions-Nuke
So9q added a comment to T35470: Create API for mass deleting pages (aka Special:Nuke).

I am loading a large number of elements in a local Wikibase installation and I often need to reprocess the dataset.
A batch delete function to mass delete elements inserted by a bot user would be useful.

Sat, Jun 25, 8:35 AM · Wikibase and Wikidata Architecture Overview, Wikibase (3rd party installations), MediaWiki-Action-API, MediaWiki-extensions-Nuke

Wed, Jun 22

So9q added a comment to T287164: Improve bulk import via API.

Any news on this? Is something hindering it from being triaged?

Wed, Jun 22, 9:02 PM · Wikidata, wdwb-tech, Wikibase (3rd party installations)
So9q awarded T287164: Improve bulk import via API a Like token.
Wed, Jun 22, 9:01 PM · Wikidata, wdwb-tech, Wikibase (3rd party installations)

May 4 2022

So9q added a comment to T272088: Logging (pywiki module) always verbose if enabled.

I'm leaning towards forking pywikibot and removing the offending lines in bot.py causing the verbose log of files.

May 4 2022, 10:16 PM · Pywikibot
So9q added a comment to T272088: Logging (pywiki module) always verbose if enabled.

I see. 'pywiki' logger will be initialized to level 11 and there is no easy way to change that. As a work-around you can modify the logger after it was initialized e.g.

import logging
import pywikibot
pywikibot.output('This will initialize the logger')
logger = logging.getLogger('pywiki')
logger.setLevel(logging.WARNING)
pywikibot.output('This message will not be logged anymore')
pywikibot.log('Also verbose lgs are hidden')
pywikibot.warning('Warnings are still logged')
pywikibot.error('Errors are logged too.)

This workaround did not work for me :/

I commented out the whole content of bot.py: writelogheader() and that did the trick!

@So9q, starting to work on this item now; by "did not work" do you mean you still see the ~300 lines of verbose log output? What platform are you running on?

May 4 2022, 10:10 PM · Pywikibot

Apr 11 2022

aborrero awarded T299039: All started jobs failed on Kubernetes during 24h with no visible error or output a Like token.
Apr 11 2022, 11:08 AM · Toolforge, Kubernetes
So9q closed T299039: All started jobs failed on Kubernetes during 24h with no visible error or output, a subtask of T285944: Toolforge: beta phase for the new jobs framework, as Resolved.
Apr 11 2022, 10:04 AM · Toolforge Jobs framework, cloud-services-team (Kanban)
So9q closed T299039: All started jobs failed on Kubernetes during 24h with no visible error or output as Resolved.

It now happens again.

Apr 11 2022, 10:04 AM · Toolforge, Kubernetes

Apr 8 2022

So9q added a comment to T304943: Make Wikibase error message "Malformed input" more meaningful .

I would love to see this improved. I often get this generic error when an empty string is passed with a string property. I would also really like to know which property was related to the input error so I don't have to scan through tens of properties to find the anomaly manually.

Apr 8 2022, 5:26 AM · Wikidata, Wikidata-Campsite, good first task

Mar 31 2022

So9q created T305117: UI usability bug when swiping right.
Mar 31 2022, 4:54 AM · Wikipedia-iOS-App-Backlog, iOS-app-Bugs

Mar 30 2022

So9q added a comment to T289561: Evaluate Apache Rya as alternative to Blazegraph.

Based on the discussion above I suggest closing this task.

Mar 30 2022, 4:44 AM · Wikidata, Wikidata-Query-Service

Mar 18 2022

So9q renamed T301635: Update the infrastructure diagrams detailing the interactions of the WMF wikibase and blazegraph stack from Create/publish PlanUML diagrams detailing the interactions of the WMF tech stack to Update the infrastructure diagrams detailing the interactions of the WMF wikibase and blazegraph stack.
Mar 18 2022, 10:06 AM · WMF-Architecture-Team
So9q added a comment to T301635: Update the infrastructure diagrams detailing the interactions of the WMF wikibase and blazegraph stack.

They look good. Thanks. I’m missing Wikibase related infrastructure there though.
The blazegraph issue is the fact that WMDE and the search platform tream are unsure if the backend can handle any more triples without catastrophic failure. But it might not be possible to detect anyway using graphs like these.

Mar 18 2022, 10:03 AM · WMF-Architecture-Team

Mar 10 2022

So9q awarded T303488: Add a Javascript function to create a new portlet section similar to mw.util.addPortletLink a Like token.
Mar 10 2022, 7:17 AM · MediaWiki-General, JavaScript

Mar 9 2022

So9q created T303431: English Wikipedia Rest documentation does not show up.
Mar 9 2022, 5:52 PM · RESTBase-API

Feb 26 2022

So9q added a comment to T291903: Evaluate QLever as a time lagging SPARQL backend to offload the BlazeGraph cluster.

@Hannah_Bast informed in the last WDQS scaling meeting that QLever could have 2 indexes to provide near-realtime queries. See https://github.com/ad-freiburg/qlever/wiki/QLever-support-for-SPARQL-1.1-Update

Feb 26 2022, 12:21 PM · Wikidata, Wikidata-Query-Service

Feb 18 2022

So9q added a comment to T195469: Warning to avoid creating duplicates of Lexemes.

Hangor and Ordia/Lexeme forms has this already. I use those to create lexemes because it is "safer" until this ticket is fixed. Unfortunately neither Hangor nor Lexeme forms support creating phrases or idioms yet.

Feb 18 2022, 6:12 AM · Wikidata Lexicographical data, Wikidata

Feb 13 2022

So9q created T301635: Update the infrastructure diagrams detailing the interactions of the WMF wikibase and blazegraph stack.
Feb 13 2022, 2:06 PM · WMF-Architecture-Team
So9q added a comment to T301227: Create RDF dataset for testing alternatives to Blazegraph.

related to https://phabricator.wikimedia.org/T260687 maybe a duplicate?

Feb 13 2022, 2:03 PM · Wikidata

Feb 10 2022

So9q added a comment to T199197: [2.11] Integrate Citoid in Wikidata.

Is there a reason this issue has stalled?

Feb 10 2022, 8:28 PM · Knowledge-Integrity, Patch-For-Review, Citoid, WMF-Legal, Wikidata, Epic

Feb 8 2022

So9q renamed T301227: Create RDF dataset for testing alternatives to Blazegraph from Create test RDF dataset for evaluating alternatives to Blazegraph to Create RDF dataset for testing alternatives to Blazegraph.
Feb 8 2022, 2:49 PM · Wikidata
So9q added a comment to T301243: Wikibase Bug: Unclear error message "save has failed".

For wbstack i tracked this for https://github.com/wbstack/private/issues/4
Quoting from there...

Wiki: https://kbtestwikibase.wiki.opencura.com/wiki/Main_Page Reported by: Olaf

Adam I set up a Wikibase via wbstack.com. First I created a property P1 : Same as, of Type URL. Then I created P2 Instance of. Then I want to add a P1 statement to P2 (same as P31 on Wikidata), and give it the value 'https://www.wikidata.org/entity/P31, and try to save that URL, it gives me an error message Failed to Save. Any ideas what might be wrong?

Trying to add a value to a P1 statement on https://kbtestwikibase.wiki.opencura.com/wiki/Property:P2 of https://www.wikidata.org/entity/P31

API Response:

{
  "error": {
    "code": "failed-save",
    "info": "The save has failed.",
    "messages": [
      {
        "name": "wikibase-api-failed-save",
        "parameters": [],
        "html": {
          "*": "The save has failed."
        }
      }
    ],
    "*": "See https://kbtestwikibase.wiki.opencura.com/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce> for notice of API deprecations and breaking changes."
  }
}

image.png (859×1 px, 318 KB)

Looking at the code paths it is probably coming from

https://github.com/wikimedia/mediawiki-extensions-Wikibase/blob/master/repo/includes/Api/EntitySavingHelper.php#L419

then hitting

https://github.com/wikimedia/mediawiki-extensions-Wikibase/blob/master/repo/includes/Api/EntitySavingHelper.php#L445

which ends up with the user.

So we are hitting EditEntity::ANY_ERROR and EditEntity is what is erroring.

Looking at EditEntity, that means it could be any one of these types of error (as these are the only time an error is set)

image.png (301×457 px, 23 KB)

Generally in the status returned the value of errorFlags seems to be set to something that would help us determine what is happening.

This status does make it into ApiErrorReporter::dieStatus, but it doesnt look like the value itself is returned? Could file an upstream issue for this...

And finally with some debugging...

After the above debugging I can get a log of:

[info] [WBSTACK] Wikibase\Repo\Api\EntitySavingHelper::handleStatus: {"errorFlags":32}

This relates to:

	/**
	 * Indicates that the content triggered an edit filter that uses
	 * the EditFilterMergedContent hook to supervise edits.
	 */
	/* public */ const FILTERED = 32;

FILTERED will be returned as a result of editFilterHookRunner hooks failing

Looking at kbtestwikibase I see that both I when testing and the user editing that reported the issue are not admins.
When I was testing on a second site I was using the default created admin user, and it was working.
On https://addshore-alpha.wiki.opencura.com/wiki/Item:Q1 I created a new user and then encountered the issue.

To investigate:

  • Which extension filter actually caused this?
  • What is the desired default behaviour here for site owners / users?

And then probably do one of:
1 - Upstream error should be better, so that the user knows what is going on
2 - Do not stop new users from adding links?

Feb 8 2022, 2:46 PM · wbstack, Wikibase (3rd party installations)
So9q updated the task description for T301244: Wbstack: URL property does not work.
Feb 8 2022, 1:31 PM · wbstack
So9q created T301244: Wbstack: URL property does not work.
Feb 8 2022, 1:30 PM · wbstack
So9q added a project to T301243: Wikibase Bug: Unclear error message "save has failed": wbstack.
Feb 8 2022, 1:22 PM · wbstack, Wikibase (3rd party installations)
So9q added a comment to T301243: Wikibase Bug: Unclear error message "save has failed".

Terrible error messages like these push the user away. The system is unreliable, saving other statements work, sometimes. A system that cannot explain why it does not work as intended leads to bad UX.

Feb 8 2022, 1:18 PM · wbstack, Wikibase (3rd party installations)
So9q created T301243: Wikibase Bug: Unclear error message "save has failed".
Feb 8 2022, 1:17 PM · wbstack, Wikibase (3rd party installations)
So9q added a comment to T27909: Add a drop-down list for the tags in Special:Newpages, Special:Log and Special:Contributions.

Okay, so here's yet another attempt at this. I tried to re-use as much code as possible from the fancy dropdown in RCFilters, which will hopefully make this easier to review and approve than the past attempts.

Nice, definitely an improvement! This could be one of the earliest completed wishes of this year's survey.

Feb 8 2022, 12:31 PM · Community-Wishlist-Survey-2022, MW-1.38-notes (1.38.0-wmf.23; 2022-02-21), Growth-Team-Filtering, Platform Team Workboards (External Code Reviews), User-notice, Growth-Team, MediaWiki-Change-tagging
So9q added a comment to T290240: Evaluate whether RDF Delta is a good idea to have in the backend.
Feb 8 2022, 10:01 AM · Wikidata, Wikidata-Query-Service
So9q added a comment to T290240: Evaluate whether RDF Delta is a good idea to have in the backend.

We should probably start with a problem we're be trying to solve. What would this be for this one?

Good idea.

So as a data consumer I want to know which triples have been changed between 2 dumps from Wikidata.

As an enterprise company I want to replicate Wikidatas triple store inhouse and therefore consume the RDF Delta to do queries on own infrastructure.

Feb 8 2022, 9:54 AM · Wikidata, Wikidata-Query-Service
So9q added a subtask for T206560: [Epic] Evaluate alternatives to Blazegraph: T301227: Create RDF dataset for testing alternatives to Blazegraph.
Feb 8 2022, 9:50 AM · Discovery-Search (Current work), MediaWiki-Stakeholders-Group, Wikidata, Epic, Wikidata-Query-Service
So9q added a parent task for T301227: Create RDF dataset for testing alternatives to Blazegraph: T206560: [Epic] Evaluate alternatives to Blazegraph.
Feb 8 2022, 9:50 AM · Wikidata
So9q created T301227: Create RDF dataset for testing alternatives to Blazegraph.
Feb 8 2022, 9:50 AM · Wikidata
So9q updated subscribers of T299460: Evaluate the Apache Jena Framework.

@So9q : How would you like to serve everything from one place? It is normal to have replica of data. One of the big bottlenecks is IO. Or do I understand something wrong?

Feb 8 2022, 9:41 AM · MediaWiki-Stakeholders-Group, Wikidata, Epic, Wikidata-Query-Service
So9q added a comment to T299460: Evaluate the Apache Jena Framework.

FYI: I added https://www.wikidata.org/wiki/Q110853896 RDF Delta and Andy to Wikidata.

Feb 8 2022, 9:31 AM · MediaWiki-Stakeholders-Group, Wikidata, Epic, Wikidata-Query-Service
So9q renamed T299460: Evaluate the Apache Jena Framework from Evaluate Apache Jena to Evaluate the Apache Jena Framework.
Feb 8 2022, 9:29 AM · MediaWiki-Stakeholders-Group, Wikidata, Epic, Wikidata-Query-Service
So9q updated the task description for T301220: Create new lexeme missing on mobile.
Feb 8 2022, 8:09 AM · Wikidata, Wikidata Mobile
So9q created T301220: Create new lexeme missing on mobile.
Feb 8 2022, 8:08 AM · Wikidata, Wikidata Mobile

Jan 29 2022

So9q created T300445: Remove the custom logging levels.
Jan 29 2022, 9:41 PM · Pywikibot
So9q updated subscribers of T222608: Should Wikidata Integrator and Pywikibot merge?.

I recommend using WikibaseIntegrator v0.12 instead(RC1 was recently released). It already supports most if not all of Wikibase and has nice APIs ;-)
See the notebooks here for a demonstration: https://github.com/LeMyst/WikibaseIntegrator/tree/rewrite-wbi/notebooks

Jan 29 2022, 9:38 PM · Pywikibot-Wikidata, Pywikibot, Wikimedia-Hackathon-2019
So9q added a comment to T272088: Logging (pywiki module) always verbose if enabled.

I see. 'pywiki' logger will be initialized to level 11 and there is no easy way to change that. As a work-around you can modify the logger after it was initialized e.g.

import logging
import pywikibot
pywikibot.output('This will initialize the logger')
logger = logging.getLogger('pywiki')
logger.setLevel(logging.WARNING)
pywikibot.output('This message will not be logged anymore')
pywikibot.log('Also verbose lgs are hidden')
pywikibot.warning('Warnings are still logged')
pywikibot.error('Errors are logged too.)
Jan 29 2022, 8:43 PM · Pywikibot
So9q awarded T272088: Logging (pywiki module) always verbose if enabled a Like token.
Jan 29 2022, 8:30 PM · Pywikibot
So9q added a comment to T300432: WDQS does not return all the descriptions.

The problem was in the query, stuffing everything in one optional clause.

Jan 29 2022, 1:38 PM · Wikidata, Wikidata-Query-Service
So9q closed T300432: WDQS does not return all the descriptions as Invalid.
Jan 29 2022, 1:38 PM · Wikidata, Wikidata-Query-Service
So9q updated the task description for T300432: WDQS does not return all the descriptions.
Jan 29 2022, 1:31 PM · Wikidata, Wikidata-Query-Service
So9q renamed T300432: WDQS does not return all the descriptions from WDQS label service does not return all descriptions to WDQS does not return all the descriptions.
Jan 29 2022, 1:28 PM · Wikidata, Wikidata-Query-Service
So9q created T300432: WDQS does not return all the descriptions.
Jan 29 2022, 1:26 PM · Wikidata, Wikidata-Query-Service

Jan 28 2022

So9q added a comment to T289621: Evaluate Halyard as alternative to Blazegraph.

That's a bit disappointing b/c it does look like it can scale and has been run through some paces. https://www.linkedin.com/pulse/halyard-tipstricks-trillion-statements-challenge-adam-sotona/

Jan 28 2022, 7:02 PM · MediaWiki-Stakeholders-Group, Wikidata, Epic, Wikidata-Query-Service
So9q added a comment to T289621: Evaluate Halyard as alternative to Blazegraph.

That's a bit disappointing b/c it does look like it can scale and has been run through some paces. https://www.linkedin.com/pulse/halyard-tipstricks-trillion-statements-challenge-adam-sotona/

I'm actually trying to get this to compile with the latest versions but a few things have changed since them so it's a bit of a sludge.

Jan 28 2022, 6:54 PM · MediaWiki-Stakeholders-Group, Wikidata, Epic, Wikidata-Query-Service
So9q updated subscribers of T289621: Evaluate Halyard as alternative to Blazegraph.

Here is their sparql evaluation strategy:

Actual Halyard Evaluation Strategy turns the previous model inside-out. I call it "PUSH Model". The SPARQL query is transformed into a chain (or tree) of pipes (Binding Set Pipe) and then it is asynchronously filled with data. An army of working threads periodically take requests with the highest priorities from the priority queue and perform them (usually by requesting the data from the underlying store and by processing them through the pipes). Each working thread can serve its own synchronous requests to the underlying storage system or process the data through the system almost independently of the others. There are two critical parts of the model implementation to make it really working. One hard part is synchronisation of the joints, where bad synchronisation leads to data corruption. And the second (with the same importance) is perfect balancing of the thread workers jobs. It was critical to design the system to do not let thread workers block each other. When most of the thread workers are blocked, it leads to the performance similar to the previous model. Halyard Strategy handles the worker threads jobs in a priority queue, where the priority is determined from the position in the parsed SPARQL query tree. Pipe iterations and active pumps are another methods to connect Halyard Strategy model with the original RDF4J API (or in some unfinished cases also with Iterations implemented in the original model).

For example your have a SPARQL query containing inner join. The request for data from left part of the join is enqueued with priority N. A worker thread that asynchronously delivers that data to the left pipe of the join also enqueues a request to receive relevant data from the right part of the join (with priority N+1). The higher priority of the right part here is very important to reflect the fact that once you get the right data, you can finish the join procedure and “reduce” the cached load and proceed down the pipes. However (based on the priority queue) the other worker threads can simultaneously prefetch more data for the left part of the join. In ideal situation you can see a continuous CPU load of all thread workers in a connected Java Profiler.

I should mention some numbers here. According to the experiments the Halyard Strategy has been approximately 250 times faster with 50 working threads and a SPARQL query containing 26 various joins. The effectivity of the Halyard Strategy is higher with more joins and unions. However feel free to compare my experimental measurements with your own. Both strategies can be individually selected for each Halyard repository. For an experiment you can set up two repositories (both pointing to the same data) with different SPARQL evaluation strategies.

source: https://www.linkedin.com/pulse/inside-halyard-2-when-one-working-thread-enough-push-versus-sotona

Jan 28 2022, 6:25 PM · MediaWiki-Stakeholders-Group, Wikidata, Epic, Wikidata-Query-Service
So9q added a comment to T289621: Evaluate Halyard as alternative to Blazegraph.

I researched this solution a little:
https://merck.github.io/Halyard/img/architecture.png

architecture.png (1×2 px, 677 KB)

Jan 28 2022, 6:05 PM · MediaWiki-Stakeholders-Group, Wikidata, Epic, Wikidata-Query-Service
So9q closed T290082: Evaluate Apache HBase and RDF4J as alternative to Blazegraph, a subtask of T206560: [Epic] Evaluate alternatives to Blazegraph, as Declined.
Jan 28 2022, 5:30 PM · Discovery-Search (Current work), MediaWiki-Stakeholders-Group, Wikidata, Epic, Wikidata-Query-Service
So9q closed T290082: Evaluate Apache HBase and RDF4J as alternative to Blazegraph as Declined.
Jan 28 2022, 5:30 PM · Wikidata, Wikidata-Query-Service
So9q added a comment to T290082: Evaluate Apache HBase and RDF4J as alternative to Blazegraph.

Yes, this issue can be closed unless WMF wanna implement an own solution based on the linked paper.

Jan 28 2022, 5:29 PM · Wikidata, Wikidata-Query-Service
So9q added a comment to T104762: Setup sparqly service at https://sparqly.wmflabs.org/ (like Quarry).

With the current SPARQL setup it's easy to share queries either by full url or by short url. I think we can close this one.

I disagree: one important part of this task, saving results, isn’t served at all by this. We want to be able to save query results and share them, and unlike on Quarry, it shouldn’t be possible to change those results later, even for the query author (who, on Quarry, can re-run the query, changing the results without assigning a new ID). Other than when privacy or legal concerns require the results to be deleted, the pages should be immutable.

Jan 28 2022, 10:16 AM · Epic, Quarry, patch-welcome, Wikidata-Query-Service, Discovery, Wikidata, VPS-Projects

Jan 25 2022

So9q created T300084: Help missing on filter form.
Jan 25 2022, 11:34 PM · Wikidata

Jan 24 2022

So9q added a comment to T288262: Estimate how many Wikidata items have low/no ORES score.

The analysis is done here (for Q-ids): Wikidata_Item_ORES_Score_Analysis

Jan 24 2022, 9:56 PM · Machine-Learning-Team, ORES, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
So9q added a comment to T288262: Estimate how many Wikidata items have low/no ORES score.

Yeah I think the underlying question we came to with this was if it would make sense to consider kicking out the low-quality Items from the Query Service for the disaster planning. The more I think about the less I think we should, because the query service is such an important piece of infrastructure for the workflows to get exactly these low-quality Items improved.

Jan 24 2022, 9:08 PM · Machine-Learning-Team, ORES, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
So9q added a comment to T90112: Investigate Apache Jena for WDQ.

@Lydia_Pintscher can you remember why this was declined back in 2015? What was the state of Jena back then compared to BlazeGraph?

Jan 24 2022, 1:45 PM · Wikidata, Wikidata-Query-Service
So9q added a comment to T299460: Evaluate the Apache Jena Framework.

I read the whole thread and just want to point out that Jena supports SPARQL Update also.

Jan 24 2022, 1:44 PM · MediaWiki-Stakeholders-Group, Wikidata, Epic, Wikidata-Query-Service
So9q awarded T299460: Evaluate the Apache Jena Framework a Like token.
Jan 24 2022, 1:36 PM · MediaWiki-Stakeholders-Group, Wikidata, Epic, Wikidata-Query-Service

Jan 21 2022

So9q placed T299752: Allow federation with the Wikipedia citations endpoint in WDQS up for grabs.
Jan 21 2022, 11:49 AM · Wikidata, Wikidata-Query-Service
So9q updated subscribers of T299752: Allow federation with the Wikipedia citations endpoint in WDQS.
Jan 21 2022, 10:57 AM · Wikidata, Wikidata-Query-Service
So9q updated subscribers of T299752: Allow federation with the Wikipedia citations endpoint in WDQS.
Jan 21 2022, 10:41 AM · Wikidata, Wikidata-Query-Service
So9q created T299752: Allow federation with the Wikipedia citations endpoint in WDQS.
Jan 21 2022, 10:41 AM · Wikidata, Wikidata-Query-Service

Jan 19 2022

So9q added a comment to T299121: Job getting killed on k8s.

It did not. It still got killed and I more or less gave up on running this on k8s. I just tried running it on the bastion instead and got this:

bild.png (152×1 px, 24 KB)

I'm curious how this killing of processes are governed. In a shared webhost at Dreamhost I once had a job killed, but then there was a clear message, so I could see why it was killed (long running wget jobs was not permitted).

Jan 19 2022, 4:21 PM · Toolforge Jobs framework, Kubernetes

Jan 16 2022

So9q added a comment to T299121: Job getting killed on k8s.

lowered to saving every 1000 lines now. I hope that will solve the issue

Jan 16 2022, 1:17 PM · Toolforge Jobs framework, Kubernetes
So9q added a comment to T299121: Job getting killed on k8s.

still gets killed. hm

bild.png (689×940 px, 90 KB)

Jan 16 2022, 12:51 PM · Toolforge Jobs framework, Kubernetes
So9q added a comment to T299121: Job getting killed on k8s.

still getting killed with 10000 as limit. hm.

bild.png (670×926 px, 91 KB)

Jan 16 2022, 12:02 PM · Toolforge Jobs framework, Kubernetes

Jan 15 2022

So9q added a comment to T299121: Job getting killed on k8s.

bild.png (678×906 px, 91 KB)

after lowering to saving every 15000 lines it still gets killed.
10000 was ok so lowering to that.

Jan 15 2022, 11:58 AM · Toolforge Jobs framework, Kubernetes

Jan 14 2022

So9q added a comment to T299121: Job getting killed on k8s.

A job just got killed again.
This time I was extracting using this exact code: https://github.com/dpriskorn/WikidataMLSuggester/commit/8f411459cbf685852aea9a238719988e5ba0611e

Jan 14 2022, 9:46 PM · Toolforge Jobs framework, Kubernetes

Jan 13 2022

So9q added a comment to T299121: Job getting killed on k8s.

Note this is low priority for me because I found a workaround and simply output a pickle for every x lines. Afterwards I can join them all easily to one big dataframe.

Jan 13 2022, 3:33 PM · Toolforge Jobs framework, Kubernetes
So9q added a comment to T299121: Job getting killed on k8s.
In T299121#7619402, @Majavah wrote:
Jan 13 2022, 3:30 PM · Toolforge Jobs framework, Kubernetes
So9q updated the task description for T299121: Job getting killed on k8s.
Jan 13 2022, 12:45 PM · Toolforge Jobs framework, Kubernetes
So9q added a comment to T299121: Job getting killed on k8s.

During writing of the output the process was killed.
Evidence is here of partial write of the output file:

bild.png (653×1 px, 131 KB)

Jan 13 2022, 12:44 PM · Toolforge Jobs framework, Kubernetes
So9q updated the task description for T299121: Job getting killed on k8s.
Jan 13 2022, 9:58 AM · Toolforge Jobs framework, Kubernetes
So9q created T299121: Job getting killed on k8s.
Jan 13 2022, 9:52 AM · Toolforge Jobs framework, Kubernetes

Jan 12 2022

So9q added a comment to T299039: All started jobs failed on Kubernetes during 24h with no visible error or output.

could you please paste here the concrete toolforge-jobs command line you are using to create the job?

Jan 12 2022, 5:50 PM · Toolforge, Kubernetes
So9q created T299039: All started jobs failed on Kubernetes during 24h with no visible error or output.
Jan 12 2022, 11:51 AM · Toolforge, Kubernetes

Nov 10 2021

NavinoEvans awarded T288418: Hard to find where to report bugs from ISA a Love token.
Nov 10 2021, 3:09 PM · ISA

Oct 19 2021

So9q added a comment to T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata.

"percentage, number of scientific papers that are connected to non-scientific paper items in WD"
Quite a lot of scholarly papers are connected to a journal item, to one or more topic items, to a language item, some to a notable author, that is in Wikipedia (so we need item in Wikidata). Currently, according to the statistics on Scholia there are 14.211.431 topic links. Works may have multiple links so perhaps only <10.000.000 works have one or more topics, - we should target for most works to have a topic, so I suspect this would grow.

Oct 19 2021, 6:48 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Oct 9 2021

So9q added a comment to T259105: Qurator: Data about Current Events.

@Manuel

  • The test dashboard is running here.
  • It will take some time before we begin to observe any differences between the 6h, 24h, 48h, and 72h tables;
  • As soon as this is evaluated, I will deploy to production on Wikidata Analytics.

Note to myself: correct Updated every minute.

Oct 9 2021, 4:56 PM · Wikidata Analytics, Wikidata, WMDE-Analytics-Engineering, User-GoranSMilovanovic
So9q added a comment to T291903: Evaluate QLever as a time lagging SPARQL backend to offload the BlazeGraph cluster.

Oh, ok. Could you give an example of a query that has no "highly selective triples" so I can test it on QLever vs. BG?

Here is a relatively simple query without a highly selective triple. It asks for the 100 people with the most professions. It requires a JOIN of the first triple (around 9 million people) with the second triple (all people and their professions, around 8.5 million triples). And there is no easy way around computing the full join result because we want the people with the most professions in the end and you cannot know in advance which people these are. The query deliberately does not have a large query result. So if it takes long, it's not because the output is so large.

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?person ?name (COUNT(?profession) AS ?count) WHERE {
  ?person wdt:P31 wd:Q5 .
  ?person wdt:P106 ?profession .
  ?person rdfs:label ?name .
  FILTER (LANG(?name) = "en")
}
GROUP BY ?person ?name
ORDER BY DESC(?count)
LIMIT 100

PS: Here you can see the performance on QLever: https://qlever.cs.uni-freiburg.de/wikidata/cYvT6w

Oct 9 2021, 11:17 AM · Wikidata, Wikidata-Query-Service

Oct 7 2021

So9q added a comment to T291903: Evaluate QLever as a time lagging SPARQL backend to offload the BlazeGraph cluster.

I have now revised QLever's Quickstart page: https://github.com/ad-freiburg/qlever

It allows you to build the code, build an index for a given set of triples, and start the engine with just a few copy&paste operations. With two example datasets, one small (120 Years of Olympics, 1.8M triples) and one large (the complete Wikidata, 12B triples). Building the small dataset takes around 20 seconds. Building the complete Wikidata takes around 20 hours. On a standard PC.

Oct 7 2021, 11:51 AM · Wikidata, Wikidata-Query-Service
So9q added a comment to T289561: Evaluate Apache Rya as alternative to Blazegraph.

🤩 big thanks for sharing this!

Oct 7 2021, 10:24 AM · Wikidata, Wikidata-Query-Service
So9q added a comment to T291903: Evaluate QLever as a time lagging SPARQL backend to offload the BlazeGraph cluster.
  1. Subqueries and predicate paths are also supported. Where is it written that they are not?
Oct 7 2021, 10:12 AM · Wikidata, Wikidata-Query-Service
So9q added a comment to T290839: Evaluate a double backend strategy for WDQS.

@So9q I have commented on your comments concerning Rya in the "Evaluate Apache Rya as alternative to Blazegraph": https://phabricator.wikimedia.org/T289561#7393732

I have commented on your questions concerning QLever in the "Evaluate QLever ..." thread: https://phabricator.wikimedia.org/T291903#7382766 https://phabricator.wikimedia.org/T291903#7393813

Concerning your wdt:P31/wdt:P279* query: Can you provide the original SPARQL query that you wanted to ask?

Oct 7 2021, 9:57 AM · Wikidata, Wikidata-Query-Service

Sep 28 2021

So9q added a comment to T291903: Evaluate QLever as a time lagging SPARQL backend to offload the BlazeGraph cluster.

Given the limitations many Scholia queries need BG to work or will need to be rewritten to avoid subqueries.

Sep 28 2021, 8:20 AM · Wikidata, Wikidata-Query-Service
So9q renamed T291903: Evaluate QLever as a time lagging SPARQL backend to offload the BlazeGraph cluster from Evaluating QLever as a time lagging SPARQL backend to offload the BlazeGraph cluster to Evaluate QLever as a time lagging SPARQL backend to offload the BlazeGraph cluster.
Sep 28 2021, 8:20 AM · Wikidata, Wikidata-Query-Service
So9q added a subtask for T290839: Evaluate a double backend strategy for WDQS: T291903: Evaluate QLever as a time lagging SPARQL backend to offload the BlazeGraph cluster.
Sep 28 2021, 8:19 AM · Wikidata, Wikidata-Query-Service
So9q added a parent task for T291903: Evaluate QLever as a time lagging SPARQL backend to offload the BlazeGraph cluster: T290839: Evaluate a double backend strategy for WDQS.
Sep 28 2021, 8:19 AM · Wikidata, Wikidata-Query-Service
So9q created T291903: Evaluate QLever as a time lagging SPARQL backend to offload the BlazeGraph cluster.
Sep 28 2021, 8:19 AM · Wikidata, Wikidata-Query-Service

Sep 27 2021

Restricted Application added a project to T209611: [Epic] Make ORES scores for wikidata available as a dump: wdwb-tech.
Sep 27 2021, 10:02 PM · wdwb-tech, Analytics-Radar, Wikidata, Dumps-Generation, Epic, ORES, revscoring, Machine-Learning-Team, artificial-intelligence
So9q added a comment to T289561: Evaluate Apache Rya as alternative to Blazegraph.

The streams mentioned in T291089 by Addshore could be used to populate a Apache Rya backend also, probably with little effort as it is built on Apache Accumulo which uses Hadoop.

Sep 27 2021, 9:57 PM · Wikidata, Wikidata-Query-Service
So9q renamed T289561: Evaluate Apache Rya as alternative to Blazegraph from Evaluate Rya as alternative to Blazegraph to Evaluate Apache Rya as alternative to Blazegraph.
Sep 27 2021, 9:56 PM · Wikidata, Wikidata-Query-Service
So9q awarded T291089: Proposal: Generate Wikidata JSON & RDF dumps from Hadoop a Love token.
Sep 27 2021, 9:52 PM · Analytics-Radar, wdwb-tech, Wikidata, Dumps-Generation
So9q added a comment to T290839: Evaluate a double backend strategy for WDQS.

can give us the short update delays that users expect

I am a user that rarely needs short update delays.
Didn't we just take a poll about what features of WDQS users prefer/want? Do we have the results of that to see if a double backend strategy would satisfy users?

Sep 27 2021, 9:48 PM · Wikidata, Wikidata-Query-Service
So9q added a comment to T290839: Evaluate a double backend strategy for WDQS.

It's of course up to you (the Wikidata team) to decide this. But I wouldn't dismiss this idea so easily.

There is clearly a group of users who want to query the exact contents of the database at the point in time they are querying it. I assume that this group includes many Wikimedians and all kinds of statistics queries on Wikidata. But I am sure that there is also a large group of users who don't care if the version of Wikidata they are querying is a few hours old, but who care much more about convenience and efficiency (or getting results at all, which is clearly a problem with the current service).

Sep 27 2021, 8:24 PM · Wikidata, Wikidata-Query-Service