Page MenuHomePhabricator

Skybristol (Sky)
User

Projects

User does not belong to any projects.

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Feb 28 2023, 9:30 PM (60 w, 5 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
Skybristol [ Global Accounts ]

Recent Activity

Feb 27 2024

Skybristol added a comment to T343034: item not returned in SPARQL query on geokb wikibase.cloud instance.

Ignore my last note on this. That was a problem in my query. I hadn't realized I didn't have all the alt labels in the data yet.

Feb 27 2024, 4:19 AM · Wikibase Cloud

Feb 24 2024

Skybristol added a comment to T343034: item not returned in SPARQL query on geokb wikibase.cloud instance.

Thanks to the team for all the work on this issue. I ended up changing some things on the items I had posted about here (different property for the URL associated with USGS staff). The problems I noted no longer seem to be an issue as those entities/claims/triples were all updated. I am, however, seeing a problem in a different area this morning. I was doing some work to resolve names of organizational units to identifiers instantiated in the Wikibase instance. I'm not turning up entities in that query that should be there. Here's a query:

Feb 24 2024, 3:57 PM · Wikibase Cloud

Jan 10 2024

Skybristol added a comment to T353624: 🟡 Rebuild Queryservice data for all production wikis.

Thank you for providing all the detail on this process. It's helpful to see that you're working through the issue and that there's a process for syncing everything across the WB environment. For me as a user, it's not so much that there are hiccups along the way. I expect that. Working on a shared platform like WB.c, though, it would be really useful to gain some visibility into logging to see where I might have issues if I've got unexpected query results. Seeing stuff in tickets is fine, or you might think about what aspects of the logs could be opened up for access by WB instance operators.

Jan 10 2024, 2:29 PM · Wikibase Cloud (Kanban board Q4 2023)

Dec 9 2023

Skybristol added a comment to T340074: Option to produce RDF dump on demand.

I've been able to dump a subset of items from geokb.wikibase.cloud as NT files, index them using Qlever's indexing engine (Docker on a cloud-based Ubuntu machine), spin up the Qlever SPARQL back-end on that index, and then query it using the Qlever UI (on the same machine). Both the back-end and UI are proxied via Nginx, and it seems to be working quite well. I don't yet know what my resource load is going to look like and what memory I'll need to allocate, but I noticed the Qlever instance of U. Frieburg going down a fair bit after they publicized it at the Wikidata modeling conference.

Dec 9 2023, 3:22 AM · Wikibase Cloud

Dec 2 2023

Skybristol added a comment to T340074: Option to produce RDF dump on demand.

I have my use case now. After seeing Hannah Bast and Johannes Kalmbach present on Qlever today, I went and spun up an instance of the back-end indexer/SPARQL service and the UI tool on a cloud VM. It's so cool when software actually works like it's supposed to! The approach to indexing and query federation there is fantastic, and I think I'll be able to jettison some of what I reorganized from sources that already conform to RDF fairly well and keep our wb.c instance focused on messier data. The Qlever SPARQL performance is truly outstanding compared to anything else I've tried.

Dec 2 2023, 10:03 PM · Wikibase Cloud

Nov 19 2023

Skybristol added a comment to T348439: 🟨 Increase Query Service Reliability.

Here's my vote for moving this issue up the queue if at all possible. The problem with the Blazegraph store not being synced or whatever it is that's going on is really hurting productivity on wikibase.cloud. We rely on SPARQL for just about everything. Every time I push a bunch of edits like adding thousands of claims with new identifier references to another source (something I just did), I have a situation where only some of the data winds up being returned via SPARQL while I can go to the UI or API and see that the information made it into the core data store. It's quite frustrating.

Nov 19 2023, 10:38 PM · Wikibase Cloud (Kanban board Q1 2024)

Nov 17 2023

Skybristol added a comment to T351288: Create something like the Wikidata TTL dumps, but for a Wikibase.Cloud instance.

I'd be interested to learn more about the use case be for the full TTL encoded dump. Maybe there's a cool usage pattern I could use. It seems like the SPARQL query service can be used to generate a full graph, and it supports offset/pagination to handle scale. Trying to work against a massive TTL encoding means a lot of data being loaded into memory somewhere just to operate on. And the SPARQL/Blazegraph capability does also have its own issues with scalability, especially on queries requiring a filter approach.

Nov 17 2023, 3:24 PM · Wikibase Cloud
Skybristol added a comment to T335448: Custom WDQS prefixes based on dashboard prefix option.

In a lot of ways, I'd be completely fine sticking with the Wikidata standard prefixes pointed at a specific wb.c instance's domain/paths. Or perhaps a slight tweak from "wd" to "wb". The big conceptual thing is distinguishing between terms/concepts that are a part of the Wikibase and larger frameworks (owl, rdfs, prov, etc.) and then the specific knowledge graph being queried. All Wikibase SPARQL queries share common characteristics because of the statement/reference/qualifier dynamic, and many of the examples online (and generative via LLMs) are going to refer to Wikidata, so it's reasonable to keep it s short step between those examples and what the query would look like for a specific Wikibase instance.

Nov 17 2023, 2:30 PM · Wikibase Cloud

Oct 2 2023

Skybristol added a comment to T343034: item not returned in SPARQL query on geokb wikibase.cloud instance.

I still have information that is not being retrieved via SPARQL queries when I can see it visibly on items in the geokb.wikibase.cloud instance. Here's a query:

Oct 2 2023, 9:54 PM · Wikibase Cloud

Aug 5 2023

Skybristol added a comment to T343034: item not returned in SPARQL query on geokb wikibase.cloud instance.

Interestingly enough, I've been posting some one-off things to Wikidata as part of some research into Cherokee Nation Tribal leaders today. SPARQL queries there are not immediately showing results of adding things like start time/end time qualifiers on claims. Here's the query showing missing information I'm attempting to contribute. Maybe it's an inherent delay into Blazegraph?

Aug 5 2023, 11:47 PM · Wikibase Cloud

Aug 3 2023

Skybristol added a comment to T343034: item not returned in SPARQL query on geokb wikibase.cloud instance.

One other thing I tried doing is replacing and then recreating a P114 claim. If I simply use wikibaseintegrator with action_if_exists=REPLACE_ALL, this doesn't have any effect. However, if I first remove the P114 claim entirely and then run another operation to add it back in using wikibaseintegrator, a query on the ID turns up the expected result. This one, P114:70235876, was missing before I did that.

Aug 3 2023, 12:02 PM · Wikibase Cloud
Skybristol added a comment to T343034: item not returned in SPARQL query on geokb wikibase.cloud instance.

Thanks for the input. I probably provided too much detail. That's a useful query, but it wasn't the issue. Bottom line is that I have items that are not showing up in SPARQL queries that absolutely should be showing up based on how they look like they are structured visibly in the UI and how they are returned via the wikimedia API. Here is a single concrete example:

Aug 3 2023, 11:38 AM · Wikibase Cloud

Jul 30 2023

Skybristol added a comment to T343034: item not returned in SPARQL query on geokb wikibase.cloud instance.

It does appear that certain triples may have been lost in the process of ingesting to Blazegraph. For a query like the following where I'm trying to get all of the items that have a particular instance of classification, I'm missing a bunch of items that are actually classified that way.

Jul 30 2023, 3:24 PM · Wikibase Cloud

Jul 28 2023

Skybristol added a comment to T343034: item not returned in SPARQL query on geokb wikibase.cloud instance.

I thought this was something different at first, but perhaps it is related. I noticed I had at least one item where the "what links here" page is showing things linked but without labels.

Jul 28 2023, 9:03 PM · Wikibase Cloud
Skybristol added a comment to T343034: item not returned in SPARQL query on geokb wikibase.cloud instance.

I do have other things missing in SPARQL queries now. The following is supposed to pull about 83K items (a whole tranche of publication items I just brought in):

Jul 28 2023, 7:33 PM · Wikibase Cloud
Skybristol created T343034: item not returned in SPARQL query on geokb wikibase.cloud instance.
Jul 28 2023, 6:41 PM · Wikibase Cloud

May 30 2023

Skybristol added a comment to T336844: geokb.wikibase.cloud instance clogged up with backlogged jobs again.

Hi @Evelien_WMDE. Thank you for connecting on this. I really appreciate what WMDE is working to build out with this infrastructure.

May 30 2023, 2:21 PM · Wikibase Cloud (Kanban board Q2 2023)

May 26 2023

Skybristol created T337565: Enable map preview in wikibase.cloud instances for GlobeCoordinate claims.
May 26 2023, 2:47 PM · Wikibase Cloud, wbstack

May 24 2023

Skybristol added a comment to T336844: geokb.wikibase.cloud instance clogged up with backlogged jobs again.

@Evelien_WMDE Yes. The GeoKB instance does seem to be working better now. Elasticsearch responses seem to be what I would expect, and SPARQL queries are returning what they should. Thank you! I was definitely having some performance problems there for a few days. I've continued to do work on aspects of the model since @Fring indicated things should be "safe," but I'm happy to have it humming along again!

May 24 2023, 7:44 PM · Wikibase Cloud (Kanban board Q2 2023)

May 19 2023

Skybristol added a comment to T336844: geokb.wikibase.cloud instance clogged up with backlogged jobs again.

Hi @Fring. Any progress on the SQL replica problems or other related issues? I'm definitely seeing some performance problems in the instance in work I'm trying to do today with a new class if entities I'm working through. Perhaps I need to just cease working with this until the problem is resolved?

May 19 2023, 6:47 PM · Wikibase Cloud (Kanban board Q2 2023)

May 17 2023

Skybristol added a comment to T336844: geokb.wikibase.cloud instance clogged up with backlogged jobs again.

Here's some visible behavior in the system arising from the problem. If you check out the WhatLinksHere for one of our science centers, you'll see a listing of people affiliated with that organization. Some of the items in the list show up without labels like this one. Those same entities are not returned in a SPARQL search such as trying to retrieve everyone employed by our organization.

May 17 2023, 11:56 AM · Wikibase Cloud (Kanban board Q2 2023)
Skybristol created T336844: geokb.wikibase.cloud instance clogged up with backlogged jobs again.
May 17 2023, 11:09 AM · Wikibase Cloud (Kanban board Q2 2023)

May 11 2023

Skybristol added a comment to T330389: Run Mediawiki Jobs in the background not at the end of requests.

Glad to see this working its way toward some kind of resolution. In the meantime, would someone mind kicking off a job to complete jobs for the geokb.wikibase.cloud instance? I've got some things backed up that are causing problems in SPARQL responses.

May 11 2023, 10:36 PM · Wikibase Cloud (Kanban board Q2 2023)

Apr 21 2023

Skybristol added a comment to T330389: Run Mediawiki Jobs in the background not at the end of requests.

@Fring - Fantastic! Thank you! I see they cleared out pretty quick, and search is all working as it should.

Apr 21 2023, 7:04 PM · Wikibase Cloud (Kanban board Q2 2023)
Skybristol added a comment to T330389: Run Mediawiki Jobs in the background not at the end of requests.

Well, I apologize, but I've once again started to clog up the system with edits. If someone can kick off some job processing for geokb.wikibase.cloud again, that would be great.

Apr 21 2023, 6:10 PM · Wikibase Cloud (Kanban board Q2 2023)

Apr 19 2023

Skybristol added a comment to T330389: Run Mediawiki Jobs in the background not at the end of requests.

Well, I managed to get a whole backlog of jobs going again in another wikibase.cloud instance. This one is showing over 50K jobs.

Apr 19 2023, 9:36 PM · Wikibase Cloud (Kanban board Q2 2023)

Apr 10 2023

Skybristol added a comment to T330389: Run Mediawiki Jobs in the background not at the end of requests.

Following up on my previous comment, I tried this type of write/read/write approach in the latest work for the https://eew-edgi.wikibase.cloud Wikibase instance. I needed to work through all the versions of the North American Industry Classification System (NAICS), creating items for the different industries to be linked to and support analysis of regulated facilities. The bot for this using WikibaseIntegrator works through the 6 different years/editions of the NAICS to develop logical items and their ExternalIDs. I first ran through and created the individual 1981 items and then worked over the items again to add in hierarchical relationships through "has part" and "part of" claims. This involved reading each item serially to add in additional claims, with the idea this might trigger something to complete the jobs.

Apr 10 2023, 1:33 PM · Wikibase Cloud (Kanban board Q2 2023)

Apr 4 2023

Skybristol added a comment to T332894: Some Items not fully reflected in query service of wikibase.cloud instance.

To try and answer this bit you can get a count (but you won't know exactly what they are) of jobs backed up from the api. Have a look at https://eew-edgi.wikibase.cloud/w/api.php?action=query&meta=siteinfo&siprop=statistics for example. Under the jobs key you should see the count of jobs. Hope that helps as a very short term signal!

Apr 4 2023, 3:20 PM · Wikibase Cloud
Skybristol added a comment to T330389: Run Mediawiki Jobs in the background not at the end of requests.

I'm curious about this statement from the original post:

Apr 4 2023, 3:19 PM · Wikibase Cloud (Kanban board Q2 2023)
Skybristol added a comment to T332894: Some Items not fully reflected in query service of wikibase.cloud instance.

Thank you very much for filling in a few blanks for me. Search in the eew-edgi instance is turning up expected results now. I did also subscribe to that issue.

Apr 4 2023, 3:12 PM · Wikibase Cloud
Skybristol added a comment to T332894: Some Items not fully reflected in query service of wikibase.cloud instance.

I'm continuing to see some issues that appear to be related to incomplete jobs somewhere in the WBStack pipeline. It would be good to understand where these are and if there is work going on somewhere to resolve them. It would at least be good to know if there is any way to get visibility on when something is stuck re. the comments from @Tarrow on jobs in a queue somewhere.

Apr 4 2023, 1:13 PM · Wikibase Cloud

Mar 28 2023

Skybristol added a comment to T332894: Some Items not fully reflected in query service of wikibase.cloud instance.

I appreciate all the work digging into this issue.

Mar 28 2023, 1:22 PM · Wikibase Cloud

Mar 24 2023

Skybristol added a comment to T332894: Some Items not fully reflected in query service of wikibase.cloud instance.

My guess is this is a case similar to what I'd seen earlier where an update job somewhere is clogged or backed up. I just added a new property and a couple of new items in the GUI to set up some further work pulling in additional items. Here's the query I use to pull my classification scheme:

Mar 24 2023, 3:37 PM · Wikibase Cloud

Mar 23 2023

Skybristol created T332894: Some Items not fully reflected in query service of wikibase.cloud instance.
Mar 23 2023, 2:24 PM · Wikibase Cloud

Mar 22 2023

Skybristol added a comment to T309070: [Timebox: 18hrs] Frequent 502 responses when submitting edits.

I'm seeing similar issues in work I'm doing at eew-edgi.wikibase.cloud. In my case, it's frequent 504 errors thrown. I've been trying a number of things to better handle large build-out scenarios where I need to populate thousands of reference items such as U.S. states, counties, and cities. Parallel processing using something like Dask just seems to make matters worse. I continue to see problems I reported in a previous issue (https://phabricator.wikimedia.org/T330796) where full availability of an item's functionality is sometimes severely delayed. I see an item in recent changes and can get to it by QID, but I can't turn the item up in UI search or SPARQL queries.

Mar 22 2023, 6:20 PM · Wikibase Cloud

Mar 5 2023

Skybristol added a comment to T330796: Problems with search in newish wikibase instance.

This is continuing to be an issue for me on another use case I am working through. I'm building out information associated with the Facility Registration System operated by the US Environmental Protection Agency. This information is incompletely and inadequately organized into Wikidata, so I am working on methods to process source data and organize into items. I've started on the two classification systems in use which are also incompletely referenced in Wikidata.

Mar 5 2023, 2:51 PM · Wikibase Cloud (WB Cloud Sprint 15), Elasticsearch

Mar 2 2023

Skybristol added a comment to T330796: Problems with search in newish wikibase instance.

Thank you for taking a look at this issue. A bunch of items that I pushed yesterday evening seem to be coming up in search just fine as well. I assume that regular processing for the search index should happen automatically and be close to real time. I built infrastructure years ago between MongoDB and Elasticsearch where we had to deal with scaling issues of keeping an index up to date in real time with the data store. I haven't dug enough into your architecture here yet to understand it, but I can imagine it's a challenge.

Mar 2 2023, 10:30 AM · Wikibase Cloud (WB Cloud Sprint 15), Elasticsearch

Feb 28 2023

Skybristol created T330796: Problems with search in newish wikibase instance.
Feb 28 2023, 9:40 PM · Wikibase Cloud (WB Cloud Sprint 15), Elasticsearch