Page MenuHomePhabricator

tfmorris (Tom Morris)


User does not belong to any projects.


  • Clear sailing ahead.


  • Clear sailing ahead.


  • Clear sailing ahead.

User Details

User Since
Feb 24 2016, 8:38 PM (405 w, 4 d)
MediaWiki User
Tfmorris1 [ Global Accounts ]

Recent Activity

Jul 21 2023

tfmorris added a comment to T303677: Automatically generate descriptions for items based on their P31 (instance of) values.

I'm surprised that this hasn't received any attention in 15 months. As an update to @Nikki 's numbers there are now on the order of 2.5 BILLION of these bot generated descriptions. The top 5 alone represent over 2 billion triples. That's a huge waste of resources!

Jul 21 2023, 6:44 PM · Wikidata-Campsite, Wikidata
tfmorris added a comment to T337021: [Analytics] Find out size of term subgraph.

@Manuel when you write:

Jul 21 2023, 5:48 PM · Wikidata Analytics (Kanban), Wikidata-Query-Service, Wikidata

Jul 20 2023

tfmorris added a comment to T337021: [Analytics] Find out size of term subgraph.

I have a theory as to where a big chunk of the machine generated descriptions are from. They are the phrase "Wikimedia category" in hundreds of languages as a textual transcription of the triple instanceOf Q4167836. For example, Catégorie:Naissance à Seri Menanti has a single label in French and the P31 instanceOf claim which together occupy 802 bytes. Then two bots (Mr.Ibrahembot and Emijrpbot) came along and added another 11.5K (!) of static text (not even anything templated) in 129 languages, none of which have labels for the category.

Jul 20 2023, 9:39 PM · Wikidata Analytics (Kanban), Wikidata-Query-Service, Wikidata
tfmorris added a comment to T337021: [Analytics] Find out size of term subgraph.

Is triple count the only important parameter? It seems likely that the descriptions could be larger, on average, than labels.

Jul 20 2023, 7:26 PM · Wikidata Analytics (Kanban), Wikidata-Query-Service, Wikidata

Mar 21 2023

tfmorris awarded T329093: sitelink encoding issues in the Wikibase REST API a Like token.
Mar 21 2023, 3:55 PM · Wikidata, Wikibase REST API (archieved)

Mar 14 2023

tfmorris added a comment to T329093: sitelink encoding issues in the Wikibase REST API.

How does one discover what the resolution was? (Apologies if this should be obvious, but I'm used to bug trackers which link the commits back to the issue.)

Mar 14 2023, 8:01 PM · Wikidata, Wikibase REST API (archieved)

Feb 9 2023

tfmorris added a comment to T329093: sitelink encoding issues in the Wikibase REST API.

I vote for full URLs. Also, HTTPS URLs should probably be used throughout in preference to HTTP URLs to save naive clients from the extra latency of a redirect.

Feb 9 2023, 3:02 AM · Wikidata, Wikibase REST API (archieved)

Jan 10 2023

tfmorris awarded T326650: pywikibot library should not attach logging handers a Like token.
Jan 10 2023, 5:30 PM · Pywikibot

Sep 10 2020

tfmorris added a comment to T262550: Toolforge returns HTTP 502 error. and were also down about the same time (11:03 Eastern US).

Sep 10 2020, 4:18 PM · cloud-services-team (Kanban), Wikidata, Toolforge

Aug 21 2020

tfmorris updated subscribers of T244847: Future of the OpenRefine Wikidata reconciliation interface.

I'm surprised that a private third party proxying such a significant segment of the traffic to Wikidata hasn't prompted the Wikidata Engineering team to take this more seriously.

Aug 21 2020, 4:33 PM · Wikidata-Query-Service, User-Sandra_Fauconnier_WMSE, Reconciliation, WMSE-Content-partnerships-support-2021-Software-development, Wikidata, OpenRefine

Jul 4 2020

tfmorris added a comment to T240442: Design a continuous throttling policy for Wikidata bots.

Very broad idea, feel free to discard, I think using industry-wide standards for throttling like token bucket, leaky bucket, fixed-window counter or sliding-window counter might help here.

One of the primary questions we need to answer is do we want to keep doing this client side self throttling, or switch to something more server side.

Jul 4 2020, 7:28 PM · Wikidata

Jun 30 2020

tfmorris added a comment to T240436: Unknown command INCRBYFLOAT in OpenRefine reconciliation interface.

The documentation claims INCRBYFLOAT was introduced in Redis 2.6.0

Jun 30 2020, 5:56 PM · OpenRefine
tfmorris added a comment to T197588: Agree on a "manifest" format to expose the configuration of Wikibase instances.

If the manifest has to be constructed by hand, it seems like YAML would be a better format than JSON. They are equivalent from a structural and informational point of view, but YAML is much easier to edit without creating invalid documents.

Jun 30 2020, 5:54 PM · Wikibase - Automated Configuration Detection (WikibaseManifest), OpenRefine, Wikidata

Apr 30 2020

tfmorris added a comment to T148150: Primary Source tool shouldn't suggest claims without sources.

The so-called "Freebase" dataset is actually a mix of data from Freebase and a bunch of URLs that were pulled from Google web crawls by an intern as potential "evidence." They don't have anything to do with the provenance of the data that was in Freebase, which was recorded for every item of data that was written there. Of course it would be silly to suggest a blacklisted site, but I don't believe the intern was provided with a blacklist ahead of time and as the blacklist was developed after the fact it hasn't been used to filter what's presented to users.

Apr 30 2020, 5:56 PM · Wikidata-primary-sources, Wikidata
tfmorris added a comment to T188715: Compute the Freebase curation ratio per property.

It would seem like the 2018-03-13 spreadsheet should be adequate to call this task complete. I would recommend including some qualitative understanding of the source of the Freebase data in addition to just pure curation ratio when making judgements about how to use which data. Things like MusicBrainz IDs and ISFDB IDs went through a heavily QA'd reconciliation process and are going to be high quality. Films, and to a lesser extent TV shows, were an area of focus for the Freebase team, so will generally be both high quality and relatively complete.

Apr 30 2020, 5:44 PM · Wikidata-primary-sources, Wikidata
tfmorris updated the task description for T237925: Primary sources tool left without maintainers.
Apr 30 2020, 4:13 PM · Wikidata-primary-sources, Wikidata

Apr 18 2016

tfmorris added a comment to T126510: [Story] Allow adding additional languages in the terms box.

It seems bizarre that the utility of this is debated. The solution suggested by Bene sounds simple, straightforward, and useful.

Apr 18 2016, 7:47 PM · Design, WMDE-Design, Story, Wikidata, MediaWiki-extensions-WikibaseRepository

Feb 24 2016

tfmorris added a comment to T115911: TypeError: wikibase.dataTypeStore is undefined (PrimarySources).

I think these are likely two different bugs. Has anyone looked at either of them in the last 4 months?

Feb 24 2016, 10:42 PM · Wikidata