Page MenuHomePhabricator
Feed Advanced Search

Today

AndrewTavis_WMDE added a comment to T336361: [Analytics] Identify access from mobile vs. desktop devices.

@Manuel, as mentioned on Mattermost there as of now doesn't seem to be a good way of deriving agent_type for those tables that don't have it. We can get spider through a UDF, but automated isn't possible at the moment. This makes the final division between desktop and API users pretty difficult. An idea I had was checking uri_path = '/w/api.php'. Some information breakdowns for that follow, with the queries being generally the same as those found directly above.

Fri, Sep 22, 3:05 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T341589: Improve df_to_remarkup formatting for wmfdata-python.

Hey there, @nshahquinn-wmf! One week on after starting the new computer setup and I've been using this as a means to test it all out 馃殌 Has been fun! So here's where we're at:

Fri, Sep 22, 2:13 PMProduct-Analytics, Data-Engineering, Wmfdata-Python

Yesterday

AndrewTavis_WMDE added a comment to T336361: [Analytics] Identify access from mobile vs. desktop devices.

Some general notes on this: as we're working from wmf.pageview_actor and wmf_raw.mediawiki_private_cu_changes, there might be a way to leverage their expanded agent_type field such that for at least the former we have automated as an option within agent_type :) So for views we can do a more distinct division into mobile, desktop and API users by including agent_type in it. For edits it's a bit more difficult, but maybe there's a way to add in agent_type via a UDF on anther field.

Thu, Sep 21, 1:57 PMWikidata Analytics (Kanban), Wikidata

Wed, Sep 20

AndrewTavis_WMDE added a comment to T336361: [Analytics] Identify access from mobile vs. desktop devices.

Here's the above referer breakdown for mobile for reference, with the big difference being that we have dramatically less - requests - good for thinking that these are APIs - and have a lot of extension requests:

Wed, Sep 20, 5:12 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T336361: [Analytics] Identify access from mobile vs. desktop devices.

@Manuel, re the question of what kind of referer values we have for "desktop" requests, the following query was used to get the results below.

Wed, Sep 20, 4:20 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T344052: [Analytics] Join data from all-access-tables (like webrequest) with edit-tables (like revision) .

Also you are explicitly filering for is_pageview = True

Wed, Sep 20, 9:19 AMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T336361: [Analytics] Identify access from mobile vs. desktop devices.
Wed, Sep 20, 9:13 AMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T336361: [Analytics] Identify access from mobile vs. desktop devices.
Wed, Sep 20, 9:12 AMWikidata Analytics (Kanban), Wikidata

Fri, Sep 1

AndrewTavis_WMDE updated the task description for T336361: [Analytics] Identify access from mobile vs. desktop devices.
Fri, Sep 1, 1:01 PMWikidata Analytics (Kanban), Wikidata

Thu, Aug 31

AndrewTavis_WMDE added a comment to T336361: [Analytics] Identify access from mobile vs. desktop devices.

As far as totals for this task are concerned, @Manuel, what I'm getting is the following:

Thu, Aug 31, 4:13 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T336361: [Analytics] Identify access from mobile vs. desktop devices.

@Manuel, I think we can throw out the idea of creating an edits subset of webrequests, sadly :( The following would be where we'd find the various actions that we'd need to collect to define as edits fully: https://www.wikidata.org/w/api.php. We know at the very least that we'd want uri_query LIKE '?action=edit%' and uri_query LIKE '?action=wbsetclaim%', but figuring out what else needs to be added seems to be prohibitive given the discrepancy:

Thu, Aug 31, 2:34 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T336361: [Analytics] Identify access from mobile vs. desktop devices.

Here are the tables that break down the device_family values, @Manuel :) As before:

Thu, Aug 31, 12:19 PMWikidata Analytics (Kanban), Wikidata

Wed, Aug 30

AndrewTavis_WMDE added a comment to T336361: [Analytics] Identify access from mobile vs. desktop devices.

Here are the values for Tizen as well, @Manuel:

Wed, Aug 30, 2:15 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated subscribers of T336361: [Analytics] Identify access from mobile vs. desktop devices.

And here are the finalized heuristics (@JAllemandou, tagging you as well). The following query is saved as a temporary view as df_requests_subset:

Wed, Aug 30, 2:04 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T336361: [Analytics] Identify access from mobile vs. desktop devices.

We'd talked "Tizen" a bit this morning, @Manuel, but let's not focus on it. Did a bit of Wikipedia research and since since 2021 it's mostly in use in Samsung Smart TVs. That leaves us with Android and iOS for the predominant mobile os_family values, and if we want to include a Linux based one it'd be KaiOS.

Wed, Aug 30, 1:26 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T336361: [Analytics] Identify access from mobile vs. desktop devices.

Here are the answers to the three questions we had from the daily, @Manuel:

Wed, Aug 30, 1:15 PMWikidata Analytics (Kanban), Wikidata

Aug 21 2023

AndrewTavis_WMDE added a comment to T340648: [Airflow] Setup Airflow instance for WMDE.

Thanks for the efforts on this, @Stevemunene! Please let us know if there's anything needed on our end :)

Aug 21 2023, 8:30 AMPatch-For-Review, Data-Platform-SRE

Aug 11 2023

AndrewTavis_WMDE awarded T336361: [Analytics] Identify access from mobile vs. desktop devices a Like token.
Aug 11 2023, 12:10 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T336361: [Analytics] Identify access from mobile vs. desktop devices.

Or am I just jumping to the question in the description and we just want to figure out mobile edits and views over the period?

Aug 11 2023, 12:09 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T336361: [Analytics] Identify access from mobile vs. desktop devices.

I guess I'm confused what the goal here is then 馃 As I understand it we're looking for users who are using the normal desktop UI on a mobile device. For the wmf.webrequest table we'd then use:

Aug 11 2023, 12:05 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T336361: [Analytics] Identify access from mobile vs. desktop devices.

My understanding of access_method is that it's only related to user_agent for mobile apps:

Aug 11 2023, 11:54 AMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T336361: [Analytics] Identify access from mobile vs. desktop devices.

I've already checked and there are differences between a python-user-agents derived device via user_agents.parse(ua_value).is_mobile and the access_method. Specifically we are getting users where the device from .is_mobile is mobile, but the access method is desktop implying that they're not using a m.URL.

Aug 11 2023, 11:44 AMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T336361: [Analytics] Identify access from mobile vs. desktop devices.
Aug 11 2023, 9:57 AMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T336361: [Analytics] Identify access from mobile vs. desktop devices.
Aug 11 2023, 9:45 AMWikidata Analytics (Kanban), Wikidata

Aug 10 2023

AndrewTavis_WMDE added a comment to T336361: [Analytics] Identify access from mobile vs. desktop devices.

@Manuel, I've been using python-user-agents and so far it's going ok in so far as the .is_mobile method seems to be working well. Are we trying then the combination of user_agent_var.is_mobile = True and access_method = "desktop" via the access_method column from wmf.webrequest? For this column:

Aug 10 2023, 3:12 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T336361: [Analytics] Identify access from mobile vs. desktop devices.
Aug 10 2023, 1:53 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T336361: [Analytics] Identify access from mobile vs. desktop devices.
Aug 10 2023, 1:36 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T336361: [Analytics] Identify access from mobile vs. desktop devices.
Aug 10 2023, 1:17 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T336361: [Analytics] Identify access from mobile vs. desktop devices.

@Manuel, just a note on using the wmf.webrequest table: now that I'm using Spark a bit more and can see the number of steps, it's definitely worth it to try to restrict the data based on the year and month as we've been doing. Selecting 30 days over two months takes dramatically longer than if we set the month column in the WHERE clause - roughly three times longer based on number of steps.

Aug 10 2023, 12:49 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T342123: [Analytics] Find out the size of the Q13442814 (scholarly article) subgraph (including instances of subclasses).

@Manuel, @dcausse: the metrics increased, but only by a very marginal amount where we're now over a bit over 50% rather than a bit below. Let me know if anything else is needed!

Aug 10 2023, 10:43 AMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE moved T342123: [Analytics] Find out the size of the Q13442814 (scholarly article) subgraph (including instances of subclasses) from In progress to Needs product input on the Wikidata Analytics (Kanban) board.
Aug 10 2023, 10:41 AMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T342123: [Analytics] Find out the size of the Q13442814 (scholarly article) subgraph (including instances of subclasses).
Aug 10 2023, 10:41 AMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T342123: [Analytics] Find out the size of the Q13442814 (scholarly article) subgraph (including instances of subclasses).
Aug 10 2023, 10:40 AMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T342123: [Analytics] Find out the size of the Q13442814 (scholarly article) subgraph (including instances of subclasses).

Thanks a lot for this, @dcausse! The reasoning of singe column, relatively few rows for caching makes a lot of sense. I think that the problems I faced were from trying to cache df_wikidata_rdf. Just ran things through again with just sa_and_sasc_ids cached and it did seem to run through a bit better. With that being said, I did end up running the notebook multiple times and saving the outputs to variables as I went along before then restarting the kernel.

Aug 10 2023, 10:37 AMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T336361: [Analytics] Identify access from mobile vs. desktop devices.
Aug 10 2023, 10:05 AMWikidata Analytics (Kanban), Wikidata

Aug 9 2023

AndrewTavis_WMDE added a comment to T342123: [Analytics] Find out the size of the Q13442814 (scholarly article) subgraph (including instances of subclasses).

Minor question on this, @dcausse: why aren't we caching df_wikidata_rdf and sa_and_sasc_ids above? My assumption is that we should given that we're using them in multiple later calculations, but then I just tried to cache them and then a calculation that normally would finish then lost resources and stalled with three separate stages running. Did you explicitly choose not to cache them, and if so why not? :)

Aug 9 2023, 4:47 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T342123: [Analytics] Find out the size of the Q13442814 (scholarly article) subgraph (including instances of subclasses).
Aug 9 2023, 3:43 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T342123: [Analytics] Find out the size of the Q13442814 (scholarly article) subgraph (including instances of subclasses).

Is what we were thinking too, @dcausse :) I'm realizing that where I had the .distinct() was incorrect though. Edit: never mind the prior comment. Not sure why it wasn't working within the parentheses at first...

Aug 9 2023, 1:20 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T336361: [Analytics] Identify access from mobile vs. desktop devices.
Aug 9 2023, 12:55 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T336361: [Analytics] Identify access from mobile vs. desktop devices.
Aug 9 2023, 12:52 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE moved T336361: [Analytics] Identify access from mobile vs. desktop devices from Prioritized backlog to In progress on the Wikidata Analytics (Kanban) board.
Aug 9 2023, 12:25 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE claimed T343690: [Analytics] Wikidata edits by UI variant.
Aug 9 2023, 12:21 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE claimed T336361: [Analytics] Identify access from mobile vs. desktop devices.
Aug 9 2023, 12:21 PMWikidata Analytics (Kanban), Wikidata

Aug 8 2023

AndrewTavis_WMDE added a comment to T342123: [Analytics] Find out the size of the Q13442814 (scholarly article) subgraph (including instances of subclasses).

@dcausse, do you have an idea why we're not getting that direct triples for SAs and its subclasses and direct triples for non-SAs and subclasses add to the same amount? Was working out for the last notebook as you saw. Only major change I've made is now it's .where(col("object").isin(sa_and_sasc_qids)) rather than the equality where sa_and_sasc_qids is the hard coded QIDs from above including scholarly article's (I was getting some papers back when directly querying subclasses).

Aug 8 2023, 5:11 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T342123: [Analytics] Find out the size of the Q13442814 (scholarly article) subgraph (including instances of subclasses).
Aug 8 2023, 2:09 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T342123: [Analytics] Find out the size of the Q13442814 (scholarly article) subgraph (including instances of subclasses).
Aug 8 2023, 2:08 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T342123: [Analytics] Find out the size of the Q13442814 (scholarly article) subgraph (including instances of subclasses).
Aug 8 2023, 2:04 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T342123: [Analytics] Find out the size of the Q13442814 (scholarly article) subgraph (including instances of subclasses).
Aug 8 2023, 1:09 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T342123: [Analytics] Find out the size of the Q13442814 (scholarly article) subgraph (including instances of subclasses).
Aug 8 2023, 1:09 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T342123: [Analytics] Find out the size of the Q13442814 (scholarly article) subgraph (including instances of subclasses).

Looking at this further, it seems that AKhatun focussed more on scholarly articles and was just listing subclasses in the report itself as examples. Reference for this is this part of the report.

Aug 8 2023, 1:03 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated subscribers of T342123: [Analytics] Find out the size of the Q13442814 (scholarly article) subgraph (including instances of subclasses).

@Manuel, @dcausse: I have the classes from AKhatun and the subclasses of scholarly article listed in the task now. I figured it'd be good to get them all here so we know what we're talking about :)

Aug 8 2023, 12:38 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T342123: [Analytics] Find out the size of the Q13442814 (scholarly article) subgraph (including instances of subclasses).
Aug 8 2023, 12:35 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T342123: [Analytics] Find out the size of the Q13442814 (scholarly article) subgraph (including instances of subclasses).
Aug 8 2023, 10:51 AMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE moved T342111: [Analytics] Find out the size of direct instances of Q13442814 (scholarly article) from In progress to Needs product input on the Wikidata Analytics (Kanban) board.
Aug 8 2023, 10:34 AMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T342111: [Analytics] Find out the size of direct instances of Q13442814 (scholarly article).

Notes from the call that @dcausse and had:

Aug 8 2023, 10:33 AMWikidata Analytics (Kanban), Wikidata

Aug 7 2023

AndrewTavis_WMDE added a comment to T342111: [Analytics] Find out the size of direct instances of Q13442814 (scholarly article).

The above LATERAL VIEW EXPLODE method came up with 40,529,640 scholarly articles via the claims, @dcausse. I think that that's close enough to the amount from discovery.wikibase_rdf that we don't need to dig more into expanding the WHERE clause :) Thanks again for your help!

Aug 7 2023, 10:48 AMWikidata Analytics (Kanban), Wikidata

Aug 5 2023

AndrewTavis_WMDE added a comment to T342111: [Analytics] Find out the size of direct instances of Q13442814 (scholarly article).

Thanks, @dcausse! Really appreciate the detailed explanation :) I totally agree that serializing the full claim would be problematic, and that your method is much better. Need a bit more practice with lateral view explode so that it becomes more natural for me to use. I'll implement the above at the start of the week and see if it works properly 馃槉

Aug 5 2023, 6:52 AMWikidata Analytics (Kanban), Wikidata

Aug 4 2023

AndrewTavis_WMDE added a comment to T342111: [Analytics] Find out the size of direct instances of Q13442814 (scholarly article).

The UDF is up and running now, but we may need to discuss my limits as running what I'd assume to be a fairly simple UDF over wmf.wikidata_entity wasn't finishing (@dcausse, @JAllemandou). Even if it does finish, I'm fairly regularly getting:

Aug 4 2023, 2:23 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T342111: [Analytics] Find out the size of direct instances of Q13442814 (scholarly article).

As for as the Spark UDF issues are concerned, let me just sketch out the process here as it's in a separate notebook from the main one just linked. The general goal in this is to explore using UDFs to easily derive data via the claims column of wmf.wikidata_entity. We can easily find out how many scholarly articles we have via the discovery.wikibase_rdf table as in the example notebook I linked on people.wikimedia.org, but then the goal was to do something similar via wmf.wikidata_entity.claims so I can have a claims exploration example to work from later :)

Aug 4 2023, 11:36 AMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T341589: Improve df_to_remarkup formatting for wmfdata-python.

@nshahquinn-wmf, just FYI I do have this on my radar. Sorry it's taking so long... I'm in the process of waiting for a new computer and then I'll have my full VS Code setup up and running. I'll update you when I start to work on this :)

Aug 4 2023, 10:50 AMProduct-Analytics, Data-Engineering, Wmfdata-Python
AndrewTavis_WMDE updated the task description for T342111: [Analytics] Find out the size of direct instances of Q13442814 (scholarly article).
Aug 4 2023, 10:04 AMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T342111: [Analytics] Find out the size of direct instances of Q13442814 (scholarly article).

@dcausse, just finished the people.wikimedia.org upload. An HTML for the notebook can be found at:

Aug 4 2023, 10:04 AMWikidata Analytics (Kanban), Wikidata

Aug 3 2023

AndrewTavis_WMDE added a comment to T342111: [Analytics] Find out the size of direct instances of Q13442814 (scholarly article).

@dcausse, glad to help :) Maybe doing a call to check all of this might make sense? If you have availability tomorrow I'm basically free, or if not then next week for say 25 min sometime?

Aug 3 2023, 4:57 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T342111: [Analytics] Find out the size of direct instances of Q13442814 (scholarly article).

I'll write some more details of the problems I'm facing tomorrow 馃槉

Aug 3 2023, 4:48 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T342111: [Analytics] Find out the size of direct instances of Q13442814 (scholarly article).

Aggregations have been added to the task description :) We'll upload the work for this to GitHub or GitLab once we have or repo set up, and I'd be happy to do a call if someone besides @Manuel wants an explanation :) Also happy to put the notebook on people.wikimedia.org for an interim presentation of the work.

Aug 3 2023, 4:48 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T342111: [Analytics] Find out the size of direct instances of Q13442814 (scholarly article).
Aug 3 2023, 4:45 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T342111: [Analytics] Find out the size of direct instances of Q13442814 (scholarly article).
Aug 3 2023, 4:44 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T342111: [Analytics] Find out the size of direct instances of Q13442814 (scholarly article).

Will check the following with @Manuel later today, but here are the metrics I'm getting from the 20230717 dated data from discovery.wikibase_rdf (note that I don't have access to later ones given permission restrictions that are documented in T342416):

Aug 3 2023, 10:26 AMWikidata Analytics (Kanban), Wikidata

Aug 2 2023

AndrewTavis_WMDE added a comment to T342111: [Analytics] Find out the size of direct instances of Q13442814 (scholarly article).

Thank you for the information here, @dcausse! Nice to have all this in one place where I can reference it when I need a recap 馃槉馃槉

Aug 2 2023, 2:17 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T342111: [Analytics] Find out the size of direct instances of Q13442814 (scholarly article).

Checking another concept with you all:

Aug 2 2023, 10:23 AMWikidata Analytics (Kanban), Wikidata

Aug 1 2023

AndrewTavis_WMDE added a comment to T342111: [Analytics] Find out the size of direct instances of Q13442814 (scholarly article).

Great to hear, @mpopov! I guess the distinction between HiveQL queries ran with wmfdata.spark.run for scientists/analysts vs. dot notation for software engineering makes sense. Nice to hear that I'll be at home writing some Hive :)

Aug 1 2023, 11:39 AMWikidata Analytics (Kanban), Wikidata

Jul 31 2023

AndrewTavis_WMDE added a comment to T342111: [Analytics] Find out the size of direct instances of Q13442814 (scholarly article).

Good to know, this is definitely a lot lower than I expected, thanks!

Jul 31 2023, 3:30 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T342111: [Analytics] Find out the size of direct instances of Q13442814 (scholarly article).

Also for all's information, the duplicate triple values in discorvery.wikibase_rdf is very very small as seen in the following snippet/output:

Jul 31 2023, 1:13 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T342111: [Analytics] Find out the size of direct instances of Q13442814 (scholarly article).

@dcausse, a general point on my end is that when I'm trying to run the code that you sent along via an HTML on people.wikimedia.org I'm getting the following as an output of Spark runs repeated over and over again:

Jul 31 2023, 1:09 PMWikidata Analytics (Kanban), Wikidata

Jul 27 2023

AndrewTavis_WMDE added a comment to T342111: [Analytics] Find out the size of direct instances of Q13442814 (scholarly article).

@Manuel, looking into cases where Q13442814 (scholarly article) is either the subject or object of a triple, it looks like we can verify that the relationships are only being saved in one way as they should be:

Jul 27 2023, 2:42 PMWikidata Analytics (Kanban), Wikidata

Jul 25 2023

AndrewTavis_WMDE awarded T342111: [Analytics] Find out the size of direct instances of Q13442814 (scholarly article) a Like token.
Jul 25 2023, 7:41 AMWikidata Analytics (Kanban), Wikidata

Jul 24 2023

AndrewTavis_WMDE updated the task description for T342123: [Analytics] Find out the size of the Q13442814 (scholarly article) subgraph (including instances of subclasses).
Jul 24 2023, 4:29 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T342123: [Analytics] Find out the size of the Q13442814 (scholarly article) subgraph (including instances of subclasses).
Jul 24 2023, 4:28 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T342123: [Analytics] Find out the size of the Q13442814 (scholarly article) subgraph (including instances of subclasses).
Jul 24 2023, 4:28 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T342111: [Analytics] Find out the size of direct instances of Q13442814 (scholarly article).

@Manuel, could you give a bit more context to "# of Items" above? Is this all distinct Wikidata entities (QIDs and PIDs), or just QIDs? The wmf.wikidata_entity table for this only has those two entity types in it, so if we're looking for other parts of the graph we'll need to look in other places.

Jul 24 2023, 4:17 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T334558: [Analytics] Unique user-agents accessing Wikidata's REST API for Q2/2023.
  • Let's discuss what we can do to make the metric more robust and reliable (e.g. exclude browser user agents)
Jul 24 2023, 4:09 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T334558: [Analytics] Unique user-agents accessing Wikidata's REST API for Q2/2023.
Jul 24 2023, 4:08 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T342123: [Analytics] Find out the size of the Q13442814 (scholarly article) subgraph (including instances of subclasses).

@Manuel, based on the query provided in https://w.wiki/77FU (I took out the French comment at the end and regenerated the short link), it looks like the ontology is relatively clean if we keep it to the base subclasses with wdt:P279, but not if we go beyond that to the full graph with wdt:P279*. A summary:

Jul 24 2023, 3:42 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T342123: [Analytics] Find out the size of the Q13442814 (scholarly article) subgraph (including instances of subclasses).
Jul 24 2023, 3:29 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T342123: [Analytics] Find out the size of the Q13442814 (scholarly article) subgraph (including instances of subclasses).
Jul 24 2023, 3:27 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T342123: [Analytics] Find out the size of the Q13442814 (scholarly article) subgraph (including instances of subclasses).
Jul 24 2023, 3:26 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T342111: [Analytics] Find out the size of direct instances of Q13442814 (scholarly article).
Jul 24 2023, 3:25 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE moved T342123: [Analytics] Find out the size of the Q13442814 (scholarly article) subgraph (including instances of subclasses) from Incoming to In progress on the Wikidata Analytics (Kanban) board.
Jul 24 2023, 2:52 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE claimed T342123: [Analytics] Find out the size of the Q13442814 (scholarly article) subgraph (including instances of subclasses).
Jul 24 2023, 2:52 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T340718: Investigate prior WMDE analytics tables / assets.

@Manuel, moved this to needs product input as I think that we have everything that we could map out (within reason). Let me know how you'd like to prioritize things from here :)

Jul 24 2023, 8:02 AMWikidata, Wikidata Analytics (Kanban)
AndrewTavis_WMDE moved T340718: Investigate prior WMDE analytics tables / assets from In progress to Needs product input on the Wikidata Analytics (Kanban) board.
Jul 24 2023, 8:00 AMWikidata, Wikidata Analytics (Kanban)

Jul 21 2023

AndrewTavis_WMDE awarded T340648: [Airflow] Setup Airflow instance for WMDE a Like token.
Jul 21 2023, 1:15 PMPatch-For-Review, Data-Platform-SRE
AndrewTavis_WMDE added a comment to T340648: [Airflow] Setup Airflow instance for WMDE.

Great to hear, @Stevemunene! Thanks for the support :) Myself and @Manuel would be the admins per prior conversations we've had.

Jul 21 2023, 1:12 PMPatch-For-Review, Data-Platform-SRE
AndrewTavis_WMDE added a comment to T337021: [Analytics] Find out size of term subgraph.

Updated the totals given the most recent dump to test my connection to it in relation to T342416. As expected, no major changes in terms of percentages :)

Jul 21 2023, 11:38 AMWikidata Analytics (Kanban), Wikidata-Query-Service, Wikidata
AndrewTavis_WMDE updated the task description for T337021: [Analytics] Find out size of term subgraph.
Jul 21 2023, 11:36 AMWikidata Analytics (Kanban), Wikidata-Query-Service, Wikidata
AndrewTavis_WMDE added a comment to T337021: [Analytics] Find out size of term subgraph.

Thanks for writing, @tfmorris! :)

Jul 21 2023, 9:34 AMWikidata Analytics (Kanban), Wikidata-Query-Service, Wikidata
AndrewTavis_WMDE created T342416: Set data permission on new snapshot generation (discovery.wikibase_rdf).
Jul 21 2023, 9:22 AMDiscovery-Search (Current work), Data-Engineering, Wikidata, Wikidata-Query-Service

Jul 20 2023

AndrewTavis_WMDE updated the task description for T342111: [Analytics] Find out the size of direct instances of Q13442814 (scholarly article).
Jul 20 2023, 12:15 PMWikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T337021: [Analytics] Find out size of term subgraph.

Great, @Manuel! Let me know what you want to do for the documentation of this. Happy to setup a repo for us on GitHub in the coming days if that would help :)

Jul 20 2023, 9:36 AMWikidata Analytics (Kanban), Wikidata-Query-Service, Wikidata
AndrewTavis_WMDE updated the task description for T337021: [Analytics] Find out size of term subgraph.
Jul 20 2023, 8:57 AMWikidata Analytics (Kanban), Wikidata-Query-Service, Wikidata