Page MenuHomePhabricator

GoranSMilovanovic (GoranSM)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Mar 20 2017, 3:58 PM (113 w, 5 d)
Availability
Available
LDAP User
GoranSMilovanovic
MediaWiki User
GoranSMilovanovic [ Global Accounts ]

Recent Activity

Thu, May 23

GoranSMilovanovic moved T195702: track quality of all/top 10000 Wikidata items over time from Prioritized to Current/Deprioritized on the User-GoranSMilovanovic board.
Thu, May 23, 5:54 PM · User-GoranSMilovanovic, WMDE-Analytics-Engineering, Wikidata, Scoring-platform-team, ORES
GoranSMilovanovic added a comment to T195702: track quality of all/top 10000 Wikidata items over time.

Ok, here's what I've got:

Thu, May 23, 5:06 PM · User-GoranSMilovanovic, WMDE-Analytics-Engineering, Wikidata, Scoring-platform-team, ORES

Wed, May 22

GoranSMilovanovic moved T195702: track quality of all/top 10000 Wikidata items over time from Incoming to Prioritized on the User-GoranSMilovanovic board.
Wed, May 22, 10:28 AM · User-GoranSMilovanovic, WMDE-Analytics-Engineering, Wikidata, Scoring-platform-team, ORES

Mon, May 20

GoranSMilovanovic added a comment to T203366: Replace {maptpx} for Topic Modeling in WDCM.

Following thorough experiments with Python Gensim and Apache Spark, which both use the same online LDA estimation, these are the conclusions:

Mon, May 20, 6:51 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic added a comment to T222066: Create daily tracking reports for republica project.

@Christine_Domgoergen_WMDE Your first follow-up report (May 13 - May 19):

Mon, May 20, 3:45 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, WMDE-New-Editors-Banner-Campaigns (Reportings)

Thu, May 16

GoranSMilovanovic moved T220977: Investigate surprising rise in mobile page views for wikidata from Prioritized to Current/Deprioritized on the User-GoranSMilovanovic board.
Thu, May 16, 2:35 PM · User-GoranSMilovanovic, Wikidata, WMDE-Analytics-Engineering
GoranSMilovanovic added a comment to T220977: Investigate surprising rise in mobile page views for wikidata.

@JAllemandou Thanks for feedback!

Thu, May 16, 1:46 PM · User-GoranSMilovanovic, Wikidata, WMDE-Analytics-Engineering
GoranSMilovanovic updated subscribers of T223444: Update geo-editors job to use tags and report desktop/mobile edits.
Thu, May 16, 1:44 PM · Product-Analytics, Analytics
GoranSMilovanovic added a comment to T220977: Investigate surprising rise in mobile page views for wikidata.

@Lea_WMDE Here we go:

Thu, May 16, 12:16 PM · User-GoranSMilovanovic, Wikidata, WMDE-Analytics-Engineering
GoranSMilovanovic moved T220977: Investigate surprising rise in mobile page views for wikidata from Current/Deprioritized to Prioritized on the User-GoranSMilovanovic board.
Thu, May 16, 7:19 AM · User-GoranSMilovanovic, Wikidata, WMDE-Analytics-Engineering
GoranSMilovanovic moved T195702: track quality of all/top 10000 Wikidata items over time from Technical Wishlist to Incoming on the User-GoranSMilovanovic board.
Thu, May 16, 7:19 AM · User-GoranSMilovanovic, WMDE-Analytics-Engineering, Wikidata, Scoring-platform-team, ORES
GoranSMilovanovic updated subscribers of T195702: track quality of all/top 10000 Wikidata items over time.
Thu, May 16, 7:19 AM · User-GoranSMilovanovic, WMDE-Analytics-Engineering, Wikidata, Scoring-platform-team, ORES
GoranSMilovanovic claimed T195702: track quality of all/top 10000 Wikidata items over time.
Thu, May 16, 7:18 AM · User-GoranSMilovanovic, WMDE-Analytics-Engineering, Wikidata, Scoring-platform-team, ORES

Wed, May 15

GoranSMilovanovic added a comment to T220977: Investigate surprising rise in mobile page views for wikidata.

@Lea_WMDE Yes, we do have a more or less steady increase in mobile edits on Wikidata:

Wed, May 15, 10:40 AM · User-GoranSMilovanovic, Wikidata, WMDE-Analytics-Engineering
GoranSMilovanovic added a comment to T220977: Investigate surprising rise in mobile page views for wikidata.

@Lea_WMDE Here's what was happening with the mobile edits since the beginning of the year. Note: the last data point is May 2019, it's incomplete of course.

Wed, May 15, 9:59 AM · User-GoranSMilovanovic, Wikidata, WMDE-Analytics-Engineering

Tue, May 14

GoranSMilovanovic added a comment to T220977: Investigate surprising rise in mobile page views for wikidata.

@JAllemandou You're the man, I see now that revision_tags is a new field since the 2019-04 (April 2019) snapshot of mediawiki_history:

Tue, May 14, 4:36 PM · User-GoranSMilovanovic, Wikidata, WMDE-Analytics-Engineering
GoranSMilovanovic committed rAWWP3b00c1df12a5: initial (authored by GoranSMilovanovic).
initial
Tue, May 14, 1:08 PM
GoranSMilovanovic closed T219844: Update all WMDE Analytics documents as Resolved.
Tue, May 14, 12:31 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic added a comment to T219844: Update all WMDE Analytics documents .

WD Percent Usage Dashboard: done.

Tue, May 14, 12:31 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic added a comment to T220977: Investigate surprising rise in mobile page views for wikidata.

@Lea_WMDE from Analytics/Data Lake/Edits Wikitech documentation page:

Tue, May 14, 11:31 AM · User-GoranSMilovanovic, Wikidata, WMDE-Analytics-Engineering
GoranSMilovanovic added a comment to T217994: WDCM Dashboards Maintenance.
  • fixed.
Tue, May 14, 11:14 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic added a comment to T217994: WDCM Dashboards Maintenance.
Tue, May 14, 10:50 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic added a comment to T219844: Update all WMDE Analytics documents .
Tue, May 14, 8:56 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic

Mon, May 13

GoranSMilovanovic moved T223117: WD Languages Landscape: essential properties and classes from Technical Wishlist to Incoming on the User-GoranSMilovanovic board.
Mon, May 13, 3:53 PM · Wikidata, WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic moved T223118: WD Languages Landscape: fundamental data sets from Technical Wishlist to Incoming on the User-GoranSMilovanovic board.
Mon, May 13, 3:53 PM · Wikidata, WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic moved T223119: WD Languages Landscape: fundamental statistics from Technical Wishlist to Incoming on the User-GoranSMilovanovic board.
Mon, May 13, 3:52 PM · Wikidata, WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic created T223119: WD Languages Landscape: fundamental statistics.
Mon, May 13, 3:50 PM · Wikidata, WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic created T223118: WD Languages Landscape: fundamental data sets.
Mon, May 13, 3:49 PM · Wikidata, WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic created T223117: WD Languages Landscape: essential properties and classes.
Mon, May 13, 3:47 PM · Wikidata, WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic moved T221965: Wikidata Languages Landscape from Incoming to Prioritized on the User-GoranSMilovanovic board.
Mon, May 13, 2:38 PM · Wikidata, WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic moved T222066: Create daily tracking reports for republica project from Current/Deprioritized to Prioritized on the User-GoranSMilovanovic board.
Mon, May 13, 2:38 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, WMDE-New-Editors-Banner-Campaigns (Reportings)
GoranSMilovanovic closed T219843: Pyspark/R procedures to process the copy of the WD Dump in the Data Lake as Resolved.
Mon, May 13, 2:31 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic added a comment to T219843: Pyspark/R procedures to process the copy of the WD Dump in the Data Lake.
  • essential properties/classes for languages - for the Wikidata Languages Landscape #T221965 --> this will be transferred as a sub-task to #T221965.
Mon, May 13, 2:30 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic added a comment to T219843: Pyspark/R procedures to process the copy of the WD Dump in the Data Lake.
  • essential properties/classes for taxa (WDCM) - DONE.
Mon, May 13, 2:27 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic added a comment to T219843: Pyspark/R procedures to process the copy of the WD Dump in the Data Lake.
  • essential properties/classes for organizations (WDCM) - DONE.
  • essential properties/classes for geographical objects (WDCM) - DONE.
Mon, May 13, 2:15 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic moved T220977: Investigate surprising rise in mobile page views for wikidata from Prioritized to Current/Deprioritized on the User-GoranSMilovanovic board.
Mon, May 13, 7:32 AM · User-GoranSMilovanovic, Wikidata, WMDE-Analytics-Engineering
GoranSMilovanovic moved T222066: Create daily tracking reports for republica project from Prioritized to Current/Deprioritized on the User-GoranSMilovanovic board.
Mon, May 13, 7:32 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, WMDE-New-Editors-Banner-Campaigns (Reportings)
GoranSMilovanovic added a comment to T222066: Create daily tracking reports for republica project.

Daily report(s) for re:publica 2019 conference:

Mon, May 13, 7:30 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, WMDE-New-Editors-Banner-Campaigns (Reportings)

Fri, May 10

GoranSMilovanovic added a comment to T222066: Create daily tracking reports for republica project.

Daily report(s) for re:publica 2019 conference:

Fri, May 10, 9:50 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, WMDE-New-Editors-Banner-Campaigns (Reportings)

Wed, May 8

GoranSMilovanovic updated subscribers of T220977: Investigate surprising rise in mobile page views for wikidata.

@Milimetric Would you happen to know if there is a convenient method to differentiate between (a) edits made from mobile vs. (b) edits made from desktop, in a particular project (say, Wikidata)? Thank you.

Wed, May 8, 12:06 PM · User-GoranSMilovanovic, Wikidata, WMDE-Analytics-Engineering
GoranSMilovanovic added a comment to T220977: Investigate surprising rise in mobile page views for wikidata.

Another possibility would be to parse the X-Analytics field of the wmf.webrequest table and look into the values of the mf-m key:

Wed, May 8, 12:01 PM · User-GoranSMilovanovic, Wikidata, WMDE-Analytics-Engineering
GoranSMilovanovic added a comment to T220977: Investigate surprising rise in mobile page views for wikidata.

Unfortunately, our edits data currently do not encompass any fields that would allow us to separate edits made from mobile vs. desktop.

Wed, May 8, 11:32 AM · User-GoranSMilovanovic, Wikidata, WMDE-Analytics-Engineering
GoranSMilovanovic added a comment to T222066: Create daily tracking reports for republica project.

Daily report for re:publica 2019 conference, May 7, 2019:

Wed, May 8, 11:20 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, WMDE-New-Editors-Banner-Campaigns (Reportings)

Tue, May 7

GoranSMilovanovic added a comment to T220977: Investigate surprising rise in mobile page views for wikidata.

If the growth is natural, would it be a valid assumption to assume that the editing behavior for mobile also increased?

Tue, May 7, 11:27 AM · User-GoranSMilovanovic, Wikidata, WMDE-Analytics-Engineering
GoranSMilovanovic moved T220977: Investigate surprising rise in mobile page views for wikidata from Current/Deprioritized to Prioritized on the User-GoranSMilovanovic board.
Tue, May 7, 11:13 AM · User-GoranSMilovanovic, Wikidata, WMDE-Analytics-Engineering
GoranSMilovanovic added a comment to T222066: Create daily tracking reports for republica project.

Daily report for re:publica 2019 conference, May 6, 2019:

Tue, May 7, 11:11 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, WMDE-New-Editors-Banner-Campaigns (Reportings)

Fri, May 3

GoranSMilovanovic closed T222359: Install Gensim for Python3 on stat1007 as Resolved.
Fri, May 3, 2:22 PM · WMDE-Analytics-Engineering, Analytics
GoranSMilovanovic closed T222359: Install Gensim for Python3 on stat1007, a subtask of T203366: Replace {maptpx} for Topic Modeling in WDCM, as Resolved.
Fri, May 3, 2:22 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic added a parent task for T222359: Install Gensim for Python3 on stat1007: T203366: Replace {maptpx} for Topic Modeling in WDCM.
Fri, May 3, 2:22 PM · WMDE-Analytics-Engineering, Analytics
GoranSMilovanovic added a subtask for T203366: Replace {maptpx} for Topic Modeling in WDCM: T222359: Install Gensim for Python3 on stat1007.
Fri, May 3, 2:22 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic added a comment to T222359: Install Gensim for Python3 on stat1007.

... to create your own Python virtual environment on stat1007 and pip install the package ...

Fri, May 3, 2:22 PM · WMDE-Analytics-Engineering, Analytics
GoranSMilovanovic added a comment to T222359: Install Gensim for Python3 on stat1007.

@Ottomata Hey, Otto

Fri, May 3, 12:57 PM · WMDE-Analytics-Engineering, Analytics

Thu, May 2

GoranSMilovanovic created T222359: Install Gensim for Python3 on stat1007.
Thu, May 2, 11:24 AM · WMDE-Analytics-Engineering, Analytics
GoranSMilovanovic moved T203366: Replace {maptpx} for Topic Modeling in WDCM from Incoming to Prioritized on the User-GoranSMilovanovic board.
Thu, May 2, 11:10 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic moved T219843: Pyspark/R procedures to process the copy of the WD Dump in the Data Lake from Current/Deprioritized to Prioritized on the User-GoranSMilovanovic board.
Thu, May 2, 11:09 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic added a comment to T219843: Pyspark/R procedures to process the copy of the WD Dump in the Data Lake.
  • These operations are meant to replace all R orchestrated, massive, and time consuming Wikidata API/WDQS SPARQL calls from WDCM and related dashboard back-ends;
  • The following datasets were produced until now:
    • WD labels for the top 15 languages per number of speakers (essential for the WDCM system);
    • Q5 (Human): all items + the essential properties for our WD statistical systems:
Thu, May 2, 11:03 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic moved T209055: WMDE Banner Campaigns Dashboard from Prioritized to Current/Deprioritized on the User-GoranSMilovanovic board.
Thu, May 2, 10:51 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic added a comment to T222066: Create daily tracking reports for republica project.

@Christine_Domgoergen_WMDE Test complete, all fine:

Thu, May 2, 9:33 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, WMDE-New-Editors-Banner-Campaigns (Reportings)
GoranSMilovanovic added a comment to T222066: Create daily tracking reports for republica project.

@Christine_Domgoergen_WMDE Testing now.

Thu, May 2, 9:08 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, WMDE-New-Editors-Banner-Campaigns (Reportings)
GoranSMilovanovic added a comment to T220977: Investigate surprising rise in mobile page views for wikidata.

I have inspected all mobile pageviews of Wikidata for April 2019.

Thu, May 2, 1:14 AM · User-GoranSMilovanovic, Wikidata, WMDE-Analytics-Engineering

Wed, May 1

GoranSMilovanovic added a comment to T222066: Create daily tracking reports for republica project.

Can we make a registration test just before our meeting? I would make a test registration and you could then tell me, if you received the data. Is that okay for you!

Wed, May 1, 11:01 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, WMDE-New-Editors-Banner-Campaigns (Reportings)
GoranSMilovanovic added a comment to T220977: Investigate surprising rise in mobile page views for wikidata.

It's definitely not Googlebot (Smartphone), I've checked the wmf.webrequest for a sample:

Wed, May 1, 10:25 PM · User-GoranSMilovanovic, Wikidata, WMDE-Analytics-Engineering
GoranSMilovanovic moved T220977: Investigate surprising rise in mobile page views for wikidata from Prioritized to Current/Deprioritized on the User-GoranSMilovanovic board.
Wed, May 1, 9:39 PM · User-GoranSMilovanovic, Wikidata, WMDE-Analytics-Engineering
GoranSMilovanovic updated subscribers of T220977: Investigate surprising rise in mobile page views for wikidata.

0. First of all, I started by replicating the Wikistats data (https://stats.wikimedia.org/v2/#/wikidata.org/reading/total-page-views/normal|bar|2-Year|access~mobile-web) that you have shared in the ticket description, to ensure that we are looking at the same data. I have used the Projectview hourly Hive table and the replication is perfect.

Wed, May 1, 9:36 PM · User-GoranSMilovanovic, Wikidata, WMDE-Analytics-Engineering
GoranSMilovanovic added a comment to T221965: Wikidata Languages Landscape.
  • Fundamental dataset (language code x language code similarity/distance matrix) produced from WD JSON dump, hdfs copy w. Pyspark;
  • PoC completed; can be done;
  • details forthcoming.
Wed, May 1, 12:15 AM · Wikidata, WMDE-Analytics-Engineering, User-GoranSMilovanovic

Tue, Apr 30

GoranSMilovanovic added a comment to T222066: Create daily tracking reports for republica project.

@Christine_Domgoergen_WMDE Well it makes sense to be fast on this since it's holiday in Germany tomorrow and our test should take place the day after, right?

Tue, Apr 30, 11:18 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, WMDE-New-Editors-Banner-Campaigns (Reportings)
GoranSMilovanovic added a comment to T222066: Create daily tracking reports for republica project.
  • I think that I understand everything, and
  • I have only two questions:
Tue, Apr 30, 10:22 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, WMDE-New-Editors-Banner-Campaigns (Reportings)
GoranSMilovanovic added a comment to T222066: Create daily tracking reports for republica project.

@Christine_Domgoergen_WMDE Of course we can test before our meeting!

Tue, Apr 30, 8:49 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, WMDE-New-Editors-Banner-Campaigns (Reportings)

Mon, Apr 29

GoranSMilovanovic moved T222066: Create daily tracking reports for republica project from Technical Wishlist to Prioritized on the User-GoranSMilovanovic board.
Mon, Apr 29, 12:40 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, WMDE-New-Editors-Banner-Campaigns (Reportings)
GoranSMilovanovic added projects to T222066: Create daily tracking reports for republica project: User-GoranSMilovanovic, WMDE-Analytics-Engineering.
Mon, Apr 29, 12:40 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, WMDE-New-Editors-Banner-Campaigns (Reportings)
GoranSMilovanovic moved T221103: Have specific URLs for different parts of the dashboard from Current/Deprioritized to WDCM on the User-GoranSMilovanovic board.
Mon, Apr 29, 10:29 AM · User-GoranSMilovanovic, WMDE-Analytics-Engineering

Apr 26 2019

GoranSMilovanovic created T221965: Wikidata Languages Landscape.
Apr 26 2019, 1:20 PM · Wikidata, WMDE-Analytics-Engineering, User-GoranSMilovanovic

Apr 25 2019

GoranSMilovanovic added a comment to T204440: analyze and visualize the identifier landscape of Wikidata.

@Lydia_Pintscher \o/ Oh, yes, please do :)

Apr 25 2019, 8:45 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata

Apr 23 2019

GoranSMilovanovic lowered the priority of T204440: analyze and visualize the identifier landscape of Wikidata from High to Low.
Apr 23 2019, 7:56 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata

Apr 22 2019

GoranSMilovanovic committed rAWWI8de6d3274e23: final (authored by GoranSMilovanovic).
final
Apr 22 2019, 11:06 PM
GoranSMilovanovic added a comment to T204440: analyze and visualize the identifier landscape of Wikidata.
  • all computations that could have been moved to the back-end are now there;
  • the dashboard is not fully client-side dependent anymore (it wasn't realistic in the first place, mea culpa; too large datasets).
Apr 22 2019, 6:43 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata

Apr 21 2019

GoranSMilovanovic committed rAWWI3dc1581ed904: minor (authored by GoranSMilovanovic).
minor
Apr 21 2019, 5:56 PM
GoranSMilovanovic added a comment to T204440: analyze and visualize the identifier landscape of Wikidata.
  • The aesthetics are back:
    • {igraph} MDS layout deprecated in favor of
    • {igraph} Fruchterman-Reingold algorithm;
    • this solution is slower and I will try to optimize it as much as I can, but
    • the result is stunning.
Apr 21 2019, 12:29 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata

Apr 20 2019

GoranSMilovanovic updated subscribers of T204440: analyze and visualize the identifier landscape of Wikidata.

@Lydia_Pintscher

  • Everything else takes place once the WD JSON dump copy to HDFS (T209655) is in production, and the Analytics-Engineering tell me that is going to take a while.
  • I think we should consider investing a bit more of my time here to optimize the dashboard (large datasets --> heavy on client-side processing). Please let me know what you think.
Apr 20 2019, 2:39 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata
GoranSMilovanovic committed rAWWI7b0d72ef74cc: re-engineered datasets (authored by GoranSMilovanovic).
re-engineered datasets
Apr 20 2019, 1:43 AM

Apr 19 2019

GoranSMilovanovic added a comment to T204440: analyze and visualize the identifier landscape of Wikidata.
  • Implementing changes in the WD external identifier class visualizations: DONE;
  • in relation to T204440#5097057, a compromise was introduced:
    • the WD identifier class network is generated to encompass all identifiers who belong to the class either by P31 or by P279 paths;
    • the table to the right of the network visualization will list only identifiers that belong to the class in a P31 sense.
Apr 19 2019, 9:58 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata

Apr 18 2019

GoranSMilovanovic added a comment to T204440: analyze and visualize the identifier landscape of Wikidata.
  • implement the suggestion by @Pintoch (see: T204440#5097057)
    • data structure: DONE
    • implementing changes in the WD external identifier class visualizations now;
  • Next:
    • check why 20 - 30% of data are not delivered from Spark; most probable cause: failures due to I/O operations;
Apr 18 2019, 8:20 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata

Apr 17 2019

GoranSMilovanovic updated subscribers of T204440: analyze and visualize the identifier landscape of Wikidata.
  • revive the Overlap Network visualization - DONE
Apr 17 2019, 9:16 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata
GoranSMilovanovic added a comment to T204440: analyze and visualize the identifier landscape of Wikidata.
  • there are some minor interventions that need take place in the visualizations code. - DONE
  • Next steps:
    • revive the Overlap Network visualization;
    • check why 20 - 30% of data are not delivered from Spark; most probable cause: failures due to I/O operations.
Apr 17 2019, 8:22 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata
GoranSMilovanovic added a comment to T204440: analyze and visualize the identifier landscape of Wikidata.
  • Back-end re-factored; dashboard online, not all functionality complete:
    • the Overlap Network tab will have to wait until I figure out why we don't get all of the data from Spark;
    • there are some minor interventions that need take place in the visualizations code.
Apr 17 2019, 1:00 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata
GoranSMilovanovic added a comment to T204440: analyze and visualize the identifier landscape of Wikidata.

@Envlh It is not fixed yet. I am getting more data, but not all of it, see: T204440#5116460

Apr 17 2019, 12:04 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata
GoranSMilovanovic added a comment to T204440: analyze and visualize the identifier landscape of Wikidata.

Overview/Status Report:

Apr 17 2019, 10:12 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata

Apr 16 2019

GoranSMilovanovic added a comment to T204440: analyze and visualize the identifier landscape of Wikidata.

@agray The SPARQL queries indeed select distinct items per property x property intersection... back to the drawing board: what is missing in my data?
I will have to dig deep to find out, the Pyspark ETL code for this dashboard is already looking into all statements, claims, and references. I fear it might be related to non-deterministic operations in Spark that could have affected the completeness of the datasets, but I've done everything to prevent such effects. I'm a bit puzzled, but I will find out the cause this or the other way. Thank you very much for testing.

Apr 16 2019, 8:07 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata
GoranSMilovanovic added a comment to T204440: analyze and visualize the identifier landscape of Wikidata.

@Jheald Given that we discard all cases of multiple use of the same identifier with a particular item, does the number seem reasonable to you?

Apr 16 2019, 3:07 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata
GoranSMilovanovic moved T221103: Have specific URLs for different parts of the dashboard from Technical Wishlist to Current/Deprioritized on the User-GoranSMilovanovic board.
Apr 16 2019, 3:04 PM · User-GoranSMilovanovic, WMDE-Analytics-Engineering
GoranSMilovanovic claimed T221103: Have specific URLs for different parts of the dashboard.
Apr 16 2019, 3:04 PM · User-GoranSMilovanovic, WMDE-Analytics-Engineering
GoranSMilovanovic moved T204440: analyze and visualize the identifier landscape of Wikidata from Current/Deprioritized to Prioritized on the User-GoranSMilovanovic board.
Apr 16 2019, 1:23 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata
GoranSMilovanovic moved T220977: Investigate surprising rise in mobile page views for wikidata from Technical Wishlist to Prioritized on the User-GoranSMilovanovic board.
Apr 16 2019, 1:20 PM · User-GoranSMilovanovic, Wikidata, WMDE-Analytics-Engineering
GoranSMilovanovic claimed T220977: Investigate surprising rise in mobile page views for wikidata.
Apr 16 2019, 1:07 PM · User-GoranSMilovanovic, Wikidata, WMDE-Analytics-Engineering
VIGNERON awarded T204440: analyze and visualize the identifier landscape of Wikidata a Love token.
Apr 16 2019, 12:52 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata
GoranSMilovanovic added a comment to T204440: analyze and visualize the identifier landscape of Wikidata.

Overview/Status report for this task:

Apr 16 2019, 12:26 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata
GoranSMilovanovic added a comment to T204440: analyze and visualize the identifier landscape of Wikidata.
  • Data engineering test: success;
  • tSNE running now.
Apr 16 2019, 8:26 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata

Apr 15 2019

GoranSMilovanovic added a comment to T204440: analyze and visualize the identifier landscape of Wikidata.
  • Data engineering procedures code re-factored and in place;
  • testing now.
Apr 15 2019, 10:20 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata
GoranSMilovanovic added a comment to T204440: analyze and visualize the identifier landscape of Wikidata.

@agray Got it. Thank you. Still working on the overlap dataset.

Apr 15 2019, 5:50 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata
GoranSMilovanovic added a comment to T204440: analyze and visualize the identifier landscape of Wikidata.

I can confirm that the numbers on the tables seem a bit off for some other properties. I've been looking at P1614 (History of Parliament), which is complete and fairly stable. It currently has 21428 IDs on 17942 items (there's a lot of items with two/three IDs) and hasn't had any big changes since I finished matching in mid-2018.

Apr 15 2019, 5:17 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata