Page MenuHomePhabricator

GoranSMilovanovic (GoranSM)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Mar 20 2017, 3:58 PM (130 w, 5 d)
Availability
Available
LDAP User
GoranSMilovanovic
MediaWiki User
GoranSMilovanovic [ Global Accounts ]

Recent Activity

Yesterday

GoranSMilovanovic updated subscribers of T195702: track quality of all/top 10000 Wikidata items over time.

@Lydia_Pintscher @RazShuty @WMDE-leszek Here's a glimpse of what we've found out thus far:

Fri, Sep 20, 8:27 PM · User-GoranSMilovanovic, WMDE-Analytics-Engineering, Wikidata, Scoring-platform-team, ORES

Thu, Sep 19

GoranSMilovanovic updated subscribers of T232332: Create Tracking Report: Email Campaign Donors for New Editors 2019.
Thu, Sep 19, 9:36 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic updated subscribers of T232335: Create Tracking Report: Social Media Campaign Type-Test 2019.
Thu, Sep 19, 9:36 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic added a comment to T232335: Create Tracking Report: Social Media Campaign Type-Test 2019.

@Stefan_Schneider_WMDE I think we should meet:

Thu, Sep 19, 9:35 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic added a comment to T232332: Create Tracking Report: Email Campaign Donors for New Editors 2019.

@Stefan_Schneider_WMDE I think we should meet:

Thu, Sep 19, 9:33 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic

Tue, Sep 17

GoranSMilovanovic moved T231975: WDCM is giving really high numbers to ruwikisource and cewiki from Technical Wishlist to WDCM on the User-GoranSMilovanovic board.
Tue, Sep 17, 10:03 PM · User-GoranSMilovanovic, WMDE-Analytics-Engineering
GoranSMilovanovic added a comment to T231975: WDCM is giving really high numbers to ruwikisource and cewiki.

@Theklan As of the following

Tue, Sep 17, 10:02 PM · User-GoranSMilovanovic, WMDE-Analytics-Engineering
GoranSMilovanovic updated subscribers of T231975: WDCM is giving really high numbers to ruwikisource and cewiki.
Tue, Sep 17, 9:58 PM · User-GoranSMilovanovic, WMDE-Analytics-Engineering
GoranSMilovanovic claimed T231975: WDCM is giving really high numbers to ruwikisource and cewiki.

@Theklan Hello, I am the maintainer of the WDCM system. Ok, let's inspect what is happening here:

Tue, Sep 17, 9:57 PM · User-GoranSMilovanovic, WMDE-Analytics-Engineering
GoranSMilovanovic added a comment to T195702: track quality of all/top 10000 Wikidata items over time.
  • working on analytics/visualizations now;
  • next steps: dashboard.
Tue, Sep 17, 9:10 PM · User-GoranSMilovanovic, WMDE-Analytics-Engineering, Wikidata, Scoring-platform-team, ORES
GoranSMilovanovic removed a project from T196193: Create KNIME nodes to interact with Wikidata: User-GoranSMilovanovic.
Tue, Sep 17, 9:06 PM · patch-welcome, Wikidata
GoranSMilovanovic moved T232332: Create Tracking Report: Email Campaign Donors for New Editors 2019 from Technical Wishlist to Current/Deprioritized on the User-GoranSMilovanovic board.
Tue, Sep 17, 8:40 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic removed projects from T232332: Create Tracking Report: Email Campaign Donors for New Editors 2019: WMDE-Fundraising-Tech, WMDE-FUN-Team.
Tue, Sep 17, 8:39 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic claimed T232332: Create Tracking Report: Email Campaign Donors for New Editors 2019.
Tue, Sep 17, 8:39 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic

Mon, Sep 9

GoranSMilovanovic lowered the priority of T217994: WDCM Dashboards Maintenance from High to Normal.
Mon, Sep 9, 11:10 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic added a comment to T217994: WDCM Dashboards Maintenance.
  • Following the latest WDCM update ruwiki is back;
  • still cannot localize what exactly happened to the previous update.
Mon, Sep 9, 11:10 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic moved T232335: Create Tracking Report: Social Media Campaign Type-Test 2019 from Technical Wishlist to Current/Deprioritized on the User-GoranSMilovanovic board.
Mon, Sep 9, 11:09 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic claimed T232335: Create Tracking Report: Social Media Campaign Type-Test 2019.

@Aklapper Thanks for noticing this.

Mon, Sep 9, 11:09 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic added projects to T232335: Create Tracking Report: Social Media Campaign Type-Test 2019: User-GoranSMilovanovic, WMDE-Analytics-Engineering.
Mon, Sep 9, 11:06 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic

Fri, Sep 6

GoranSMilovanovic raised the priority of T217994: WDCM Dashboards Maintenance from Normal to High.

Check out what is happening with:

Fri, Sep 6, 7:39 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic

Wed, Sep 4

GoranSMilovanovic added a comment to T195702: track quality of all/top 10000 Wikidata items over time.

@Halfak Thank you, Aaron.

Wed, Sep 4, 4:32 PM · User-GoranSMilovanovic, WMDE-Analytics-Engineering, Wikidata, Scoring-platform-team, ORES

Tue, Sep 3

GoranSMilovanovic committed rAWCM9e2e20322e5c: Biases (authored by GoranSMilovanovic).
Biases
Tue, Sep 3, 10:46 PM

Tue, Aug 27

GoranSMilovanovic updated subscribers of T195702: track quality of all/top 10000 Wikidata items over time.

Amir, right before our meeting, what we need here is simple:

Tue, Aug 27, 12:48 PM · User-GoranSMilovanovic, WMDE-Analytics-Engineering, Wikidata, Scoring-platform-team, ORES
GoranSMilovanovic updated subscribers of T195702: track quality of all/top 10000 Wikidata items over time.
Tue, Aug 27, 12:43 PM · User-GoranSMilovanovic, WMDE-Analytics-Engineering, Wikidata, Scoring-platform-team, ORES

Fri, Aug 23

GoranSMilovanovic added a comment to T223119: WD Languages Landscape: fundamental statistics.

Something to begin with:

Fri, Aug 23, 7:43 PM · Wikidata, WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic added a comment to T223118: WD Languages Landscape: fundamental data sets.
  • the Jaccard similarity and distance matrices: testing, the procedure is memory efficient but slow (subsetting the dgCMatrix class matrix...):
  • DONE. We can have the Jaccard distances here too.
Fri, Aug 23, 7:37 PM · Wikidata, WMDE-Analytics-Engineering, User-GoranSMilovanovic

Aug 20 2019

GoranSMilovanovic added a comment to T223118: WD Languages Landscape: fundamental data sets.
  • Batch processing over sparse matrices (dgCMatrix class) is now employed to compute
    • the co-occurence data set: success, using approx. order of magnitude less resources than the previously employed procedure, and
    • the Jaccard similarity and distance matrices: testing, the procedure is memory efficient but slow (subsetting the dgCMatrix class matrix...).
Aug 20 2019, 3:30 AM · Wikidata, WMDE-Analytics-Engineering, User-GoranSMilovanovic

Aug 9 2019

GoranSMilovanovic added a comment to T208567: Count Wikidata page views per page type.

@Lea_WMDE Hm, this might be the solution - dygraph se to dylegend(show = 'follow'), please check: http://wmdeanalytics.wmflabs.org/WD_pageviewsPerNamespace/
Note. This was the initial solution and there is one thing I don't like about it in spite of the fact that it solves the problem that you were facing (focus overlapping title).
Let me know if you like this approach better, please.

Aug 9 2019, 11:27 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata-Termbox, Wikidata

Aug 8 2019

GoranSMilovanovic added a comment to T208567: Count Wikidata page views per page type.

Strange.
Lea, please let me know what browser are you using. I have tested the dashboard on Chromium and Mozilla Firefox under Ubuntu; the WDCM system, using the same front-end technology (RStudio Shiny) was tested over an even broader range of browsers (including macOS), and at this point I cannot really tell what is causing the problem - but I will do my best to figure it out.

Aug 8 2019, 9:40 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata-Termbox, Wikidata

Aug 7 2019

GoranSMilovanovic added a comment to T223118: WD Languages Landscape: fundamental data sets.
  • Given how often is stat1007 used by us analysts,
  • it barely has the resources for the computations that we need here (the languages x languages contingency table; takes at least ~25Gb to compute);
  • Apache Spark cannot help (large number of categories -> Spark's dataframe functions cannot support contingencies in such situations).
Aug 7 2019, 4:57 PM · Wikidata, WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic added a comment to T208567: Count Wikidata page views per page type.

Could you add the info why wikistats2 data differs from these graphs to the explanatory text?

Done: dashboard.

Aug 7 2019, 12:10 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata-Termbox, Wikidata

Aug 4 2019

GoranSMilovanovic added a comment to T208567: Count Wikidata page views per page type.

@Lea_WMDE The Total Average is back: dashboard.

Aug 4 2019, 11:38 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata-Termbox, Wikidata

Aug 1 2019

GoranSMilovanovic added a comment to T208567: Count Wikidata page views per page type.

@Milimetric Thanks for the clarification, Dan.
@Lea_WMDE This implies that

Aug 1 2019, 8:39 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata-Termbox, Wikidata
GoranSMilovanovic updated subscribers of T208567: Count Wikidata page views per page type.

@Lea_WMDE Ok, here is a direct test (Pyspark code against the wmf.pageviews_hourly table):

Aug 1 2019, 3:49 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata-Termbox, Wikidata
GoranSMilovanovic added a comment to T208567: Count Wikidata page views per page type.

@Lea_WMDE So that is one order of magnitude and looks straightforward impossible to happen. Please let me check.
I guess the difference of this magnitude could not be a consequence of the fact that we have picked only four namespaces (Entity, Property, Lexeme, and EntitySchema)? Please confirm.

Aug 1 2019, 3:13 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata-Termbox, Wikidata
GoranSMilovanovic added a comment to T208567: Count Wikidata page views per page type.

@Lea_WMDE Take a look, please: http://wmdeanalytics.wmflabs.org/WD_pageviewsPerNamespace/

Aug 1 2019, 12:45 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata-Termbox, Wikidata

Jul 29 2019

GoranSMilovanovic added a comment to T208567: Count Wikidata page views per page type.

@Lea_WMDE I am on it.

Jul 29 2019, 11:17 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata-Termbox, Wikidata

Jul 24 2019

GoranSMilovanovic added a comment to T208567: Count Wikidata page views per page type.
  • On the vertical axes the dashboards now uses K, M, and B for thousands, millions, and billions of pageviews, respectively.
Jul 24 2019, 3:49 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata-Termbox, Wikidata
GoranSMilovanovic added a comment to T208567: Count Wikidata page views per page type.
  • The dashboard is now running a regular daily update;
  • fixing the axis labels now.
Jul 24 2019, 3:44 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata-Termbox, Wikidata
GoranSMilovanovic added a comment to T208567: Count Wikidata page views per page type.

@Lea_WMDE I am on it, putting the dashboard on regular updates + fixing the labels to include decimal points.

Jul 24 2019, 10:15 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata-Termbox, Wikidata

Jul 23 2019

GoranSMilovanovic added a comment to T227905: Public Data Review Needed.

As stated in the README.txt:

Jul 23 2019, 10:37 AM · Analytics, WMDE-Analytics-Engineering

Jul 15 2019

GoranSMilovanovic updated subscribers of T227701: Quantify additional information available via external identifiers.

@Lydia_Pintscher

That's "our" information. And then we have links/external identifiers to say 3 libraries that also have information about X. We want to somehow quantify the latter for all of Wikidata's entities.

Q. Do I understand correctly: you would like to have some sort of comparison (a "ratio" of some form) between (a) knowledge on X in Wikidata and (b) knowledge on X in other (linked from Wikidata) databases?

Jul 15 2019, 11:41 AM · Wikidata
GoranSMilovanovic added a comment to T227701: Quantify additional information available via external identifiers.

We should try to find ways to quantify this information.

Jul 15 2019, 9:52 AM · Wikidata

Jul 14 2019

GoranSMilovanovic added a comment to T208567: Count Wikidata page views per page type.

@Lea_WMDE You can now test your new dashboard: http://wmdeanalytics.wmflabs.org/WD_pageviewsPerNamespace/

Jul 14 2019, 8:41 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata-Termbox, Wikidata

Jul 12 2019

GoranSMilovanovic added a comment to T208567: Count Wikidata page views per page type.
  • data set review requested from Analytics in T227905;
  • next steps:
    • visualizations + dashboard;
    • test, deploy.
Jul 12 2019, 6:10 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata-Termbox, Wikidata
GoranSMilovanovic added a subtask for T208567: Count Wikidata page views per page type: T227905: Public Data Review Needed.
Jul 12 2019, 6:09 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata-Termbox, Wikidata
GoranSMilovanovic added a parent task for T227905: Public Data Review Needed: T208567: Count Wikidata page views per page type.
Jul 12 2019, 6:09 PM · Analytics, WMDE-Analytics-Engineering
GoranSMilovanovic created T227905: Public Data Review Needed.
Jul 12 2019, 6:08 PM · Analytics, WMDE-Analytics-Engineering

Jul 8 2019

GoranSMilovanovic added a comment to T208567: Count Wikidata page views per page type.
  • data set production - completed.
Jul 8 2019, 10:11 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata-Termbox, Wikidata
GoranSMilovanovic added a comment to T208567: Count Wikidata page views per page type.

@Lea_WMDE I guess 640 is the EntitySchema namespace (figured this out from this Gerrit patch, since it is not documented in the Wikidata namespaces), right?

Jul 8 2019, 9:44 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata-Termbox, Wikidata

Jul 3 2019

GoranSMilovanovic claimed T208567: Count Wikidata page views per page type.
Jul 3 2019, 11:03 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata-Termbox, Wikidata
GoranSMilovanovic moved T208567: Count Wikidata page views per page type from Technical Wishlist to Current/Deprioritized on the User-GoranSMilovanovic board.
Jul 3 2019, 11:00 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, Wikidata-Termbox, Wikidata
GoranSMilovanovic moved T187393: Wikidata items touched by humans per class from Current/Deprioritized to WDCM on the User-GoranSMilovanovic board.
Jul 3 2019, 10:10 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic closed T217997: WDCM_Process.R/WDCM_UpdateLabs.R Deprecation as Resolved.
  • Deployed everything.
Jul 3 2019, 10:09 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic closed T217997: WDCM_Process.R/WDCM_UpdateLabs.R Deprecation, a subtask of T217994: WDCM Dashboards Maintenance, as Resolved.
Jul 3 2019, 10:09 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic moved T209055: WMDE Banner Campaigns Dashboard from Current/Deprioritized to New Editors/Campaigns on the User-GoranSMilovanovic board.
Jul 3 2019, 6:20 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic

Jul 1 2019

GoranSMilovanovic added a comment to T222066: Create daily tracking reports for republica project.

@Christine_Domgoergen_WMDE Then please close this ticket and open a new one when the campaign is ready. It will take me no time to set up the analytics for the new campaign if only parameters like landing pages are different. Thank you.

Jul 1 2019, 2:51 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, WMDE-New-Editors-Banner-Campaigns (Reportings)
GoranSMilovanovic added a comment to T222066: Create daily tracking reports for republica project.

@Christine_Domgoergen_WMDE Will we be using exactly the same parameters as in this campaign, or are we talking about a new, different campaign altogether?

Jul 1 2019, 2:45 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, WMDE-New-Editors-Banner-Campaigns (Reportings)
GoranSMilovanovic moved T222066: Create daily tracking reports for republica project from Prioritized to Current/Deprioritized on the User-GoranSMilovanovic board.
Jul 1 2019, 11:18 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, WMDE-New-Editors-Banner-Campaigns (Reportings)
GoranSMilovanovic added a comment to T222066: Create daily tracking reports for republica project.

2019/06/21: no pageviews, no user registrations.
2019/06/22: no pageviews, no user registrations.
2019/06/23: no pageviews, no user registrations.
2019/06/24: no pageviews, no user registrations.
2019/06/25: no pageviews, no user registrations.
2019/06/26: no pageviews, no user registrations.
2019/06/27: no pageviews, no user registrations.

Jul 1 2019, 11:18 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, WMDE-New-Editors-Banner-Campaigns (Reportings)
GoranSMilovanovic added a comment to T226000: Disable and re-enable 2FA for user GoranSMilovanovic on Wikitech/Horizon.

@bd808

Jul 1 2019, 11:12 AM · cloud-services-team (Kanban)

Jun 21 2019

GoranSMilovanovic added a comment to T222066: Create daily tracking reports for republica project.

2019/06/14: pageviews: staa = 2, strz = 1 , user registrations: 0.
2019/06/15: pageviews: strz = 1 , user registrations: 0.
2019/06/16: pageviews: none, user registrations: 0.
2019/06/17: pageviews: none, user registrations: 0.
2019/06/18: pageviews: none, user registrations: 0.
2019/06/19: pageviews: strz = 1, user registrations: 0.
2019/06/19: pageviews: strz = 1, user registrations: 0.

Jun 21 2019, 12:04 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, WMDE-New-Editors-Banner-Campaigns (Reportings)

Jun 18 2019

GoranSMilovanovic created T226000: Disable and re-enable 2FA for user GoranSMilovanovic on Wikitech/Horizon.
Jun 18 2019, 10:14 AM · cloud-services-team (Kanban)

Jun 17 2019

GoranSMilovanovic added a comment to T220977: Investigate surprising rise in mobile page views for wikidata.

@Lea_WMDE Do we have any additional requirements here or shall we resolve the ticket?

Jun 17 2019, 9:55 AM · User-GoranSMilovanovic, Wikidata, WMDE-Analytics-Engineering
GoranSMilovanovic moved T222066: Create daily tracking reports for republica project from Current/Deprioritized to Prioritized on the User-GoranSMilovanovic board.
Jun 17 2019, 9:54 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, WMDE-New-Editors-Banner-Campaigns (Reportings)
GoranSMilovanovic moved T195702: track quality of all/top 10000 Wikidata items over time from Current/Deprioritized to Prioritized on the User-GoranSMilovanovic board.
Jun 17 2019, 9:46 AM · User-GoranSMilovanovic, WMDE-Analytics-Engineering, Wikidata, Scoring-platform-team, ORES
GoranSMilovanovic added a comment to T217997: WDCM_Process.R/WDCM_UpdateLabs.R Deprecation.
  • The only two dashboards that now depend upon WDCM_updateLabs.R are
Jun 17 2019, 9:20 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic closed T217996: Refactor WDCM Biases Dashboard updates from WDCM_Process.R/WDCM_UpdateLabs.R as Resolved.
  • Refactored.
Jun 17 2019, 9:13 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic closed T217996: Refactor WDCM Biases Dashboard updates from WDCM_Process.R/WDCM_UpdateLabs.R, a subtask of T217994: WDCM Dashboards Maintenance, as Resolved.
Jun 17 2019, 9:13 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic closed T214586: Remove goransm.wdcm_maintable from the Data Lake as Resolved.
  • wdcm_maintable removed from hdfs;
  • all WDCM dashboards now running Apache Spark supported update engines;
  • resolved.
Jun 17 2019, 9:12 AM · User-GoranSMilovanovic
GoranSMilovanovic added a comment to T217994: WDCM Dashboards Maintenance.

The WDCM Biases dashboard is now back on updates:

  • it will be update once monthly, and
  • it's update is dependent upon the most recent version of the WD dump copy in hdfs (see T209655);
  • the dashboard is now fully client-side dependent,
  • and because it now uses Spark as its ETL back-end we can finally remove
  • the huge wdcm_maintable from HDFS that previously supported our ETL procedures for the WDCM system (Hive; see T214586).
Jun 17 2019, 9:04 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic

Jun 13 2019

GoranSMilovanovic added a comment to T217994: WDCM Dashboards Maintenance.
  • Goodbye Advanced Search Extension dashboard :(
Jun 13 2019, 7:22 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic

Jun 12 2019

GoranSMilovanovic updated subscribers of T217994: WDCM Dashboards Maintenance.
  • archive the Advanced Search Extension dashboard (tracking).
Jun 12 2019, 11:38 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic

Jun 7 2019

GoranSMilovanovic added a comment to T209655: Copy Wikidata dumps to HDFs.

@JAllemandou Thanks for the recent 20190603 dump copy in HDFS.

Jun 7 2019, 11:02 PM · Research-Backlog, Wikidata, Analytics
GoranSMilovanovic added a comment to T217994: WDCM Dashboards Maintenance.
  • Pyspark ETL procedures for WDCM Biases completed;
  • next step: re-factor R code to work with Spark outputs.
Jun 7 2019, 10:58 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic updated subscribers of T217994: WDCM Dashboards Maintenance.
Jun 7 2019, 6:58 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic added a comment to T217994: WDCM Dashboards Maintenance.
  • The WDCM Biases Dashboard's back-end has to switch to Apache Spark;
  • it maintains large data sets, depends upon WDQS heavily by sending queries that time-out every now and then; beyond that,
  • it relies on geo-coordinates which, when fetched together with items from WDQS, produce long vectors that cannot be converted from raw to character() in R (and which then need to be parsed by {jsonlite} into readable data.frame types for processing).
Jun 7 2019, 6:37 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic added a comment to T217997: WDCM_Process.R/WDCM_UpdateLabs.R Deprecation.
  • Next step: completely remove WDCM_updateLabs.R from the WDCM pipeline.
Jun 7 2019, 5:50 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic added a comment to T217997: WDCM_Process.R/WDCM_UpdateLabs.R Deprecation.
  • WDCM_Process.R is now deprecated.
Jun 7 2019, 5:49 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic closed T217995: Refactor WDCM Geo Dashboard updates from WDCM_Process.R/WDCM_UpdateLabs.R, a subtask of T217994: WDCM Dashboards Maintenance, as Resolved.
Jun 7 2019, 5:47 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic closed T217995: Refactor WDCM Geo Dashboard updates from WDCM_Process.R/WDCM_UpdateLabs.R as Resolved.
  • The dashboard is now fully client-side dependent;
  • update_labs module - hopefully to be fully deprecated soon - updated.
Jun 7 2019, 5:47 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic added a comment to T225280: stat1007 disk space warning.

@elukey I was able to reduce the amount of disk usage in my home directory on stat1007 by approx. 40Gb.
My apologies: there were some old, raw data sets there that I have forgot to clean up...

Jun 7 2019, 2:05 PM · Analytics-Kanban, Analytics
GoranSMilovanovic added a comment to T225280: stat1007 disk space warning.

@elukey I'm on it.

Jun 7 2019, 1:54 PM · Analytics-Kanban, Analytics

Jun 6 2019

GoranSMilovanovic moved T222066: Create daily tracking reports for republica project from New Editors/Campaigns to Current/Deprioritized on the User-GoranSMilovanovic board.
Jun 6 2019, 12:18 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, WMDE-New-Editors-Banner-Campaigns (Reportings)

Jun 5 2019

GoranSMilovanovic moved T217996: Refactor WDCM Biases Dashboard updates from WDCM_Process.R/WDCM_UpdateLabs.R from WDCM to Prioritized on the User-GoranSMilovanovic board.
Jun 5 2019, 6:56 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic moved T217997: WDCM_Process.R/WDCM_UpdateLabs.R Deprecation from WDCM to Prioritized on the User-GoranSMilovanovic board.
Jun 5 2019, 6:56 PM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic added a comment to T222066: Create daily tracking reports for republica project.

Of course. Please state exactly (FROM - TO) the dates for the remaining weekly reports. Thanks.

Jun 5 2019, 11:40 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, WMDE-New-Editors-Banner-Campaigns (Reportings)
GoranSMilovanovic moved T217995: Refactor WDCM Geo Dashboard updates from WDCM_Process.R/WDCM_UpdateLabs.R from WDCM to Prioritized on the User-GoranSMilovanovic board.
Jun 5 2019, 11:00 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic added a comment to T214586: Remove goransm.wdcm_maintable from the Data Lake.
  • The WDCM Biases Dashboard is the only remaining dashboard whose back-end relies on this Hive table.
  • As soon as the changes are implemented there, the table will be removed from hdfs.
Jun 5 2019, 11:00 AM · User-GoranSMilovanovic
GoranSMilovanovic moved T214586: Remove goransm.wdcm_maintable from the Data Lake from Current/Deprioritized to Prioritized on the User-GoranSMilovanovic board.
Jun 5 2019, 10:59 AM · User-GoranSMilovanovic
GoranSMilovanovic added a comment to T222066: Create daily tracking reports for republica project.

@Christine_Domgoergen_WMDE Can we close this ticket now?

Jun 5 2019, 10:59 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic, WMDE-New-Editors-Banner-Campaigns (Reportings)
GoranSMilovanovic closed T203366: Replace {maptpx} for Topic Modeling in WDCM as Resolved.
Jun 5 2019, 10:56 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic added a comment to T203366: Replace {maptpx} for Topic Modeling in WDCM.
  • All changes implemented: {text2vec} procedures replacing {maptpx};
  • Deploying now; the test run was successful so the dashboards are already update from the new ML back-end;
  • first run in production scheduled for tomorrow, June 7/2019;
  • behavior: too many topics in the optimal model from perplexity based model-selection;
  • action: we will switch the WDCM model selection procedures from perplexity based to coherence based (as in WDCM Sitelinks and Titles).
Jun 5 2019, 10:56 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic

May 31 2019

GoranSMilovanovic added a comment to T217994: WDCM Dashboards Maintenance.
  • WDCM Geo is back on updates, re-designed, fully client-side dependent, and does not rely on the wdcm_maintable anymore (see T214586):

http://wmdeanalytics.wmflabs.org/WDCM_GeoDashboard/

May 31 2019, 11:12 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic

May 30 2019

GoranSMilovanovic added a comment to T214586: Remove goransm.wdcm_maintable from the Data Lake.
  • WDCM Geo Dashboard does not depend upon this table anymore.
May 30 2019, 8:28 AM · User-GoranSMilovanovic

May 27 2019

GoranSMilovanovic lowered the priority of T217994: WDCM Dashboards Maintenance from High to Normal.
May 27 2019, 11:10 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic added a comment to T217994: WDCM Dashboards Maintenance.
May 27 2019, 9:39 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic added a comment to T217994: WDCM Dashboards Maintenance.

@Lydia_Pintscher I got it:

May 27 2019, 9:38 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic added a comment to T217994: WDCM Dashboards Maintenance.
  • Correction. It is not the Apache Sqoop procedure in WDCM_Sqoop_Clients.R that fails (luckily!);
  • explanation: I've stumbled into an unfinished log file when I was checking this... and some databases, naturally, where missing - because the procedure did not complete all the passes yet.
  • Continue with: inspect why the usage number reported on WD_percentUsageDashboard 'oscilates'.
May 27 2019, 9:30 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic
GoranSMilovanovic raised the priority of T217994: WDCM Dashboards Maintenance from Low to High.
  • the WDCM_Sqoop_Clients.R procedure fails for some databases;
  • however, this seems to be happening on occassion only;
  • inspect and solve.
May 27 2019, 8:07 AM · WMDE-Analytics-Engineering, User-GoranSMilovanovic