Page MenuHomePhabricator
Feed Advanced Search

Yesterday

JAllemandou moved T230312: Add literal transcoding to media file properties UDF from Ready to Deploy to Done on the Analytics-Kanban board.
Thu, Aug 22, 4:05 PM · Patch-For-Review, Analytics-Kanban, StructuredDataOnCommons, Analytics, Tool-Pageviews
JAllemandou moved T227896: Make oozie swift upload emit event to Kafka about swift object upload complete from Ready to Deploy to Done on the Analytics-Kanban board.
Thu, Aug 22, 12:01 PM · Patch-For-Review, Analytics-Kanban, Research-Backlog, Operations, Discovery, Analytics
JAllemandou moved T228291: Refine should accept principal name for hive2 jdbc connection for DDL from Ready to Deploy to Done on the Analytics-Kanban board.
Thu, Aug 22, 12:01 PM · Patch-For-Review, Analytics-Kanban, Analytics
JAllemandou created T231002: Refactor quenename into HQL hive2 action oozie jobs.
Thu, Aug 22, 11:40 AM · Patch-For-Review, Analytics

Wed, Aug 21

JAllemandou updated subscribers of T230853: Turnilo : Issue with Is Deleted and Is Reverted dimensions when added as Split.

Hi @Mayakp.wiki - Thanks for reporting!
I can't say if it's related to the Turnilo new version Luca just deployed (thanks @elukey :) but it seems fixed for me.
I assume the issue was a rounding problem with different scales: with 1 decimal, 12.4k appears as 0.0m.
Can you double check now?

Wed, Aug 21, 3:40 PM · Analytics

Tue, Aug 20

JAllemandou closed T229143: Access to HUE for Mayakpwiki as Resolved.
Tue, Aug 20, 7:38 PM · Operations, Analytics
JAllemandou added a comment to T229143: Access to HUE for Mayakpwiki.

Action has been taken that should have granted access to shell username Mayakpwiki.
@Mayakp.wiki can you test please? :)

Tue, Aug 20, 7:01 PM · Operations, Analytics

Jul 3 2019

JAllemandou added a comment to T220507: Decide: start_timestamp for mediawiki history.

Adding a comment:

Jul 3 2019, 11:41 AM · Analytics-Kanban, Analytics

Jul 1 2019

JAllemandou moved T221825: Mediawiki-history release - Snapshot 2019-06 from In Code Review to Done on the Analytics-Kanban board.
Jul 1 2019, 8:40 PM · Analytics-Kanban, Analytics
JAllemandou moved T220507: Decide: start_timestamp for mediawiki history from In Code Review to Done on the Analytics-Kanban board.
Jul 1 2019, 8:40 PM · Analytics-Kanban, Analytics
JAllemandou moved T221338: Many revision events in mediawiki_history have missing page and namespace information from In Code Review to Done on the Analytics-Kanban board.
Jul 1 2019, 8:40 PM · Analytics-Kanban, Analytics-Data-Quality, Analytics, Product-Analytics
JAllemandou moved T190434: Issues with page deleted dates on data lake from In Code Review to Done on the Analytics-Kanban board.
Jul 1 2019, 8:40 PM · Patch-For-Review, Analytics, Analytics-Kanban
JAllemandou moved T214490: page_creation_timestamp not always correct in mediawiki_history from In Code Review to Done on the Analytics-Kanban board.
Jul 1 2019, 8:40 PM · Analytics-Kanban, Product-Analytics, Analytics-Data-Quality, Analytics
JAllemandou moved T225247: Update bot user check in mediawiki-user-history-checker to use historical bot values from Ready to Deploy to Done on the Analytics-Kanban board.
Jul 1 2019, 8:40 PM · Analytics-Kanban, Analytics
JAllemandou moved T205594: mediawiki_history missing page events from Ready to Deploy to Done on the Analytics-Kanban board.
Jul 1 2019, 8:40 PM · Analytics-Kanban, Analytics-Data-Quality, Analytics, Contributors-Analysis, Product-Analytics
JAllemandou renamed T221825: Mediawiki-history release - Snapshot 2019-06 from Mediawiki-history release - Snapshot 2019-05 to Mediawiki-history release - Snapshot 2019-06.
Jul 1 2019, 8:39 PM · Analytics-Kanban, Analytics
JAllemandou added a comment to T220507: Decide: start_timestamp for mediawiki history.

We have not implemented the proposal defined here for page-create event timestamp definition. I let @Milimetric explain (either here or in sync-up meeting, might be easier face to face).

Jul 1 2019, 8:39 PM · Analytics-Kanban, Analytics
JAllemandou added a comment to T221338: Many revision events in mediawiki_history have missing page and namespace information.

This is solved from snapshot 2019-05 onward thanks to the rebuild of the page-history reconstruction algorithm:

Jul 1 2019, 8:32 PM · Analytics-Kanban, Analytics-Data-Quality, Analytics, Product-Analytics
JAllemandou added a comment to T190434: Issues with page deleted dates on data lake .

Improved greatly by the last page-history reconstruction refactor:

Jul 1 2019, 8:19 PM · Patch-For-Review, Analytics, Analytics-Kanban
JAllemandou added a comment to T214490: page_creation_timestamp not always correct in mediawiki_history.

This is solved in snapshot 2019-05 onward.
Some explanation:

  • The page_first_edit_timestamp is the field containing the interesting value, not page_creation_timestamp as this one should reflect the timestamp of the first create event. Most of the time, they are equal, but they can differ for pages having complicated histories with deletes and restores.
  • The page_first_edit_timestamp is not always equal to the timestamp of revision having parent_page_id = 0, as the dataset also use archive revision (therefore the first revision can be an archived one), and because complex histories can also lead to multiple revisions having parent_page_id = 0 in their history.
Jul 1 2019, 7:00 AM · Analytics-Kanban, Product-Analytics, Analytics-Data-Quality, Analytics

Jun 28 2019

JAllemandou moved T215863: Coarse alarm on data quality for refined data based on entrophy calculations from Ready to Deploy to Done on the Analytics-Kanban board.
Jun 28 2019, 7:48 PM · Patch-For-Review, Analytics-Kanban, Analytics
JAllemandou moved T225792: Exclude doc.wikimedia.org from pageview definition from Ready to Deploy to Done on the Analytics-Kanban board.
Jun 28 2019, 7:48 PM · Analytics-Kanban, Analytics

Jun 27 2019

JAllemandou moved T205594: mediawiki_history missing page events from In Code Review to Ready to Deploy on the Analytics-Kanban board.
Jun 27 2019, 7:56 AM · Analytics-Kanban, Analytics-Data-Quality, Analytics, Contributors-Analysis, Product-Analytics
JAllemandou added a comment to T205594: mediawiki_history missing page events.

Results confirmed after page-history algorithm refactor. Marking as done :)

Jun 27 2019, 7:56 AM · Analytics-Kanban, Analytics-Data-Quality, Analytics, Contributors-Analysis, Product-Analytics
JAllemandou placed T186559: Provide data dumps in the Analytics Data Lake up for grabs.
Jun 27 2019, 7:46 AM · Analytics
JAllemandou placed T188265: Active Editors metric per project family up for grabs.
Jun 27 2019, 7:44 AM · Analytics, Analytics-Wikistats
JAllemandou placed T204965: Create report for "articles with most contributors" in Wikistats2 up for grabs.
Jun 27 2019, 7:44 AM · Patch-For-Review, Analytics-Wikistats, Analytics
JAllemandou raised the priority of T190434: Issues with page deleted dates on data lake from Normal to High.
Jun 27 2019, 7:43 AM · Patch-For-Review, Analytics, Analytics-Kanban
JAllemandou placed T218824: A few alterblocks events have event_timestamps from before 2001 up for grabs.
Jun 27 2019, 7:43 AM · Analytics, Analytics-Data-Quality, Product-Analytics
JAllemandou raised the priority of T226338: Drop of editor numbers for earlier months from Normal to High.
Jun 27 2019, 7:42 AM · Analytics-Kanban, Analytics
JAllemandou moved T226338: Drop of editor numbers for earlier months from In Progress to Done on the Analytics-Kanban board.
Jun 27 2019, 7:41 AM · Analytics-Kanban, Analytics
JAllemandou removed a project from T188265: Active Editors metric per project family : Analytics-Kanban.
Jun 27 2019, 7:41 AM · Analytics, Analytics-Wikistats
JAllemandou removed the point value for T204965: Create report for "articles with most contributors" in Wikistats2.
Jun 27 2019, 7:40 AM · Patch-For-Review, Analytics-Wikistats, Analytics
JAllemandou added a comment to T221482: Identify imported revisions in mediawiki_history.

Some information in that respect is provided as part of T221825 with the new field page_is_from_before_page_creation. But this is incomplete as it only accounts for pages imported before the page creation, not after.

Jun 27 2019, 7:16 AM · Analytics, Product-Analytics
JAllemandou removed a project from T218824: A few alterblocks events have event_timestamps from before 2001: Analytics-Kanban.
Jun 27 2019, 7:14 AM · Analytics, Analytics-Data-Quality, Product-Analytics
JAllemandou added a comment to T218824: A few alterblocks events have event_timestamps from before 2001.

I haven't have time to fix this with this bunch of changes. Keeping it in backlog of things to do for mediawiki_history.

Jun 27 2019, 7:14 AM · Analytics, Analytics-Data-Quality, Product-Analytics
JAllemandou added a comment to T211627: Mediawiki history has no data on IP blocks.

Actually I haven't had time to tackle this issue in this round of change, sorry about that :(
Keeping the task in the bakclog of things to do for mediawiki-history.

Jun 27 2019, 7:13 AM · Anti-Harassment, Product-Analytics, Analytics
JAllemandou added a comment to T226338: Drop of editor numbers for earlier months.

Done ! Sorry for the delay.

Jun 27 2019, 7:11 AM · Analytics-Kanban, Analytics
JAllemandou moved T225247: Update bot user check in mediawiki-user-history-checker to use historical bot values from In Code Review to Ready to Deploy on the Analytics-Kanban board.
Jun 27 2019, 6:52 AM · Analytics-Kanban, Analytics

Jun 25 2019

Samat awarded T226338: Drop of editor numbers for earlier months a Like token.
Jun 25 2019, 5:27 PM · Analytics-Kanban, Analytics
JAllemandou added a comment to T226338: Drop of editor numbers for earlier months.

Thanks a lot @Samat for the details.
Indeed you were right the difference is to be accounted for a methodological change. I'm sorry not to have noticed right away.
From the month 2019-05 onward, we have changed the way editors were computed by removing the edits on deleted pages.
We did this to be more homogeneous, as other metrics (edits and edited-pages for instance) were already computed with deleted-edits removal.

Jun 25 2019, 1:54 PM · Analytics-Kanban, Analytics

Jun 24 2019

JAllemandou added a comment to T226227: Keep webrequest_sampled_128's druid segments for more than a week.

Correct (see https://druid.apache.org/docs/latest/tutorials/tutorial-delete-data.html, paragraph How to permanently delete data). We can also use API calls to mark segments as unused if we prefer not using rules.

Jun 24 2019, 6:52 PM · Patch-For-Review, Analytics-Kanban, Analytics
JAllemandou added a comment to T226227: Keep webrequest_sampled_128's druid segments for more than a week.

With better/more precise explanation:

  • In order for data to be dropped from deepstorage, it needs to be unloaded from historical nodes. This can be done in 2 ways: disabling a full datasource, or disabling segments using rules.
  • Once segments are disabled, you can run the kill task to drop them.

Given the need to use rules to disable segments from historical, I'd rather keep the max data in hadoop (no storage issue so far).

Jun 24 2019, 6:24 PM · Patch-For-Review, Analytics-Kanban, Analytics
JAllemandou added a comment to T226227: Keep webrequest_sampled_128's druid segments for more than a week.

@Nuria: We on purpose did it the way it is setup, in order to facilitate loading data in druid in case it is needed (data present in deep-storage for 60 days) while still keeping space on druid.
Having agreed we should keep 1 month of data in druid, I still recommend using rules to unload data after 1 month and keep 60 days in deep storage, as 2 month means 2Tb per server in druid, probably too much.

Jun 24 2019, 5:39 PM · Patch-For-Review, Analytics-Kanban, Analytics
JAllemandou added a comment to T226338: Drop of editor numbers for earlier months.

Hi @Samat, thanks for reaching out.
It would be interesting if you could upload the files again, and also possibly confirm the URL you downloaded data from, as my tests/checks don't show differences that big.
I have checked the number of users only editors for huwiki over 4 years, looking for differences in our last 3 snapshots (we call monthly recomputations snapshots), and while there a very small deletion-drift (difference due to pages being deleted, as they are excluded from statistics computation), they are really not a 5%/10% change, more like -0.05% to -0.10%, and only for 3/4 month before last month.

Jun 24 2019, 7:57 AM · Analytics-Kanban, Analytics

Jun 21 2019

JAllemandou moved T225247: Update bot user check in mediawiki-user-history-checker to use historical bot values from Next Up to In Code Review on the Analytics-Kanban board.
Jun 21 2019, 8:21 AM · Analytics-Kanban, Analytics
JAllemandou set the point value for T225247: Update bot user check in mediawiki-user-history-checker to use historical bot values to 3.
Jun 21 2019, 8:21 AM · Analytics-Kanban, Analytics
JAllemandou moved T205594: mediawiki_history missing page events from In Progress to In Code Review on the Analytics-Kanban board.
Jun 21 2019, 8:19 AM · Analytics-Kanban, Analytics-Data-Quality, Analytics, Contributors-Analysis, Product-Analytics
JAllemandou moved T214490: page_creation_timestamp not always correct in mediawiki_history from In Progress to In Code Review on the Analytics-Kanban board.
Jun 21 2019, 8:19 AM · Analytics-Kanban, Product-Analytics, Analytics-Data-Quality, Analytics
JAllemandou moved T190434: Issues with page deleted dates on data lake from In Progress to In Code Review on the Analytics-Kanban board.
Jun 21 2019, 8:19 AM · Patch-For-Review, Analytics, Analytics-Kanban
JAllemandou moved T221338: Many revision events in mediawiki_history have missing page and namespace information from In Progress to In Code Review on the Analytics-Kanban board.
Jun 21 2019, 8:18 AM · Analytics-Kanban, Analytics-Data-Quality, Analytics, Product-Analytics
JAllemandou moved T221825: Mediawiki-history release - Snapshot 2019-06 from In Progress to In Code Review on the Analytics-Kanban board.
Jun 21 2019, 8:18 AM · Analytics-Kanban, Analytics
JAllemandou moved T220507: Decide: start_timestamp for mediawiki history from In Progress to In Code Review on the Analytics-Kanban board.
Jun 21 2019, 8:18 AM · Analytics-Kanban, Analytics

Jun 18 2019

JAllemandou moved T225178: New directories created under /wmf/data/event_sanitized and /wmf/data/event_sanitized are owned by yarn:analytics from Ready to Deploy to Done on the Analytics-Kanban board.
Jun 18 2019, 4:03 PM · Analytics-Kanban, Patch-For-Review, Analytics

Jun 17 2019

JAllemandou added a comment to T225786: Increased number of webrequest sequence-numbers alarms (mostly) on upload webrequest-source.

We can easily get data for older days if needed (we don't drop statistic-data).

Jun 17 2019, 11:36 AM · Traffic, Operations, Analytics

Jun 14 2019

JAllemandou added a comment to T225538: Request for a large request data set for caching research and tuning.

Hi @Nuria - Can you confirm the above request is correct for generating the data?

Jun 14 2019, 12:47 PM · Analytics
JAllemandou updated the task description for T225786: Increased number of webrequest sequence-numbers alarms (mostly) on upload webrequest-source.
Jun 14 2019, 10:59 AM · Traffic, Operations, Analytics
JAllemandou renamed T225786: Increased number of webrequest sequence-numbers alarms (mostly) on upload webrequest-source from Investigate varnish behavior change since new ATS-change in upload to Investigate varnish behavior change since new ATS-change in webrequest upload.
Jun 14 2019, 10:58 AM · Traffic, Operations, Analytics
Restricted Application added a project to T225786: Increased number of webrequest sequence-numbers alarms (mostly) on upload webrequest-source: Operations.
Jun 14 2019, 8:53 AM · Traffic, Operations, Analytics

Jun 13 2019

JAllemandou moved T225342: Empty hostnames trigger Refine eventlogging failures from In Code Review to Ready to Deploy on the Analytics-Kanban board.
Jun 13 2019, 4:19 PM · Analytics-Kanban, Patch-For-Review, Analytics

Jun 11 2019

JAllemandou added a comment to P8605 SWAP incorrect AQS results.

I found a workaround url:

"http://aqs1004.eqiad.wmnet:7232/analytics.wikimedia.org/v1/edited-pages/new/all-projects/all-editor-types/content/monthly/20190501/20190601"
Jun 11 2019, 6:40 PM
JAllemandou renamed T225247: Update bot user check in mediawiki-user-history-checker to use historical bot values from Remove bot user check from userHistory in mediawiki-history-checker to Update bot user check in mediawiki-user-history-checker to use historical bot values.
Jun 11 2019, 12:32 PM · Analytics-Kanban, Analytics

Jun 10 2019

Groceryheist awarded T186559: Provide data dumps in the Analytics Data Lake a Love token.
Jun 10 2019, 7:52 PM · Analytics
JAllemandou added a comment to T221338: Many revision events in mediawiki_history have missing page and namespace information.

Thanks for offering @Neil_P._Quinn_WMF :)
I'm still working on changing the algorithm, so no need from you as of now.
I'll let you know once I have a test dataset.

Jun 10 2019, 4:53 PM · Analytics-Kanban, Analytics-Data-Quality, Analytics, Product-Analytics

Jun 8 2019

JAllemandou created T225343: Refine failure alert seems broken - No alert email sent while jobs were failing.
Jun 8 2019, 8:22 AM · Analytics-Kanban, Analytics
JAllemandou moved T225342: Empty hostnames trigger Refine eventlogging failures from Next Up to In Code Review on the Analytics-Kanban board.
Jun 8 2019, 8:14 AM · Analytics-Kanban, Patch-For-Review, Analytics
JAllemandou claimed T225342: Empty hostnames trigger Refine eventlogging failures.
Jun 8 2019, 8:13 AM · Analytics-Kanban, Patch-For-Review, Analytics
JAllemandou added a comment to T225342: Empty hostnames trigger Refine eventlogging failures.

Issue pinpointed in the new TransformFunction applied to drop non-mediawiki data: https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/refine/TransformFunctions.scala#L105

Jun 8 2019, 7:58 AM · Analytics-Kanban, Patch-For-Review, Analytics
JAllemandou added a comment to T209655: Copy Wikidata dumps to HDFs.

@GoranSMilovanovic : You're welcome :) At some point I'll manage to have that productionize ;)

Jun 8 2019, 7:21 AM · Research-Backlog, Wikidata, Analytics

Jun 7 2019

JAllemandou added a comment to T224957: Pyspark shell shut down automatically.

Spark driver is not launched from the notebook but from the kernel, and it's configuration is not updatable on the fly, so I'm not surprised it doesn't work.
The solution is to bump driver-memory at the kernel level (see my ping to Andrew and Luca in the previous comment).

Jun 7 2019, 5:46 PM · Analytics
JAllemandou added a comment to T224957: Pyspark shell shut down automatically.

I have reproduced the error. The problem comes from driver-memory I think. I have been able to make the computation succeed for 1 day in python-notebook, and for 1 month in CLI with higher driver memory.

Jun 7 2019, 11:41 AM · Analytics
JAllemandou moved T225178: New directories created under /wmf/data/event_sanitized and /wmf/data/event_sanitized are owned by yarn:analytics from Next Up to In Code Review on the Analytics-Kanban board.
Jun 7 2019, 10:58 AM · Analytics-Kanban, Patch-For-Review, Analytics
JAllemandou added a project to T225178: New directories created under /wmf/data/event_sanitized and /wmf/data/event_sanitized are owned by yarn:analytics: Analytics-Kanban.
Jun 7 2019, 10:57 AM · Analytics-Kanban, Patch-For-Review, Analytics
JAllemandou claimed T225178: New directories created under /wmf/data/event_sanitized and /wmf/data/event_sanitized are owned by yarn:analytics.
Jun 7 2019, 10:57 AM · Analytics-Kanban, Patch-For-Review, Analytics
JAllemandou added a comment to T225178: New directories created under /wmf/data/event_sanitized and /wmf/data/event_sanitized are owned by yarn:analytics.

Issue found by manual test of DataFrameToHive (I added logging and created a small class using DataFrameToHive to test) on that line: https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-spark/src/main/scala/org/wikimedia/analytics/refinery/spark/connectors/DataFrameToHive.scala#L234

Jun 7 2019, 10:57 AM · Analytics-Kanban, Patch-For-Review, Analytics

Jun 6 2019

JAllemandou claimed T224957: Pyspark shell shut down automatically.
Jun 6 2019, 5:56 PM · Analytics
JAllemandou created T225232: Backfill EL new schemas sanitization after ownership issue fixed.
Jun 6 2019, 4:44 PM · Analytics-Kanban, Analytics

May 23 2019

JAllemandou created T224221: Contributor ID field has empty instances in 2019-05-01 dumps (was 0 in previous month).
May 23 2019, 1:09 PM · MW-1.34-notes (1.34.0-wmf.8; 2019-06-04), Dumps-Generation

May 21 2019

JAllemandou updated subscribers of T223929: Wikistats Bug: Top editors counts and time selection are not displayed correctly.

Following your path, I confirm I have the same problem you do.
Thanks a lot for reporting @Formatierer!

May 21 2019, 10:11 AM · Analytics-Kanban, Analytics, Analytics-Wikistats

May 20 2019

JAllemandou added a comment to T223929: Wikistats Bug: Top editors counts and time selection are not displayed correctly.

Hi @Formatierer - While I definitely see the snapshot, I can't reproduce on wikistats :(

May 20 2019, 6:49 PM · Analytics-Kanban, Analytics, Analytics-Wikistats

May 17 2019

JAllemandou added a comment to T222254: Pyspark on SWAP: Py4JJavaError: Import Error: no module named pyarrow.

NO WAY !!!! I'm super sorry for having derailed that :(

May 17 2019, 6:48 PM · Analytics, Analytics-Cluster
JAllemandou moved T222603: Fix oozie banner_impression monthly job from Ready to Deploy to Done on the Analytics-Kanban board.
May 17 2019, 6:06 PM · Analytics-Kanban, Analytics
JAllemandou claimed T223653: Fix mediawiki_wikitext_history SLA.
May 17 2019, 6:06 PM · Analytics-Kanban, Analytics
JAllemandou set the point value for T223653: Fix mediawiki_wikitext_history SLA to 1.
May 17 2019, 6:06 PM · Analytics-Kanban, Analytics
JAllemandou moved T223653: Fix mediawiki_wikitext_history SLA from Next Up to In Code Review on the Analytics-Kanban board.
May 17 2019, 6:06 PM · Analytics-Kanban, Analytics
JAllemandou created T223653: Fix mediawiki_wikitext_history SLA.
May 17 2019, 6:05 PM · Analytics-Kanban, Analytics

May 16 2019

JAllemandou added a comment to T220977: Investigate surprising rise in mobile page views for wikidata.

A lot trickier :)
We have the wmf_raw.mediawiki_private_cu_changes table in hive, allowing us to compute geo-editors (editors-by-country, aggregated). This table only contains 3 month of data for PII removal reasons. It's probably not enough for what you're after, but I have nothing better (see https://github.com/wikimedia/analytics-refinery/blob/master/oozie/mediawiki/geoeditors/monthly/insert_geoeditors_monthly_data.hql for an example).
I've just created T223444 to submit the general idea of having geo-editors stats split by desktop/mobile.

May 16 2019, 1:09 PM · User-GoranSMilovanovic, Wikidata, WMDE-Analytics-Engineering
JAllemandou created T223444: Update geo-editors job to use tags and report desktop/mobile edits.
May 16 2019, 1:09 PM · Product-Analytics, Analytics
JAllemandou updated the task description for T218819: Investigate discrepancies in editor metrics between Data Lake and MediaWiki replica pipelines .
May 16 2019, 7:48 AM · Product-Analytics
JAllemandou added a parent task for T220456: Many small wikis missing from mediawiki_history dataset: T221825: Mediawiki-history release - Snapshot 2019-06.
May 16 2019, 7:47 AM · Patch-For-Review, Analytics-Kanban, Analytics-Data-Quality, Analytics, Product-Analytics
JAllemandou added a subtask for T221825: Mediawiki-history release - Snapshot 2019-06: T220456: Many small wikis missing from mediawiki_history dataset.
May 16 2019, 7:47 AM · Analytics-Kanban, Analytics
JAllemandou added a comment to T221824: Mediawiki History Release - 2019-04 snapshot.

Ping @JAllemandou the tasks not closed on 2019-04 snapshot should probably be moved to 2019-05 snapshot cc @fdans

May 16 2019, 7:46 AM · Patch-For-Review, Product-Analytics, Analytics-Kanban, Analytics
JAllemandou removed a parent task for T220456: Many small wikis missing from mediawiki_history dataset: T221824: Mediawiki History Release - 2019-04 snapshot.
May 16 2019, 7:46 AM · Patch-For-Review, Analytics-Kanban, Analytics-Data-Quality, Analytics, Product-Analytics
JAllemandou removed a subtask for T221824: Mediawiki History Release - 2019-04 snapshot: T220456: Many small wikis missing from mediawiki_history dataset.
May 16 2019, 7:46 AM · Patch-For-Review, Product-Analytics, Analytics-Kanban, Analytics

May 14 2019

JAllemandou added a comment to T220977: Investigate surprising rise in mobile page views for wikidata.

Hi @Lea_WMDE and @GoranSMilovanovic - I think the answer the your problem is solved in this month snapshot with the revision_tags field of mediawiki_history:

May 14 2019, 4:03 PM · User-GoranSMilovanovic, Wikidata, WMDE-Analytics-Engineering

May 13 2019

JAllemandou moved T220111: Refactor druid data deletion script from In Code Review to Ready to Deploy on the Analytics-Kanban board.
May 13 2019, 2:54 PM · Analytics-Kanban, Analytics
JAllemandou moved T222603: Fix oozie banner_impression monthly job from Done to Ready to Deploy on the Analytics-Kanban board.
May 13 2019, 9:22 AM · Analytics-Kanban, Analytics

May 7 2019

JAllemandou moved T220507: Decide: start_timestamp for mediawiki history from Ready to Deploy to In Progress on the Analytics-Kanban board.
May 7 2019, 5:33 PM · Analytics-Kanban, Analytics
JAllemandou updated the task description for T222425: Fix jobs after mediawiki-history refactor.
May 7 2019, 5:33 PM · Patch-For-Review, Analytics-Kanban, Analytics
JAllemandou moved T213770: Remove Zero support in analytics from In Code Review to Ready to Deploy on the Analytics-Kanban board.
May 7 2019, 5:28 PM · Patch-For-Review, Analytics-Kanban, Technical-Debt, Analytics