JAllemandou (joal)
Data Engineer

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Feb 11 2015, 6:02 PM (174 w, 4 d)
Availability
Available
IRC Nick
joal
LDAP User
Unknown
MediaWiki User
JAllemandou (WMF)

Recent Activity

Today

JAllemandou moved T197281: Fix failing webrequest hours (upload and text 2018-06-14-11) from In Progress to In Code Review on the Analytics-Kanban board.
Mon, Jun 18, 7:47 AM · Patch-For-Review, Analytics-Kanban

Sat, Jun 16

JAllemandou added a comment to T197281: Fix failing webrequest hours (upload and text 2018-06-14-11).

I did another quick check this morning: there are some valid user-agent strings of length larger than 512 in our faulty hour (9 over 64). The other 55 are exactly the same, of length 2035.
I also have successfully parsed user-agents with a length-limit of 1024 over the faulty hour, and double checked how many user-agents would not have been parsed with various limits for another full day of raw webrequest:

  • Total number of rows for that day: 3626986512
Sat, Jun 16, 11:31 AM · Patch-For-Review, Analytics-Kanban

Fri, Jun 15

JAllemandou added a comment to T197281: Fix failing webrequest hours (upload and text 2018-06-14-11).

One problem is related to user-agent parsing for very long strings:

sudo -u hdfs spark2-shell --master yarn --conf spark.dynamicAllocation.maxExecutors=256 --jars /srv/deployment/analytics/refinery/artifacts/refinery-job.jar
Fri, Jun 15, 2:00 PM · Patch-For-Review, Analytics-Kanban
JAllemandou moved T197281: Fix failing webrequest hours (upload and text 2018-06-14-11) from Next Up to In Progress on the Analytics-Kanban board.
Fri, Jun 15, 7:23 AM · Patch-For-Review, Analytics-Kanban
JAllemandou added a project to T197281: Fix failing webrequest hours (upload and text 2018-06-14-11): Analytics-Kanban.
Fri, Jun 15, 7:23 AM · Patch-For-Review, Analytics-Kanban
JAllemandou claimed T197281: Fix failing webrequest hours (upload and text 2018-06-14-11).
Fri, Jun 15, 7:22 AM · Patch-For-Review, Analytics-Kanban
JAllemandou created T197281: Fix failing webrequest hours (upload and text 2018-06-14-11).
Fri, Jun 15, 7:21 AM · Patch-For-Review, Analytics-Kanban

Thu, Jun 14

JAllemandou moved T192483: Add data-quality check on mediawiki-history-reduced before druid indexation from Next Up to In Progress on the Analytics-Kanban board.
Thu, Jun 14, 6:10 PM · Analytics, Analytics-Kanban, Analytics-Wikistats
JAllemandou moved T196912: Scoop jars , automate generation at the beginning of job from In Progress to In Code Review on the Analytics-Kanban board.
Thu, Jun 14, 6:05 PM · Patch-For-Review, Analytics, Analytics-Kanban
JAllemandou claimed T196912: Scoop jars , automate generation at the beginning of job.
Thu, Jun 14, 4:35 PM · Patch-For-Review, Analytics, Analytics-Kanban
JAllemandou moved T196912: Scoop jars , automate generation at the beginning of job from Next Up to In Progress on the Analytics-Kanban board.
Thu, Jun 14, 4:34 PM · Patch-For-Review, Analytics, Analytics-Kanban
JAllemandou moved T196737: Fix issue with prod/labs jars for sqoop from In Progress to Done on the Analytics-Kanban board.
Thu, Jun 14, 4:34 PM · Analytics-Kanban
JAllemandou changed the point value for T196737: Fix issue with prod/labs jars for sqoop from 5 to 3.
Thu, Jun 14, 4:34 PM · Analytics-Kanban

Tue, Jun 12

JAllemandou moved T192481: Add Mediawiki-History data-quality check stage in oozie using statistics from In Progress to In Code Review on the Analytics-Kanban board.
Tue, Jun 12, 2:58 PM · Patch-For-Review, Analytics, Analytics-Kanban, Analytics-Wikistats

Fri, Jun 8

JAllemandou claimed T196737: Fix issue with prod/labs jars for sqoop .
Fri, Jun 8, 11:54 AM · Analytics-Kanban
JAllemandou moved T196737: Fix issue with prod/labs jars for sqoop from Next Up to In Progress on the Analytics-Kanban board.
Fri, Jun 8, 11:54 AM · Analytics-Kanban
JAllemandou created T196737: Fix issue with prod/labs jars for sqoop .
Fri, Jun 8, 11:54 AM · Analytics-Kanban

Tue, Jun 5

JAllemandou moved T192481: Add Mediawiki-History data-quality check stage in oozie using statistics from Ready to Deploy to In Progress on the Analytics-Kanban board.
Tue, Jun 5, 3:02 PM · Patch-For-Review, Analytics, Analytics-Kanban, Analytics-Wikistats
JAllemandou moved T195882: Update oozie druid loading job to facilitate test indexation and prevent prod indexation by mistake from In Progress to Done on the Analytics-Kanban board.
Tue, Jun 5, 3:01 PM · Analytics-Kanban

Mon, Jun 4

JAllemandou updated the task description for T176815: Investigate the full-text search pattern on mobile web.
Mon, Jun 4, 7:28 AM · Product-Analytics, Discovery-Analysis (Current work)
JAllemandou added a comment to T196318: Error when accessing webrequest on hue.wikimedia.org.

For Hive to support JSON files with 1 record per line, explicit import of the hcatalog jar in session is needed (see https://github.com/wikimedia/analytics-refinery/blob/master/hive/webrequest/create_webrequest_raw_table.hql#L19). I assume hue doesn't do it by default. Let's keep this ticket to see if we can do anything for this.
On a related but different matter: wmf_raw shouldn't be use by regular users. wmf is prefered, its data being stored as parquet etc.

Mon, Jun 4, 6:48 AM · Analytics

Tue, May 29

JAllemandou added a comment to T195837: Requesting access to analytics-privatedata-users for gilles.

@Gilles : Feel free to ping when you're in if you want some help on the data or the way to play with it.

Tue, May 29, 6:33 PM · Performance-Team (Radar), Patch-For-Review, Analytics, Operations, SRE-Access-Requests
JAllemandou moved T195882: Update oozie druid loading job to facilitate test indexation and prevent prod indexation by mistake from Next Up to In Progress on the Analytics-Kanban board.
Tue, May 29, 5:44 PM · Analytics-Kanban
JAllemandou claimed T195882: Update oozie druid loading job to facilitate test indexation and prevent prod indexation by mistake.
Tue, May 29, 4:18 PM · Analytics-Kanban
JAllemandou created T195882: Update oozie druid loading job to facilitate test indexation and prevent prod indexation by mistake.
Tue, May 29, 4:17 PM · Analytics-Kanban

Tue, May 22

JAllemandou moved T192481: Add Mediawiki-History data-quality check stage in oozie using statistics from Next Up to In Progress on the Analytics-Kanban board.
Tue, May 22, 12:04 PM · Patch-For-Review, Analytics, Analytics-Kanban, Analytics-Wikistats
JAllemandou added a project to T194741: Productionize monthly article quality prediction datasets: Analytics.
Tue, May 22, 12:01 PM · Analytics, Scoring-platform-team, artificial-intelligence, draftquality-modeling
JAllemandou added a comment to T194741: Productionize monthly article quality prediction datasets.

Adding analytics tag :)

Tue, May 22, 12:01 PM · Analytics, Scoring-platform-team, artificial-intelligence, draftquality-modeling

May 17 2018

JAllemandou moved T192464: Update ua-parser package. Both uap-java and uap-core from In Code Review to Ready to Deploy on the Analytics-Kanban board.
May 17 2018, 4:11 PM · Patch-For-Review, Analytics-Kanban, Analytics
JAllemandou moved T192529: Update version of ua-parser in eventlogging from Ready to Deploy to In Code Review on the Analytics-Kanban board.
May 17 2018, 4:11 PM · Patch-For-Review, Analytics-Kanban, Analytics
JAllemandou moved T192529: Update version of ua-parser in eventlogging from In Code Review to Ready to Deploy on the Analytics-Kanban board.
May 17 2018, 4:11 PM · Patch-For-Review, Analytics-Kanban, Analytics
JAllemandou moved T193387: Add druid datasources as configuration parameter in AQS from In Code Review to Done on the Analytics-Kanban board.
May 17 2018, 4:09 PM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats
JAllemandou moved T194427: Deploy Turnilo (possible pivot replacement) from Ready to Deploy to Done on the Analytics-Kanban board.
May 17 2018, 4:07 PM · Patch-For-Review, Analytics-Kanban, Analytics
JAllemandou moved T194309: Add nyc.wikimedia to pageviews whitelist from Ready to Deploy to Done on the Analytics-Kanban board.
May 17 2018, 4:03 PM · Analytics-Kanban, Patch-For-Review, Analytics, Pageviews-API

May 15 2018

JAllemandou moved T193388: Index by-snapshot mediawiki-history-reduced in druid from In Code Review to Done on the Analytics-Kanban board.
May 15 2018, 10:38 AM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats
JAllemandou moved T192482: Make mediawiki-history-reduced table permanent (snapshot partitioning) from In Code Review to Done on the Analytics-Kanban board.
May 15 2018, 10:38 AM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats

May 10 2018

JAllemandou moved T194309: Add nyc.wikimedia to pageviews whitelist from In Code Review to Ready to Deploy on the Analytics-Kanban board.
May 10 2018, 6:52 PM · Analytics-Kanban, Patch-For-Review, Analytics, Pageviews-API
JAllemandou moved T194309: Add nyc.wikimedia to pageviews whitelist from Next Up to In Code Review on the Analytics-Kanban board.
May 10 2018, 6:54 AM · Analytics-Kanban, Patch-For-Review, Analytics, Pageviews-API
JAllemandou claimed T194309: Add nyc.wikimedia to pageviews whitelist.
May 10 2018, 6:53 AM · Analytics-Kanban, Patch-For-Review, Analytics, Pageviews-API

May 8 2018

JAllemandou moved T194075: 2018-03 snapshot still broken from In Progress to Done on the Analytics-Kanban board.
May 8 2018, 5:59 PM · Analytics-Kanban, Analytics
JAllemandou added a comment to T194075: 2018-03 snapshot still broken.

Fixed today.

select snapshot, event_entity, count(*) from mediawiki_history where snapshot in ('2018-03', '2018-02', '2018-04') group by snapshot, event_entity;
May 8 2018, 5:59 PM · Analytics-Kanban, Analytics
JAllemandou moved T194075: 2018-03 snapshot still broken from Next Up to In Progress on the Analytics-Kanban board.
May 8 2018, 3:31 PM · Analytics-Kanban, Analytics

May 2 2018

JAllemandou moved T192841: Wikistats Bug: all but 2018 data missing? from In Code Review to Done on the Analytics-Kanban board.
May 2 2018, 4:10 PM · Patch-For-Review, Analytics-Kanban, Analytics, Analytics-Wikistats
JAllemandou moved T188556: Add a --dry-run option to the sqoop script from Ready to Deploy to Done on the Analytics-Kanban board.
May 2 2018, 4:10 PM · Patch-For-Review, Analytics-Kanban, Analytics
JAllemandou moved T191714: Add Ecosia and Startpage to list of search engines from Ready to Deploy to Done on the Analytics-Kanban board.
May 2 2018, 4:10 PM · Analytics-Kanban, Patch-For-Review, Analytics

Apr 30 2018

JAllemandou moved T193230: EventBus HTTP Proxy service does not report errors to logstash from Radar to Operational Excellence on the Analytics board.
Apr 30 2018, 4:38 PM · Services (done), Analytics-Kanban, Wikimedia-Logstash, EventBus, Analytics
JAllemandou moved T193388: Index by-snapshot mediawiki-history-reduced in druid from Next Up to In Code Review on the Analytics-Kanban board.
Apr 30 2018, 11:13 AM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats
JAllemandou created T193388: Index by-snapshot mediawiki-history-reduced in druid .
Apr 30 2018, 11:11 AM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats
JAllemandou moved T193387: Add druid datasources as configuration parameter in AQS from In Progress to In Code Review on the Analytics-Kanban board.
Apr 30 2018, 10:50 AM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats
JAllemandou moved T193387: Add druid datasources as configuration parameter in AQS from Next Up to In Progress on the Analytics-Kanban board.
Apr 30 2018, 10:50 AM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats
JAllemandou created T193387: Add druid datasources as configuration parameter in AQS.
Apr 30 2018, 10:50 AM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats

Apr 27 2018

JAllemandou awarded T193257: Hadoop HDFS Namenode shutdown on 26/04/2018 a Y So Serious token.
Apr 27 2018, 5:09 PM · Patch-For-Review, Analytics, Analytics-Kanban

Apr 26 2018

JAllemandou added a comment to T136732: Puppetize job that saves old versions of Maxmind geoIP database.

+1 for weekly on wednesday. Thanks @fdans and @faidon :)

Apr 26 2018, 1:01 PM · Puppet, Patch-For-Review, Analytics-Kanban

Apr 25 2018

JAllemandou claimed T192841: Wikistats Bug: all but 2018 data missing?.
Apr 25 2018, 1:44 PM · Patch-For-Review, Analytics-Kanban, Analytics, Analytics-Wikistats
JAllemandou moved T192841: Wikistats Bug: all but 2018 data missing? from Done to In Code Review on the Analytics-Kanban board.
Apr 25 2018, 1:44 PM · Patch-For-Review, Analytics-Kanban, Analytics, Analytics-Wikistats

Apr 24 2018

JAllemandou added a comment to T192841: Wikistats Bug: all but 2018 data missing?.

Job finished, data is up to date. Thanks @Nuria and @Milimetric for having spotted the problem and quick fix it !

Apr 24 2018, 12:07 PM · Patch-For-Review, Analytics-Kanban, Analytics, Analytics-Wikistats
JAllemandou added a comment to T192841: Wikistats Bug: all but 2018 data missing?.

@Nuria actions fixed the problem for data up to 2018-02. I restarted a job ending in 2018-03 as the problem is not related to snapshots but to wrong indexation while testing. Will follow up later today when back from day off.

Apr 24 2018, 8:10 AM · Patch-For-Review, Analytics-Kanban, Analytics, Analytics-Wikistats
JAllemandou updated subscribers of T192840: Wikistats: Data Regression.

@Milimetric and @Nuria: This problem is due to me having testing the mediawiki-reduced new job, without disabling the indexation part of it :( I used fake data to test the job, therefore fake data got indexed as well.
I'm super about that. I have launched a manual reindexation job, this should be fixed during the day.

Apr 24 2018, 7:01 AM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats, Analytics

Apr 23 2018

JAllemandou added a comment to T192348: SparkR on Spark 2.3.0 - Testing on Large Data Sets.

Hi @GoranSMilovanovic ,
The problem I see in your code is that you instanciate the dataframe as a R structure, and then convert it to spark.
The first steps involves creating the dataframe and loading the datagrame in R, which involves only the driver. Since your driver has 4g RAM, you have a memory error.
When dealing with big datasets, you should use spark reading functions (they don't load the full datasets in the driver).
I found

df <- read.df(csvPath, "csv", header = "true", inferSchema = "true", na.strings = "NA")

in https://spark.apache.org/docs/latest/sparkr.html
Maybe you could try that?
Cheers

Apr 23 2018, 2:51 PM · User-GoranSMilovanovic, Analytics-Kanban, Patch-For-Review, WMDE-Analytics-Engineering
JAllemandou added a comment to T164008: Update druid to 0.10.

+1 for commenting the global check :)

Apr 23 2018, 9:53 AM · Analytics-Kanban, User-Elukey, Analytics, Patch-For-Review

Apr 20 2018

JAllemandou set the point value for T192482: Make mediawiki-history-reduced table permanent (snapshot partitioning) to 3.
Apr 20 2018, 6:56 PM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats
JAllemandou moved T192482: Make mediawiki-history-reduced table permanent (snapshot partitioning) from Next Up to In Code Review on the Analytics-Kanban board.
Apr 20 2018, 6:56 PM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats
JAllemandou moved T188669: Update user_history and page_history column naming convention from In Progress to Done on the Analytics-Kanban board.
Apr 20 2018, 6:24 PM · Analytics-Kanban, Analytics
JAllemandou added a comment to T188669: Update user_history and page_history column naming convention.

Done.

Apr 20 2018, 6:23 PM · Analytics-Kanban, Analytics
JAllemandou moved T188669: Update user_history and page_history column naming convention from Next Up to In Progress on the Analytics-Kanban board.
Apr 20 2018, 5:41 PM · Analytics-Kanban, Analytics
JAllemandou added a comment to T188669: Update user_history and page_history column naming convention.

Due to https://gerrit.wikimedia.org/r/#/c/388265/ not having been reflected on wmf.mediawiki_user_history and wmf.mediawiki_page_history, we expected field definition issues.

Apr 20 2018, 5:31 PM · Analytics-Kanban, Analytics
JAllemandou claimed T188669: Update user_history and page_history column naming convention.
Apr 20 2018, 5:15 PM · Analytics-Kanban, Analytics
JAllemandou added a comment to T164008: Update druid to 0.10.

After memory tricks from @elukey , both hadoop indexation and realtime indexation went fine (without any change - Incredible).
Let's plan on an update next week for the druid-analytics cluster.

Apr 20 2018, 5:05 PM · Analytics-Kanban, User-Elukey, Analytics, Patch-For-Review

Apr 19 2018

JAllemandou added a comment to T164008: Update druid to 0.10.

First step of testing confirmed on labs with druid 0.9.2:

  • Indexation from hadoop
  • Realtime indexation with tranquility
Apr 19 2018, 9:46 PM · Analytics-Kanban, User-Elukey, Analytics, Patch-For-Review
JAllemandou added a comment to T192348: SparkR on Spark 2.3.0 - Testing on Large Data Sets.

Hi @GoranSMilovanovic.
I am not fluent in sparkR but here are a few thoughts:

Apr 19 2018, 1:48 PM · User-GoranSMilovanovic, Analytics-Kanban, Patch-For-Review, WMDE-Analytics-Engineering

Apr 18 2018

JAllemandou added a comment to T177965: Wikistats 2 Backend: Resiliency, Rollback and Deployment of Data.

Anoher round of discussion with team:

  • Quality checks should happen before data gets loaded into druid
  • Since T155507, we now have statistics over the data generated by Mediawiki-history reconstruction job. The first layer of data quality checking should happen there (subtask: T192481)
  • Another layer of data-quality check should be done over he mediawiki-history-reduced dataset. This implies keeping the data instead of deleting it after druid indexation (subtask: T192482). A new job step would then check data similarity between previous and current snapshot (subtask: T192483).
  • With those checks satisfied, we are ok to index the data in druid, then cache-warming and datasource-swap should happen (no task yet).
Apr 18 2018, 6:51 PM · Analytics-Kanban, Analytics-Wikistats
JAllemandou created T192483: Add data-quality check on mediawiki-history-reduced before druid indexation.
Apr 18 2018, 6:51 PM · Analytics, Analytics-Kanban, Analytics-Wikistats
JAllemandou created T192482: Make mediawiki-history-reduced table permanent (snapshot partitioning).
Apr 18 2018, 6:48 PM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats
JAllemandou created T192481: Add Mediawiki-History data-quality check stage in oozie using statistics.
Apr 18 2018, 6:45 PM · Patch-For-Review, Analytics, Analytics-Kanban, Analytics-Wikistats
JAllemandou renamed T177965: Wikistats 2 Backend: Resiliency, Rollback and Deployment of Data from Beta Release: Resiliency, Rollback and Deployment of Data to Wikistats 2 Backend: Resiliency, Rollback and Deployment of Data.
Apr 18 2018, 4:56 PM · Analytics-Kanban, Analytics-Wikistats
JAllemandou moved T177965: Wikistats 2 Backend: Resiliency, Rollback and Deployment of Data from In Progress to Parent Tasks on the Analytics-Kanban board.
Apr 18 2018, 4:56 PM · Analytics-Kanban, Analytics-Wikistats
JAllemandou edited projects for T177965: Wikistats 2 Backend: Resiliency, Rollback and Deployment of Data, added: Analytics-Kanban; removed Analytics.
Apr 18 2018, 4:55 PM · Analytics-Kanban, Analytics-Wikistats

Apr 16 2018

JAllemandou set the point value for T189449: Improve mediwiki-history performance to 21.
Apr 16 2018, 8:26 AM · Patch-For-Review, Analytics-Kanban
JAllemandou claimed T185419: Mediacounts missing top1000 files after 2018-01-01.
Apr 16 2018, 8:25 AM · Patch-For-Review, Analytics-Kanban, Datasets-Webstatscollector, Datasets-Archiving, Analytics-Cluster
JAllemandou moved T185419: Mediacounts missing top1000 files after 2018-01-01 from Paused to Done on the Analytics-Kanban board.
Apr 16 2018, 8:25 AM · Patch-For-Review, Analytics-Kanban, Datasets-Webstatscollector, Datasets-Archiving, Analytics-Cluster
JAllemandou renamed T164020: Use spark to split webrequest on tags from Use hive dynamic partitioning to split webrequest on tags to Use spark to split webrequest on tags.
Apr 16 2018, 7:04 AM · Patch-For-Review, Analytics-Kanban
JAllemandou moved T190058: Make 'metric' field not a partition in mediawiki_metrics from In Code Review to Done on the Analytics-Kanban board.
Apr 16 2018, 7:04 AM · Patch-For-Review, Analytics-Kanban
JAllemandou moved T159962: Spark 2 as cluster default (working with oozie) from In Code Review to Done on the Analytics-Kanban board.
Apr 16 2018, 7:04 AM · Patch-For-Review, Analytics-Kanban
JAllemandou moved T155507: Meta-statistics on MediaWiki history reconstruction process from In Code Review to Done on the Analytics-Kanban board.
Apr 16 2018, 7:04 AM · Analytics-Kanban
JAllemandou moved T189449: Improve mediwiki-history performance from In Code Review to Done on the Analytics-Kanban board.
Apr 16 2018, 7:03 AM · Patch-For-Review, Analytics-Kanban

Apr 12 2018

JAllemandou added a comment to T184576: Make an Analytics Data Lake table to provide meta info about wikis .

Quick note: Knowing the domain of any project, it's relatively easy to extract the project-family and the language (if any).

Apr 12 2018, 9:16 AM · Product-Analytics, Analytics, Contributors-Analysis

Apr 6 2018

JAllemandou moved T188025: Create refinery-spark package from Next Up to In Code Review on the Analytics-Kanban board.
Apr 6 2018, 11:17 AM · Patch-For-Review, Analytics-Kanban
JAllemandou claimed T188025: Create refinery-spark package.
Apr 6 2018, 11:17 AM · Patch-For-Review, Analytics-Kanban

Apr 5 2018

JAllemandou moved T190058: Make 'metric' field not a partition in mediawiki_metrics from In Progress to In Code Review on the Analytics-Kanban board.
Apr 5 2018, 7:19 PM · Patch-For-Review, Analytics-Kanban
JAllemandou added a comment to T191412: Add raw sites table to Analytics Data Lake.

Hi @Neil - Would wmf_raw.mediawiki_project_namespace_map saisfy the need ? This table is updated every month (snapshot partition) and is defined as explained here in github.

Apr 5 2018, 4:47 PM · Analytics, Contributors-Analysis
JAllemandou moved T190058: Make 'metric' field not a partition in mediawiki_metrics from Next Up to In Progress on the Analytics-Kanban board.
Apr 5 2018, 9:16 AM · Patch-For-Review, Analytics-Kanban
JAllemandou claimed T190058: Make 'metric' field not a partition in mediawiki_metrics.
Apr 5 2018, 9:16 AM · Patch-For-Review, Analytics-Kanban
JAllemandou moved T164020: Use spark to split webrequest on tags from Paused to In Progress on the Analytics-Kanban board.
Apr 5 2018, 9:15 AM · Patch-For-Review, Analytics-Kanban

Mar 30 2018

JAllemandou added a comment to T190409: Checklist for geowiki pipeline.

@Milimetric : I have modifed https://wikitech.wikimedia.org/wiki/Analytics/Systems/Druid#Delete_segments_from_deep_storage for better understanding of the issue you encountered.
I htink you were trying o delete data hat was still available in historical nodes - And druid doesn't let you do that.
I have first disabled the datasource in coordinator UI, then I used the command you pasted, with added parameter not to check for datasource availability (since I disabled it).

/srv/deployment/analytics/refinery/bin/refinery-drop-druid-deep-storage-data -d 1 -v mediawiki-geowiki-daily --no-datasource-check

It worked as far as I an tell:

hdfs dfs -ls /user/druid/deep-storage/mediawiki-geowiki-daily
ls: `/user/druid/deep-storage/mediawiki-geowiki-daily': No such file or directory
Mar 30 2018, 9:48 AM · Patch-For-Review, Analytics-Kanban

Mar 29 2018

JAllemandou moved T184541: Update AQS pageview-top definition from Ready to Deploy to Done on the Analytics-Kanban board.
Mar 29 2018, 9:00 AM · Analytics-Kanban, Services (done), Patch-For-Review, RESTBase-API
JAllemandou moved T188113: Oozie job to compute geowiki on top of sqooped data from Ready to Deploy to Done on the Analytics-Kanban board.
Mar 29 2018, 8:59 AM · Patch-For-Review, Analytics-Kanban

Mar 28 2018

JAllemandou updated the task description for T190459: Phasing away one of the mobile apps session metrics jobs..
Mar 28 2018, 6:04 PM · Patch-For-Review, Analytics-Kanban, Analytics
JAllemandou moved T190459: Phasing away one of the mobile apps session metrics jobs. from Ready to Deploy to Done on the Analytics-Kanban board.
Mar 28 2018, 6:03 PM · Patch-For-Review, Analytics-Kanban, Analytics
JAllemandou moved T189448: Correct mediawiki-reduced loading job from Ready to Deploy to Done on the Analytics-Kanban board.
Mar 28 2018, 6:01 PM · Patch-For-Review, Analytics-Kanban
JAllemandou moved T189740: unique devices data for january not in cassandra from Ready to Deploy to Done on the Analytics-Kanban board.
Mar 28 2018, 5:58 PM · Patch-For-Review, Analytics-Kanban