Page MenuHomePhabricator

fkaelin
User

Projects

User does not belong to any projects.

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Nov 12 2020, 6:16 PM (21 w, 2 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
FKaelin (WMF) [ Global Accounts ]

Recent Activity

Thu, Apr 8

fkaelin added a comment to T278217: Release image data for training.

Update on the competition images dataset:

  • downloaded 300px thumbnails from swift
  • total 250gb of avro files on hdfs /user/fab/images/competition/all/pixels/
  • 6711755 images
  • 32200 images couldn't be downloaded (0.48%), /user/fab/images/competition/all/swift_errors/
Thu, Apr 8, 3:20 AM · Research (FY2020-21-Research-January-March)

Wed, Mar 31

fkaelin added a comment to T272313: Newpytyer python spark kernels.

Thanks @Ottomata, I can also confirm that the certificates work now too, ie a request with verify=False now fails on the workers as well.

Wed, Mar 31, 7:37 PM · Patch-For-Review, Analytics

Thu, Mar 25

fkaelin updated the task description for T278441: Memory errors in Spark.
Thu, Mar 25, 5:51 PM · Analytics-Kanban, Analytics
fkaelin created T278451: NullPointerException at beginning of spark job.
Thu, Mar 25, 4:07 PM · Analytics
fkaelin created T278441: Memory errors in Spark.
Thu, Mar 25, 3:08 PM · Analytics-Kanban, Analytics

Wed, Mar 24

fkaelin added a comment to T215001: Revisions missing from mediawiki_revision_create.

Thanks for the background on where the special:tag info is stored @Milimetric on irc, though since it is in the refined revisions and not the raw revisions table used so far, pulling that in is a bit more work. That said, just looking at the actual revisions that are missing does seem to indicate that multiple specific scenarios don't result in kafka events being created.

Wed, Mar 24, 4:41 PM · Analytics-Kanban, Growth-Team, Product-Analytics, Analytics
fkaelin added a comment to T215001: Revisions missing from mediawiki_revision_create.

After going a little overboard still no easy answers. I did slice and dice the data based on the query that @Milimetric provided above.

Wed, Mar 24, 4:53 AM · Analytics-Kanban, Growth-Team, Product-Analytics, Analytics

Tue, Mar 16

fkaelin added a comment to T184744: Improve access to Commons image data for research and development.

I picked this up last week again, and ran a more substantial test job using 50 workers downloading ~1million commons images (400px thumbnails) using a spark job. Some more questions before I run a job on the full datasets (~53M image files). Looking at the grafana dashboard,

  • what does the increase in put 201 in the object state-changing? cache misses for the thumbnails that get filled?

Possible but hard to say from that graph, when did the job start/finish ? I'm assuming ~23:30 to ~1:40 but best to confirm
Something else to check for thumbnailing activity is the Thumbor dashboard (for the same timeframe):
https://grafana.wikimedia.org/d/Pukjw6cWk/thumbor?orgId=1&from=1615330800000&to=1615341600000

Tue, Mar 16, 7:00 PM · User-ArielGlenn
fkaelin added a comment to T184744: Improve access to Commons image data for research and development.

messed up the link to the grafana above (edited) and adding it as a screenshot

Tue, Mar 16, 4:37 PM · User-ArielGlenn
fkaelin added a comment to T184744: Improve access to Commons image data for research and development.

I picked this up last week again, and ran a more substantial test job using 50 workers downloading ~1million commons images (400px thumbnails) using a spark job. Some more questions before I run a job on the full datasets (~53M image files). Looking at the grafana dashboard,

Tue, Mar 16, 4:33 PM · User-ArielGlenn

Mon, Mar 15

fkaelin added a comment to T276791: Configure the Hadoop cluster to use the GPUs available on some workers.

I don't think splitting the GPU machines from the yarn cluster is a far fetched idea, especially given the hurdles of making this work with yarn - though I am not familiar with Alluxio. Another option is to create a kubernetes cluster that could make use of these GPUs, which would be in line with the technology stack used by other ML infra projects currently being built (ML platform, search infra). These GPU are a good example of the gap that I perceive in regards the analytics infrastructure and ongoing efforts to build ML infrastructure. I created a doc to discuss the larger question, from the perspective of the research team as a user of ML infrastructure.

Mon, Mar 15, 3:37 PM · Analytics, Machine-Learning-Team
fkaelin added a comment to T275551: Kubeflow on stat machines.

I created a separate document to discuss some of the bigger questions around orchestration within analytics that arise from discussing the very specific use case of 'kubeflow on stat machines', any input is much appreciated. On this phab, I would like to continue discussing our short/near term options.

Mon, Mar 15, 2:12 PM · Analytics-Radar, Machine-Learning-Team, SRE

Mar 2 2021

fkaelin added a comment to T272313: Newpytyer python spark kernels.

I second Isaac`s comment. I reviewed the gh PR and tested successfully.

Mar 2 2021, 4:49 PM · Patch-For-Review, Analytics

Mar 1 2021

fkaelin added a comment to T224658: Newpyter - SWAP Juypter Rewrite.

Another observation: I attempted to use wmfdata to avoid replicating spark session code. The wmf base conda env contains an older version, and upgrading it fails with

Mar 1 2021, 5:51 PM · Analytics-Kanban, Patch-For-Review, Analytics

Feb 23 2021

fkaelin created T275551: Kubeflow on stat machines.
Feb 23 2021, 7:55 PM · Analytics-Radar, Machine-Learning-Team, SRE

Feb 11 2021

fkaelin added a comment to T182351: Make HTML dumps available.

To summarize my understanding:

  • for research, the html history is interesting because it expands templates and lua modules
  • for a revision of page p created at time t, we prefer to store the html that a reader was served at that time (ie what WikiHist does), rather than the html using the version of the templates at some time in the future (ie by calling the mediawiki api during an batch export)
  • however, if at time t+1 a template that is used by page p changed, then the reader was served a different html on wikipedia but there is not any revision for page p. Only once there is an new revision for page p at time t+2 will the change of the template at time t+1 be reflected in the history. In fact page p is not edited after time t, the template change will never be reflected in the html history of the page.
Feb 11 2021, 9:54 PM · Research, Analytics-Radar, Datasets-Archiving

Feb 10 2021

fkaelin added a comment to T182351: Make HTML dumps available.

@ArielGlenn, the dataset should contain the rendered html for all revisions, rendered with the mediawiki version at the time the revision was created. The motivation for this is described in @tizianopiccardi's paper.

Feb 10 2021, 11:54 PM · Research, Analytics-Radar, Datasets-Archiving
fkaelin claimed T182351: Make HTML dumps available.
Feb 10 2021, 7:34 PM · Research, Analytics-Radar, Datasets-Archiving

Feb 1 2021

fkaelin updated subscribers of T272973: Generalize the current Airflow puppet/scap code to deploy a dedicated Analytics instance.
Feb 1 2021, 1:42 PM · Analytics-Kanban, Analytics

Jan 22 2021

fkaelin added a comment to T184744: Improve access to Commons image data for research and development.

Thanks for the pointers @fgiunchedi and @Miriam. This approach works well, including scaling the download on spark.

Jan 22 2021, 3:11 PM · User-ArielGlenn

Jan 18 2021

fkaelin created T272313: Newpytyer python spark kernels.
Jan 18 2021, 4:52 PM · Patch-For-Review, Analytics

Jan 14 2021

fkaelin added a comment to T184744: Improve access to Commons image data for research and development.

Thanks for the information @fgiunchedi.

Jan 14 2021, 6:22 AM · User-ArielGlenn

Jan 13 2021

fkaelin added a comment to T267817: Requesting access to analytics-privatedata-users and wmf LDAP for fkaelin.

@elukey, thanks for the background and for adding my user to Hue - I was able to login.

Jan 13 2021, 2:05 PM · SRE, SRE-Access-Requests

Jan 12 2021

fkaelin updated subscribers of T267817: Requesting access to analytics-privatedata-users and wmf LDAP for fkaelin.

I am trying to access Hue, and after looking at these tasks requesting access for Hue T271602 and T252703 I am not sure if there is a template task for requesting access? If I understand @elukey 's comment, there is a new'ish way to request UI credentials only, instead of the more involved ssh access. However, I would imagine that most people requesting ssh access will also end up using the UI, so would it make to sense to create the the UI based creds as part of this task as well?

Jan 12 2021, 10:07 PM · SRE, SRE-Access-Requests

Jan 5 2021

fkaelin added a comment to T184744: Improve access to Commons image data for research and development.

Hi, revisiting this subject! With T220081 the swift cluster is reachable from analytics, does this allow us to proceed with one or both of the options described?

Jan 5 2021, 9:32 PM · User-ArielGlenn

Nov 20 2020

fkaelin created T268365: Kerberos identity for fkaelin.
Nov 20 2020, 6:45 PM · Analytics

Nov 18 2020

fkaelin closed T267817: Requesting access to analytics-privatedata-users and wmf LDAP for fkaelin as Resolved.

Thanks for the explanation, it makes sense now.

Nov 18 2020, 8:57 PM · SRE, SRE-Access-Requests
fkaelin reopened T267817: Requesting access to analytics-privatedata-users and wmf LDAP for fkaelin as "Open".

Thanks for the assistance! I am trying to connect, and key seems to have propagated but I see the following error

$ ssh -v bast1002.eqiad.wmnet
[snip]
debug1: Next authentication method: publickey
debug1: Offering public key: /home/fab/.ssh/wmf_prod ED25519 SHA256:iIFh8ZfJOewuqKKZgStkfmPejgsYEgZC0a9FutV860M explicit agent
debug1: Server accepts key: /home/fab/.ssh/wmf_prod ED25519 SHA256:iIFh8ZfJOewuqKKZgStkfmPejgsYEgZC0a9FutV860M explicit agent
debug1: Authentication succeeded (publickey).
Authenticated to bast1002.wikimedia.org ([208.80.154.86]:22).
debug1: channel_connect_stdio_fwd bast1002.eqiad.wmnet:22
debug1: channel 0: new [stdio-forward]
debug1: getpeername failed: Bad file descriptor
debug1: Requesting no-more-sessions@openssh.com
debug1: Entering interactive session.
debug1: pledge: network
debug1: client_input_global_request: rtype hostkeys-00@openssh.com want_reply 0
channel 0: open failed: administratively prohibited: open failed
stdio forwarding failed
kex_exchange_identification: Connection closed by remote host
Nov 18 2020, 6:25 PM · SRE, SRE-Access-Requests

Nov 16 2020

fkaelin added a comment to T267817: Requesting access to analytics-privatedata-users and wmf LDAP for fkaelin.

Also, I noticed that there is an previous outdated entry for me in that yaml file. https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/admin/data/data.yaml$1733

Nov 16 2020, 2:25 PM · SRE, SRE-Access-Requests
fkaelin added a comment to T267817: Requesting access to analytics-privatedata-users and wmf LDAP for fkaelin.

Thanks. I did create a separate task for the analytics-privatedata-users group, which seemingly wasn't necessary. https://phabricator.wikimedia.org/T267816

Nov 16 2020, 2:17 PM · SRE, SRE-Access-Requests

Nov 13 2020

fkaelin updated subscribers of T267817: Requesting access to analytics-privatedata-users and wmf LDAP for fkaelin.

@Ottomata and @leila, I think your approvals are needed for this review, thank you!

Nov 13 2020, 1:46 AM · SRE, SRE-Access-Requests
fkaelin updated subscribers of T267816: Requesting access to analytics-privatedata-users for fkaelin.

@Ottomata and @leila, I think your approvals are needed for this review, thank you!

Nov 13 2020, 1:46 AM · SRE, SRE-Access-Requests
fkaelin created T267817: Requesting access to analytics-privatedata-users and wmf LDAP for fkaelin.
Nov 13 2020, 1:35 AM · SRE, SRE-Access-Requests
fkaelin created T267816: Requesting access to analytics-privatedata-users for fkaelin.
Nov 13 2020, 1:35 AM · SRE, SRE-Access-Requests