Page MenuHomePhabricator

Shilad (Shilad Sen)
User

Projects

User does not belong to any projects.

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Apr 23 2015, 3:27 AM (473 w, 2 d)
Availability
Available
LDAP User
Shilad Sen
MediaWiki User
Shilad [ Global Accounts ]

Recent Activity

Oct 22 2020

Shilad added a comment to T264269: Check home/HDFS leftovers of shiladsen.

You can trash everything! Sorry for the delayed response.

Oct 22 2020, 4:26 PM · Analytics

Nov 29 2019

Shilad added a comment to T174796: Productionize navigation vectors.

Hi @Aklapper, Thanks for asking! I think this got stuck in code review. I'm happy to step in and move it forward once folks have time to code review it.

Nov 29 2019, 3:39 PM · Data-Engineering-Radar, Analytics-Clusters, Analytics-Radar, Patch-For-Review

Mar 18 2019

Shilad added a comment to T217922: Migrate Wikilabels from labsdb1004 to clouddb1002.

@Shilad ^ any opinions on this database?

Mar 18 2019, 2:14 PM · Patch-For-Review, Machine-Learning-Team (Active Tasks), Wikilabels, Cloud-VPS

Feb 15 2019

Shilad added a comment to T216226: GPU upgrade for stat1005.

I've been lurking on this issue, and just wanted chime in with one bit of information I learned through experience. I'm not sure what model of Dell server you are using, but there are some subtle issues with fitting GPUs in a 2U server (and maybe others). In my case, a Nvidia GeForce card (which I know you aren't considering) technically fit in my 2U server, but the power location for the card meant it extruded beyond the enclosure.

Feb 15 2019, 12:18 PM · Analytics, hardware-requests, SRE

Jan 9 2018

Shilad added a comment to T148843: Remove computational bottlenecks in stats machine via adding a GPU that can be used to train ML models.

That makes sense. There are plenty of other avenues I can explore without a GPU.

Jan 9 2018, 7:55 PM · Analytics-Radar, Patch-For-Review, User-Elukey, SRE, Research-management
Shilad added a comment to T148843: Remove computational bottlenecks in stats machine via adding a GPU that can be used to train ML models.

I have availability this week and next, but I think @Ottomata is right. It will be tough to do this work if it's attached to a production machine. How feasible is it to move it to one?

Jan 9 2018, 5:31 PM · Analytics-Radar, Patch-For-Review, User-Elukey, SRE, Research-management

Jan 8 2018

Shilad added a comment to T148843: Remove computational bottlenecks in stats machine via adding a GPU that can be used to train ML models.

@dr0ptp4kt Thanks for chiming in, and I hope the leave was restful!

Jan 8 2018, 8:42 PM · Analytics-Radar, Patch-For-Review, User-Elukey, SRE, Research-management

Jan 6 2018

Shilad added a comment to T148843: Remove computational bottlenecks in stats machine via adding a GPU that can be used to train ML models.

Hi all, I’m in a position to put a GPU to use now and am happy to help if I can. I want to make sure I understand where things stand: is the issue with the encumbered kernel driver or the open cl library? If it’s the former the situation is much more difficult.

Jan 6 2018, 4:23 PM · Analytics-Radar, Patch-For-Review, User-Elukey, SRE, Research-management

Nov 28 2017

Shilad reopened T161554: Provide large disk space to WikiBrain for memory-mapped file, a subtask of T76375: [DO NOT USE] New Labs project requests (tracking) [superseded by #cloud-vps-project-requests], as Open.
Nov 28 2017, 3:50 AM · User-bd808, Tracking-Neverending, Cloud-Services
Shilad reopened T161554: Provide large disk space to WikiBrain for memory-mapped file as "Open".

Hi all, I am reopening this. Hooray :)

Nov 28 2017, 3:50 AM · Cloud-VPS (Project-requests), artificial-intelligence
Shilad updated the task description for T174796: Productionize navigation vectors.
Nov 28 2017, 3:47 AM · Data-Engineering-Radar, Analytics-Clusters, Analytics-Radar, Patch-For-Review

Oct 25 2017

Shilad added a comment to T161554: Provide large disk space to WikiBrain for memory-mapped file.

Yes! I think this is all set. I'm currently working on T174796 to create data needed for these instances.

Oct 25 2017, 2:25 AM · Cloud-VPS (Project-requests), artificial-intelligence
Shilad updated the task description for T174796: Productionize navigation vectors.
Oct 25 2017, 2:25 AM · Data-Engineering-Radar, Analytics-Clusters, Analytics-Radar, Patch-For-Review

Sep 13 2017

Shilad added a subtask for T158972: Spark job to produce clickstream dataset : T174796: Productionize navigation vectors.
Sep 13 2017, 4:51 AM · Analytics-Kanban, Research
Shilad added a parent task for T174796: Productionize navigation vectors: T158972: Spark job to produce clickstream dataset .
Sep 13 2017, 4:51 AM · Data-Engineering-Radar, Analytics-Clusters, Analytics-Radar, Patch-For-Review
Shilad updated the task description for T174796: Productionize navigation vectors.
Sep 13 2017, 4:50 AM · Data-Engineering-Radar, Analytics-Clusters, Analytics-Radar, Patch-For-Review

Sep 12 2017

Shilad added a comment to T171988: NDA, MOU and LDAP (analytics cluster) for Shilad Sen.

Indeed, I now have Yarn access! Thanks @elukey!

Sep 12 2017, 4:06 PM · Analytics-Radar, Patch-For-Review, SRE, SRE-Access-Requests, Research-collaborations, Research

Sep 2 2017

Shilad added a comment to T158972: Spark job to produce clickstream dataset .

@JAllemandou, thanks for the pointers! I think there's a little confusion on this, though. I volunteered to productionize Navigation Vectors (see T174796). I'm happy to also work on clickstream once this is done, but I think it will take several months to wrap up Navigation Vectors because of my teaching commitments.

Sep 2 2017, 11:10 AM · Analytics-Kanban, Research

Aug 28 2017

Shilad closed T171988: NDA, MOU and LDAP (analytics cluster) for Shilad Sen as Resolved.

Everything looks good now! Thanks for your quick help, @Ottomata! I'm going to close this ticket and get to work :)

Aug 28 2017, 8:42 PM · Analytics-Radar, Patch-For-Review, SRE, SRE-Access-Requests, Research-collaborations, Research
Shilad closed T171988: NDA, MOU and LDAP (analytics cluster) for Shilad Sen, a subtask of T158972: Spark job to produce clickstream dataset , as Resolved.
Aug 28 2017, 8:42 PM · Analytics-Kanban, Research

Aug 25 2017

Shilad added a comment to T171988: NDA, MOU and LDAP (analytics cluster) for Shilad Sen.

One follow-up: The Navigation Vectors project uses Hive queries, so I think I also need the analytics-privatedata-users role. Is this correct? If so, should I start a new ticket, or can that also be added to this ticket?

Aug 25 2017, 9:06 AM · Analytics-Radar, Patch-For-Review, SRE, SRE-Access-Requests, Research-collaborations, Research

Aug 23 2017

Shilad added a comment to T171988: NDA, MOU and LDAP (analytics cluster) for Shilad Sen.

Yes! I updated Help:SSH to indicate that DSA is being phased out.

Aug 23 2017, 8:34 PM · Analytics-Radar, Patch-For-Review, SRE, SRE-Access-Requests, Research-collaborations, Research
Shilad added a comment to T171988: NDA, MOU and LDAP (analytics cluster) for Shilad Sen.

@herron I am having some trouble logging in. I can get to bastion but not beyond. I'm suspicious that the key I gave you is a DSS key, not a DSA key. I requested a DSA key, but the public key starts with ssh-dss instead.

Aug 23 2017, 3:25 AM · Analytics-Radar, Patch-For-Review, SRE, SRE-Access-Requests, Research-collaborations, Research

Aug 17 2017

Shilad reassigned T171988: NDA, MOU and LDAP (analytics cluster) for Shilad Sen from Shilad to RobH.
Aug 17 2017, 9:37 PM · Analytics-Radar, Patch-For-Review, SRE, SRE-Access-Requests, Research-collaborations, Research
Shilad added a comment to T171988: NDA, MOU and LDAP (analytics cluster) for Shilad Sen.

@RobH, I've signed the L3. My wikitech username is "Shilad Sen" and my preferred shell username is shiladsen. I created a new public SSH key for the production environment, and it is below I think that should be everything!

Aug 17 2017, 9:36 PM · Analytics-Radar, Patch-For-Review, SRE, SRE-Access-Requests, Research-collaborations, Research

Jul 12 2017

Shilad added a comment to T161554: Provide large disk space to WikiBrain for memory-mapped file.

Also, I'll probably be using Docker images (we have a WikiBrain docker image). I presume that it's better to run the Docker image in a VM rather than on the host, but please let me know if that's not correct.

Jul 12 2017, 5:42 AM · Cloud-VPS (Project-requests), artificial-intelligence
Shilad updated the task description for T161554: Provide large disk space to WikiBrain for memory-mapped file.
Jul 12 2017, 5:40 AM · Cloud-VPS (Project-requests), artificial-intelligence
Shilad updated the task description for T161554: Provide large disk space to WikiBrain for memory-mapped file.
Jul 12 2017, 5:40 AM · Cloud-VPS (Project-requests), artificial-intelligence
Shilad added a comment to T161554: Provide large disk space to WikiBrain for memory-mapped file.

Thanks @Halfak and @Andrew This is exciting!

Jul 12 2017, 5:33 AM · Cloud-VPS (Project-requests), artificial-intelligence

Apr 25 2017

Shilad added a comment to T163788: Implement clickstream & navigation vectors as a regular job.

I am happy to help with engineering on this if we can find a way to make that work. I've set up navigation-based word2vec pipelines in similar environments (PySpark, Oozie, etc.) in the past.

Apr 25 2017, 3:01 PM · Analytics

Apr 24 2017

Shilad added a comment to T161554: Provide large disk space to WikiBrain for memory-mapped file.

Great! 24GB of memory and 4 cores would be great if that works for you.

Apr 24 2017, 5:30 AM · Cloud-VPS (Project-requests), artificial-intelligence

Apr 21 2017

Shilad added a comment to T161554: Provide large disk space to WikiBrain for memory-mapped file.

Good questions! The big files are statistical models. So they take a while to build (a day or two), but they can be easily recreated. I think your suggestion of swapping the VMs over time seems reasonable. My only thought is if we could have a little more wiggle room... perhaps 300GB.. that would substantially reduce the rate at which we had to turn over the images.

Apr 21 2017, 3:21 PM · Cloud-VPS (Project-requests), artificial-intelligence
Shilad added a comment to T161554: Provide large disk space to WikiBrain for memory-mapped file.

I think Aaron was saying that although 200GB would probably work right now, it would't hold Wikipedia for very long. 500GB would definitely last for 5 years. Somewhere in between those sizes would work for a few years. Two of the large stoarage VMs would be plenty initially.

Apr 21 2017, 4:24 AM · Cloud-VPS (Project-requests), artificial-intelligence

Apr 20 2017

Shilad added a comment to T161554: Provide large disk space to WikiBrain for memory-mapped file.

Just to follow up on this. Aaron's estimates are pretty accurate. The disk cached data structures require about 200GB for larger language editions right now. We would likely expand to 500GB over time (or if we require "more advanced" WikiBrain features). Is this possible?

Apr 20 2017, 4:20 AM · Cloud-VPS (Project-requests), artificial-intelligence

Jan 26 2017

Shilad added a comment to T155853: Article similarity scorer.

I have spent quite a bit of time on this over the past few years. I do have a service that I could make available as an endpoint. HOWEVER, from what I've seen in my projects a much better approach is combining the work of Ellery Wulczyn on navigation vectors (https://meta.wikimedia.org/wiki/Research:Wikipedia_Navigation_Vectors) with the "standard" content-based approaches from Wikipedia.

Jan 26 2017, 8:55 PM · Technical-Tool-Request, artificial-intelligence

Apr 28 2015

Shilad added a comment to T96950: WikiBrain.

Perfect. Thanks!

Apr 28 2015, 5:51 PM · Cloud-Services
Shilad added a comment to T96950: WikiBrain.

Hello! I believe I listed the wrong Wikitech username. It should be "Shilad Sen" instead of just "Shilad". Are you able to change this? Sorry for the mistake!

Apr 28 2015, 5:11 PM · Cloud-Services

Apr 25 2015

Shilad added a comment to T96950: WikiBrain.

Than you!

Apr 25 2015, 7:41 AM · Cloud-Services

Apr 24 2015

Shilad added a comment to T96950: WikiBrain.

As Aaron said, this software will push the limits of your largest VM (16GB). I'd feel much safer if I knew our system was sandboxed and had no possibility of affecting other software running on tools.wmflabs.org.

Apr 24 2015, 11:01 AM · Cloud-Services

Apr 23 2015

Shilad created T96950: WikiBrain.
Apr 23 2015, 3:31 AM · Cloud-Services