Page MenuHomePhabricator

RBrounley_WMF (Ryan)
Product Manager on Okapi

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
May 29 2020, 11:52 PM (39 w, 5 h)
Availability
Available
LDAP User
Unknown
MediaWiki User
RBrounley (WMF) [ Global Accounts ]

Recent Activity

Thu, Feb 11

RBrounley_WMF added a comment to T182351: Make HTML dumps available.

Hi @fkaelin - it's nice to meet you, sounds like there are a lot of overlaps in your thinking and ours. On Okapi, in general, we are working on some things that may be relevant as well as others that may not be.

Thu, Feb 11, 7:34 PM · Research, Analytics-Radar, Datasets-Archiving

Tue, Feb 2

RBrounley_WMF added a comment to T273585: Host OKAPI HTML dumps on public-facing labstore servers.

Thanks for setting this up Ariel, I am working with my team to get a better idea of timeline as well as total file sizes.

Tue, Feb 2, 6:33 PM · Datasets-Archiving, Okapi, cloud-services-team (Kanban), Dumps-Generation

Fri, Jan 29

RBrounley_WMF moved T263087: Okapi: Components for UI Work from In Progress to Done on the Okapi board.
Fri, Jan 29, 11:01 PM · Okapi
RBrounley_WMF moved T263885: Okapi: Fresher -> Safer Spectrum, please review!! from In Progress to Backlog on the Okapi board.
Fri, Jan 29, 11:01 PM · Okapi

Dec 8 2020

RBrounley_WMF created T269686: Create three Okapi sub-domains (okapi*.wikimedia.org).
Dec 8 2020, 3:44 PM · DNS, Okapi, SRE, Traffic

Oct 1 2020

RBrounley_WMF added a comment to T263910: ORES redis: max number of clients reached....

We’re working to patch up our end, switching to streams with querying the Ores api when streams fail. Sorry will update soon

Oct 1 2020, 9:32 PM · User-Ladsgroup, Sustainability (Incident Followup), Patch-For-Review, Okapi, serviceops, SRE, ORES, Machine-Learning-Team

Sep 30 2020

RBrounley_WMF added a comment to T263910: ORES redis: max number of clients reached....

Hey, connecting with folks on ORES team around this today - sorry we were given advice that the ORES stream may have some data integrity issues when implementing since we need the whole corpus. Pulling in our engineers to the conversation to elaborate. We'll dig back into the streams and discuss today on the call to move things over.

Sep 30 2020, 4:40 PM · User-Ladsgroup, Sustainability (Incident Followup), Patch-For-Review, Okapi, serviceops, SRE, ORES, Machine-Learning-Team
RBrounley_WMF moved T263090: Okapi: Build Wiki Export from In Progress to Done on the Okapi board.
Sep 30 2020, 3:52 PM · Okapi
RBrounley_WMF moved T263084: Okapi Sprint Sep16 - Sep30 from In Progress to Done on the Okapi board.
Sep 30 2020, 3:52 PM · Okapi

Sep 28 2020

RBrounley_WMF added a comment to T263885: Okapi: Fresher -> Safer Spectrum, please review!!.

Does this "trust" expire?

For example it's certainly possible an anon user could be making good edits "today", but then tomorrow, next week, month or whatever time later are making bad edits (change of user using the IP etc)

No right now we haven't built in hindsight beyond the check to see if a user has been blocked whilst the edit was in a holding pattern.

Sep 28 2020, 8:57 PM · Okapi
RBrounley_WMF added a comment to T263885: Okapi: Fresher -> Safer Spectrum, please review!!.

One thing maybe worth considering is replacing the editcount requirement with something closer to how the autoconfirmed group is defined (a combination of edit count and account age). Another might be to check if other recent edits of the user were reverted (revert detection has landed recently in core: T152434).

Sep 28 2020, 8:54 PM · Okapi

Sep 25 2020

RBrounley_WMF updated subscribers of T263016: Add documentation what Okapi does, where to find further info, where to find a code base.
Sep 25 2020, 8:32 PM · Documentation, Okapi
RBrounley_WMF moved T263090: Okapi: Build Wiki Export from Backlog to In Progress on the Okapi board.
Sep 25 2020, 8:31 PM · Okapi
RBrounley_WMF moved T263087: Okapi: Components for UI Work from Backlog to In Progress on the Okapi board.
Sep 25 2020, 8:31 PM · Okapi
RBrounley_WMF updated subscribers of T263885: Okapi: Fresher -> Safer Spectrum, please review!!.
Sep 25 2020, 8:30 PM · Okapi
RBrounley_WMF updated the task description for T263885: Okapi: Fresher -> Safer Spectrum, please review!!.
Sep 25 2020, 8:27 PM · Okapi
RBrounley_WMF added a comment to T263885: Okapi: Fresher -> Safer Spectrum, please review!!.

Feel free to add more subscribers, we want more opinions on this!

Sep 25 2020, 8:26 PM · Okapi
RBrounley_WMF updated the task description for T263885: Okapi: Fresher -> Safer Spectrum, please review!!.
Sep 25 2020, 8:25 PM · Okapi
RBrounley_WMF updated the task description for T263885: Okapi: Fresher -> Safer Spectrum, please review!!.
Sep 25 2020, 8:25 PM · Okapi
RBrounley_WMF updated subscribers of T263885: Okapi: Fresher -> Safer Spectrum, please review!!.
Sep 25 2020, 8:24 PM · Okapi
RBrounley_WMF updated the task description for T263885: Okapi: Fresher -> Safer Spectrum, please review!!.
Sep 25 2020, 8:22 PM · Okapi
RBrounley_WMF updated the task description for T263885: Okapi: Fresher -> Safer Spectrum, please review!!.
Sep 25 2020, 8:18 PM · Okapi
RBrounley_WMF created T263885: Okapi: Fresher -> Safer Spectrum, please review!!.
Sep 25 2020, 8:16 PM · Okapi
RBrounley_WMF added a comment to T262479: Oversighted Revisions on HTML Exports.

Ok heard back from Legal on this - response below from Tony S:

Sep 25 2020, 8:05 PM · Okapi

Sep 24 2020

RBrounley_WMF added a comment to T263090: Okapi: Build Wiki Export.

Meaning English Wikipedia rather than eng.wikipedia.org?

Sep 24 2020, 12:18 AM · Okapi

Sep 17 2020

RBrounley_WMF updated RBrounley_WMF.
Sep 17 2020, 8:52 PM
RBrounley_WMF added a comment to T262479: Oversighted Revisions on HTML Exports.

So we are really focused on the "best last revision" of articles across the wikis and not adding historical revisions into the exports (dumps). Thus, whatever version we have of a revision should not include nor ever include a revision that is sensitive. If we were dumping historical dumps I think a record would make sense, or if we are providing historical dumps - which as of now, we aren't - just download a "non-sensitive" view and come back later to do it again. Some of the past exports could live on machines though, which could potentially have something that was oversighted after we compiled the dump...

Sep 17 2020, 8:43 PM · Okapi
RBrounley_WMF added a comment to T262479: Oversighted Revisions on HTML Exports.

Talked with Tony and he's double checking, but since it's just exposing an event that happened - it might be a bigger liability to have these "bad revisions" live in our end exports (dumps) without noting they were suppressed. He's checking with a few folks and I'll sync back here with his findings.

Sep 17 2020, 7:57 PM · Okapi
RBrounley_WMF added a comment to T262479: Oversighted Revisions on HTML Exports.

Awesome! Thanks @Ottomata -- checking with legal on this.

Sep 17 2020, 4:37 PM · Okapi
RBrounley_WMF created T263090: Okapi: Build Wiki Export.
Sep 17 2020, 1:44 AM · Okapi
RBrounley_WMF updated the task description for T263087: Okapi: Components for UI Work.
Sep 17 2020, 1:12 AM · Okapi
RBrounley_WMF created T263087: Okapi: Components for UI Work.
Sep 17 2020, 1:11 AM · Okapi
RBrounley_WMF added a subtask for T263084: Okapi Sprint Sep16 - Sep30: T262479: Oversighted Revisions on HTML Exports.
Sep 17 2020, 1:03 AM · Okapi
RBrounley_WMF added a parent task for T262479: Oversighted Revisions on HTML Exports: T263084: Okapi Sprint Sep16 - Sep30.
Sep 17 2020, 1:03 AM · Okapi
RBrounley_WMF created T263084: Okapi Sprint Sep16 - Sep30.
Sep 17 2020, 1:02 AM · Okapi
RBrounley_WMF moved T262476: Okapi Sprint Sep02 - Sep16 from In Progress to Done on the Okapi board.
Sep 17 2020, 12:22 AM · Okapi

Sep 9 2020

RBrounley_WMF added a comment to T257480: Sample HTML Dumps - Request for feedback.

Split this oversighted revision conversation into T262479 to continue the conversation.

Sep 9 2020, 9:43 PM · Analytics-Radar, Dumps-Generation
RBrounley_WMF created T262479: Oversighted Revisions on HTML Exports.
Sep 9 2020, 9:40 PM · Okapi
RBrounley_WMF added a comment to T254275: HTML Dumps - June/2020.

Hey all - I'm starting to post our sprint overviews here to improve Okapi's dialogue on phabricator. I will add tickets in the Okapi board, feel free to subscribe. First one is at T262476.

Sep 9 2020, 9:32 PM · Okapi, Analytics-Radar, Platform Engineering, Dumps-Generation
RBrounley_WMF moved T262476: Okapi Sprint Sep02 - Sep16 from Backlog to In Progress on the Okapi board.
Sep 9 2020, 9:14 PM · Okapi
RBrounley_WMF created T262476: Okapi Sprint Sep02 - Sep16.
Sep 9 2020, 9:13 PM · Okapi

Aug 27 2020

RBrounley_WMF added a comment to T261324: LDAP access to wmf for Ryan Brounley.

Yay, thank you!

Aug 27 2020, 4:59 PM · SRE, LDAP-Access-Requests

Aug 26 2020

RBrounley_WMF created T261324: LDAP access to wmf for Ryan Brounley.
Aug 26 2020, 4:02 PM · SRE, LDAP-Access-Requests

Aug 18 2020

RBrounley_WMF added a comment to T254275: HTML Dumps - June/2020.

Thanks all - added!

Aug 18 2020, 4:42 PM · Okapi, Analytics-Radar, Platform Engineering, Dumps-Generation

Aug 13 2020

RBrounley_WMF added a project to T254275: HTML Dumps - June/2020: Okapi.
Aug 13 2020, 9:42 PM · Okapi, Analytics-Radar, Platform Engineering, Dumps-Generation
RBrounley_WMF added a member for Okapi: MNadrofsky.
Aug 13 2020, 9:40 PM
RBrounley_WMF added a member for Okapi: R.zhurba.
Aug 13 2020, 9:40 PM
RBrounley_WMF created T260385: Create Okapi Project Page.
Aug 13 2020, 7:22 PM · Project-Admins

Jul 14 2020

RBrounley_WMF added a comment to T257480: Sample HTML Dumps - Request for feedback.

English Wiki has 15m articles (I believe)
a full enwiki dump is clocking in at 944gb or something insanely large

I'm pretty sure a large part of this issue is based on how you handle redirects really and not compression format. Enwiki has 9.3M redirects. Right now the HTML of an article is fully reproduced for a redirect (i.e. not just redirect to [[article]] but the full-text of that article that the reader would see). English Wikipedia has just over 6M articles in the classic sense, so reproducing the full article text in the redirects would probably be what explodes it to 15M full articles and a very large file (as opposed to 6M full articles and ~9M very tiny files that just indicate that they are redirects).

Jul 14 2020, 2:25 PM · Analytics-Radar, Dumps-Generation

Jul 10 2020

RBrounley_WMF added a comment to T257480: Sample HTML Dumps - Request for feedback.

Couple quick thoughts about the format: it would be good for the articles to be written into subdirectories for the larger wikis, so that we don't have hundreds of thousands of files (or millions!) in one directory. See https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/DumpHTML/+/refs/heads/master/dumpHTML.inc#477 for way back when these were produced by extension (in 2008), I think they used three levels of subdirs as the default back then but this could be adjustable depending on the size of the wiki.

Although the large tech partners that will consume these dumps will probably be fine with one large gz tarball, we want these to be easily usable by volunteers and researchers too, so I'd consider providing them also in a format that makes parallel processing of the dumps possible, such as bz2 multistream format with 100 or 1000 pages per 'stream', maybe without any tarring up at all. It might be nice to have a close html tag too, the sample articles I looked at didn't have it.

Jul 10 2020, 3:31 PM · Analytics-Radar, Dumps-Generation

Jul 8 2020

RBrounley_WMF updated subscribers of T257480: Sample HTML Dumps - Request for feedback.
Jul 8 2020, 6:25 PM · Analytics-Radar, Dumps-Generation
RBrounley_WMF added a subtask for T254275: HTML Dumps - June/2020: T257480: Sample HTML Dumps - Request for feedback.
Jul 8 2020, 4:50 PM · Okapi, Analytics-Radar, Platform Engineering, Dumps-Generation
RBrounley_WMF added a parent task for T257480: Sample HTML Dumps - Request for feedback: T254275: HTML Dumps - June/2020.
Jul 8 2020, 4:50 PM · Analytics-Radar, Dumps-Generation
RBrounley_WMF created T257480: Sample HTML Dumps - Request for feedback.
Jul 8 2020, 4:50 PM · Analytics-Radar, Dumps-Generation

Jul 7 2020

RBrounley_WMF closed T255524: HTML Dumps 429 error on RESTBase endpoints, a subtask of T254275: HTML Dumps - June/2020, as Resolved.
Jul 7 2020, 12:59 AM · Okapi, Analytics-Radar, Platform Engineering, Dumps-Generation
RBrounley_WMF closed T255524: HTML Dumps 429 error on RESTBase endpoints as Resolved.
Jul 7 2020, 12:58 AM · Traffic, SRE

Jun 24 2020

RBrounley_WMF added a comment to T254275: HTML Dumps - June/2020.

Yep, sorry about the delay here @Sj. @Kelson Interesting, learning about this is interesting. I’d love to learn more about your work and how we might best collaborate with each other and fill some of the technical-gaps. I'll ping you off-phab with some questions once I've done some more reading, and if you're available earlier than your (great-sounding) techtalk I'd love to have a quick video-chat meeting with you. And thank you for your patience whilst I'm digging into the many years of history here!

Jun 24 2020, 5:40 PM · Okapi, Analytics-Radar, Platform Engineering, Dumps-Generation

Jun 16 2020

RBrounley_WMF added a comment to T254275: HTML Dumps - June/2020.

Great, thanks @CDanis - cited you here on the sub-task related to the 429 errors we're getting. https://phabricator.wikimedia.org/T255524

Jun 16 2020, 3:16 AM · Okapi, Analytics-Radar, Platform Engineering, Dumps-Generation
RBrounley_WMF added a subtask for T254275: HTML Dumps - June/2020: T255524: HTML Dumps 429 error on RESTBase endpoints.
Jun 16 2020, 3:15 AM · Okapi, Analytics-Radar, Platform Engineering, Dumps-Generation
RBrounley_WMF added a parent task for T255524: HTML Dumps 429 error on RESTBase endpoints: T254275: HTML Dumps - June/2020.
Jun 16 2020, 3:15 AM · Traffic, SRE
RBrounley_WMF created T255524: HTML Dumps 429 error on RESTBase endpoints.
Jun 16 2020, 3:14 AM · Traffic, SRE

Jun 15 2020

RBrounley_WMF added a comment to T254275: HTML Dumps - June/2020.

@ArielGlenn - oh great, yeah I misunderstood that. So the first run is obviously expensive on RESTBase to grab all of the pages but we're thinking about listening to Kafka through this endpoint below or something similar. Then just changing it via an upsert type approach using RESTBase only on the changes... @Ottomata, @Milimetric - want to make sure I have this right from our call. For now, we're running these bi-weekly and still designing the second dumps out haha.

https://stream.wikimedia.org/?doc#/Streams/get_v2_stream_recentchange
Jun 15 2020, 10:23 PM · Okapi, Analytics-Radar, Platform Engineering, Dumps-Generation
RBrounley_WMF updated subscribers of T254275: HTML Dumps - June/2020.

Hey all -

Jun 15 2020, 10:09 PM · Okapi, Analytics-Radar, Platform Engineering, Dumps-Generation

Jun 2 2020

RBrounley_WMF created T254275: HTML Dumps - June/2020.
Jun 2 2020, 7:27 PM · Okapi, Analytics-Radar, Platform Engineering, Dumps-Generation