Thu, Feb 11
Hi @fkaelin - it's nice to meet you, sounds like there are a lot of overlaps in your thinking and ours. On Okapi, in general, we are working on some things that may be relevant as well as others that may not be.
Tue, Feb 2
Thanks for setting this up Ariel, I am working with my team to get a better idea of timeline as well as total file sizes.
Fri, Jan 29
Dec 8 2020
Oct 1 2020
We’re working to patch up our end, switching to streams with querying the Ores api when streams fail. Sorry will update soon
Sep 30 2020
Hey, connecting with folks on ORES team around this today - sorry we were given advice that the ORES stream may have some data integrity issues when implementing since we need the whole corpus. Pulling in our engineers to the conversation to elaborate. We'll dig back into the streams and discuss today on the call to move things over.
Sep 28 2020
One thing maybe worth considering is replacing the editcount requirement with something closer to how the autoconfirmed group is defined (a combination of edit count and account age). Another might be to check if other recent edits of the user were reverted (revert detection has landed recently in core: T152434).
Sep 25 2020
Feel free to add more subscribers, we want more opinions on this!
Ok heard back from Legal on this - response below from Tony S:
Sep 24 2020
Meaning English Wikipedia rather than eng.wikipedia.org?
Sep 17 2020
So we are really focused on the "best last revision" of articles across the wikis and not adding historical revisions into the exports (dumps). Thus, whatever version we have of a revision should not include nor ever include a revision that is sensitive. If we were dumping historical dumps I think a record would make sense, or if we are providing historical dumps - which as of now, we aren't - just download a "non-sensitive" view and come back later to do it again. Some of the past exports could live on machines though, which could potentially have something that was oversighted after we compiled the dump...
Talked with Tony and he's double checking, but since it's just exposing an event that happened - it might be a bigger liability to have these "bad revisions" live in our end exports (dumps) without noting they were suppressed. He's checking with a few folks and I'll sync back here with his findings.
Awesome! Thanks @Ottomata -- checking with legal on this.
Sep 9 2020
Split this oversighted revision conversation into T262479 to continue the conversation.
Hey all - I'm starting to post our sprint overviews here to improve Okapi's dialogue on phabricator. I will add tickets in the Okapi board, feel free to subscribe. First one is at T262476.
Aug 27 2020
Yay, thank you!
Aug 26 2020
Aug 18 2020
Thanks all - added!
Aug 13 2020
Jul 14 2020
Jul 10 2020
Jul 8 2020
Jul 7 2020
Jun 24 2020
Yep, sorry about the delay here @Sj. @Kelson Interesting, learning about this is interesting. I’d love to learn more about your work and how we might best collaborate with each other and fill some of the technical-gaps. I'll ping you off-phab with some questions once I've done some more reading, and if you're available earlier than your (great-sounding) techtalk I'd love to have a quick video-chat meeting with you. And thank you for your patience whilst I'm digging into the many years of history here!
Jun 16 2020
Jun 15 2020
@ArielGlenn - oh great, yeah I misunderstood that. So the first run is obviously expensive on RESTBase to grab all of the pages but we're thinking about listening to Kafka through this endpoint below or something similar. Then just changing it via an upsert type approach using RESTBase only on the changes... @Ottomata, @Milimetric - want to make sure I have this right from our call. For now, we're running these bi-weekly and still designing the second dumps out haha.
Hey all -