Added @Protsack.stephan and @HShaikh who are more qualified to figure out what's going on here :).
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Apr 25 2023
Mar 21 2023
Hello all - apologies for delay. Back from holiday today.
Feb 15 2023
Hello - good news here! @Kelson and I met last week and discussed the opportunity to move some of the MWOffliner systems to Wikimedia Enterprise HTML dumps. On the onset, it seems feasible...but we need to discuss and dive deeper before having a clear answer or timeline. It seems the initial two areas of concern are our use of Parsoid web html instead of mobile html and information on the namespaces we cover.
Dec 8 2022
Approved. (if needed)
Nov 11 2022
Aug 22 2022
A nice to have @Marcelo.castillo would be to be able to do this same process easily in the future. We imagine this type of testing will become much more common as credibility signals "grows" and designing this to be repeatable would be helpful.
Jul 18 2022
Jul 13 2022
Sure - I'm not actually that sure on the size, so fair point. Some of my assumptions here came from @ArielGlenn on their past experiences working on those dumps (and since they're on the ticket, I'm tagging them). Do we have an idea of # of HTML entities in commons?
I like it - flagging with our team.
Hey @Mitar, it is in the future plans and will be a part of some of the work we have coming around "credibility signals" particularly. I'll follow up as we get to what this will look like timeline wise in the dumps themselves.
Hey @Mitar sorry for late response. We are working through some process oriented work to make sure we triage this type of stuff more quickly.
Jun 27 2022
Jun 17 2022
Jun 9 2022
Jun 7 2022
Hey @Mitar - open source code is published here + we'll be releasing a trial feature later this month that will allow you to pull the public endpoint. Thanks for your patience here.
May 19 2022
Yes sorry @BPirkle - I wrote this ticket much more prescriptively of the solution than I should have. The bug report on our side can be found here on the Wikimedia Enterprise phab board. We are hardcoding "Article" since its returned empty and at the time confused us when in fact it is "(Main)", oversight on my part.
May 18 2022
Hi - yes thanks for the clarification. On our side, we'll map canonical "" -> "Main". Do you know if it is called something other than "Main" across language projects (beyond the language translation) - is that queryable / accessible somewhere in documentation?
May 11 2022
May 5 2022
Hey @Nemo_bis - thanks for this - helpful feedback. Further investigating...we're using this Action API endpoint which appears to leave namespace 0 out which I believe we are hardcoding as Article...let me check with some teams about that and get back to this, but we should definitely rename according to language.
Apr 25 2022
Apr 8 2022
Thanks for adding this @jberkel - we'll look into it!
Mar 14 2022
This makes sense. Thanks for sharing that ticket, it makes sense to me. We'll change them all to http:// , we are intending to use them as identifiers. Thanks @Addshore for catching this, this wasn't on my radar.
Mar 9 2022
Hey @Addshore - Thanks for flagging - we actually use https for our output, could you elaborate on the main differences actually, we could change this.
Feb 22 2022
Feb 8 2022
Feb 7 2022
Added more context to this description - if any questions, feel free to ping me.
Feb 2 2022
Wooot! Was this ticket exactly one year?
Jan 20 2022
Jan 12 2022
This is interesting - thanks for posting interest in this. I think this makes sense and I'm glad you're finding the JSON schema useful! It might take a little bit to plan this sort of thing out but I agree having this available does make sense especially sitting alongside the dumps.
Hey @Mitar - thanks for making this ticket and appreciate your response on the wikitech-l thread. Super helpful feedback.
Dec 21 2021
Nov 25 2021
Nov 1 2021
Oct 18 2021
Amazing :), excited to see how it does!
Sep 1 2021
Aug 12 2021
@Protsack.stephan @Sashah2 - where are we on the md5 hashes?
Aug 5 2021
Hah meant to add @Protsack.stephan as a subscriber not owner
Jul 5 2021
Jun 23 2021
Update here - we are onboarding folks at the current moment - DevOps focused Sr Software Engineers. Some of which will take pick this ticket up, moving this to the backlog for now.
Jun 22 2021
@Sashah2 - checking in here, were you able to allow-list these IPs? After that we should be good to go?
Jun 21 2021
Jun 15 2021
@Ladsgroup @Tgr - cool ticket, thanks for flagging this. Been following peripherally.
Jun 14 2021
Nice, sounds good. @ArielGlenn - does that feel like enough space on your end?
Jun 9 2021
Jun 8 2021
@Protsack.stephan, maybe we can repurpose this ticket with our comments from today's standup? I'll tag relevant people.
Jun 3 2021
No problem, thanks @BBlack!
Checking in here @Eugene.chernov, any blockers?
Checked in with @nskaggs this morning. Looks like we are in good shape to get this moving.