Jan 20 2021
As @wikitrent's manager, I approve this access.
Dec 18 2020
Dec 16 2020
Dec 10 2020
Since we are working in Labs, I'd strongly recommend we look at standing up a Celery job queue to use for this and other queue needs.
Dec 9 2020
Nov 25 2020
I've patched T254988 which should fix this problem. The patch is here: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/TheWikipediaLibrary/+/643518
Nov 18 2020
Thanks @sbassett. I approve the expedited release and take ownership of any risk involved.
Nov 17 2020
Approved by me, Tran's manager, as well.
Oct 7 2020
@Jdforrester-WMF I'm curious if you have any knowledge about how accessing the MaxMind database works. If so, could you share some guidance?
Oct 5 2020
Sep 30 2020
Sep 29 2020
After some discussions with @eprodromou about an API, we've decided for ease and speed's sake to use the flatfile available on the application servers.
Sep 18 2020
Sep 15 2020
Sep 2 2020
@MarcoAurelio Thanks for the question. During some final QA and user testing in the second phase outline above there were some bugs and suggestions that seemed important to address before we get this in front of more users. Given the sensitivity of the feature and the data within the tool, we want to to be cautious with any potential issues. Combining that with a few weeks that didn't have train releases and the project is delayed as you've noticed.
Jul 27 2020
Jul 22 2020
After reading about this in various blogs and talking to some colleagues at other companies (both focused on privacy and surveillance), I'm convinced that MediaWiki should collect this data only when it is necessary.
Jul 15 2020
@Marostegui I just want to add that I apologize for being so flippant about the risks here. I was wrong. I appreciate your patience with explaining it to me.
@Marostegui Thanks for that perspective. I was more replying to bringing in even more people into the conversation without a specific question for them to answer.
Is there a specific piece of feedback we are waiting on? If it's just a general heads up, we should proceed as planned. This project can't wait another week on this topic if there's not something specifically being addressed.
Jun 30 2020
Jun 4 2020
Jun 3 2020
May 27 2020
May 20 2020
May 14 2020
We will put together the information about which queries need a review, what the scale of typical results is, what the outlier number of results might be, and what application features/details we are using the data for. We hope that will help the DBAs more efficiently review the queries and their impact.
May 13 2020
May 12 2020
I wanted to clarify that this is just in the experiment and investigation stage.
May 11 2020
I was thinking that we could add a format "flag" to the existing API endpoint for the data. Then, we can request HTML, JSON, or Wikitext. Or, maybe all three at the same time. Is there a place in the code where we could do something like that?
Thanks for knocking this out, @Samwilson.
May 7 2020
We've chosen to avoid changing the existing star implementation as much as possible. That is, we recognize that Desktop Refresh work is happening and we don't want to re-engineer in OOUI an interface which is likely to change. As for the design itself, I'm sure @Prtksxna can give some context about his thinking.
May 1 2020
@dbarratt Thanks for checking that. Interesting stuff.
Apr 30 2020
Ahhh. I misunderstood. Thanks for clarifying.
@Ladsgroup When we were talking about this earlier, @dbarratt made the point that we could generate the wikitext in the PHP code of the extension. We are already pulling the data that's in the table through the PHP code so it shouldn't be that much more effort to make it available.
Apr 28 2020
I think we agree that even a better error message is a path forward. I'll leave it to @ifried to determine the priority.
This is an implementation constraint, AFAIK. I vaguely remember the discussion when we built the tool. We decided that nested TSPANs represented a technical challenge greater than their prevalence in the wild.
Apr 27 2020
I'm inclined not to change this until it proves to be a problem. It seems straightforward to change these queries should they prove to be a problem in the future.
Apr 20 2020
Thanks for putting this together. Lots of stuff to consider.
This is good stuff. One thought I have is that maybe we separate this work into two categories: 1) the formatting issues and 2) the reliability and scalability issues.
I think the Partial Blocks project that @Niharika created is what's useful to our team. As she said, we do not own all of Blocks but we do have an expertise in it. We also have a desire to help others contribute to the patterns that we tried to introduce in the code.
We aren't getting nearly the number of unavailability reports from the monitoring as we used to.
Apr 15 2020
Any idea why this changed? I think I saw an email about some network changes in Toolforge. Is that what happened?
All those loops make my spidey-sense go off.
Apr 10 2020
Apr 8 2020
Hi @AS, thanks for validating that. I'm glad that there's not a bug we introduced but I'm bummed that this is a problem.
Thanks for this report @AS. We aren't aware of that CSS class actually existing in the code.
Apr 6 2020
Thanks to everyone for working to find a solution to this gap in environments we seem to have.
Apr 3 2020
@Nuria Our team is in a position of needing to test some new queries or changes to queries that we believe could potentially be costly. Given Grant and Toby's guidance to take special care about site stability risk, we need a place to test them. @Catrope suggested that we go this route as a safer, if not completely safe, place to prove the performance of these queries instead of production.
Apr 2 2020
I approve this access for these staff that report to me.
Apr 1 2020
@Mooeypoo If you create a new one with the format suggested above, I will provide approval.
Mar 26 2020
Thanks for summarizing that @dbarratt.
It seems to me that the core issue here is that we are in a world where the browsers are going to increasingly stop telling web sites so much about their users. This is on top of whatever data regulations may come into effect in various locales.
Mar 19 2020
Very cool. Thanks for all the research @kaldari!
Mar 18 2020
I'm happy to +2 this. Seems like we raised the issue about Python but feel that it's an acceptable approach here.
Given that we are trying to minimize risky changes, this feels like something we should wait on.
Wow. That's a big shift. I was actually going to update this task to say that with all the idiosyncrasies with the various platforms, we were going to focus on the front-end workflow experience.
Mar 11 2020
Thanks so much for doing this, @kaldari !
Mar 10 2020
I think the next steps here are to test the existing IA APIs for conversion and see what features they might make available to us.
I found this quote from Kovid on an old bug report about temp files: "temp files are guaranteed to be deleted only when calibre is shutdown cleanly."
Mar 4 2020
It seems clear even this early that we will still want to provide access to as many of these OCR tools as we can reliably maintain. They each have strengths and weaknesses.
We'd be more comfortable reviewing things in Gerrit and seeing the outcome of the testing, as you mention.
@Reedy You're right. That happened before I got here and I was not aware.
While the Anti-Harassment Team hasn't worked on Anti-Spoof before, it's possible we could help with code review once the patches are available in Gerrit.
Feb 29 2020
I think parsing the date string should be OK. Not sure how that might work with localization though. Maybe moment.js has some helpers for this? I think it's already available in MW?
Thanks @kaldari ! That's great.
Feb 26 2020
Feb 25 2020
That makes sense to me. It'll make it easier to estimate both (or reestimate if we choose) and make the acceptance criteria clearer.
I think the answer to that is the latter. It seems to me that the highlighting is informed by a reading of the results by the investigator. Those hunches or discernments would change for each investigation as the results would be different.
Excellent. Thanks for clarifying. This sounds like a reasonable expectation to me but I'll leave it to @Niharika to unpack all the use cases.
Thanks @Trizek for this report. Just to make sure I understand the situation, I have a few questions.
My opinion is that the highlighting should be as ephemeral as possible. It seems like that would be the easiest for the user to reason about. That is, if I have to remember that opening a new tab sometimes keeps highlights and sometimes doesn't depending on which link I click on, that's a lot of overhead for the user. Instead, just making it as simple as possible, if slightly less effective, seems like a less problematic path.
I approve these credentials for these staff.
Feb 24 2020
I'm worried that the complexities that keep cropping up around using this token method might overwhelm the value of using that particular method. What would it look like if we just used the POST as we'd previously described? I feel like these kinds of changes become much easier with the POST.
Feb 21 2020
I'm going to add some complexity here. @kaldari is connecting me with some folks at Internet Archive to discuss the possibility of our integrating more closely with their OCR workflow. The general idea would be to make the upload to IA and creation of pages in Wikisource more automated.
Feb 20 2020
I like this idea of using the same table with a specific cul_type. I agree with David that it provides flexibility and continuity for whatever tools, people, processes might be using the data in this table.
@Samwilson Could you detail some of the details and risks associated with this choice? I don't know enough about the tradeoffs to form a helpful viewpoint.
Feb 19 2020
Cool. I wasn't aware of the distinctions. That's good news.
Another new table? I wonder if we'll run into resistance for that?
Feb 18 2020
I don't think we've abandoned the idea, we just haven't been explicit about it. We can certainly talk about it some more.
Feb 14 2020
How will the relative bits work with translation and localization if this is truly configurable per wiki?
Feb 13 2020
This is a good question. My first instinct would be to do single inserts inside a database transaction.
What would the variable look like? Does it include the UI language and the actual "offset" value like number of days from now to expire? Is this an array then?
I think I like that idea. Any downsides we can foresee?
Feb 11 2020
Their response: "Are you using an up to date version of calibre?" Eh....
If we uncover what we think might be bugs or inconsistencies with Calibre's renderings, it would be cool to create issues or bug reports for them. We probably can't fix it but we can help with a bit of research.
Feb 10 2020
I salute you!
Awesome. I hope this shows benefits for the reliability. Initial signs look good.
Feb 7 2020
@Marostegui Thanks for your help with this!