Wed, Jun 19
Hi all. There isn't further work on this. Right now the Multimedia team is focused on the Structured Data on Commons project, for which the grant runs until the end of the current calendar year. Presently additional work on stuff outside of SDC or urgent bugfixing or adding test coverage is generally lower priority. Tagging @Ramsey-WMF for any further prioritization or work decomposition if time becomes available in the new calendar year.
Tue, Jun 18
Mon, Jun 17
Thank you, @chelsyx, great work! Is there a Phabricator paste or Jupyter notebook with the queries and results for our future selves?
Fri, Jun 7
Hi @eranroz. Heads up, I'm on time off so my reply may be delayed a couple weeks.
May 23 2019
I was wondering, how well does the parameters name-based approach apply to the set of TemplateData-backed templates themselves?
May 22 2019
One of my hopes is that people will be concentrated together on topics. I'd prefer single track with a well structured format if at all possible.
May 16 2019
May 15 2019
Thanks @kzimmerman. All the signals I've seen and that we could think of seem to suggest there's a positive traffic increase attributable to the intervention. My gut instinct given typical user behavior on search engine result pages and the nature of this intervention is that there's very little cannibalization, no detrimental cannibalization, and in fact there's a real boost to content availability and consumption.
May 13 2019
@elukey thanks for the follow up here. No need to block on me for the GPU. Fully agreed on the need for a secure supply chain.
May 3 2019
As an example, https://upload.wikimedia.org/wikipedia/commons/thumb/7/7f/NY_308_in_Rhinebeck_4.jpg/800px-NY_308_in_Rhinebeck_4.jpg maps to https://ms-fe.svc.eqiad.wmnet/wikipedia/commons/thumb/7/7f/NY_308_in_Rhinebeck_4.jpg/800px-NY_308_in_Rhinebeck_4.jpg in the cluster if fetching directly from stat1005.
May 1 2019
Apr 30 2019
To clarify, was that via the internet or internal cluster?
Apr 19 2019
Apr 10 2019
@Yurik in addition to pulling both version of the JS for pages bearing both v1.5 and v2 syntax, the other thing is to encourage the syntax being updated, is that right?
Apr 9 2019
Totally understood on the user observed performance (at least when it's on the critical rendering path) and static asset payload, just wanted to make sure if it's the case that theoretically JS-only provides all of the functionality (for JS clients) and there wasn't some additional use case.
Apr 8 2019
Hi, I'm requesting access to gpu-testers as well in order to begin validating model building.
Thanks. Confirmed it works.
Mar 28 2019
Okay, this has been sitting in draft for too long, so I'm going to provide this simply so that we have it here and people have access to some queries in case they're looking around later. I was intending to share some numbers then follow up about product aspects on a separate post. But I've been rather busy with planning and anticipate continuing to be quite busy with more planning. We'll be discussing potential product / support aspects separately in a forthcoming meeting.
@JKatzWMF @JMinor would you support us addressing T142090: Add hover-card like summary (og:description) to open graph meta data printing plain summary as time permits here in Q4 or perhaps as a Q1 FY 19-20 project, to bridge the gap between now and bigger social sharing work?
Mar 27 2019
Mar 25 2019
- Based on 1A, the volume of visits coming from integrated translated results in Google search is similar to the visits explicitly requesting a translation (e.g., by going to Google Translate or clicking the "translate this age" option in the search results).
Correct. And note that this graph only shows the traffic when the translation target language is Indonesian. You may also notice that from January to February, visits from integrated translated results is much higher, then it drop since March. I asked @dr0ptp4kt and he thinks Google probably change their algorithm.
Mar 21 2019
It seems like in the screenshot from @Peter it's now fairly clear about the proxying. Obviously the visual nudge is to actually turn on the feature, but it seems like the terminology is pretty clear. In this regard, this makes it more on par with something like Opera.
Mar 20 2019
Thank you, @Krinkle.
I don't have my Android device handy, but this is not applying Google Web Light, correct? Would it be possible for someone to post screenshots of the resultant treatment for our pages?
Mar 11 2019
Yes, @Legoktm, thanks!
Mar 10 2019
Thanks @Legoktm. Yes, please, Joe Walsh should be added, too.
Mar 8 2019
Thanks. Looking forward to confirmation.
^ Well, I intended for that to be on email. But it stands: I think Olga put this in terms that I could understand - and as I've said in other places, I think the implementation is non-trivial even if the consequences can be studied sufficiently to be well understood. That said, what is this "exploring sharing entire articles or portions of articles" part about?
Thanks. That's cool, although what should we do to reinstate my privileges short run? The specific request right now is to get Natalia H granted group membership, although I want to ensure that both Joe W and I will have owner privs more generally in the apps repos.
Great framing, nice job! One question, though, what's this part about and how does that tie into the conversation?
Mar 7 2019
@Krenair created task here. Previously I had sufficient permissions as I recall, but it seems like there's been an update (TBH I may have missed a note somewhere).
Hi there - re-opening, although please let me know if I should open a fresh task.
Mar 5 2019
Mar 4 2019
@chelsyx As it is the "Access the translated funnel" line makes the other parts of the funnel look compressed in the "Number of events when target language is Indonesian, by action type" graph. It's a true representation of the magnitude, of course, but I was wondering if you had an approach that might aid visual interpretation of the data (e.g., two y-axes, non-constant scale, percentile fluctuation, etc.).
Mar 1 2019
Thanks @Pginer-WMF. I've put a HOLD on the calendar for March 6 to get the Varnish patch up a little bit ahead of this, although will adjust as needed for any change in the activation for the extension (or the schedule of @BBlack and myself the day prior to the activation of the extension).
Feb 28 2019
Thanks, @santhosh. When you say "context detection code", I take that to mean inclusive of this init code we're referring to for the "Desktop" footer link removal.
Feb 26 2019
Thanks, @BBlack, will give the heads up once the date is set.
Feb 25 2019
Would it be possible to clarify the wording on "There is no existing FLOSS software that provides the same functionality"? I believe the intent here is about surveying the FLOSS ecosystem for well crafted, well maintained, architecturally compatible FLOSS software that provides comparable functionality before specifying and building new non-trivial standalone services.
Feb 22 2019
I don't see this as an urgent priority, although planning it as a small piece of work for a future quarter would be fine. We could then share this with the mailing lists and contacts we have at places where people are employing these sorts of algorithms in their own code.
This sort of algorithm is in use in several prominent high scale media properties, but people are recreating the work in their specific cases, as opposed to having one easy-to-call API that reflects this line of thinking. The idea was to expose something that, given a title, produces the correct revision. I strongly agree that it should also take into consideration whether that last correct revision is reportedly non-damaging (and scrub backwards further if so), as sometimes humans can't keep up with the backlog.
Feb 21 2019
@BBlack in https://gerrit.wikimedia.org/r/490120 I checked in with @Pginer-WMF today. Pau said deploying this the day or two prior to ExternalGuidance being activated for the source wiki of enwiki (for Indonesian) would be ideal.
Feb 20 2019
Okay, so based on @TheDJ feedback @JMinor it seems the issue may surface in fleeting edge case scenarios. Of course some would say making those errors visible is a feature, not a bug. Anyway, obviously this is your area for prioritization.
Thanks. I'm not sure if something changed in a Scribunto module or somewhere in extension land, but it doesn't seem like it's really turning up on enwiki source, at least - there are some Village Pump discussions on this.
Feb 14 2019
Hi all - I was aware of this task but hadn't been following it. But it was brought to my attention as having some momentum, so here I am! I have some information I can dredge up that I think may help shed some light on some paths forward. I also want to check in with some product and design people about any sense on forthcoming product interventions in the area of interactive or, for that matter, materialized graphs.
Feb 13 2019
For those following along, I ran a query to get a sense of global usage of Google Translate and using the "Desktop" link. On 11 February 2019 there were only 89 such requests globally, about 2/3 where enwiki was the source wiki. This figure is not a perfect predictor of desktop user behavior, as for desktop users using enwiki as the source wiki receiving the mobile treatment it will be a new thing. But it probably suggests that, in addition to the rationale @Pginer-WMF provides about the basis of stopping showing broken stuff, the mobile read view is okay for consumptive purposes in general.
Thanks, @santhosh !
Feb 12 2019
@BBlack ^ would you please review the enwiki VCL patch? We'll only want to merge it after ExternalGuidance has been tested with simplewiki and @Pginer-WMF has given the greenlight, but I figured it best if we go through review ahead of that.
@santhosh ^ would you please review and verify it has the intended effect? I need to reset my Vagrant stuff, but figured this was simple looking enough to post a patch (we'll see if I'm right!).
Feb 11 2019
Heads up @chelsyx: for simplewiki access via the Google Translate proxy the traffic pattern is now mobile web based even for desktop UAs. The same will happen with enwiki when we make that change later. I thought I should make this clear for any intervention analysis.
@santhosh and @Gilles the footer list containing the "Desktop" link and other list items places the dot character between elements using an li::after pseudo-element. Do you think we should just use JS to remove the "Desktop" <li> instead of using a CSS rule? Setting the opacity to 0 like the other hidden elements would leave the dot character for any preceding bullets in place, which looks unusual because it leaves a dot at the end of the list. If we use JS is there a preferred segment of the JS code to do so to avoid any performance issues?
Feb 10 2019
Feb 8 2019
Jan 18 2019
Heads up @phuedx . @BBlack and I spoke yesterday and we'll go with a simpler patch instead of the fuller refactor, given the plan to have the Varnish stuff in maintenance mode and switch to ATS (i.e., don't fix it if it ain't broken).
Okay, @BBlack, now it's ready for review.
@BBlack hold that thought, one more condition to add.
@BBlack patch posted for your review ^. Would you please review and let me know on patch for any additions?
Jan 9 2019
Hi @BBlack , any suggestion here?
Jan 8 2019
@Tbayer what do you have in mind? Heads up, T208795 captures the first concrete case where the full transcoding indeed goes all the way through the Wikimedia servers and stuff is already counted as a pageview but there's an X-Analytics key-value made available for query purposes.
Jan 7 2019
Paraphrasing a dialogue with @BBlack immediate edge side HTTP redirects based on header/regex might be feasible without fragmenting caches/backends.
Dec 5 2018
Nov 28 2018
Nov 15 2018
Nov 11 2018
Nov 9 2018
Nov 7 2018
@Nuria thanks. You understood the question well. Okay, so my read of sessionInSample and randomTokenMatch is that the populationSize values between different schemas would need to have a common base value so that they divide cleanly in order to guarantee intersection, as it's a divisor in a modulo calculation. Do I have that right?
Nov 6 2018
The question of whether you can sample events per session with stickiness is a different one, and the answer to that is yes, you can do that as of today deterministically and decide that event 1 and event2 are always going to be sampled for session "25". Session here means " identifier assigned to your browser until you close it down" . This identifier is sent in eventlogging events but it is not sent in general requests. It will be reset when you re-start your browser.
- IE11+ (6.8%)
- Safari 5.1-11.2 (1.7%)
- iOS Safari 8-11.3
- After a brief peak at pageviews_daily in Turnilo, this looks like ~0.7%
Yes, we discussed collision avoidance as part of T201124 and increased the length of mw.user.sessionId() to a value that should be safe for all foreseeable scenarios (see in particular T201124#4521002). I'm not quite sure what salting and hashing has to do with that though.
- For unique device:
- (an example that can include both scenarios you mentioned above) Any kind of experiment or data collection that requires asking the same unique device multiple questions across a period of time. For example, when we want to learn about how users "learn" on Wikipedia, we need to be able to interfere with their experience on Wikipedia in multiple stages of their interaction and ask them questions. Not being able to say which unique device has answered the first batch of questions is a blocker for this line of research.
Thanks for the review. The User-Agent field is that of the end user's device.
Nov 5 2018
Nov 2 2018
Oct 31 2018
Follow up here: Kosta and I spoke, and we don't need the token, as logging should take place on a per-user basis, not just on a per-session basis. So the key will be constructed by hashing two non-sensitive items. This is an okay approach in my view given the requirements.
@Bawolff I added a question in the patchset about getToken(). Basically, although the cost of computing a rainbow table to reverse engineer the hashed values of getToken() in case of someone spilling Redis keynames is moderately high, I wanted to check whether there's even a risk if an attacker does so. If there's a risk if an attacker does so, I'm thinking we should instead take just a portion of the token (I'm working from the assumption this is tied to something fixed between the client and the server - a cookie issued post login) and the user's numerical ID and concatenate those and then hash that concatenated value for setting the keyname - that would still be basically collision free for keynaming purposes.
Oct 30 2018
@leila to clarify, which of the following do you desire?
Oct 23 2018
Oct 21 2018
@Tbayer @Neil_P._Quinn_WMF @chelsyx @mpopov @nettrom_WMF curious about your thinking here for session overlap between events that are sent at the global (perhaps per-project, if we need that) default and those that are oversampled for the sessions.
Oct 18 2018
@phuedx do you think it might be sensible to simply make sendBeacon a pre-requisite at this point for client side event logging?
@Ottomata I agree with @phuedx on your question that opt-in (eventually via SCS) makes sense. After all, for feature teams or feature clusters where session sampling as the norm would be wanted, they could follow some convention of their own to make it simple.