Sat, Nov 11
Thu, Nov 9
@Tgr : i think your use case would work in hadoop as edit data related information is available since the beginning of time, now, it is true that hadoop at this time does not give you teh ability to join with any mediawiki table.
Wed, Nov 8
What types of selects were you doing?
Per conversation in person: let's make sure that when/if we remove event capsule from meta (so it does not exists in schema form), when we do we document the de-facto capsule of eventlogging schemas in the same fashion we document hive tables, wikitech documentation.
@JMinor I find yours quite an unfair representation of events.
Please be so kind to invite us to the meting you have with legal on this regard.
Tue, Nov 7
Updated link to data: https://analytics.wikimedia.org/datasets/archive/public-datasets/analytics/caching/
This ticket should be a talk, really.
Mon, Nov 6
@Neil_P._Quinn_WMF We followed @ezachte 's lead when it comes to metrics that our communities thinks are important, we are now vetting them with wikistats 1.0 results. Please take a look at preliminary results (we are vetting these as they differ about (i think) 5% with wikistats 1.0 metrics) : https://stats.wikimedia.org/v2/#/eu.wikipedia.org
Right now when a user logs out of MediaWiki, a significant amount of state can stay behind spanning both the logged-in and logged-out browsing session, which is likely unexpected from a user perspective.
This would be true of state that alters your interactions with the site. We certainly would like for analytics cookies to remain after logout and they do not any way affect the interactions of the user with the site. (Ex: WMF-Last-Access)
Thu, Nov 2
I agree to do it in EL makes most sense , totally, the field should not be coupled to the ability of the storage to handle / not to handle a nested object.
Doesn't seem that json-tuple is that friendly on a where clause right? I just cannot get it to work in something like:
Fri, Oct 27
Thu, Oct 26
Wed, Oct 25
Docs updated ping @elukey about dropping table in mysql
Ok, table PageContentSaveComplete_5588433 can be dropped from mySQL, it is now archived in hadoop.
Tue, Oct 24
Edit_13457736_15423246 can also be dropped
Mon, Oct 23
@chelsyx That makes sense, thank you.
Is the user versus bot percentage overall? I am not sure that is of value to quantify usage as of 2017, right? See timeseries of uploads by bots/users at https://stats.wikimedia.org/wikispecial/EN/TablesWikipediaCOMMONS.htm (scroll down)
Please have in mind that metrics for commons exist is https://stats.wikimedia.org/wikispecial/EN/TablesWikipediaCOMMONS.htm , let's make sure those are looked at when this work is taking place.
Ticket can be closed.
Approved on my end.
Oct 18 2017
@Ejegg: Can you explain (out of curiosity) how clicks on banners can identified on hive data?
We have not heard from these folks for a while now, looks like this project is not going to happen so I am going to close ticket.
Yes, pinging @bd808 that those two will be deleted soon.
Let's please test on beta before sending to prod
Oct 17 2017
As for the permanent blocks versus temporary blocks I really have no words of advice, I can see how a temporary block can be used to not apply more drastic measures if not needed, it will be excellent if phab supported it. Since it doesn't perhaps COC and @Aklapper can consult informally when a block is applied?
The phab etiquette lists " you must follow CoC" so it seems that is the basis and phab specific points are add-on?
@Ejegg understood, it is worth thinking that we have already infrastructure in place that processes events at a scale so maybe on your end you can start a project (geared towards next year) to use it to your benefit, we can help and also we can get jobs started that measure click through so you have those available, those are easy ones for us to tackle
Docs look good, thanks.
MobileWikiAppToCInteraction_10375484_15423246 is ready to be dropped, on archive db on hdfs cc @elukey
@Ejegg, right, got it that is why i said "estimate" but in this case it might not be correct at all
sorry, I forgot to mention we join to some of the tables in CiviCRM / drupal to determine how many of the clicks actually lead to donations.
You can "estimate" this with requests to donate.wiki right? with utm_source, utm_campaign and click_id , correct?
None of that stuff should be too tough to replicate in SQL, if we're able to use the qs-parsing functions that are available in Hive.
You can of course and you can also use spark.