Fri, Sep 21
We should probably link to this page somewhere in the meta page that hosts the metric definition: https://meta.wikimedia.org/wiki/Article_counts_revisited
See issue with table view.
Couple things on this regard:
However, when querying data before the rename, the old column name is the one that has data, on more recent rows, it's the new column. This is less than ideal.
Right, the renaming of the schema is seen by the persistence storage as "removal" of 1 column and "addition" of 1 column. Our recommendation is to keep changes in schemas backwards compatible so they can be persisted easily, while json schema gives you a lot of freedom you are limited by what you can persist on hive. A removal of a field is not backwards compatible.
Thu, Sep 20
using EventLogging does afford us some limited follow-on analysis with tools like Druid/Superset
The point I was trying to make above is that neither Druid nor Superset are a good fit for error logging as errors are quite useless w/o stack traces. Druid is a tool to analyze timeseries data, cannot deal with text and it is really is not well suited for data that is schema-less. I suggest you use hive to analyze this data if you are set on collecting it. I think once you start a 1% collection you might find that many of your events are not persisted due to validation issues, I might be wrong about that so we shall see. A sure thing is that validating client side errors to a schema might not be the best usage of CPU cycles.
+1 to great work @sahil505 !
Tue, Sep 18
Let's document results on wikitech
Mon, Sep 17
Fri, Sep 14
I will live this ticket open until next month, we have started computing pageviews for usability.wikimedia and you can see there are pageviews available for the daily range here: https://wikimedia.org/api/rest_v1/metrics/pageviews/aggregate/strategy.wikimedia.org/all-access/user/daily/2018090100/2018091400
usability data is still not showing on wikistats
Nice, something we need to figure out is how to test events in beta for most schemas, I added some docs of how to consume from eventlogging kafka topic directly as from now on the all-events.log will only have the schemas whitelisted
To be deployed next week.
Total pagecounts daily seen for the last month do not really give you information when you are visualizing them (as variation is little given running totals) . I think we agreed early on that this metric only has sense if time range is hardcoded to "all" and there is no posibility to switch to a near time range. Ping @mforns to see what he thinks as he was doing CR
Ping @JAllemandou and @Pchelolo has this change been deployed? from looking at: https://wikimedia.org/api/rest_v1/#!/Edits_data/get_metrics_edits_aggregate_project_editor_type_page_type_granularity_start_end does not seem it is the case
Thu, Sep 13
Wed, Sep 12
@Ottomata can you explain what is missing?
@Cirdan: the harhsness of your responses is really uncalled for.
After talking with @Ottomata on irc about the reasons why eventlogging is not well suited to do error logging I have written a wikitech page on this regard: https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/NotErrorLogging#Eventlogging_is_not_Well_suited_to_do_error_logging
Tue, Sep 11
Completely agree for the reasons you mention! Right know however we are keen to document the total number of client side errors we do have to build a business case for further exploration
On my opinion, in this case absolute numbers are of little help, a mild error that still lets you use the page that shows up in chrome will dwarf any other errors from browsers that are less used. I would look, ad minimun, into rates of errors per browser.
Super fun and all, Ijust saying that think all this info is on the MySQL consumer log, as we have those for the past month.
Mon, Sep 10
Let's message analytics@ list when we get this work started.
Looks all tables in mySQL db also need to be deleted.
Let's 1) stop purging 2) drop all echo tables on events and events sanitized database 3) start purging again
Ah, my mistake totally, wrong ticket for my reply.
. In such an event if it was detected we'd turn the sampling rate down to 0 until the problem was fixed, but I'm not sure if that's feasible?
With caching of our JS resources the answer is that no, it is really not feasible, there will be a timeperiod in which events will be sent regardless of config setting as that has been cached on client.
That is also another reason why we really do not think EL should be used for error reporting, bursty (and unpredictable) traffic
@Jdlrobson +1 to @Ottomata 's suggestion. I do not think sending this schema to MySQL is a viable option, EL is really not the best tool to do error logging, as 1) it is impossible to anticipate a sampling rate (errors are bursty) , 2) EL is not designed to manipulate long streams of text and 3) EL lacks grouping of errors in order to see most prevalent occurrences. of errors.
Negative bytes are due to deletion of revisions/pages. is that your question?
Sat, Sep 8
The data is not public as there are no public stats of edits per country. Your edit history does not include geographical locations where edits were made
Fri, Sep 7
@Niedzielski please ping us on channel for issues such as this one, eventlogging on beta needed to be restarted bu i can see vents now coming into log
ping @Johan for any thoughts
Wait no.. see nb.wikipedia.org does not exist as a project ...
@JMinor: turnilo just requires ldap access, just like piwik, please do file a ticket if you cannot access.
You can see here (this is sampled 1/100) the amount of traffic that the app is sending that is met with a 404 versus the 200 traffic, ratio is significant, specially see numbers on September 3rd: https://bit.ly/2wPLgLe
Thu, Sep 6
Having never seen that before using same browser I think maybe a browser restart is in order? Let us know if you see it still after a restart
Never seen that before, what browser do you see this one, Chrome?
@bmansurov we get data via varnish that does not see post body, only url.
Wed, Sep 5
Tue, Sep 4
What I think is happening: the ulr decoded farsi text from internal referrers does not match (encoding?) any page by name and thus those internal referrers are being lost.
What percentage of global human pageviews (i.e. those with agent_type = 'user', a core metric we report to the board on a monthly basis) are going to be reclassified as spider pageviews?
We are reclassifying very few requests ~.25% increase on bot traffic over prior numbers. You can use provided notebook and extract more precise answers should you want to do so.