Page MenuHomePhabricator

Nuria (Nuria)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Nov 26 2014, 3:04 AM (251 w, 5 d)
Availability
Available
LDAP User
Nuria
MediaWiki User
NRuiz (WMF) [ Global Accounts ]

Recent Activity

Sat, Sep 21

Nuria assigned T233504: add whether an edit happened on cloud VPS to geoeditors-daily dataset to JAllemandou.
Sat, Sep 21, 11:38 PM · Patch-For-Review, Analytics-Kanban, Analytics, Cloud-Services, Developer-Advocacy (Jul-Sep 2019)
Nuria added a comment to T233504: add whether an edit happened on cloud VPS to geoeditors-daily dataset .

Can someone from cloud-services-team confirm we wnat the dashboard with this data to be public? cause if the request is for a private dashboard after adding a column we could set up one in superset in 5 minutes

Sat, Sep 21, 11:08 PM · Patch-For-Review, Analytics-Kanban, Analytics, Cloud-Services, Developer-Advocacy (Jul-Sep 2019)
Nuria added a comment to T233504: add whether an edit happened on cloud VPS to geoeditors-daily dataset .

We have a UDF that @bd808 work on a while back that can classify iPS as coming from cloud: https://github.com/wikimedia/analytics-refinery-source/blob/06e6c0a1d63d31236638934129c5d5d0344dc677/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/IpUtil.java

Sat, Sep 21, 11:06 PM · Patch-For-Review, Analytics-Kanban, Analytics, Cloud-Services, Developer-Advocacy (Jul-Sep 2019)
Nuria created T233504: add whether an edit happened on cloud VPS to geoeditors-daily dataset .
Sat, Sep 21, 11:06 PM · Patch-For-Review, Analytics-Kanban, Analytics, Cloud-Services, Developer-Advocacy (Jul-Sep 2019)

Fri, Sep 20

Nuria added a comment to T214093: Modern Event Platform: Schema Guidelines and Conventions.

Perhaps we should add them into the event schema, even if they are never set by event producers?

The capsule is coming back!

Fri, Sep 20, 10:18 PM · Analytics-Kanban, CPT Initiatives (Modern Event Platform (TEC2)), Analytics, Better Use Of Data, Patch-For-Review, Product-Analytics, Goal, Services (watching), Analytics-EventLogging, EventBus
Nuria added a comment to T231984: NDA Request from WMDE employee Raja.

@raja_wmde you should have access to both http://turnilo.wikimedia.org and http://superset.wikimedia.org

Fri, Sep 20, 7:26 PM · Operations, LDAP-Access-Requests

Thu, Sep 19

Nuria added a comment to T231858: Archive data on eventlogging MySQL to analytics replica before decomisioning .

Scoop can map column types but there are gotchas, like column types with '.' need to be quoted, many columns need an explicit utf8 conversion or casting (see example command below). I am not saying it scoops cannot be scripted, they can totally be, but after each one of them we need to verify data for all columns made it like we expected. Seems like it would be less work to just drop data not needed and archive on mysql (but if you disagree I am open to consider that)

Thu, Sep 19, 6:39 PM · Analytics-Kanban, Analytics, Analytics-EventLogging
Nuria closed T231856: Cleanup refinery artifacts folder from unneeded jars as Resolved.
Thu, Sep 19, 6:32 PM · Analytics-Kanban, Analytics
Nuria closed T225911: Add new mediatypes to media classification refinery code, a subtask of T228149: Load media requests data into cassandra , as Resolved.
Thu, Sep 19, 6:30 PM · Analytics-Kanban, Analytics, Tool-Pageviews
Nuria closed T225911: Add new mediatypes to media classification refinery code as Resolved.
Thu, Sep 19, 6:30 PM · Analytics-Kanban, Analytics
Nuria closed T232226: wmf_netflow cube in Turnilo missing bytes and packets measures as Resolved.
Thu, Sep 19, 6:29 PM · Patch-For-Review, Analytics-Kanban, Analytics
Nuria closed T225297: Create a spicerack recipe to reboot the hadoop worker nodes as Resolved.
Thu, Sep 19, 6:28 PM · Analytics-Kanban, Analytics, User-Elukey
Nuria closed T232122: Decomission eventlogging-service-eventbus and clean up related configs and code, a subtask of T201068: Modern Event Platform: Stream Intake Service, as Resolved.
Thu, Sep 19, 6:27 PM · Analytics, Core Platform Team Legacy (Watching / External), Services (watching), Analytics-EventLogging, EventBus
Nuria closed T232122: Decomission eventlogging-service-eventbus and clean up related configs and code as Resolved.
Thu, Sep 19, 6:27 PM · MW-1.34-notes (1.34.0-wmf.23; 2019-09-17), Analytics-Kanban, Analytics, Core Platform Team Legacy (Watching / External), Services (watching), Analytics-EventLogging, EventBus
Nuria moved T208612: Release edit data lake data as a public json dump /mysql dump, other? from Done to Ready to Deploy on the Analytics-Kanban board.
Thu, Sep 19, 6:27 PM · Patch-For-Review, Analytics-Kanban, Research-Backlog, Analytics
Nuria closed T228557: third party domain data is getting refined as Resolved.
Thu, Sep 19, 6:27 PM · Patch-For-Review, Analytics-Kanban, Analytics
Nuria closed T212854: Upgrade ua parser to latest version for both java and python, a subtask of T200630: Eventlogging's processors stopped working, as Resolved.
Thu, Sep 19, 6:26 PM · Analytics, Patch-For-Review, Analytics-Kanban
Nuria closed T212854: Upgrade ua parser to latest version for both java and python as Resolved.
Thu, Sep 19, 6:26 PM · Patch-For-Review, Analytics-Kanban, Analytics
Nuria added a comment to T212854: Upgrade ua parser to latest version for both java and python.

Super nice @JAllemandou Thanks for the docs

Thu, Sep 19, 6:26 PM · Patch-For-Review, Analytics-Kanban, Analytics
Nuria added a comment to T231858: Archive data on eventlogging MySQL to analytics replica before decomisioning .

It is actually a lot of work to scoop this data as it is not doable in bulk, every table needs to have special mappings. I did 4/5 tables some quarters back and it was a lot of menial work. In many instances it *seemed* that tables had been properly imported but some in some columns data was mingled, every column for every table would need to be queried to make sure imports are correct.

Thu, Sep 19, 4:45 PM · Analytics-Kanban, Analytics, Analytics-EventLogging
Nuria added a comment to T231858: Archive data on eventlogging MySQL to analytics replica before decomisioning .

One question I have is whether you guys want to keep that database online (as in: it could be queryable) or just keep it as a backup in case it needs to be restored for some emergency/very concrete case.

It should be queryable as there is data there that predates hadoop and while used sparingly needs to be accessible. Now, data as of teh last 1.5 years (aprox, i can check for exact dates) on mysql is also in hadoop so, before moving, we can drop all data that exists in both storages, that should reduce size a bunch.

Thu, Sep 19, 3:52 PM · Analytics-Kanban, Analytics, Analytics-EventLogging

Wed, Sep 18

Nuria added a comment to T229682: Add more dimensions to netflow's druid ingestion specs.

Wed, Sep 18, 11:31 PM · Patch-For-Review, Analytics-Kanban, Analytics
Nuria added a comment to T229682: Add more dimensions to netflow's druid ingestion specs.

note to self, turnilo neds double quotes and no spaces:

Wed, Sep 18, 11:24 PM · Patch-For-Review, Analytics-Kanban, Analytics
Nuria added a comment to T231616: Request access to Analytics cluster for Urbanecm.

The idea that access for a formal collaboration with an external non-wikimedia group is an acceptable use of resources but access for wikimedians is somehow a significantly bigger problem deserves some scrutiny imo.

(my last post on this regard)
Again, our resources are limited and while this policy might not be perfect is certainly clear. The analytics team does not run a platform intended for wide community access, to do so we will need many times the resources we have in terms of people and infrastructure. Sorry this answer is disappointing but analytics serves the community by making publicly accessible as much data as we can about the movement (and this is a goal towards which we work every day). We do not provide a publicly accessible computation platform. We simply cannot do both.

Wed, Sep 18, 7:01 PM · Patch-For-Review, Operations, SRE-Access-Requests
Nuria added a comment to T233238: add agent-type dimension to pageviews per country endpoint.

We can, load this data into a "shadow" table, v2 in cassandra and after swap the current table by the other one.

Wed, Sep 18, 4:33 PM · Analytics
Nuria triaged T233238: add agent-type dimension to pageviews per country endpoint as Normal priority.
Wed, Sep 18, 4:31 PM · Analytics
Nuria created T233238: add agent-type dimension to pageviews per country endpoint.
Wed, Sep 18, 4:31 PM · Analytics
Nuria added a comment to T231616: Request access to Analytics cluster for Urbanecm.

@Urbanecm I have corrected the information about data access, sorry about that. From what i can tell the query that lead to you requesting access can be run against labs databases, it is very few the info that is on analytics replicas that is not present on labs databases.

Wed, Sep 18, 3:39 PM · Patch-For-Review, Operations, SRE-Access-Requests
Nuria added a comment to T231616: Request access to Analytics cluster for Urbanecm.

Sorry this is disappointing but given our very limited resources we really cannot support ad-hoc data access for community members, the best way we have found to have a policy around granting access has to do with employment or active collaborations with research team. I have added a note to this extent to the wikitech docs. Again , my apologies, in this case the ticket that prompted this request seems like is been resolved.

Wed, Sep 18, 5:10 AM · Patch-For-Review, Operations, SRE-Access-Requests
Nuria added a comment to T229682: Add more dimensions to netflow's druid ingestion specs.

@ayounsi if you can provide a map like:

map: {"32":"URG", "16":"ACK","8":"PSH", "4":"RST", "2":"SYN", "1":"FIN"}
Wed, Sep 18, 1:36 AM · Patch-For-Review, Analytics-Kanban, Analytics

Tue, Sep 17

Nuria added a comment to T229682: Add more dimensions to netflow's druid ingestion specs.
tcp:
            extractionFn:
                type: lookup
                lookup:
                    type: map
                    map: {"32":"URG", "16":"ACK","8":"PSH", "4":"RST", "2":"SYN", "1":"FIN"}
                    retainMissingValue: true
Tue, Sep 17, 11:18 PM · Patch-For-Review, Analytics-Kanban, Analytics
Nuria added a comment to T229682: Add more dimensions to netflow's druid ingestion specs.

ya, flags can be done in the same way , a bunch of them are null though seems like

Tue, Sep 17, 11:16 PM · Patch-For-Review, Analytics-Kanban, Analytics
Nuria added a comment to T229682: Add more dimensions to netflow's druid ingestion specs.

ok, i figured out how to replace values using druid lookup transform functions which is a real long way to say "map"

Tue, Sep 17, 11:09 PM · Patch-For-Review, Analytics-Kanban, Analytics
Nuria added a comment to T226663: Develop a tool or integrate feature in existing one to visualize WMCS edits data.

@mforns cause geoeditors daily is updated from cu_changes (in hadoop) which is the only table that has IPs and it is the only one that would be able to tell you whether an edit came from cloud internal ips, makes sense?

Tue, Sep 17, 9:15 PM · Cloud-Services, Developer-Advocacy (Jul-Sep 2019)
Nuria added a comment to T231616: Request access to Analytics cluster for Urbanecm.

Also, the ticket referenced is already been closed, right?

Tue, Sep 17, 8:48 PM · Patch-For-Review, Operations, SRE-Access-Requests
Nuria closed T232664: Membership in "researchers" group for Srishti Sethi as Resolved.
Tue, Sep 17, 8:36 PM · Patch-For-Review, SRE-Access-Requests, Operations, Cloud-Services, Developer-Advocacy (Jul-Sep 2019)
Nuria closed T232664: Membership in "researchers" group for Srishti Sethi, a subtask of T226663: Develop a tool or integrate feature in existing one to visualize WMCS edits data, as Resolved.
Tue, Sep 17, 8:35 PM · Cloud-Services, Developer-Advocacy (Jul-Sep 2019)
Nuria added a comment to T232664: Membership in "researchers" group for Srishti Sethi.

@srishakatux do sync up with @bd808 about steps to go forward to have a public dashboard, i think that on the light of data in hadoop we probably no longer need the script.

Tue, Sep 17, 8:35 PM · Patch-For-Review, SRE-Access-Requests, Operations, Cloud-Services, Developer-Advocacy (Jul-Sep 2019)
Nuria added a comment to T226663: Develop a tool or integrate feature in existing one to visualize WMCS edits data.

We have a UDF that @bd808 work on a while back that can classify iPS as coming from cloud: https://github.com/wikimedia/analytics-refinery-source/blob/06e6c0a1d63d31236638934129c5d5d0344dc677/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/IpUtil.java

Tue, Sep 17, 8:34 PM · Cloud-Services, Developer-Advocacy (Jul-Sep 2019)
Nuria added a comment to T226663: Develop a tool or integrate feature in existing one to visualize WMCS edits data.

We have a UDF that @bd808 work on a while back that can classify iPS as coming from cloud: https://github.com/wikimedia/analytics-refinery-source/blob/06e6c0a1d63d31236638934129c5d5d0344dc677/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/IpUtil.java

Tue, Sep 17, 8:33 PM · Cloud-Services, Developer-Advocacy (Jul-Sep 2019)
Nuria updated subscribers of T226663: Develop a tool or integrate feature in existing one to visualize WMCS edits data.
Tue, Sep 17, 8:32 PM · Cloud-Services, Developer-Advocacy (Jul-Sep 2019)
Nuria updated subscribers of T226663: Develop a tool or integrate feature in existing one to visualize WMCS edits data.

Moving from other ticket: I think we can probably add a column in geoeditors-daily that speaks as to whether the edit happen on cloud. We can add a column to table below and they can compute aggregates via reportupdater or oozie and serve them with a dashiki dashboard cc @mforns and @joal for comments

Tue, Sep 17, 8:32 PM · Cloud-Services, Developer-Advocacy (Jul-Sep 2019)
Nuria added a comment to T231616: Request access to Analytics cluster for Urbanecm.

@MMiller_WMF and @Urbanecm

Tue, Sep 17, 8:26 PM · Patch-For-Review, Operations, SRE-Access-Requests
Nuria moved T225578: EventLogging needs to enque events to avoid draining users' battery on mobile from In Code Review to Paused on the Analytics-Kanban board.
Tue, Sep 17, 4:05 PM · Patch-For-Review, Analytics-Kanban, Performance-Team (Radar), Analytics-EventLogging, Analytics
Nuria moved T232122: Decomission eventlogging-service-eventbus and clean up related configs and code from Ready to Deploy to Done on the Analytics-Kanban board.
Tue, Sep 17, 4:03 PM · MW-1.34-notes (1.34.0-wmf.23; 2019-09-17), Analytics-Kanban, Analytics, Core Platform Team Legacy (Watching / External), Services (watching), Analytics-EventLogging, EventBus

Mon, Sep 16

Nuria added a comment to T226663: Develop a tool or integrate feature in existing one to visualize WMCS edits data.

FYI that wikistats2 is probably not the best place for this data. https://stats.wikimedia.org/v2/#/all-projects displays data aggregated for all wikis around some classical metrics like "active editors". Edits in labs seems a better fit for a dataset of its own.
If we want a public dashboard let's make one from the data stored on the cvs @bd808 has been creating (probably dashiki for public is best, for internal data we have superset) , we can do a fast tryout once files are moved to a publicly accessible endpoint.

Mon, Sep 16, 10:07 PM · Cloud-Services, Developer-Advocacy (Jul-Sep 2019)
Nuria added a comment to T226663: Develop a tool or integrate feature in existing one to visualize WMCS edits data.

sounds like bryan has a csv with counts of wiki, total, wmcs edits. I think we can do a dashboard on top of that csv easily and further updates to it can come from data in hadoop. @bd808 we can do a fast tryout with the data you have if you copy the data to a publicly accessible directory. See: https://wikitech.wikimedia.org/wiki/Analytics/Ad_hoc_datasets

Mon, Sep 16, 9:04 PM · Cloud-Services, Developer-Advocacy (Jul-Sep 2019)
Nuria updated subscribers of T232664: Membership in "researchers" group for Srishti Sethi.

Also, question to @mforns and @JAllemandou it seems that since the cloud-services-team just wants to see edits from those ips we could add a column to this dataset about _is_cloud_edit and they could compute their aggregates using this table, right? Does that seems like a good idea?

Mon, Sep 16, 7:06 PM · Patch-For-Review, SRE-Access-Requests, Operations, Cloud-Services, Developer-Advocacy (Jul-Sep 2019)
Nuria moved T232843: Can we add ORES data so it can be easily retrieved per revision present on mediawiki history? from Machine Learning Platform to Radar on the Analytics board.
Mon, Sep 16, 5:47 PM · Analytics
Nuria updated the task description for T232843: Can we add ORES data so it can be easily retrieved per revision present on mediawiki history?.
Mon, Sep 16, 5:47 PM · Analytics
Nuria renamed T232843: Can we add ORES data so it can be easily retrieved per revision present on mediawiki history? from Can we add Parsoid and ORES data to mediawiki history? to Can we add ORES data so it can be easily retrieved per revision present on mediawiki history?.
Mon, Sep 16, 5:45 PM · Analytics
Nuria added a comment to T232795: We are not capturing IPs of original requests for proxied requests from operamini and googleweblight. x-forwarded-for is null and client-ip is the same as IP on Webrequest data .

@BBlack
Let me add more contex here, we are trying to increase the data of our pageview dataset by tagging "automated" data. That is, "entities" that do very spiky requests in our data, say 30 pageviews per minute (pageviews, not requests). In order to do that we need a means to identify "entities" and having an IP that is not an umbrella IP (like it is the case of a proxy) helps.

Mon, Sep 16, 5:39 PM · Operations, Traffic, Analytics
Nuria added a comment to T232664: Membership in "researchers" group for Srishti Sethi.

SRE-Access-Requests please give @srishakatux ssh permits for the cluster if she does not have those already

Mon, Sep 16, 5:34 PM · Patch-For-Review, SRE-Access-Requests, Operations, Cloud-Services, Developer-Advocacy (Jul-Sep 2019)
Nuria added a comment to T232664: Membership in "researchers" group for Srishti Sethi.

I see, this data also exists in hadoop not only on the analytics replicas. .The replicas are a cluster where different wikis are located in different hosts, whereas hadoop has that same data already nicely available for you to query and it is already denormalizerd by wiki. Please see:

Mon, Sep 16, 4:53 PM · Patch-For-Review, SRE-Access-Requests, Operations, Cloud-Services, Developer-Advocacy (Jul-Sep 2019)
Nuria reassigned T232821: Check home leftovers of atgomez from Nuria to elukey.
Mon, Sep 16, 4:25 PM · Analytics
Nuria added a comment to T232821: Check home leftovers of atgomez.

Ya, +1 to removing all

Mon, Sep 16, 4:24 PM · Analytics
Nuria added a comment to T232844: Release wikimedia history dumps sorted by user ID and page ID.

This sorting makes total sense to facilitate user-focused research

Mon, Sep 16, 3:31 PM · Analytics
Nuria moved T231856: Cleanup refinery artifacts folder from unneeded jars from In Code Review to Done on the Analytics-Kanban board.
Mon, Sep 16, 3:07 PM · Analytics-Kanban, Analytics

Sat, Sep 14

Nuria added a comment to T232664: Membership in "researchers" group for Srishti Sethi.

Hardly any data can be found in mysql, we are deprecating that storage this quarter as we moved all data collection to hadoop a while back. If you point me to the script i can recommend the best path.

Sat, Sep 14, 9:45 PM · Patch-For-Review, SRE-Access-Requests, Operations, Cloud-Services, Developer-Advocacy (Jul-Sep 2019)

Fri, Sep 13

Nuria added a comment to T232795: We are not capturing IPs of original requests for proxied requests from operamini and googleweblight. x-forwarded-for is null and client-ip is the same as IP on Webrequest data .

Right, I see the UA issue but in the absence of IPs being provided by the proxy owners themselves what I am doing to retrieve them is just look at UA data in webrequest table so effectively, it is the same thing. Not sure what else can we do to have a more trustworthy list.

Fri, Sep 13, 6:40 PM · Operations, Traffic, Analytics
Nuria added a comment to T232843: Can we add ORES data so it can be easily retrieved per revision present on mediawiki history?.

I do not disagree that ores score would be useful in a table accessible by revision, +1 to that . I just do not think that the process that retrieves them and maintains them should be related to the mediawiki history one. I think the use case here is to be able to retrieve those in an easier format than it is possible now, makes sense.

Fri, Sep 13, 5:02 PM · Analytics
Nuria updated subscribers of T232795: We are not capturing IPs of original requests for proxied requests from operamini and googleweblight. x-forwarded-for is null and client-ip is the same as IP on Webrequest data .

ping @Ottomata and @JAllemandou for thoughts on this regard

Fri, Sep 13, 4:58 PM · Operations, Traffic, Analytics
Nuria added a comment to T232679: Images served with text/html content type.

I have started another ticket that as you mentioned, better explains the rationale behing having "trusted proxies", we really do not need them if we can capture the original ip: https://phabricator.wikimedia.org/T232795

Fri, Sep 13, 4:58 PM · Traffic, Analytics, Operations
Nuria renamed T232795: We are not capturing IPs of original requests for proxied requests from operamini and googleweblight. x-forwarded-for is null and client-ip is the same as IP on Webrequest data from Client_IP and Ip are always the same , even for proxied requests for opera mini or googleweblight to We are not capturing IPs of original requests for proxied requests from operamini and googleweblight. x-forwarded-for is null and client-ip is the same as IP on Webrequest data .
Fri, Sep 13, 4:56 PM · Operations, Traffic, Analytics
Nuria added a comment to T224459: Recommend the best format to release public data lake as a dump.

Can we move the info to wikitech so we can access it easily when this ticket is closed?

Fri, Sep 13, 4:00 PM · Research, Analytics
Nuria renamed T232795: We are not capturing IPs of original requests for proxied requests from operamini and googleweblight. x-forwarded-for is null and client-ip is the same as IP on Webrequest data from Client_IP and Ip are always the same , even for proxied requests for opera mini to Client_IP and Ip are always the same , even for proxied requests for opera mini or googleweblight.
Fri, Sep 13, 3:47 PM · Operations, Traffic, Analytics
Nuria moved T229674: Set up automatic deletion for netflow datasource in Druid from In Code Review to Ready to Deploy on the Analytics-Kanban board.
Fri, Sep 13, 3:25 PM · Patch-For-Review, Analytics-Kanban, Analytics
Nuria moved T232226: wmf_netflow cube in Turnilo missing bytes and packets measures from In Code Review to Done on the Analytics-Kanban board.
Fri, Sep 13, 3:24 PM · Patch-For-Review, Analytics-Kanban, Analytics
Nuria added a comment to T232843: Can we add ORES data so it can be easily retrieved per revision present on mediawiki history?.

I do not think it will be a wise decision (even if we could do it technically) to have mediawiki history be the monolith of all the things, with revision ids you should be able to retrieve easily ores scores and content.

Fri, Sep 13, 3:22 PM · Analytics
Nuria moved T228557: third party domain data is getting refined from Ready to Deploy to Done on the Analytics-Kanban board.
Fri, Sep 13, 2:16 AM · Patch-For-Review, Analytics-Kanban, Analytics

Thu, Sep 12

Nuria added a comment to T232664: Membership in "researchers" group for Srishti Sethi.

@srishakatux hello, can you be a bit more specific about what data you are after? Are you thinking eventlogging?

Thu, Sep 12, 11:55 PM · Patch-For-Review, SRE-Access-Requests, Operations, Cloud-Services, Developer-Advocacy (Jul-Sep 2019)
Nuria added a comment to T232795: We are not capturing IPs of original requests for proxied requests from operamini and googleweblight. x-forwarded-for is null and client-ip is the same as IP on Webrequest data .

Verfiable with this query:

Thu, Sep 12, 11:10 PM · Operations, Traffic, Analytics
Nuria created T232795: We are not capturing IPs of original requests for proxied requests from operamini and googleweblight. x-forwarded-for is null and client-ip is the same as IP on Webrequest data .
Thu, Sep 12, 11:09 PM · Operations, Traffic, Analytics
Nuria updated the task description for T232679: Images served with text/html content type.
Thu, Sep 12, 11:02 PM · Traffic, Analytics, Operations
Nuria updated subscribers of T232679: Images served with text/html content type.

cc @Ottomata just in case he can do the change too

Thu, Sep 12, 10:21 PM · Traffic, Analytics, Operations
Nuria updated subscribers of T232679: Images served with text/html content type.

We need to add googlewblight to the proxy list to make sure it is treated appropriately, i think misc/trusted_proxies.json is outside my boundaries so possibly @BBlack or @ema can do it.

Thu, Sep 12, 10:20 PM · Traffic, Analytics, Operations
Nuria renamed T212854: Upgrade ua parser to latest version for both java and python from Upgrade python ua parser to latest version to Upgrade n ua parser to latest version for both java and python.
Thu, Sep 12, 5:12 PM · Patch-For-Review, Analytics-Kanban, Analytics
Nuria added a comment to T232707: Requesting access to analytics cluster for Martin Gerlach.

Approved on my end as well.

Thu, Sep 12, 4:12 PM · Analytics, Operations, SRE-Access-Requests
Nuria added a project to T232123: Parse wikidumps and extract redirect information for 1 small wiki, romanian : Research.
Thu, Sep 12, 1:13 PM · Research, Analytics

Wed, Sep 11

Nuria added a comment to T232679: Images served with text/html content type.

I think we need to add proxy=googleweblight to x-analytics

Wed, Sep 11, 10:34 PM · Traffic, Analytics, Operations
Nuria added a comment to T232679: Images served with text/html content type.

This has the effect that these images are being considered content pageviews when they are just asset requests

Wed, Sep 11, 10:06 PM · Traffic, Analytics, Operations
Nuria created T232679: Images served with text/html content type.
Wed, Sep 11, 10:03 PM · Traffic, Analytics, Operations
Nuria added a comment to T211248: Modern Event Platform: Stream Intake Service: Migrate eventlogging-service-eventbus events to eventgate-main.

Let's do the wave!!!!!

Wed, Sep 11, 9:50 PM · CPT Initiatives (Modern Event Platform (TEC2)), Patch-For-Review, Core Platform Team Workboards (Clinic Duty Team), MW-1.34-notes (1.34.0-wmf.15; 2019-07-23), Services (watching), Analytics-EventLogging, EventBus, Analytics-Kanban
Nuria closed T211248: Modern Event Platform: Stream Intake Service: Migrate eventlogging-service-eventbus events to eventgate-main as Resolved.
Wed, Sep 11, 9:49 PM · CPT Initiatives (Modern Event Platform (TEC2)), Patch-For-Review, Core Platform Team Workboards (Clinic Duty Team), MW-1.34-notes (1.34.0-wmf.15; 2019-07-23), Services (watching), Analytics-EventLogging, EventBus, Analytics-Kanban
Nuria closed T211248: Modern Event Platform: Stream Intake Service: Migrate eventlogging-service-eventbus events to eventgate-main, a subtask of T201068: Modern Event Platform: Stream Intake Service, as Resolved.
Wed, Sep 11, 9:49 PM · Analytics, Core Platform Team Legacy (Watching / External), Services (watching), Analytics-EventLogging, EventBus
Nuria closed T229436: Add --skip-trash arg to refinery-drop-older-than calls in data_purge.pp as Resolved.
Wed, Sep 11, 9:49 PM · Analytics-Kanban, Analytics
Nuria closed T229817: Create new mediarequests table, a subtask of T228149: Load media requests data into cassandra , as Resolved.
Wed, Sep 11, 9:49 PM · Analytics-Kanban, Analytics, Tool-Pageviews
Nuria closed T229817: Create new mediarequests table as Resolved.
Wed, Sep 11, 9:49 PM · Analytics-Kanban, Analytics, Tool-Pageviews
Nuria set the point value for T229817: Create new mediarequests table to 3.
Wed, Sep 11, 9:49 PM · Analytics-Kanban, Analytics, Tool-Pageviews
Nuria closed T231002: Refactor quenename into HQL hive2 action oozie jobs as Resolved.
Wed, Sep 11, 9:48 PM · Analytics-Kanban, Analytics
Nuria closed T230312: Add literal transcoding to media file properties UDF , a subtask of T229817: Create new mediarequests table, as Resolved.
Wed, Sep 11, 9:47 PM · Analytics-Kanban, Analytics, Tool-Pageviews
Nuria closed T230312: Add literal transcoding to media file properties UDF as Resolved.
Wed, Sep 11, 9:47 PM · Patch-For-Review, Analytics-Kanban, StructuredDataOnCommons, Analytics, Tool-Pageviews
Nuria set the point value for T230312: Add literal transcoding to media file properties UDF to 5.
Wed, Sep 11, 9:46 PM · Patch-For-Review, Analytics-Kanban, StructuredDataOnCommons, Analytics, Tool-Pageviews
Nuria added a comment to T231677: Superset + Turnilo access for Verena Lindner + Raja Gumienny (WMDE).

Has @Verena also signed an NDA?

Wed, Sep 11, 5:17 PM · Analytics
Nuria added a comment to T229682: Add more dimensions to netflow's druid ingestion specs.

Add the netflow kafka supervisor job somewhere in refinery

Can you explain a bit what do we need this for?

Wed, Sep 11, 3:43 PM · Patch-For-Review, Analytics-Kanban, Analytics

Tue, Sep 10

Nuria added a comment to T229682: Add more dimensions to netflow's druid ingestion specs.

I just though i can easily setup turnilo to decode tcp_flags so they are not ints, let me give it a try

Tue, Sep 10, 11:23 PM · Patch-For-Review, Analytics-Kanban, Analytics
Nuria moved T217848: Sqoop: remove cuc_comment and join to comment table from Ops Week to Smart Tools for Better Data on the Analytics board.
Tue, Sep 10, 9:11 PM · Analytics
Nuria removed a project from T217848: Sqoop: remove cuc_comment and join to comment table: Analytics-Kanban.
Tue, Sep 10, 9:10 PM · Analytics
Nuria added a comment to T217848: Sqoop: remove cuc_comment and join to comment table.

The task to follow is https://phabricator.wikimedia.org/T232531 that already ccs analytics, that refactor has not started yet

Tue, Sep 10, 8:37 PM · Analytics
Nuria added a comment to T166732: Refactor comment storage in the database and abstract access in MediaWiki.

@WDoranWMF not blocking, we just need to know cause changes such as these affects our history reconstruction algorithms and we need to adjust them. If you create a task and tag Analytics so we can follow progress it will be excellent, many thanks.

Tue, Sep 10, 8:32 PM · MediaWiki-Comment-backend, MW-1.31-release-notes (WMF-deploy-2018-02-06 (1.31.0-wmf.20)), MediaWiki-Platform-Team (MWPT-Q1-Jul-Sep-2017), MW-1.30-release-notes (WMF-deploy-2017-09-05 (1.30.0-wmf.17)), Patch-For-Review, Wikimedia-Rdbms