Page MenuHomePhabricator

lexnasser (Lex Nasser)
Analytics Software Engineering Intern

Projects

User does not belong to any projects.

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Sep 30 2019, 9:17 PM (45 w, 20 h)
Availability
Available
IRC Nick
lexnasser
LDAP User
Lex Nasser
MediaWiki User
Lexnasser [ Global Accounts ]

Recent Activity

May 14 2020

lexnasser added a comment to T252363: Check home leftovers of lexnasser.

I think that the following should be saved:

  • stat1007: api, byc, refinery
  • notebook1003: Search_Engine_Testing.ipynb, Geoeditors.ipynb
  • hive: lex.webrequest_subset, lex.geoeditors_public_monthly
May 14 2020, 12:33 AM · Analytics

May 1 2020

lexnasser moved T244597: Create intermediate table that holds public data for geoeditors dataset so it can be used to load cassandra from In Progress to In Code Review on the Analytics-Kanban board.
May 1 2020, 3:17 AM · Analytics-Kanban, Analytics
lexnasser moved T248289: Configure Oozie job for loading geoeditors data into Cassandra from In Progress to In Code Review on the Analytics-Kanban board.
May 1 2020, 3:17 AM · Analytics-Kanban, Analytics
lexnasser reassigned T238365: Add editors per country data to AQS API (geoeditors) from lexnasser to Milimetric.

Handing the remainder of this task off to Dan.

May 1 2020, 3:15 AM · Product-Analytics, Patch-For-Review, Analytics-Kanban, Analytics

Apr 27 2020

lexnasser closed T245468: Pageviews missing for titles with emojis since April 23, 2019 as Resolved.

Did some final verification of pageviews for characters above 0xFFFF, and looks like everything's working! Marking as resolved.

Apr 27 2020, 4:00 PM · Patch-For-Review, Analytics-Kanban, Analytics, Pageviews-API

Apr 22 2020

lexnasser added a comment to T245468: Pageviews missing for titles with emojis since April 23, 2019.

Success! Will do some more testing to ensure that more cases are valid.

Apr 22 2020, 1:24 AM · Patch-For-Review, Analytics-Kanban, Analytics, Pageviews-API

Apr 16 2020

lexnasser added a comment to T245468: Pageviews missing for titles with emojis since April 23, 2019.

Thanks for the suggestions!

Apr 16 2020, 9:02 PM · Patch-For-Review, Analytics-Kanban, Analytics, Pageviews-API
lexnasser added a comment to T245468: Pageviews missing for titles with emojis since April 23, 2019.

Just submitted a patch with the fix and some new tests: https://gerrit.wikimedia.org/r/589383

Apr 16 2020, 6:12 PM · Patch-For-Review, Analytics-Kanban, Analytics, Pageviews-API

Apr 13 2020

lexnasser added a comment to T245468: Pageviews missing for titles with emojis since April 23, 2019.

On @Milimetric 's suggestion, I tested all 3 methods against each other to verify their consistency, and found they all behaved the same over the whole Unicode range.

Apr 13 2020, 11:39 PM · Patch-For-Review, Analytics-Kanban, Analytics, Pageviews-API
lexnasser added a comment to T245468: Pageviews missing for titles with emojis since April 23, 2019.

This is the code I'm using: Pattern.compile("^[ %!\"$&'()*,\\-.\\/0-9:;=?@A-Z\\\\^_a-z~\\x{80}-\\x{10FFFF}\\+]+$");`

Apr 13 2020, 12:48 AM · Patch-For-Review, Analytics-Kanban, Analytics, Pageviews-API

Apr 12 2020

lexnasser added a comment to T245468: Pageviews missing for titles with emojis since April 23, 2019.

Thanks again for all your feedback!

Apr 12 2020, 9:29 PM · Patch-For-Review, Analytics-Kanban, Analytics, Pageviews-API

Apr 11 2020

lexnasser added a comment to T249923: Users having issues with presto sqllab on superset .

@elukey I don't see the "Unknown error" message anymore. Nothing in the JS console either.

Apr 11 2020, 7:29 PM · Product-Analytics, Analytics-Kanban, Analytics
lexnasser added a comment to T249923: Users having issues with presto sqllab on superset .

@elukey I can see https://superset.wikimedia.org/superset/dashboard/73/, but still get the same "Unknown error" for https://superset.wikimedia.org/superset/sqllab.

Apr 11 2020, 3:51 PM · Product-Analytics, Analytics-Kanban, Analytics

Apr 10 2020

lexnasser added a comment to T245468: Pageviews missing for titles with emojis since April 23, 2019.

Thanks everyone for your input! I'm a bit busy right now, but I'll be sure to address each of your points later today.

Apr 10 2020, 8:10 PM · Patch-For-Review, Analytics-Kanban, Analytics, Pageviews-API
lexnasser added a comment to T245468: Pageviews missing for titles with emojis since April 23, 2019.

Just wanted to write everything I figured out in the past 3 days. I would love your feedback!

Apr 10 2020, 2:28 PM · Patch-For-Review, Analytics-Kanban, Analytics, Pageviews-API

Mar 23 2020

lexnasser moved T248289: Configure Oozie job for loading geoeditors data into Cassandra from Next Up to In Progress on the Analytics-Kanban board.
Mar 23 2020, 5:28 AM · Analytics-Kanban, Analytics
lexnasser created T248289: Configure Oozie job for loading geoeditors data into Cassandra.
Mar 23 2020, 5:28 AM · Analytics-Kanban, Analytics

Jan 13 2020

lexnasser closed T239625: Improve quality of external referer data, a subtask of T235780: Literature review of external reuse of Wikimedia content, as Resolved.
Jan 13 2020, 9:55 PM · Research
lexnasser closed T239625: Improve quality of external referer data as Resolved.

Deployed with the help of @Milimetric ! Hope you find these changes helpful!

Jan 13 2020, 9:55 PM · Product-Analytics, Analytics-Kanban, Research, Analytics
lexnasser added a comment to T239625: Improve quality of external referer data.

One last thing to resolve: There are a few Google Translate referers with the parameter prev=/search... (ex. prev=/search%3Fq%3DBARON%2BDE%2BHIRSCH%26hl%3Del%26rlz%3D1T4GGLL_elGR398GR398%26prmd%3Divns). Should these also be classified under the Google Translate search engine purview?

Jan 13 2020, 4:35 PM · Product-Analytics, Analytics-Kanban, Research, Analytics

Jan 6 2020

lexnasser added a comment to T239625: Improve quality of external referer data.

Hi @Isaac Got to final testing, and found an issue.

Jan 6 2020, 10:36 PM · Product-Analytics, Analytics-Kanban, Research, Analytics

Jan 3 2020

lexnasser added a comment to T239625: Improve quality of external referer data.
  1. Quick Status Update
Jan 3 2020, 7:56 AM · Product-Analytics, Analytics-Kanban, Research, Analytics

Dec 21 2019

lexnasser created T241295: Unusual unknown external webrequest referers.
Dec 21 2019, 9:10 AM · Analytics

Dec 13 2019

lexnasser added a comment to T225538: Request for a large request data set for caching research and tuning.

Thanks for the question! Nuria is more aware of the intricacies of the source of the data than I, but I believe the main other factor that limits the amount of data from the text cache is that the text data is filtered by is_pageview .

Dec 13 2019, 7:51 AM · Analytics-Kanban, Analytics

Dec 9 2019

lexnasser added a comment to T239625: Improve quality of external referer data.

Started looking into the referer class, and had a few questions:

Dec 9 2019, 5:08 PM · Product-Analytics, Analytics-Kanban, Research, Analytics

Dec 8 2019

ema awarded T225538: Request for a large request data set for caching research and tuning a Like token.
Dec 8 2019, 9:25 AM · Analytics-Kanban, Analytics

Dec 5 2019

lexnasser added a comment to T225538: Request for a large request data set for caching research and tuning.

The data has been released!

Dec 5 2019, 10:49 PM · Analytics-Kanban, Analytics

Dec 2 2019

lexnasser updated subscribers of T225538: Request for a large request data set for caching research and tuning.

Updated Wikitech (LINK) once again with a description about the text data. Let me know if you see any last-minute issues.

Dec 2 2019, 10:28 PM · Analytics-Kanban, Analytics

Nov 21 2019

lexnasser added a comment to T225538: Request for a large request data set for caching research and tuning.

Checking in again.

Nov 21 2019, 7:55 PM · Analytics-Kanban, Analytics

Nov 12 2019

lexnasser added a comment to T225538: Request for a large request data set for caching research and tuning.

I'm not sure if there's a public-facing way to check the frequency of submit queries. Will have to defer to @Nuria about that.

Nov 12 2019, 5:43 PM · Analytics-Kanban, Analytics
lexnasser added a comment to T225538: Request for a large request data set for caching research and tuning.

is this for a text_cache or for upload_cache (like cp5006)? I expect that only text caches (like cp5008) would see submit queries.

Nov 12 2019, 5:01 PM · Analytics-Kanban, Analytics
lexnasser added a comment to T225538: Request for a large request data set for caching research and tuning.

The only difference would be the save column, which is 1 if uri_query %like% "action=submit" and 0 otherwise.

The upload(.wikimedia.org) uri_query field does not contain an action=submit parameter for any entry.

Nov 12 2019, 4:47 PM · Analytics-Kanban, Analytics

Nov 11 2019

lexnasser added a comment to T225538: Request for a large request data set for caching research and tuning.

Are we narrowing the query to a single server, e.g., via WHERE x_cache like '%cp3033%' ?

Yes. I’m using WHERE x_cache like '%cp5006%' .

Which server are we using? Ideally we'd actually create two datasets, one for a cache_text and one for a cache_upload server, but since the ATS deployment (replacing Varnish) I can't figure out the right x_cache query.

As above, I’m using 5006, which is for images only via upload.wikimedia.org.

I'm afraid that we'll have too much data, as Nuria previously pointed out. The x_cache field is one of the largest, we had this in the last dataset and no researcher / paper (afaik) used it. I think we can drop the x_cache column in the output (but keep it in the where clause).

To confirm, the remaining fields are: relative_unix, hashed_host_path_query, image_type, response_size, time_firstbyte . Is that proper?

How are we limiting the response size? It would be great to cover a longer time (say 4 weeks) period.

I’m not sure what you mean by limiting the response size - I currently have no filters on the response size. I’ll have to consider the longer time period.

Nov 11 2019, 3:54 PM · Analytics-Kanban, Analytics

Nov 7 2019

lexnasser added a comment to T225538: Request for a large request data set for caching research and tuning.

Also, I saw that in your 2016 dataset request (link) that you wanted a separate query field for a save flag.

Nov 7 2019, 9:16 PM · Analytics-Kanban, Analytics

Nov 6 2019

lexnasser added a comment to T225538: Request for a large request data set for caching research and tuning.

Hi @Danielsberger ,
I'm almost finished compiling the data. This is what the dataset would look like:

Nov 6 2019, 9:08 PM · Analytics-Kanban, Analytics

Nov 1 2019

lexnasser added a comment to T225538: Request for a large request data set for caching research and tuning.

Hi @Danielsberger, thanks for the thorough response. I'm currently reviewing all the different configurations of the features of the dataset and will try to accommodate your needs as much as practical. And yes, the underlying timestamp uses second-granularity.

Nov 1 2019, 11:41 PM · Analytics-Kanban, Analytics

Oct 28 2019

lexnasser added a comment to T225538: Request for a large request data set for caching research and tuning.

Hi @Danielsberger, I’m working on compiling this new public dataset for your caching research. I had a few questions that I hope you could answer so that I could get a better understanding of your specific wants and needs for this new release:

Oct 28 2019, 8:48 PM · Analytics-Kanban, Analytics

Oct 23 2019

lexnasser added a comment to T235688: SSH access for Lex Nasser, analytics intern.

Here's another public ED25519 key: AAAAC3NzaC1lZDI1NTE5AAAAIOBTDDmL8isvso6xqOJB5qkk3n8xuM0XxFc1Q34ZnZRj

Oct 23 2019, 4:56 PM · Analytics, Operations, SRE-Access-Requests

Oct 18 2019

lexnasser added a comment to T235688: SSH access for Lex Nasser, analytics intern.

@RStallman-legalteam Just sent an email

Oct 18 2019, 9:07 PM · Analytics, Operations, SRE-Access-Requests
lexnasser updated lexnasser.
Oct 18 2019, 3:16 PM

Oct 17 2019

lexnasser added a comment to T235688: SSH access for Lex Nasser, analytics intern.

@Dzahn It shows that I signed L3 Wednesday (image attached) .Let me know if I am mistaken. Will have to defer to @Nuria regarding which groups.

Oct 17 2019, 11:19 PM · Analytics, Operations, SRE-Access-Requests

Oct 16 2019

lexnasser updated the task description for T235688: SSH access for Lex Nasser, analytics intern.
Oct 16 2019, 6:15 PM · Analytics, Operations, SRE-Access-Requests
lexnasser added a comment to T235688: SSH access for Lex Nasser, analytics intern.

Approving as the relevant Wikimedia Foundation employee.

Oct 16 2019, 6:03 PM · Analytics, Operations, SRE-Access-Requests
lexnasser claimed T225538: Request for a large request data set for caching research and tuning.
Oct 16 2019, 5:51 PM · Analytics-Kanban, Analytics