Page MenuHomePhabricator

Iflorez (Irene)
User

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
May 14 2019, 10:44 PM (76 w, 1 d)
Availability
Available
LDAP User
Iflorez
MediaWiki User
IFlorez (WMF) [ Global Accounts ]

Recent Activity

Aug 4 2020

Iflorez added a comment to T255028: Move the stat1004-6-7 hosts to Debian Buster.

Thank you, @elukey !

Aug 4 2020, 4:43 PM · Analytics-Kanban, Analytics-Clusters

Jun 18 2020

Iflorez closed T235889: Pull 2018 Hindi and Malayalam data for Project Tiger and Wiki Asia Month as Resolved.
Jun 18 2020, 10:08 PM · GLOW
Iflorez closed T235894: Survey GLOW project hosts (country level) for total monthly avg devices accessing Wikipedia as Resolved.
Jun 18 2020, 10:07 PM · GLOW
Iflorez closed T235888: Define evaluation plan for GLOW project as Resolved.
Jun 18 2020, 10:07 PM · GLOW
Iflorez closed T233804: Measure Tiger 1.0 for Punjabi against other community campaigns as Resolved.
Jun 18 2020, 10:07 PM · GLOW
Iflorez closed T233808: Discuss the use of Wikidata items in articles tracking as Resolved.
Jun 18 2020, 10:06 PM · GLOW
Iflorez closed T238938: GLOW Team access to superset as Resolved.
Jun 18 2020, 10:05 PM · GLOW
Iflorez closed T233806: Prepare a preliminary metric plan for GLOW , a subtask of T235888: Define evaluation plan for GLOW project, as Resolved.
Jun 18 2020, 9:58 PM · GLOW
Iflorez closed T233806: Prepare a preliminary metric plan for GLOW as Resolved.
Jun 18 2020, 9:58 PM · GLOW
Iflorez closed T237581: external search engine traffic CTR on superset? as Resolved.
Jun 18 2020, 9:57 PM · GLOW
Iflorez closed T249077: Measure number of surviving new GLOW Project Tiger 2 articles also submitted to other contests as Resolved.
Jun 18 2020, 9:57 PM · GLOW
Iflorez closed T245543: Identify GLOW articles created or edited with the translation tool as Resolved.
Jun 18 2020, 9:57 PM · GLOW
Iflorez closed T247568: Measure how many entries were picked from the list of suggestions as Resolved.
Jun 18 2020, 9:57 PM · GLOW
Iflorez closed T248140: Collect content quality metrics for articles submitted in GLOW Project Tiger 2.0 contest as Resolved.
Jun 18 2020, 9:57 PM · GLOW

Jun 2 2020

Iflorez added a comment to T249752: Decomission notebook hosts .

I deleted all files on nb3 and shutdown the server.
I rsynced all files from nb4 and shutdown the server.
Thank you!

Jun 2 2020, 3:02 AM · Analytics-Kanban, Analytics-Clusters, Patch-For-Review

May 21 2020

Iflorez added a comment to T234701: "Content" equivalent of pageviews daily or edits_hourly available to use in Turnilo and Superset.

These were interesting and helpful metrics to review for GLOW India articles:
Namespace (or just main/not main?)
Project
age
num of editors
num of edits
length/size
num of watchers
time since last edit
links

May 21 2020, 6:43 PM · Epic, Product-Analytics

May 14 2020

Iflorez added a comment to T249752: Decomission notebook hosts .

Hi @elukey I'll transfer files and shut down notebooks over the next few days. I'll check in on Tuesday with an update or questions if any.
Thank you!

May 14 2020, 5:19 PM · Analytics-Kanban, Analytics-Clusters, Patch-For-Review

Apr 21 2020

Iflorez added a comment to T247768: Code review: Review rec list results.

thank you @nettrom_WMF! sorry for the delay

Apr 21 2020, 12:39 AM · Product-Analytics (Kanban), GLOW

Apr 1 2020

Iflorez updated the task description for T247768: Code review: Review rec list results.
Apr 1 2020, 5:09 AM · Product-Analytics (Kanban), GLOW
Iflorez added a comment to T249077: Measure number of surviving new GLOW Project Tiger 2 articles also submitted to other contests.

Loading neighboring contest articles:
https://github.com/IreneFlorez/GLOW/blob/article_suggestions/scripts/data_wrangling/1d_load_neighboring_contests.ipynb

Apr 1 2020, 1:48 AM · GLOW
Iflorez created T249077: Measure number of surviving new GLOW Project Tiger 2 articles also submitted to other contests.
Apr 1 2020, 1:45 AM · GLOW

Mar 27 2020

Iflorez added a comment to T248140: Collect content quality metrics for articles submitted in GLOW Project Tiger 2.0 contest.

Data wrangling code to pull the items in this task can be found here: https://github.com/IreneFlorez/GLOW/tree/article_suggestions/scripts/data_wrangling

Mar 27 2020, 10:49 PM · GLOW
Iflorez updated the task description for T248140: Collect content quality metrics for articles submitted in GLOW Project Tiger 2.0 contest.
Mar 27 2020, 10:48 PM · GLOW

Mar 24 2020

Iflorez added a comment to T245543: Identify GLOW articles created or edited with the translation tool .

articles that were edited using a translation tool (by type):
expanded 113 (expanded total 1418)
new 3602. (new total 7445)


Expanded articles edited using a translation tool:
7.96%

Mar 24 2020, 5:05 PM · GLOW

Mar 20 2020

Iflorez created T248140: Collect content quality metrics for articles submitted in GLOW Project Tiger 2.0 contest.
Mar 20 2020, 12:44 AM · GLOW

Mar 18 2020

Iflorez updated the task description for T247768: Code review: Review rec list results.
Mar 18 2020, 4:19 PM · Product-Analytics (Kanban), GLOW
Iflorez added a comment to T247768: Code review: Review rec list results.

@mpopov maybe the faulty link was related to a bug? I'm receiving bug reports related to this ticket. Would it make sense to create a new ticket?

Mar 18 2020, 4:08 PM · Product-Analytics (Kanban), GLOW
Iflorez added a comment to T247768: Code review: Review rec list results.

Sorry about that, I just updated the ticket with a functional task link.

Mar 18 2020, 4:06 PM · Product-Analytics (Kanban), GLOW
Iflorez updated the task description for T247768: Code review: Review rec list results.
Mar 18 2020, 4:04 PM · Product-Analytics (Kanban), GLOW
Iflorez updated the task description for T247768: Code review: Review rec list results.
Mar 18 2020, 3:53 PM · Product-Analytics (Kanban), GLOW

Mar 16 2020

Iflorez added a comment to T245373: Optimization tips and feedback.

In an effort to run these queries from a Python3 notebook without needing to change the notebook type, I've switched these queries to run as spark queries using the wmf data package's spark.run function. I'm now able to run the queries. For example, here's the code for the translation query:

Mar 16 2020, 6:12 PM · Analytics-Radar, GLOW
Iflorez added a comment to T245373: Optimization tips and feedback.

Thank you. Yes, I can confirm that I had run kinit and entered my kerberos credentials in a notebook-terminal.

Mar 16 2020, 4:13 PM · Analytics-Radar, GLOW
Iflorez created T247768: Code review: Review rec list results.
Mar 16 2020, 3:59 PM · Product-Analytics (Kanban), GLOW
Iflorez added a comment to T245373: Optimization tips and feedback.

@JAllemandou I tried running these spark queries over the weekend on a small batch of articles and they timed out.
Might you have tips or insights? I didn't receive any error messages, simply the queries took a very long time and eventually I stopped the kernel.
Given that behavior, I also tried running the queries as hive queries and had similar issues.

Mar 16 2020, 3:53 PM · Analytics-Radar, GLOW
Iflorez added a project to T245373: Optimization tips and feedback: GLOW.
Mar 16 2020, 12:54 AM · Analytics-Radar, GLOW
Iflorez updated the task description for T245543: Identify GLOW articles created or edited with the translation tool .
Mar 16 2020, 12:45 AM · GLOW

Mar 12 2020

Iflorez updated the task description for T247568: Measure how many entries were picked from the list of suggestions.
Mar 12 2020, 9:46 PM · GLOW
Iflorez updated the task description for T247568: Measure how many entries were picked from the list of suggestions.
Mar 12 2020, 9:45 PM · GLOW
Iflorez updated the task description for T247568: Measure how many entries were picked from the list of suggestions.
Mar 12 2020, 9:45 PM · GLOW
Iflorez added a comment to T247568: Measure how many entries were picked from the list of suggestions.

Total values in full rec list: 34295


total recs in translation: 14155
total recs in editing: 20102


Mar 12 2020, 9:44 PM · GLOW
Iflorez created T247568: Measure how many entries were picked from the list of suggestions.
Mar 12 2020, 9:35 PM · GLOW

Feb 18 2020

Iflorez updated the task description for T245543: Identify GLOW articles created or edited with the translation tool .
Feb 18 2020, 6:14 PM · GLOW
Iflorez created T245543: Identify GLOW articles created or edited with the translation tool .
Feb 18 2020, 5:52 PM · GLOW
Iflorez added a comment to T245375: Issues running revactor_rev when joining page, revision, revision_actor_temp, and actor tables.

Thank you for the quick replies and the feedback re: EXPLAIN and the group by line.

Feb 18 2020, 5:45 PM · Platform Team Workboards (Clinic Duty Team), GLOW
Iflorez closed T245375: Issues running revactor_rev when joining page, revision, revision_actor_temp, and actor tables as Resolved.
Feb 18 2020, 5:41 PM · Platform Team Workboards (Clinic Duty Team), GLOW

Feb 17 2020

Iflorez added a comment to T245373: Optimization tips and feedback.

Thank you for the feedback!
I've updated the date handling in the pageviews query and added event_entity, revision_is_identity_reverted, and revision_is_deleted_by_page_deletion to the fields used in the revision tags mediawiki_history table query.
These are performing much better now.

Feb 17 2020, 10:18 PM · Analytics-Radar, GLOW

Feb 16 2020

Iflorez added a project to T245375: Issues running revactor_rev when joining page, revision, revision_actor_temp, and actor tables: GLOW.
Feb 16 2020, 10:51 PM · Platform Team Workboards (Clinic Duty Team), GLOW
Iflorez created T245375: Issues running revactor_rev when joining page, revision, revision_actor_temp, and actor tables.
Feb 16 2020, 10:51 PM · Platform Team Workboards (Clinic Duty Team), GLOW
Iflorez created T245373: Optimization tips and feedback.
Feb 16 2020, 10:46 PM · Analytics-Radar, GLOW

Feb 14 2020

Iflorez added a comment to T244176: Request LDAP access to the WMF group for Edna M.

Hi All! I work with Edna on the Partnerships team.
I am a Partnerships data analyst focusing on analyzing the data coming out of the GLOW project.
Edna and I are on the Partnerships and Global Reach team, she focuses on the Latin America region.

Feb 14 2020, 6:14 PM · LDAP-Access-Requests, Operations

Feb 7 2020

Iflorez moved T238938: GLOW Team access to superset from Backlog to Done on the GLOW board.
Feb 7 2020, 10:29 PM · GLOW

Feb 4 2020

Iflorez updated the task description for T238938: GLOW Team access to superset.
Feb 4 2020, 5:57 PM · GLOW

Jan 24 2020

Iflorez added a comment to T243197: Make wiki comparison tool public.

Hi all,
Aeryn Palmer from legal has taken a look and now this is on to James Fishback for the second part of the new privacy review process, a security review.
According to Aeryn, much of this should be okay to publish, although we'd need to exclude very small wikis. Security may help identify that threshold.

Jan 24 2020, 7:18 PM · Product-Analytics (Kanban)

Jan 22 2020

Iflorez added a comment to T158889: Explore getting articles ranked by sitelinks before anything else.

Hi @JAllemandou is this now possible via spark? I'm querying wikidata for GLOW analysis and wondering if there's an update on the hadoop version of wikidata that I should consider or keep in mind. At present I'm setting up SPARQL queries via the SWAP notebooks.

Jan 22 2020, 7:10 PM · Recommendation-API
Iflorez updated the task description for T238938: GLOW Team access to superset.
Jan 22 2020, 4:35 PM · GLOW

Jan 21 2020

Iflorez added a comment to T243197: Make wiki comparison tool public.

I sent out a request to legal on Thu, Jan 9. I will check in on that request today and will post updates here.

Jan 21 2020, 6:00 PM · Product-Analytics (Kanban)

Jan 18 2020

Iflorez added a comment to T243103: Request for LDAP access to the WMF group for Rudolph Ampofo.

We were told that for Superset access he needed to create a ticket using T160662 as a sample.

@Iflorez: Might be better to bookmark or document https://phabricator.wikimedia.org/project/profile/1564/ in your team's onboarding docs instead of that ticket. :)

Jan 18 2020, 1:34 AM · SRE-Access-Requests, LDAP-Access-Requests, Operations
Iflorez added a comment to T243103: Request for LDAP access to the WMF group for Rudolph Ampofo.

Hoorah! Thank you @Dzahn!

Jan 18 2020, 1:33 AM · SRE-Access-Requests, LDAP-Access-Requests, Operations
Iflorez added a comment to T221566: Update and fix wiki segmentation dataset.

@nshahquinn-wmf The wiki comparison sheet is now updated with Dec 2019 data.
You can now add formatting magic :)

Jan 18 2020, 1:25 AM · Product-Analytics (Kanban), Better Use Of Data, Epic

Jan 17 2020

Iflorez added a comment to T242490: Viewing Santali and Javanese characters on SWAP via Chrome only displays Tofu signs.

Hi @Aklapper, on my screen, I see tofu characters for the characters on the Javanese script page, essentially just boxes.

.

Jan 17 2020, 11:07 PM · Analytics-SWAP, Analytics, GLOW
Iflorez added a comment to T243103: Request for LDAP access to the WMF group for Rudolph Ampofo.

Hi all,
I am a data analyst working on GLOW. Here's the GLOW project phab board and the meta page.
I work with Rudolph on the Partnerships and Global Reach team. Rudolph is requesting access to Superset. He is not requesting additional Analytics access. We were told that for Superset access he needed to create a ticket using T160662 as a sample.

Jan 17 2020, 9:36 PM · SRE-Access-Requests, LDAP-Access-Requests, Operations

Jan 11 2020

Iflorez added a comment to T240890: Make wmfdata work with Kerberos.

I see, thank you. Your clarification is helpful.

Jan 11 2020, 1:03 AM · wmfdata-python, Product-Analytics (Kanban)
Iflorez created T242490: Viewing Santali and Javanese characters on SWAP via Chrome only displays Tofu signs.
Jan 11 2020, 12:51 AM · Analytics-SWAP, Analytics, GLOW

Jan 10 2020

Iflorez moved T242448: Particular MariaDB queries not working on SWAP from Backlog to In Progress on the GLOW board.
Jan 10 2020, 7:08 PM · wmfdata-python, Product-Analytics (Kanban), GLOW
Iflorez added a project to T242448: Particular MariaDB queries not working on SWAP: Analytics.
Jan 10 2020, 7:08 PM · wmfdata-python, Product-Analytics (Kanban), GLOW
Iflorez updated subscribers of T242448: Particular MariaDB queries not working on SWAP.
Jan 10 2020, 7:07 PM · wmfdata-python, Product-Analytics (Kanban), GLOW
Iflorez added a comment to T240890: Make wmfdata work with Kerberos.

Given the above suggestion, I propose a longer timer.
I stop the spark session at the end of the day or when I finish the task...but I have had periods where I need to spend say 30mins reading documentation to address an issue before proceeding with the task at hand on the spark session...so could use a longer timer.

Jan 10 2020, 6:57 PM · wmfdata-python, Product-Analytics (Kanban)
Iflorez updated the task description for T242448: Particular MariaDB queries not working on SWAP.
Jan 10 2020, 6:20 PM · wmfdata-python, Product-Analytics (Kanban), GLOW
Iflorez created T242448: Particular MariaDB queries not working on SWAP.
Jan 10 2020, 6:18 PM · wmfdata-python, Product-Analytics (Kanban), GLOW

Jan 9 2020

Iflorez closed T241170: Access to DataGrip refused as Resolved.
Jan 9 2020, 8:32 PM · User-Elukey, Analytics, GLOW
Iflorez added a comment to T241170: Access to DataGrip refused.

Thank you @elukey
I appreciate your troubleshooting and assessment

Jan 9 2020, 8:31 PM · User-Elukey, Analytics, GLOW

Jan 7 2020

Iflorez updated the task description for T238938: GLOW Team access to superset.
Jan 7 2020, 5:14 PM · GLOW

Jan 6 2020

Iflorez added a comment to T241170: Access to DataGrip refused.


No luck with the updated URL

Jan 6 2020, 11:50 PM · User-Elukey, Analytics, GLOW
Iflorez added a comment to T238560: Doubts and questions about Kerberos and Hadoop.

Thank you @Nuria!
I've been using notebooks along with the wmf data package.
I aim to shift to using notebooks with Spark and will reach out if I have any issues. Rereading the SWAP documentation is helpful.

Jan 6 2020, 11:42 PM · Analytics
Iflorez added a comment to T90240: Could it be that the geo IP matching is not accurate for Africa?.

Is this still an ongoing issue? Is this something to keep in mind for project GLOW when we begin evaluating Nigeria specific data?

Jan 6 2020, 10:20 PM · Analytics
Iflorez updated the task description for T238938: GLOW Team access to superset.
Jan 6 2020, 10:07 PM · GLOW
Iflorez updated the task description for T238938: GLOW Team access to superset.
Jan 6 2020, 10:06 PM · GLOW
Iflorez added a comment to T238560: Doubts and questions about Kerberos and Hadoop.

I'm experiencing the same as @MMiller_WMF . Hue was working fine on Friday and today it's funky.
Today, I'm unable to see any tables and I am seeing error messages. On the left sidebar, the error message says Error loading databases..
On the top right, I am intermittently getting this error message:
java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient in red font.
If I already have a correct/working/functional query, I can paste it in and get results but I cannot unfurl the database icon to see any tables. As DataGrip is not working, I'm hoping to use Hue to test out queries and see what tables are available. To be sure, Hue has been problematic for me (there are some tables that it never shows me)...so if we can ultimately get DataGrip to work, I will be prioritizing that tool.

Jan 6 2020, 9:54 PM · Analytics
Iflorez updated the task description for T238938: GLOW Team access to superset.
Jan 6 2020, 6:24 PM · GLOW
Iflorez updated the task description for T238938: GLOW Team access to superset.
Jan 6 2020, 6:24 PM · GLOW
Iflorez updated the task description for T238938: GLOW Team access to superset.
Jan 6 2020, 6:21 PM · GLOW
Iflorez updated the task description for T238938: GLOW Team access to superset.
Jan 6 2020, 6:19 PM · GLOW
Iflorez updated the task description for T238938: GLOW Team access to superset.
Jan 6 2020, 6:14 PM · GLOW
Iflorez updated the task description for T238938: GLOW Team access to superset.
Jan 6 2020, 6:10 PM · GLOW

Dec 24 2019

Iflorez added a comment to T241170: Access to DataGrip refused.

Hi @elukey,
No luck with the steps included in that link. I went through all of the steps listed for DataGrip. Here's a screenshot that was taken after I tested the connection.

Dec 24 2019, 12:13 AM · User-Elukey, Analytics, GLOW

Dec 20 2019

Iflorez updated subscribers of T241170: Access to DataGrip refused.

Update: Worked with @elukey just now to gauge the issue.
We tried to update the URL to jdbc:hive2://localhost:10000/default;principal=hive/an-coord1001.eqiad.wmnet@WIKIMEDIA.
Per the documentation on DataGrip the URL will be automatically filled and should look like jdbc:hive2://localhost:10000/default
We updated this to see if it would fix the issue and the test failed.
With the updated URL that we tested, DataGrip now doesn't even let me try to execute a query...the buttons outside of the config window are 'frozen'. A restart did not fix the freeze.
We will touch base about this next week.

Dec 20 2019, 6:06 PM · User-Elukey, Analytics, GLOW
Iflorez updated the task description for T241170: Access to DataGrip refused.
Dec 20 2019, 5:42 PM · User-Elukey, Analytics, GLOW

Dec 19 2019

Iflorez added a project to T241170: Access to DataGrip refused: GLOW.
Dec 19 2019, 8:22 PM · User-Elukey, Analytics, GLOW
Iflorez created T241170: Access to DataGrip refused.
Dec 19 2019, 8:22 PM · User-Elukey, Analytics, GLOW

Dec 16 2019

Iflorez updated the task description for T240253: Get historical article counts for GLOW wikis.
Dec 16 2019, 6:51 PM · GLOW
Iflorez closed T240253: Get historical article counts for GLOW wikis as Resolved.
Dec 16 2019, 1:43 AM · GLOW
Iflorez added a comment to T240253: Get historical article counts for GLOW wikis.
#consider https://meta.wikimedia.org/wiki/List_of_Wikipedias/Table to be the ultimate source of truth on wiki counts
Dec 16 2019, 1:43 AM · GLOW
Iflorez added a comment to T221566: Update and fix wiki segmentation dataset.
In T221566#5739287, @Neil_P._Quinn_WMF wrote:

Data Issues:
#2 - yes, 2017 readership metrics do have identical data to the June 2018 tab. I'm looking into this. For now, can you please remove that sheet from the Wiki Comparison notebook? No dates were hardcoded into the individual queries in the code repository. Instead, there's one cell at the top that defines dates. So I'm not sure how this came about, it seems unusual that one section has different data variables than another...this definitely needs to be fixed.

Yeah, I had seen that in the notebook; obviously there is something weird and unexpected going on! But agreed that removing it for now (as you've already done) is the best solution.

Anyway, I think we're actually pretty close to good! What do you think about these ideas:

  • I can reformat the Nov 2019 tab as I did with the Dec 2018 one.

excellent! Again, I highly recommend recording a macro as it should just take two clicks and then can be deployed with a shortcut on the next sheet. The process would be clicking

tools > macros > Record macros

and sheets will put your actions into code that it saves as a macro.
but, of course, it's up to you :)

Dec 16 2019, 1:38 AM · Product-Analytics (Kanban), Better Use Of Data, Epic
Iflorez added a comment to T221566: Update and fix wiki segmentation dataset.

About the data Issues:
#1 I will need a little more insight into gathering Cumulative content edits and content pages with the API. As far as historical cumulative content pages, the Wikistats 2.0 API notes that Total article count can be derived from the Pages Created count, by taking its cumulative value. However, I'm not clear on how this addresses deleted content. I looked at this in T240253 for GLOW India and ended up pulling data from https://meta.wikimedia.org/wiki/List_of_Wikipedias/Table. I will appreciate more detailing on how to pull accurate data for these two measures. In the short term, I recommend having blank columns for any historical year wiki comparison snapshots. So, for example, if we decide to create a 2017 tab, then have blank columns for cumulative content edits and content pages.

Dec 16 2019, 1:23 AM · Product-Analytics (Kanban), Better Use Of Data, Epic

Dec 12 2019

Iflorez added a comment to T221566: Update and fix wiki segmentation dataset.

Data Issues:
#2 - yes, 2017 readership metrics do have identical data to the June 2018 tab. I'm looking into this. For now, can you please remove that sheet from the Wiki Comparison notebook? No dates were hardcoded into the individual queries in the code repository. Instead, there's one cell at the top that defines dates. So I'm not sure how this came about, it seems unusual that one section has different data variables than another...this definitely needs to be fixed.

Dec 12 2019, 5:21 PM · Product-Analytics (Kanban), Better Use Of Data, Epic

Dec 11 2019

Iflorez added a comment to T221566: Update and fix wiki segmentation dataset.

Thank you @Neil_P._Quinn_WMF

Dec 11 2019, 5:37 PM · Product-Analytics (Kanban), Better Use Of Data, Epic

Dec 9 2019

Iflorez updated the task description for T240253: Get historical article counts for GLOW wikis.
Dec 9 2019, 8:27 PM · GLOW
Iflorez updated the task description for T240253: Get historical article counts for GLOW wikis.
Dec 9 2019, 8:27 PM · GLOW
Iflorez updated the task description for T240253: Get historical article counts for GLOW wikis.
Dec 9 2019, 8:26 PM · GLOW
Iflorez updated the task description for T240253: Get historical article counts for GLOW wikis.
Dec 9 2019, 8:05 PM · GLOW