User Details
- User Since
- Jan 6 2022, 11:29 AM (74 w, 1 d)
- Availability
- Available
- LDAP User
- Snwachukwu
- MediaWiki User
- Unknown
Wed, Jun 7
Thank you @hashar
Wed, May 24
Currently getting the error below when running refine jobs with spark3: Yet to get a solution for this but will update once I get one.
Tue, May 16
Thu, May 11
May 10 2023
Apr 17 2023
Apr 14 2023
Apr 13 2023
I got similar error when deploying analytics refinery:
Apr 12 2023
Apr 11 2023
Apr 6 2023
Apr 5 2023
Mar 30 2023
Mar 21 2023
Feb 22 2023
Feb 21 2023
Feb 16 2023
Feb 15 2023
Feb 14 2023
Feb 13 2023
See wikitech documentation here.
Feb 9 2023
Here is a google doc containing a documentation on the data loss
Feb 8 2023
Feb 6 2023
Regarding the new column, I like to get suggestions on the name to use for the new field. I am thinking referer_data. Anyone has a better name?
@Mayakp.wiki I ran an analysis on the UDF which would be used to populate the data of the new field and posted the result in the parent ticket T309769 and there is a comment thread on it.
Jan 31 2023
Jan 26 2023
I ran the UDF on a day's data and extracted the top 1000 referer's for that day to show the impact of the GetRefererDataUDF on referers. You can check the spreadsheet and a little doc on it.
Jan 23 2023
Traffic Can you please confirm that there were cases of pages served in eqsin but not reported in webrequest logs.
@taavi. done
@bd808 and @Platonides . I have been now have access to cloud bastion. Here is the result.
Jan 19 2023
@Platonides Here is the result when I run on a production host.
@Mayakp.wiki We are introducing a new new column to wmf.webrequest table of a struct data type that would contain same data in existing referer_class column as well as the referer’s name. However the referer_class column won't be removed now. It would only be removed after all the downstream have been changed.
Jan 18 2023
Before now I haven't ssh to any cloud or toolforge instance. Is there another verification method?
Jan 17 2023
Please I have been unable to login to my wikitech account and do an important editing because of this issue. I would appreciate any form of assistance as this is urgent.
Jan 16 2023
Jan 12 2023
Jan 11 2023
@Aklapper I am unable to 'ssh bastion.wmcloud.org' or ssh login.toolforge.org
Dec 23 2022
Dec 20 2022
- Indeed this will alter the referer_class field as some rows previously labelled as external will now be labelled as external (media sites) class.
Dec 13 2022
In the current patch we have a updated our referer classifier to include "external (media sites)" class to represent the list of sites to track. This is in addition to the previous classes: unknown, internal, external (search engine) and external. The classifier would also identify the Name of the site if it's a search engine or a media site (eg Youtube, Facebook, etc.).
Next step:
- Test for performance and optimise to include caching if necessary.
- Create a new UDF that will Identify the Names of the search engine and media sites by using the referer classifier.
Nov 15 2022
Nov 14 2022
Nov 1 2022
Sep 30 2022
Sep 29 2022
Sep 8 2022
Sep 5 2022
There were 2 fix made in this repo:
Aug 30 2022
Sure @JArguello-WMF I can take it. Would sync with @BTullis.
Aug 29 2022
Aug 25 2022
Aug 24 2022
Aug 22 2022
The HdfsArchiver Operator fails to run successfully on skein because we do not have Scala 2.12.10 version installed on the workers yet. For now, the Scala version 2.12.10 is provided by the spark 3 assembly which can only be found on an-launcher.
Aug 18 2022
The airflow dags have been update to version 2.3.2. However, before this change can be merged, we'll need to perform
- a puppet change on the airflow.cfg file.
- upgrade the airflow deb used by all airflow instances.
There will be seperate tickets to track these steps above.