User Details
- User Since
- Jan 6 2022, 11:29 AM (126 w, 6 d)
- Availability
- Available
- LDAP User
- Snwachukwu
- MediaWiki User
- SNwachukwu (WMF) [ Global Accounts ]
Yesterday
I drafted a plan for deploying Eventgate-wikimedia and Eventstreams services to staging and production.
Tue, Jun 11
Thanks!
Fri, Jun 7
Sure. I wil try to do that right away.
Tue, Jun 4
I have removed the non PipelineLib repos from this task. Migrating those will be tracked in T366611: Migrate Data Engineering NodeJS library repos to GitLab.
Thu, May 30
Wed, May 29
@Jdforrester-WMF thanks for the response. How do we go about this? DO we submit a ticket to release engineering or we do ourselves? DO you mind helping me with a guide for the process while I confirm if this is what we want from my the data engineering team.
In regards to node-rdkafka-statsd. @thcipriani what does archive mean for release engineering? Will you be deleting or will the repo be kept as read only?
Thu, May 16
Wed, May 15
I have imported the following repos to gitlab:
May 13 2024
Apr 30 2024
Based on the Mediawiki History checker use case, the RowLevelSchemaValidator has some Limitations that may not allow us to use it for our use cases:
- The method has only String, Integer, Decimal and Timestamps Column definition. We need other column definitions like Double, Map, etc.
- Even if we decide to use the Decimal Column Definition, this definition does not support Maximum value and Minimum Value customisations of the column. In the MediawikiHistory Checker the growth column needs to have a Max and Min value set.
Apr 17 2024
Apr 16 2024
Apr 4 2024
Apr 2 2024
Mar 28 2024
In the code implementation for change in size detection, I have compared previous snapshot and current snapshot by reading previous snapshot into a dataframe.
One TODO will be to use the AWS Deequ Anomaly detectiom and filesystem repository capability to implement this check.
Mar 8 2024
Mar 6 2024
Feb 6 2024
We added the following:
Feb 4 2024
Jan 15 2024
The suggested approach for this will be to use spark to run the queries after which result will be saved in the cluster. However, spark saves files in folder and we don't want to have different folders for each querry result. We want to put all the output files (report) in one location with is already rsynced to report server. Thus we would use our hdfsarchive operator to move the generated output from the spark output path to the final destination.
To start, we would migrate the queries in the browser folder first.
Jan 10 2024
Jan 8 2024
Update a patch containing 2 hql files required to create and update iceberg version of browser_general tables respectively.
Jan 2 2024
Jun 22 2023
Jun 21 2023
Jun 19 2023
RefineSanitize still was failing on Friday, so Ben and Joseph helped to successfully revert RefineSanitize to spark2 by reverting the refinery jar version from v0.2.16 back to v0.1.15. https://gerrit.wikimedia.org/r/930765 to allow it work during the weekend.
Jun 12 2023
Jun 7 2023
Thank you @hashar
May 24 2023
Currently getting the error below when running refine jobs with spark3: Yet to get a solution for this but will update once I get one.
May 16 2023
May 11 2023
May 10 2023
Apr 17 2023
Apr 14 2023
Apr 13 2023
I got similar error when deploying analytics refinery:
Apr 12 2023
Apr 11 2023
Apr 6 2023
Apr 5 2023
Mar 30 2023
Mar 21 2023
Feb 22 2023
Feb 21 2023
Feb 16 2023
Feb 15 2023
Feb 14 2023
Feb 13 2023
See wikitech documentation here.
Feb 9 2023
Here is a google doc containing a documentation on the data loss
Feb 8 2023
Feb 6 2023
Regarding the new column, I like to get suggestions on the name to use for the new field. I am thinking referer_data. Anyone has a better name?
@Mayakp.wiki I ran an analysis on the UDF which would be used to populate the data of the new field and posted the result in the parent ticket T309769 and there is a comment thread on it.
Jan 31 2023
Jan 26 2023
I ran the UDF on a day's data and extracted the top 1000 referer's for that day to show the impact of the GetRefererDataUDF on referers. You can check the spreadsheet and a little doc on it.
Jan 23 2023
Traffic Can you please confirm that there were cases of pages served in eqsin but not reported in webrequest logs.
@taavi. done
@bd808 and @Platonides . I have been now have access to cloud bastion. Here is the result.
Jan 19 2023
@Platonides Here is the result when I run on a production host.
@Mayakp.wiki We are introducing a new new column to wmf.webrequest table of a struct data type that would contain same data in existing referer_class column as well as the referer’s name. However the referer_class column won't be removed now. It would only be removed after all the downstream have been changed.
Jan 18 2023
Before now I haven't ssh to any cloud or toolforge instance. Is there another verification method?
Jan 17 2023
Please I have been unable to login to my wikitech account and do an important editing because of this issue. I would appreciate any form of assistance as this is urgent.
Jan 16 2023
Jan 12 2023
Jan 11 2023
@Aklapper I am unable to 'ssh bastion.wmcloud.org' or ssh login.toolforge.org