User Details
- User Since
- Mar 14 2023, 12:16 PM (58 w, 4 d)
- Availability
- Available
- LDAP User
- Unknown
- MediaWiki User
- ROdonnell-WMF [ Global Accounts ]
Fri, Apr 26
I think we cover the intro text in our chat last night. @creynolds do you have what you need?
Can we survey our clients to see what programming languages they are using to connect to our APIs. It's all well and good if they use Golang, that maybe the case with Google. But I suspect the majority of devs will use Java or Python or Node. Having a Go SDK is only useful for a small set of users, not many people are Go developers.
Thu, Apr 25
Wed, Apr 24
As a demo for the presentation, maybe show MR API calls to create a dataset that is used in the LLM RAG demo? See Python code and readme: https://gitlab.enterprise.wikimedia.com/wikimedia-enterprise/experiments/for-blog-llm-rag
I added an HTML to markdown converter for lists, which could solve the last bullet about lists in the Free trade agreements of the European Union. I've added it to the decision log and asked Stephanie and Ricardo to decide what they want in abstract lists.
Tue, Apr 23
The "Aldi" pronunciation defect is a known issue. It was a design decision to remove the pronunciation symbols. In most other articles the pronunciation is inside a parenthesis and not part of the sentence, so it should not be in the Abstract. In the Aldi article, the pronunciation is part of a sentence and when removed, it looks strange. We have no way of knowing if the pronunciation is part of a sentence, it's up to the article editors to choose how they use that Wikitext template. I consider this a known issue, we remove pronunciation for all our Abstracts.
Thu, Apr 18
Wed, Apr 17
Tue, Apr 16
@REsquito-WMF This ticket scope is to create the new Repo for "Structured-Contents-Parser"?
The only feature we don't include in SC API is "GetTemplates". Should I remove it from the code?
There is a new repo in the Experiments group: for-blog-LLM-RAG
Umbrella ticket for unplanned tasks:
I have a work POC that I shared with Chuck.
Mon, Apr 15
v1 is ingested into compact topics
v2 is not ingested
Thu, Apr 11
I didn't change the broker EC2 instance types, because I wanted to check the added topics first. The team decision was to keep 4 brokers and see how DV copes with the load.
Python deploy logs are in GD in folder Misc/kafka_deployer.out
Wed, Apr 10
Mon, Apr 8
Wed, Apr 3
Ticket is a replicate of another ticket and is already fix in Prod
Mar 28 2024
I've reviewed the MRD and added some comments/questions
Mar 27 2024
Updating Main API too, as it uses Config submodule
Mar 26 2024
Mar 25 2024
Mar 24 2024
- I need more guidance bullet 2, on updating titles.csv. The call to wmf.GetAllPages() is paging through all the projects and articles. I need to run this in dev and save a new`titles.csv` with the new totals for all projects, then revert the code changes once I have the csv. Running this locally would be slow.
- On bullet 3, updating partitionsv2.csv, what "algorithm do we use for the manual partitioning? I can't see a pattern or trend in the partition changes in the CSV.
- There is a config package dependency in API/main, I should update the submodule ref here too?
- I've read the "Bulk Ingestion Runbook v2". Does this ticket scope include modification of code in Structured-Data and On-Demand repos?
Mar 22 2024
Thanks @prabhat, I've removed the specials and the new ones are these, which include the 4 above from the last comment.
For bullet 1:
Mar 12 2024
Mar 11 2024
Mar 6 2024
DEV is QA-tested and passes my shakedown tests. I documented my tests and saved them to Google Drive as a Postman collection
Mar 5 2024
Unblocked, we'll use a "fuzzy match" to compare the Summary API and WME getAbstract call output. We'll use a threshold of up to 5 character differences, less than that is considered a match. Over 5 characters is considered a non-match. This fuzzy character matching adjusts our metrics testing so the final statistics better represent the real-world matches for the summary/abstract text.
For Abstract coverage metrics, we're comparing the WMF Summary API with the WME GetAbstract() call.
Tickets are merged to DEV. Will QA test this afternoon.
I'm moving this to "Paused".
Mar 4 2024
Feb 29 2024
After reviewing the code changes, I've changed this ticket status to declined.