Thu, Dec 7
Working on this ticket, to create sample JSON
Tue, Dec 5
Report in "RCA - Abstract Defects"
Mon, Dec 4
Thu, Nov 30
One difference between the GetAbstract and GetSections is that braces are stripped from the abstracts. Example: Freda Josephine Baker (née McDonald; June 3, 1906 – April 12, 1975),... in the text is output as: Freda Josephine Baker,.... Should we remove braces from the sections text?
Mon, Nov 27
We use TLS encryption to connect to Redis. The way RedisCommander connects to Redis doesn't allow TLS in the REDIS_HOSTS environment variable.
Alex updated the bastion script to use make and fix the port number.
Sat, Nov 25
See Domain Specific Research Knowledge Graph for Schema.org Research.
Tue, Nov 21
Sat, Nov 18
Wed, Nov 15
It was tested in Dev, but there is a problem with connecting to it from the desktop. Alex thinks we need a firewall change to allow dynamic addresses. Alex will investigate later today
Tue, Nov 14
Nov 8 2023
Nov 7 2023
You are right. For the community, we don't offer push. Sorry for my miscommunication
Hi from the Enterprise Dev Team!
Nov 6 2023
Waiting for sign-off. Deck and report are described in the engineering channel (Monday 6 Nov message)
Nov 2 2023
We reviewed the GoDropBox 3rd party library, but it doesn't do what we need to server-side round-robin scheduling. It does connection pooling for the client side by keeping a set of persistent connections and doing round-robin scheduling for the client connections.
Oct 31 2023
Oct 26 2023
Minor things I'd like to see in this doc:
- guide on code patterns where we must have logging, optional logging and not use logs
- The same for tracing
- Ways to dynamically change the log level and filters out by tags in logs or traces. example:
processors: filter/1: metrics: include: match_type: regexp metric_names: - prefix/.* - prefix_.* resource_attributes: - key: container.name value: app_container_1 exclude: match_type: strict metric_names: - kafka-latency-*
Oct 25 2023
Oct 24 2023
Oct 22 2023
This is example research on entity type identification from Wikipedia articles. They get about 80-90% accuracy on their News and Politics subdomains. But that is for English, when they expand to German, Spanish Portuguese and Japanese their accuracy drops to 50-60%. That is with just 4-5 target entity labels: Person, Organisation, Government Title-work, facility, which is considerable smaller than the entity labels we're considering for WME.
Oct 20 2023
Feedback from GroundedAI
Oct 19 2023
Potential classification system using Spacy: https://colab.research.google.com/github/wandb/examples/blob/master/colabs/spacy/SpaCy_v3_and_W%26B.ipynb#scrollTo=krVWm1YRFbHc
If we aim to output a single "Entity Type" for an article, then Rosette will not help us.
Oct 18 2023
Oct 15 2023
Tests show the French infobox is returned when I call /v2/structured-contents/Josephine_Baker in dev
I need 2 hours to give dev a smoke test, will do it Monday morning. Then I will let Saphanie she can run her QA test on French infoboxes
@Protsack.stephan Any thoughts on what I should do with this global variable for baseURL state?
Merged latest Parser main branch
MR code comments resolved
Oct 9 2023
Started POC using SPARQL queries to Wikidata and then parsed HTML tables/lists in Wikipedia. Also, add up to 10 artist/band images from WikiCommons.
Oct 3 2023
Need to fix CI build error in Gitlab
Added an RFC team conclusions section
Oct 1 2023
I will add a few new sub-tasks:
- Add Proxy logic to include sections and references to the api/main project
- Add an env variable to enable/disable in dev and prod
I've added parsing of references to this ticket. There are some open question that I'll add to the RFC document
Sep 30 2023
I found the issue with French infoboxes. The French pages use infobox_v3 as the class name and not infobox like other languages. See the infobox here: https://fr.wikipedia.org/wiki/Jos%C3%A9phine_Baker
Sep 27 2023
Maybe look at more template synonyms for these edit revisions, example: Template:Citation needed span
Sep 26 2023
@JArguello-WMF I've moved this into this Sprint since it's now unblocked. I hope to get time during the sprint to complete the task