Page MenuHomePhabricator

Precompute Article Summaries
Closed, ResolvedPublic3 Estimated Story Points

Description

Background

User story

  • As a designer running a user test to verify summarization, I have a set of summaries I can include in my examples

Requirements

  • We have identified articles to generate summaries for (consulting with Justin/Olga)
  • We have generated summaries for these articles and stored our mechanism of doing so
  • We have put these summaries into a spreadsheet that design can use

This task was created by Version 1.2.0 of the Web team task template using phabulous

Event Timeline

I think let's use the article for Dopamine. The reasoning here is that it's a verified "good article" and it also has a relatively complex introductory paragraph with lots of technical language. It also has broad general interest and an analog in Simple English.

Ok! finally got @MGerlach jupyter notebook tutorial up and running on the stat1008 box.

I ran the model for the "Dopamine" article, here is the original and the output

original
Dopamine (DA, a contraction of 3,4-dihydroxyphenethylamine) is a neuromodulatory molecule that plays several important roles in cells. It is an organic chemical of the catecholamine and phenethylamine families. Dopamine constitutes about 80% of the catecholamine content in the brain. It is an amine synthesized by removing a carboxyl group from a molecule of its precursor chemical, L-DOPA, which is synthesized in the brain and kidneys. Dopamine is also synthesized in plants and most animals. In the brain, dopamine functions as a neurotransmitter—a chemical released by neurons (nerve cells) to send signals to other nerve cells. Neurotransmitters are synthesized in specific regions of the brain, but affect many regions systemically. The brain includes several distinct dopamine pathways, one of which plays a major role in the motivational component of reward-motivated behavior. The anticipation of most types of rewards increases the level of dopamine in the brain, and many addictive drugs increase dopamine release or block its reuptake into neurons following release. Other brain dopamine pathways are involved in motor control and in controlling the release of various hormones. These pathways and cell groups form a dopamine system which is neuromodulatory.",
simplified
Dopamine is a chemical found in the brain. It is a neurotransmitter. Dopamine is released by neurons to send signals to other nerve cells. Dopamine is a neurotransmitter. Dopamine is a chemical that is released by neurons to send signals to other nerve cells. Dopamine is a neurotransmitter. It is a chemical released by neurons to send signals to other nerve cells. Neurotransmitters are made in specific regions of the brain, but affect many regions systemically. The brain includes several distinct dopamine pathways. One of these plays a major role in the motivational component of reward-motivated behavior. The anticipation of most types of rewards increases the level of dopamine in the brain. Many addictive drugs increase dopamine release or increase dopamine levels.

@Jdrewniak Can we engineer the prompt for this to eliminate extraneous sentences? The simple summary repeats itself a few times in the first few sentences. If you re-arrange them so the similar sentences are together you get a kind of concrete poetry...

It is a neurotransmitter.
Dopamine is a neurotransmitter.
Dopamine is a neurotransmitter.

Dopamine is released by neurons to send signals to other nerve cells.
Dopamine is a chemical that is released by neurons to send signals to other nerve cells.
It is a chemical released by neurons to send signals to other nerve cells.

Some clarification. It turns out the notebook originally posted in the task description is not the one intended for summarization purposes (but rather for simplification).
The approach we should go with is better reflected in this notebook:
https://public-paws.wmcloud.org/User:MGerlach%20(WMF)/text-simplification/section-gists_v01.ipynb
where (for experimentation purposes) we just use the Cohere API to generate article summaries.

Ok, so Jan and I walked through this. First of all, we're using Aya by Cohere, not our custom, in-house one per @MGerlach and @NBaca-WMF's guidance.

Our goal is to get simple summaries for as many good quality English articles as possible that are the most likely to be seen by extension users.

What we'll propose, for now, is to build an article pool from:

  • most popular articles screened for "good quality" using an API (do you have a link for this @Jdrewniak?)
  • vital articles
  • featured articles

That should give us an article pool in the 10s of thousands.
Then we rank those by popularity.
Jan will take the 10K most popular articles from that pool and bulk generate the simple summaries for them on his local to avoid API key limitations. We decided on 10K articles because people will have to download these summaries when they download the extension, and Jan thought this would keep the extension within an acceptable download size.

The outputs from Aya are leaps and bounds better in terms of quality compared to our previous models. We need to refine the prompt engineering a bit to get the model to have more of an encyclopedic tone.

We tweaked the prompt because Aya has a tendency to make exclamations and sound too excited for encyclopedic tone.

Preamble

You are writing encyclopedic articles in the style of Wikipedia using simplified language.
Temperature = 0.1
## Instructions
Summarize the first section of the Wikipedia article linked below for a 7th grader in English. Use an encyclopedic tone. Just return the summary.
## Input text
https://en.wikipedia.org/wiki/Dopamine

@Jdrewniak, @JScherer-WMF - curious what our final summary output was. Any chance we can paste it here?

here's the output that we used in testing:

Dopamine is a neurotransmitter, a chemical messenger that carries signals between brain cells. It plays a vital role in several brain functions, including emotion, motivation, and movement. When we experience something enjoyable or receive a reward, our brain releases dopamine, creating a sense of pleasure and reinforcement. This neurotransmitter also helps us focus and stay motivated by influencing our behavior and thoughts. Dopamine imbalance has been associated with various disorders, such as depression and Parkinson's disease, highlighting its importance in maintaining overall brain health and function.