Page MenuHomePhabricator

<APP: Commons> Wikimedia Israel GLAMs Analytics Dashboard Support
Open, In Progress, MediumPublic

Description

Request Status: New Request
Request Type: project support request
Related OKRs/APP Priority: 2021-2022 KaaS Commons

Request Title: GLAMs Analytics Dashboard Support

  • Request Description: https://glamwikidashboard.org/ is requesting that we support the data required to run their GLAMs Analytics Dashboard as AWS costs continue to increase and it becomes challenging to maintain the software.

The current amount is mainly used for backing up Wikicommons media view count information that we download daily from Wikipedia servers, and for data processing services. Our database backup grows every day and each time an institution is added, and it requires a larger and growing storage volume than what we have available.
Already today, the storage available for us from AWS - (at a special cost for NPO's) - does not meet the needs and demand and limits our ability to add additional GLAM institutions that approach us and request to use the dashboard.
To enable the uploading of additional GLAM institutions and in order to respond to the demand we are experiencing, an expansion of the storage volume is required. This expansion cannot be included in the limited storage volume that AWS offers to NPO's, and therefore it requires a payment according to market prices, which may reach, in our estimation, about 2000 dollars per month. Needless to say that we as a chapter cannot take upon ourselves this level of costs, which represent a global usage of the tool.
We are interested in transferring the storage of the dashboard to the WMF servers. This move will allow the continued encouragement of GLAM institutions that already use it, and many other new ones, to release media files to Wikimedia Commons, as we have experienced in practice since we started operating the dashboard, which proves itself to be an efficient, friendly, accessible and easy-to-use tool, and it provides added value to the institutions that use it (such as marketing, research, management statistics and information for stakeholders, to name few).
Such a move will also make it possible to generate a variety of collaborations within the movement to continue developing the tool, improving it and expanding the uses it can offer.

  • Indicate Priority Level: Medium
  • Main Requestors: Wikimedia Israel
  • Ideal Delivery Date: ASAP
  • Stakeholders:

Request Documentation

Document TypeRequired?Document/Link
Related PHAB TicketsYes<add link here>
Product One PagerYes<add link here>
Product Requirements Document (PRD)Yes<add link here>
Product RoadmapNo<add link here>
Product Planning/Business CaseNo<add link here>
Product BriefNo<add link here>
Other LinksNohttps://github.com/yonathan06/cassandra-GLAM-tools

Event Timeline

DAbad triaged this task as Medium priority.Oct 26 2022, 3:23 PM
DAbad created this task.

Following our session with their tech lead end of last week - our agreed next steps:

  • Yonatan (wikimedia Israel tech lead) to explore loading the data from the MediaRequesst API instead of using the mediacounts dumps which should reduce some of the data crunching needed.
  • Yonatan to investigate the performance implications of progressively loading some of the historical views directly from the API for large or newer institutions to reduce the size of the backend database.
  • Us (Data Platform) to explore producing a custom API endpoint for mediarequests by category to reduce/remove the need for data transformation and storage from the dashboard. (Need to confirm if we should wait for AQS 2.0 release)

Update:

  • Yonatan (wikimedia Israel tech lead) explored using the MediaRequesst API and confirmed that it can replace the s3 storage they are using but will take significant refactoring on his side to get it working to reduce the size of the postgress instance (which is the main cost driver). He is unable to do any major refactors in the coming months.
  • As a stopgap solution we are looking to migrate the project from AWS to cloud services. This however will require a minimum of 2 x 200GB instances. Need to request quota increases from the cloud services team before proceeding down this route.
EChetty changed the task status from Open to In Progress.Jan 18 2023, 11:56 AM

Update:

  • Yonatan (wikimedia Israel tech lead) explored using the MediaRequesst API and confirmed that it can replace the s3 storage they are using but will take significant refactoring on his side to get it working to reduce the size of the postgress instance (which is the main cost driver). He is unable to do any major refactors in the coming months.
  • As a stopgap solution we are looking to migrate the project from AWS to cloud services. This however will require a minimum of 2 x 200GB instances. Need to request quota increases from the cloud services team before proceeding down this route.

Just a note – I know that typically VPS Cloud is not accustomed to hosting services this large, but these metrics (even if imperfect) are of tremendous value to folks in the GLAM wiki community. In the last decade, "open access" and wiki work has become an accepted norm for the largest GLAM institutions in the world. We made the call with our movement strategy:

"By 2030, Wikimedia will become the essential infrastructure of the ecosystem of free knowledge, and anyone who shares our vision will be able to join us."

They have joined us, at scale. That's great news.

However, we are failing in our ability at the various stages of our GLAM Wiki work –ingestion, enrichment, and measuring impact. Of these, the metrics and measurement is perhaps the toughest part. We haven't updated our approach, our tools, our support, or our planning for this new reality.

Getting GLAMwikidashboard working on our own servers where resources are not constrained by commercial fees would be great. Thanks.

I am Wikimedian in Residence at the School of Data Science at the University of Virginia and a person who has been developing Wikipedia content as a full time salaried communication professional since 2012.

My university wants this and so does every other university, museum, research institute, government agency, and cultural partner.

In another thread a Wikimedia Foundation staffperson asked for testimony from institutional collaborators as to the importance of communication metrics in Wikimedia partnerships. I confirm that these metrics are critical. The briefest evidence that I can give for this is that when communication professionals develop either institutional websites or social media platforms like Facebook, Twitter, YouTube, Insta, or anything else, they uniformly and universally provide communication metrics of their impact.

The Wikimedia Foundation is the only major tech organization which lacks a communication impact interface to provide the communication industry standard metrics for partners. In general, these metrics would simply report that if an organization were to invest in content development, then that content would reach readers. This is not an unusual request. This is an industry sector with 100,000s of thousands of workers and countless millions of amateurs who all want communication metrics.

The future of Wikipedia requires stability and reliability in accessing metrics. The community has requested and demanded this for a decade. For Wikimedia projects to align with the nearly universal expectation of communications professionals, metrics must become accessible and reliable to access.

I'm a Wikimedian-in-Residence at the Brigham Young University library, and accurate image views are important to me too! They are so useful in showing the success of open culture work to other library professionals. They were a major part in my convincing the administration where I work to change my position from 1/2-time to 3/4-time (although, it was a little embarrassing to explain the limitations of the metrics at the same time).

Hi @Fuzheado,

I know that typically VPS Cloud is not accustomed to hosting services this large, but these metrics (even if imperfect) are of tremendous value to folks in the GLAM wiki community. In the last decade, "open access" and wiki work has become an accepted norm for the largest GLAM institutions in the world.

Having vetting the project with the SRE folks, its larger than what we support by default, but nowhere near the limits of what VPS can handle.

Re:

We are failing in our ability at the various stages of our GLAM Wiki work –ingestion, enrichment, and measuring impact. Of these, the metrics and measurement is perhaps the toughest part. We haven't updated our approach, our tools, our support, or our planning for this new reality.

I couldn't agree more. This specific piece of work is a tiny stopgap in a much larger conversation the movement & the foundation needs to have about prioritising the work to support these use cases. As articulated by @Bluerasberry and @RachelHelpsBYU in this thread, the user needs are clear enough to start. The next steps are about getting the larger pieces of work scoped and prioritised, and engaging in conversations that allow us to align on which parts of this work is most important. I know leadership is the midst of annual planning at the moment. @FRomeo_WMF can you advise on where the best place to have this conversation would be so it can be captured?

  • Us (Data Platform) to explore producing a custom API endpoint for mediarequests by category to reduce/remove the need for data transformation and storage from the dashboard. (Need to confirm if we should wait for AQS 2.0 release)

Is this being tracked somewhere (phabricator or wiki) for us who are interested in following the development?

Following up on @EChetty's last post, @Udehb-WMF created a page to document GLAM Metrics: https://commons.wikimedia.org/wiki/Commons:Product_and_technical_support_for_Commons_2022-23/GLAM_Metrics_Needs

It is under the 'Product and technical support for Commons 2022-23' page and is visible to the team working on that program.

Aklapper added a subscriber: EChetty.

Removing inactive assignee (please do so as part of team offboarding!).