Add logging to the mediawiki-services-chromium-render service
Closed, ResolvedPublic2 Estimated Story Points
Actions

Assigned To

Authored By

	phuedx
	Nov 15 2017, 2:14 PM

Description

@pmiazga reports that he's needed to add logging to the service while testing changes for T178501: Limit resources used by Chromium in order to make the chromium-render service ready for production.

The chromium-render library looks like a really good candidate to replace old electron service. First, we need to do proper performance testing and check how this project tackles the production traffic. Without proper logging, it will be very difficult to monitor the service health.

Each log entry should allow us to identify the job (by URL or task ID). When we get many rejected/failed jobs we will be able to verify what is the problem (service overloaded vs problematic articles).

For now, let's keep logs very detailed, as this will help us determining the server requirements

Questions:

do we need queue health status?

Log on those actions:

List of actions when we should log

Queue health status

Might be helpful, every one hour log
Level: info
Include:

queue size
number of running tasks
memory usage

Reasoning: an indicator that everything works as expected and service is up and running, even if there are no incoming requests.

DONE

Job got queued

Level: info
Include:

ID
URL
size of the queue

Reasoning: an indicator that queue works properly, the size of the queue will help us determine the max_waiting_timeout. When every task has to wait it means we should allow more concurrent requests (if CPU|MEM resources allow that). From now, we will use the ID to search for all related job log entries.

DONE

Job got rejected because queue is full

Level: warning
Include:

Reasoning: an indicator that queue is full and we need to throw more resources. The URL will help us finding which books weren't rendered (just in case someone creates a ticket with information that everytime he wants to create a book it fails)

DONE

Job started

Level: info
Include:

ID
URL
time spent in queue (in microseconds)
size of the queue

Reasoning: an indicator that queue is working and the concurrency is correct. If the queue size is always the same/bigger than in the moment when the task was queued it will mean everything goes into the queue and waits there. The queue should work as a buffer, not as a place where each job has to stay for couple seconds.

DONE

Job finished

Level: info
Include:

ID
time spent while rendering (in microseconds)
size of the queue

Reasoning: an indicator that chromium rendering is working correctly, also a good indicator how much time it takes to render an article (for tuning up the timeouts configuration for queue/rendering)

DONE

Job failed

Level: error
Include:

ID
URL
time spent while rendering (in microseconds)
reason

Reasoning: an indicator that chromium rendering is not working correctly, from URL we will be able to find which articles break chromium rendering

DONE

Connection closed while job in the queue|progress

Level: info
Include:

ID
total time spent in the queue (rendering + queue)
task state (queue|render)

Reasoning: an indicator that users don't want to wait so long for a PDFs

DONE

Stats

It would be also nice to have a graphana board to visualize the chromium-pdf traffic. Stats we would like to track:

rejected jobs
queued jobs
size of the queue when new job comes in
number of rendered jobs
number of failed renderings
time each job spends in the queue
time each job spends in the rendering state
might be helpful: generated pdf size

Details

	Subject	Repo	Branch	Lines +/-
	Log important events	mediawiki/services/chromium-render	master	+135 -45
	Make chromium testing easier	mediawiki/services/chromium-render	master	+20 -4

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	• mobrovac	T159922 pdfrender fails to serve requests since Mar 8 00:30:32 UTC on scb1003
Invalid	None	T172815 Improve stability and maintainability of our browser-based PDF render service
Resolved	ovasileva	T181079 [GOAL] Provide an expanded reading experience by improving the ways that users can download articles of interest for later consumption
Resolved	None	T181084 [EPIC] Deploy the mediawiki-services-chromium-render service (Proton)
Resolved	phuedx	T181118 [EPIC] Build a Chromium-based PDF renderer service
Resolved	phuedx	T180601 Add logging to the mediawiki-services-chromium-render service

Event Timeline

phuedx created this task.Nov 15 2017, 2:14 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 15 2017, 2:14 PM

phuedx renamed this task from Add logging to the mediawiki-services-chromium-render to Add logging to the mediawiki-services-chromium-render service.Nov 15 2017, 2:14 PM

pmiazga updated the task description. (Show Details)Nov 15 2017, 4:06 PM

phuedx removed pmiazga as the assignee of this task.Nov 15 2017, 4:08 PM

Jdlrobson moved this task from Incoming to Upcoming on the Web-Team-Backlog board.Nov 15 2017, 6:19 PM

MBinder_WMF moved this task from Upcoming to Needs Prioritization on the Web-Team-Backlog board.Nov 15 2017, 6:30 PM

pmiazga mentioned this in T178501: Limit resources used by Chromium in order to make the chromium-render service ready for production.Nov 16 2017, 5:46 PM

Change 392051 had a related patch set uploaded (by Pmiazga; owner: Pmiazga):
[mediawiki/services/chromium-render@master] Make chromium testing easier

https://gerrit.wikimedia.org/r/392051

gerritbot added a project: Patch-For-Review.Nov 17 2017, 2:54 PM

pmiazga mentioned this in rMSCRe2c73cc02195: Make chromium testing easier.Nov 17 2017, 2:57 PM

Change 392051 merged by Bmansurov:
[mediawiki/services/chromium-render@master] Make chromium testing easier

https://gerrit.wikimedia.org/r/392051

The task looks clear.

ovasileva triaged this task as High priority.Nov 20 2017, 6:11 PM

ovasileva moved this task from Triaged but Future to Upcoming on the Web-Team-Backlog board.

pmiazga set the point value for this task to 2.Nov 21 2017, 5:15 PM

phuedx added a parent task: T181118: [EPIC] Build a Chromium-based PDF renderer service.Nov 22 2017, 6:01 AM

Sorry for the task linking spam, y'all. It occurred to me that we didn't actually have a task tracking building the new service!

ovasileva added a project: Readers-Web-Kanbanana-Board-Old.Nov 23 2017, 12:44 PM

• bmansurov claimed this task.Nov 27 2017, 2:06 PM

• bmansurov moved this task from To Do to Doing on the Readers-Web-Kanbanana-Board-Old board.

Jdlrobson moved this task from Upcoming to 2017-18 Q2 on the Web-Team-Backlog board.Nov 27 2017, 8:32 PM

Change 393664 had a related patch set uploaded (by Bmansurov; owner: Bmansurov):
[mediawiki/services/chromium-render@master] Log important events

https://gerrit.wikimedia.org/r/393664

• bmansurov mentioned this in rMSCRc83da4e37733: Log important events.Nov 27 2017, 9:23 PM

• bmansurov moved this task from Doing to Needs Code Review on the Readers-Web-Kanbanana-Board-Old board.Nov 28 2017, 1:00 AM

• bmansurov mentioned this in rMSCR26b4ca635847: Log important events.Nov 28 2017, 11:45 PM

• bmansurov mentioned this in rMSCRd1ea1e1a5098: Log important events.Nov 28 2017, 11:56 PM

• bmansurov mentioned this in rMSCR0334b286e3af: Log important events.Nov 28 2017, 11:58 PM

This can skip QA! \o/

phuedx removed • bmansurov as the assignee of this task.Nov 29 2017, 6:33 AM

• bmansurov mentioned this in rMSCR4584a448bfff: Log important events.Nov 29 2017, 5:25 PM

pmiazga moved this task from Ready for Signoff to Doing on the Readers-Web-Kanbanana-Board-Old board.Nov 29 2017, 6:06 PM

• bmansurov mentioned this in rMSCRe68164848c89: Log important events.Nov 29 2017, 6:07 PM

• bmansurov moved this task from Doing to Needs Code Review on the Readers-Web-Kanbanana-Board-Old board.Nov 29 2017, 7:14 PM

Jdlrobson added a comment.Nov 29 2017, 8:24 PM

This comment was removed by Jdlrobson.

Jdlrobson assigned this task to • bmansurov.Nov 29 2017, 8:27 PM

Change 393664 merged by Mobrovac:
[mediawiki/services/chromium-render@master] Log important events

https://gerrit.wikimedia.org/r/393664

• bmansurov removed • bmansurov as the assignee of this task.Nov 30 2017, 1:00 PM

• bmansurov moved this task from Needs Code Review to Ready for Signoff on the Readers-Web-Kanbanana-Board-Old board.

Per the discussion in the Sprint Kickoff - Reading Web meeting, I'm happy to sign this off if y'all (@pmiazga, @bmansurov, and @pmiazga) have checked over the service's logs and they're readable and easily understood (which I believe to be the case).

IIRC we noted that there may be a few minor tweaks that need to be made as part of T178278: Performance test the service.

Add logging to the mediawiki-services-chromium-render serviceClosed, ResolvedPublic2 Estimated Story PointsActions

Description

Questions:

Log on those actions:

Queue health status

Job got queued

Job got rejected because queue is full

Job started

Job finished

Job failed

Connection closed while job in the queue|progress

Stats

Details

Related ObjectsSearch...

Event Timeline

Add logging to the mediawiki-services-chromium-render service
Closed, ResolvedPublic2 Estimated Story Points
Actions

Related Objects
Search...