Page MenuHomePhabricator

RecordLintJob is sometimes too big
Closed, ResolvedPublic

Description

For example for the revision https://uk.wikipedia.org/w/index.php?title=Вікіпедія:Шафа&oldid=21337479 the RecordLintJob serialized in JSON takes 5.6M and this is not the largest example.

For the Kafka #JobQueue this is just too big, but I think it's too big for any queue implementation.

Can we, perhaps, limit the number of linting errors per job and split a huge job into smaller jobs in case it exceeds the limit

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 17 2018, 11:10 PM

Do you have a copy of the JSON?

We limit it to 20 errors per category, so theoretically it shouldn't be possible to have millions of errors. I'm assuming that the parameters/text part of the LintError is what took up too much room, but would like to confirm that...

Do you have a copy of the JSON?

Here's the event wrapped into EventBus log entry - it's wrapped in the log entry, so escaped, but you get the idea

Legoktm claimed this task.Aug 18 2018, 3:19 AM

Do you have a copy of the JSON?

Here's the event wrapped into EventBus log entry - it's wrapped in the log entry, so escaped, but you get the idea

Thanks, that was helpful. Our 20 error limit is in the job itself, before writing to the database...which means all of those errors need to be inserted into the job queue. Oops.

Change 453563 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[mediawiki/extensions/Linter@master] Drop excess events at the API layer

https://gerrit.wikimedia.org/r/453563

Change 453563 merged by jenkins-bot:
[mediawiki/extensions/Linter@master] Drop excess events at the API layer

https://gerrit.wikimedia.org/r/453563

Legoktm closed this task as Resolved.Aug 27 2018, 5:55 PM

Was deployed last week, please re-open if Linter is still causing problems :)