The knowledge gaps pipeline depends on a number of data sources to provide metrics per (wiki_db, gap, category, time). Some of these sources don't filter for canonical wikis, which results in non-canonical wikis to appear in the output datasets.
Specifically, it seems the culprit is in the page revision history logic, as only article_created and revision_count are non-null. The filter step needs to be added here.
shows values for article_created and revision count
spark.table("knowledge_gaps.by_category").where("wiki_db='srwikiquote'").select("metrics.*").distinct().show()
+---------------+-------------+--------------+----------------+----------------------+-------------+--------------+
|article_created|pageviews_sum|pageviews_mean|standard_quality|standard_quality_count|quality_score|revision_count|
+---------------+-------------+--------------+----------------+----------------------+-------------+--------------+
| 2| null| null| null| null| null| 6|
| 8| null| null| null| null| null| 13|To filter for wikipedia projects, see Isaac's comment here.