Page MenuHomePhabricator

gmodena (GModena (WMF))
User

Projects

User does not belong to any projects.

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Nov 2 2020, 1:15 PM (46 w, 1 d)
Availability
Available
LDAP User
Gmodena
MediaWiki User
GModena (WMF) [ Global Accounts ]

Recent Activity

Mon, Sep 13

gmodena updated the task description for T287274: [SPIKE][PLACEHOLDER] we need to estimate the effort required to migrate Similarusers' backend to Cassandra.
Mon, Sep 13, 7:06 AM · Platform Team Workboards (Green)

Thu, Sep 9

gmodena added a comment to T290664: Agree on a repository structure for Airflow-related code.

Hey @mforns thanks for starting this.

Thu, Sep 9, 7:18 PM · Analytics

Aug 6 2021

gmodena closed T288114: Spark download url in build config needs update as Resolved.
Aug 6 2021, 8:55 AM · Platform Team Workboards (Image Suggestion API)

Aug 4 2021

gmodena added a comment to T288114: Spark download url in build config needs update.

PR at https://github.com/mirrys/ImageMatching/pull/28
All checks (main branch) are green again.

Aug 4 2021, 3:57 PM · Platform Team Workboards (Image Suggestion API)
gmodena updated the task description for T288114: Spark download url in build config needs update.
Aug 4 2021, 3:54 PM · Platform Team Workboards (Image Suggestion API)
gmodena moved T288114: Spark download url in build config needs update from In progress to In review on the Platform Team Workboards (Image Suggestion API) board.
Aug 4 2021, 3:53 PM · Platform Team Workboards (Image Suggestion API)
gmodena created T288114: Spark download url in build config needs update.
Aug 4 2021, 3:31 PM · Platform Team Workboards (Image Suggestion API)

Aug 3 2021

gmodena added a comment to T284225: Create airflow instances for Platform Engineering and Research.

Many thanks for this! Just wanted to give an ack that login on the host worked.

Aug 3 2021, 5:52 PM · Patch-For-Review, Analytics-Kanban, Research, Platform Engineering, Analytics

Jul 26 2021

gmodena updated the task description for T287274: [SPIKE][PLACEHOLDER] we need to estimate the effort required to migrate Similarusers' backend to Cassandra.
Jul 26 2021, 4:11 PM · Platform Team Workboards (Green)

Jul 23 2021

gmodena created T287274: [SPIKE][PLACEHOLDER] we need to estimate the effort required to migrate Similarusers' backend to Cassandra.
Jul 23 2021, 6:35 PM · Platform Team Workboards (Green)

Jul 14 2021

gmodena updated subscribers of T286036: Ingest user similarity data for June 2021.

June run has successfully completed on 2021-07-13 at 1600UTC/1800CEST.

Jul 14 2021, 11:43 AM · Data-Persistence (Consultation), Platform Team Workboards (Green)
gmodena moved T286036: Ingest user similarity data for June 2021 from Doing to Done on the Platform Team Workboards (Green) board.
Jul 14 2021, 11:40 AM · Data-Persistence (Consultation), Platform Team Workboards (Green)
gmodena moved T286036: Ingest user similarity data for June 2021 from Backlog to Doing on the Platform Team Workboards (Green) board.
Jul 14 2021, 11:40 AM · Data-Persistence (Consultation), Platform Team Workboards (Green)

Jul 13 2021

gmodena added a comment to T285816: Add an image: generate static file of suggestions.

I have a couple of questions re integration:

Jul 13 2021, 10:36 AM · Growth-Team (Current Sprint), Platform Team Workboards (Image Suggestion API), Image-Suggestions, Image-Suggestion-API, Growth-Structured-Tasks

Jul 2 2021

gmodena closed T284424: Ingest user similarity data for May 2021 as Resolved.
Jul 2 2021, 12:36 PM · Platform Team Workboards (Green)
gmodena closed T284424: Ingest user similarity data for May 2021, a subtask of T265722: New Service Request: Sockpuppet Detection, as Resolved.
Jul 2 2021, 12:36 PM · Platform Team Workboards (Green)
gmodena created T286036: Ingest user similarity data for June 2021.
Jul 2 2021, 12:36 PM · Data-Persistence (Consultation), Platform Team Workboards (Green)
gmodena added a comment to T284258: Knowledge store data model.

Disclaimer: total MW noob here :).

Jul 2 2021, 8:20 AM · tech-decision-forum

Jul 1 2021

gmodena moved T285494: We should use a unique spelling for neigbhor/neighbour from Ready to Waiting for Review on the Platform Team Workboards (Green) board.
Jul 1 2021, 2:06 PM · Platform Team Workboards (Green)

Jun 24 2021

gmodena updated the task description for T285494: We should use a unique spelling for neigbhor/neighbour.
Jun 24 2021, 6:31 PM · Platform Team Workboards (Green)
gmodena updated the task description for T285494: We should use a unique spelling for neigbhor/neighbour.
Jun 24 2021, 6:30 PM · Platform Team Workboards (Green)
gmodena moved T285494: We should use a unique spelling for neigbhor/neighbour from Next Sprint to Ready on the Platform Team Workboards (Green) board.
Jun 24 2021, 6:26 PM · Platform Team Workboards (Green)
gmodena moved T285494: We should use a unique spelling for neigbhor/neighbour from Backlog to Next Sprint on the Platform Team Workboards (Green) board.
Jun 24 2021, 6:25 PM · Platform Team Workboards (Green)
gmodena assigned T285494: We should use a unique spelling for neigbhor/neighbour to codebug.
Jun 24 2021, 6:23 PM · Platform Team Workboards (Green)
gmodena created T285494: We should use a unique spelling for neigbhor/neighbour.
Jun 24 2021, 6:16 PM · Platform Team Workboards (Green)

Jun 23 2021

gmodena added a comment to T283501: Data size estimates for v1 image rec wikis.

The datasets we generated for PoC are available at https://analytics.wikimedia.org/published/datasets/one-off/platform-imagematching/api/.

Jun 23 2021, 4:56 PM · Platform Team Workboards (Image Suggestion API), Image-Suggestion-API, Image-Suggestions

Jun 21 2021

gmodena added a comment to T280385: Apache Beam go prototype code for DP evaluation.

Small update: The Apache Beam Go SDK is about to officially become stable. Privacy on Beam's 1.0.0 release can be considered stable as soon as it's done.

Jun 21 2021, 3:02 PM · Analytics, Research, Privacy Engineering, Privacy, Data-release

Jun 10 2021

gmodena added a comment to T284630: `spark.memory.driver` option does not get applied with "client" deployment mode..

I'd prefer not to have to maintain a fork, but this change is not on the critical path for now. No rush :).

Jun 10 2021, 8:25 AM · Product-Analytics, wmfdata-python

Jun 9 2021

gmodena updated subscribers of T284424: Ingest user similarity data for May 2021.

The May training/ingestion run completed successfully.

Jun 9 2021, 3:27 PM · Platform Team Workboards (Green)
gmodena added a comment to T284630: `spark.memory.driver` option does not get applied with "client" deployment mode..

I would have opened a PR for this, but I wanted to validate one of our use cases first.

Jun 9 2021, 8:51 AM · Product-Analytics, wmfdata-python
gmodena added a comment to T284630: `spark.memory.driver` option does not get applied with "client" deployment mode..

wmfdata helper methods set spark.driver.memory via SparkSession‘s builder. The config will be ignored when spark runs in client mode, which is the default for the configs you ship. In client deployment mode, spark.driver.memory must be set before the JVM starts. For example we should pass the value to spark-submit like spark-submit --driver-memory <size>.
You can find more about this behaviour in Spark’s doc https://spark.apache.org/docs/latest/configuration.html.

Jun 9 2021, 8:37 AM · Product-Analytics, wmfdata-python
gmodena created T284630: `spark.memory.driver` option does not get applied with "client" deployment mode..
Jun 9 2021, 8:31 AM · Product-Analytics, wmfdata-python

Jun 8 2021

gmodena added a comment to T272973: Generalize the current Airflow puppet/scap code to deploy a dedicated Analytics instance.
  • DAG dir and distribution

We'll need to set a directory in which airflow scheduler will look for DAG files. Perhaps we can just add an airflow/dags directory in refinery and configure airflow scheduler to look there?
This will be deteremined per instance. For now we are using refinery/airflow/dags for analytics instance.

Jun 8 2021, 8:28 AM · Patch-For-Review, Analytics-Kanban, Analytics

Jun 7 2021

gmodena triaged T284424: Ingest user similarity data for May 2021 as Low priority.
Jun 7 2021, 11:13 AM · Platform Team Workboards (Green)
gmodena closed T282992: Ingest user similarity data for April 2021, a subtask of T265722: New Service Request: Sockpuppet Detection, as Resolved.
Jun 7 2021, 11:12 AM · Platform Team Workboards (Green)
gmodena closed T282992: Ingest user similarity data for April 2021 as Resolved.
Jun 7 2021, 11:12 AM · Platform Team Workboards (Green)
gmodena added a comment to T281687: 📊wikidata instance labels should be extracted based on the wiki language.

Adding a summary of https://github.com/mirrys/ImageMatching/pull/26#issuecomment-849519963 for posterity:

Jun 7 2021, 8:18 AM · Image-Suggestions, Image-Suggestion-API, Platform Team Workboards (Image Suggestion API)
gmodena added a comment to T282992: Ingest user similarity data for April 2021.

Excellent! Thanks for the ping :)

Jun 7 2021, 7:36 AM · Platform Team Workboards (Green)
gmodena created T284424: Ingest user similarity data for May 2021.
Jun 7 2021, 7:35 AM · Platform Team Workboards (Green)

Jun 3 2021

gmodena added a comment to T280678: Crunch and delete many old dumps logs.

I had a chat with @ArielGlenn today; I can help with a one-off analysis, but I'd need to understand needs and scope. Before moving forward let's make sure we would not be replicating analysis work already made available by @Addshore.

Jun 3 2021, 11:00 AM · Analytics-Kanban, Analytics

May 31 2021

gmodena added a watcher for Image-Suggestions: gmodena.
May 31 2021, 6:19 PM

May 27 2021

gmodena added a comment to T281687: 📊wikidata instance labels should be extracted based on the wiki language.

We dedicated some time fine tuning the spark job (not the code itself, but the cluster that executes it) and troubleshooting out-of-memory errors caused by extracting labels for all languages. Our findings are here https://github.com/mirrys/ImageMatching/pull/26#issuecomment-849519963

May 27 2021, 6:54 PM · Image-Suggestions, Image-Suggestion-API, Platform Team Workboards (Image Suggestion API)
gmodena moved T280800: [SPIKE] 📊 Research options for real-time processing from In progress to In review on the Platform Team Workboards (Image Suggestion API) board.
May 27 2021, 6:43 PM · Platform Team Workboards (Image Suggestion API)
gmodena added a comment to T280800: [SPIKE] 📊 Research options for real-time processing.

PoC code developed for this spike can be found at https://github.com/gmodena/wmf-streaming-imagematching/pulls. The stated goal of demoing was not satisfiable within the budgeted effort. A functionally equivalente aggregation query has been provided instead.
This PoC shows basic approaches to packaging Flink application, and the moving parts required for deploying clusters atop YARN on WMF's Hadoop cluster.

May 27 2021, 6:42 PM · Platform Team Workboards (Image Suggestion API)
gmodena added a comment to T280800: [SPIKE] 📊 Research options for real-time processing.

The Image Matching model is trained with a monthly schedule. During the month, the state of a recommendation can change. For example:

  • A recommendation has been rejected and should not be offered again.
  • A recommendation has been accepted, a page is now illustrated and should not receive further recommendations.
  • A page has been illustrated by a workflow external to ImageMatching and should not receive further recommendations.

With our current setup, we’ll need to wait till the new training completes to see changes reflected in data.

May 27 2021, 6:37 PM · Platform Team Workboards (Image Suggestion API)

May 21 2021

gmodena added a comment to T272973: Generalize the current Airflow puppet/scap code to deploy a dedicated Analytics instance.

@Ottomata thanks for the summary & overview of the .deb status.

May 21 2021, 6:38 PM · Patch-For-Review, Analytics-Kanban, Analytics

May 17 2021

gmodena updated subscribers of T282992: Ingest user similarity data for April 2021.

@Marostegui today we ran Similarusers ingestion of April data. Some stats:

May 17 2021, 3:35 PM · Platform Team Workboards (Green)
gmodena renamed T282992: Ingest user similarity data for April 2021 from Ingest user similarity data for March 2021 to Ingest user similarity data for April 2021.
May 17 2021, 7:19 AM · Platform Team Workboards (Green)
gmodena closed T279640: Ingest user similarity data for March 2021, a subtask of T265722: New Service Request: Sockpuppet Detection, as Resolved.
May 17 2021, 7:19 AM · Platform Team Workboards (Green)
gmodena set the point value for T282992: Ingest user similarity data for April 2021 to 1.
May 17 2021, 7:19 AM · Platform Team Workboards (Green)
gmodena closed T279640: Ingest user similarity data for March 2021 as Resolved.
May 17 2021, 7:19 AM · Data-Persistence (Consultation), Platform Team Workboards (Green)
gmodena created T282992: Ingest user similarity data for April 2021.
May 17 2021, 7:18 AM · Platform Team Workboards (Green)

May 6 2021

gmodena renamed T270613: [SPIKE] can we orchestrate the Similarusers pipeline with airflow? from [SPIKE] can we orchestrate (parts of) ETL with airflow? to [SPIKE] can we orchestrate the Similarusers pipeline with airflow?.
May 6 2021, 5:58 AM · Patch-For-Review, Platform Team Workboards (Green)
gmodena added a parent task for T270613: [SPIKE] can we orchestrate the Similarusers pipeline with airflow?: T282033: Airflow collaborations.
May 6 2021, 5:57 AM · Patch-For-Review, Platform Team Workboards (Green)
gmodena added a subtask for T282033: Airflow collaborations: T270613: [SPIKE] can we orchestrate the Similarusers pipeline with airflow?.
May 6 2021, 5:57 AM · Platform Team Workboards (Image Suggestion API), Analytics

May 5 2021

gmodena moved T280800: [SPIKE] 📊 Research options for real-time processing from Ready to In progress on the Platform Team Workboards (Image Suggestion API) board.
May 5 2021, 12:19 PM · Platform Team Workboards (Image Suggestion API)

May 4 2021

gmodena added a comment to T280042: New database request: image_matching.

Life would be easier if we could reach RESTBase Cassandra from the Hadoop network.

For the right usecase I imagine access could be authorised - we currently have firewall rules in place that allow access from Analytics->AQS, which are on the prod network outside of the analytics cluster. As has been mentioned, this pattern is quite similar to that of AQS in general.

May 4 2021, 1:45 PM · Platform Engineering
gmodena added a comment to T281687: 📊wikidata instance labels should be extracted based on the wiki language.

Right now it is due to memory constraints. We encountered a number of out of memory errors when trying to retrieve large set of labels from enwiki.
There's a few things we can do to fine tune memory footprint of the algo, but first we experimented with restricting the result set.

May 4 2021, 1:38 PM · Image-Suggestions, Image-Suggestion-API, Platform Team Workboards (Image Suggestion API)

May 3 2021

gmodena updated the task description for T280800: [SPIKE] 📊 Research options for real-time processing.
May 3 2021, 7:28 PM · Platform Team Workboards (Image Suggestion API)
gmodena moved T280800: [SPIKE] 📊 Research options for real-time processing from Backlog to Ready on the Platform Team Workboards (Image Suggestion API) board.
May 3 2021, 7:27 PM · Platform Team Workboards (Image Suggestion API)
gmodena moved T281680: 📊Instances of "point in time with respect to recurrent timeframe" should be filtered from the API datasets from In review to Ready on the Platform Team Workboards (Image Suggestion API) board.
May 3 2021, 7:27 PM · Image-Suggestion-API, Image-Suggestions, Platform Team Workboards (Image Suggestion API)
gmodena renamed T280800: [SPIKE] 📊 Research options for real-time processing from [Placeholder] 📊 Research options for real-time processing to [SPIKE] 📊 Research options for real-time processing.
May 3 2021, 7:26 PM · Platform Team Workboards (Image Suggestion API)
gmodena added a comment to T281680: 📊Instances of "point in time with respect to recurrent timeframe" should be filtered from the API datasets.

The following instances should be added to our filter list:

May 3 2021, 5:54 PM · Image-Suggestion-API, Image-Suggestions, Platform Team Workboards (Image Suggestion API)
gmodena moved T281687: 📊wikidata instance labels should be extracted based on the wiki language from Backlog to In review on the Platform Team Workboards (Image Suggestion API) board.
May 3 2021, 10:44 AM · Image-Suggestions, Image-Suggestion-API, Platform Team Workboards (Image Suggestion API)
gmodena updated the task description for T281687: 📊wikidata instance labels should be extracted based on the wiki language.
May 3 2021, 10:44 AM · Image-Suggestions, Image-Suggestion-API, Platform Team Workboards (Image Suggestion API)
gmodena added a comment to T281687: 📊wikidata instance labels should be extracted based on the wiki language.

PR at https://github.com/mirrys/ImageMatching/pull/24

May 3 2021, 10:44 AM · Image-Suggestions, Image-Suggestion-API, Platform Team Workboards (Image Suggestion API)
gmodena created T281687: 📊wikidata instance labels should be extracted based on the wiki language.
May 3 2021, 10:42 AM · Image-Suggestions, Image-Suggestion-API, Platform Team Workboards (Image Suggestion API)
gmodena added a comment to T281680: 📊Instances of "point in time with respect to recurrent timeframe" should be filtered from the API datasets.

PR at https://github.com/mirrys/ImageMatching/pull/25

May 3 2021, 8:49 AM · Image-Suggestion-API, Image-Suggestions, Platform Team Workboards (Image Suggestion API)
gmodena moved T281680: 📊Instances of "point in time with respect to recurrent timeframe" should be filtered from the API datasets from In progress to In review on the Platform Team Workboards (Image Suggestion API) board.
May 3 2021, 8:48 AM · Image-Suggestion-API, Image-Suggestions, Platform Team Workboards (Image Suggestion API)
gmodena moved T281680: 📊Instances of "point in time with respect to recurrent timeframe" should be filtered from the API datasets from Ready to In progress on the Platform Team Workboards (Image Suggestion API) board.
May 3 2021, 8:48 AM · Image-Suggestion-API, Image-Suggestions, Platform Team Workboards (Image Suggestion API)
gmodena moved T281680: 📊Instances of "point in time with respect to recurrent timeframe" should be filtered from the API datasets from Backlog to Ready on the Platform Team Workboards (Image Suggestion API) board.
May 3 2021, 8:48 AM · Image-Suggestion-API, Image-Suggestions, Platform Team Workboards (Image Suggestion API)
gmodena set the point value for T281680: 📊Instances of "point in time with respect to recurrent timeframe" should be filtered from the API datasets to 1.
May 3 2021, 8:47 AM · Image-Suggestion-API, Image-Suggestions, Platform Team Workboards (Image Suggestion API)
gmodena created T281680: 📊Instances of "point in time with respect to recurrent timeframe" should be filtered from the API datasets.
May 3 2021, 8:47 AM · Image-Suggestion-API, Image-Suggestions, Platform Team Workboards (Image Suggestion API)

Apr 29 2021

gmodena added a comment to T280794: [SPIKE] 📊 Import data into local Cassandra development db.

Local Cassandra docker-compose PoC (under review): https://github.com/gmodena/wmf-cassandra-imagematching

Apr 29 2021, 7:25 PM · Platform Team Workboards (Image Suggestion API)
gmodena created T281518: 📊[PLACEHOLDER] production data format should be adjusted to fit into Cassandra .
Apr 29 2021, 7:24 PM · Image-Suggestion-API, Image-Suggestions, Platform Team Workboards (Image Suggestion API)
gmodena created T281517: 📊[PLACEHOLDER] We should implement a data loader for Cassandra.
Apr 29 2021, 7:18 PM · Image-Suggestion-API, Image-Suggestions, Platform Team Workboards (Image Suggestion API)
gmodena added a comment to T277688: [Decision Needed] Long-term Target Datastore System.

This discussion, with relevant stakeholders, is ongoing at https://phabricator.wikimedia.org/T280042

Apr 29 2021, 7:12 PM · Image-Suggestions, Image-Suggestion-API, Platform Team Workboards (Image Suggestion API)
gmodena added a comment to T280042: New database request: image_matching.

The full dataset for ImageMatching, generated on 321 wikis, is 2.6GB. It contains 23585365 records.

To be clear, a record as it is referred to here is one globally unique primary key, and the corresponding columns, yes?

Apr 29 2021, 12:18 PM · Platform Engineering
gmodena added a comment to T280042: New database request: image_matching.

Maybe premature optimisation, but this dataset stores text fields (part of a potential primary key) that can be relatively long (page titles, image names). Do we have guidelines for hashing/storing long keys?

Below are some summary stats and percentiles on three fields with long text. @Eevans a few degenerate records are longer than Cassandra's max_key_size. I need to do some more validation here, a page title 755993 chars long might be
something wrong with the export process.

The limit on key names (and cluster column values) is 64KB, and for something that is meant to be an identifier...that is quite large. :) I can't see how either of page titles or image names would need to be part of the primary key, BUT, I also think there must be a problem with the reporting; It looks like page names are limited to 255 chars in MediaWiki (https://www.mediawiki.org/wiki/Manual:Page_table#page_title)

Apr 29 2021, 11:36 AM · Platform Engineering
gmodena updated the task description for T280794: [SPIKE] 📊 Import data into local Cassandra development db.
Apr 29 2021, 11:14 AM · Platform Team Workboards (Image Suggestion API)
gmodena moved T280794: [SPIKE] 📊 Import data into local Cassandra development db from In progress to In review on the Platform Team Workboards (Image Suggestion API) board.
Apr 29 2021, 11:14 AM · Platform Team Workboards (Image Suggestion API)

Apr 26 2021

gmodena added a comment to T280042: New database request: image_matching.

Maybe premature optimisation, but this dataset stores text fields (part of a potential primary key) that can be relatively long (page titles, image names). Do we have guidelines for hashing/storing long keys?

Apr 26 2021, 2:46 PM · Platform Engineering
gmodena moved T280794: [SPIKE] 📊 Import data into local Cassandra development db from Ready to In progress on the Platform Team Workboards (Image Suggestion API) board.
Apr 26 2021, 2:17 PM · Platform Team Workboards (Image Suggestion API)
gmodena moved T280834: Preprocess unmatched pages from In progress to In review on the Platform Team Workboards (Image Suggestion API) board.
Apr 26 2021, 2:17 PM · Image-Suggestion-API, Image-Suggestions, Platform Team Workboards (Image Suggestion API)
gmodena added a comment to T280042: New database request: image_matching.

The full dataset for ImageMatching, generated on 321 wikis, is 2.6GB. It contains 23585365 records. In prod we might want to store multiple snapshots (prev/current months), and possibly variants (to satisfy ad-hoc clients or A/B testing).

Apr 26 2021, 1:48 PM · Platform Engineering

Apr 21 2021

gmodena added a comment to T280042: New database request: image_matching.

@Marostegui @Eevans thanks for the input!
I should have stats re dataset sizes of the 300+ wikis towards the end of this week. Crunching is still in progress; it takes a while to cycle through all languages.

Apr 21 2021, 8:17 PM · Platform Engineering
gmodena set the point value for T280834: Preprocess unmatched pages to 5.
Apr 21 2021, 7:41 PM · Image-Suggestion-API, Image-Suggestions, Platform Team Workboards (Image Suggestion API)
gmodena moved T280794: [SPIKE] 📊 Import data into local Cassandra development db from In progress to Ready on the Platform Team Workboards (Image Suggestion API) board.
Apr 21 2021, 7:37 PM · Platform Team Workboards (Image Suggestion API)

Apr 19 2021

gmodena updated the task description for T280585: 📊Image Matching experiments should be deterministic .
Apr 19 2021, 7:15 PM · Image-Suggestion-API, Image-Suggestions, Platform Team Workboards (Image Suggestion API)
gmodena updated the task description for T280585: 📊Image Matching experiments should be deterministic .
Apr 19 2021, 7:15 PM · Image-Suggestion-API, Image-Suggestions, Platform Team Workboards (Image Suggestion API)
gmodena created T280585: 📊Image Matching experiments should be deterministic .
Apr 19 2021, 7:04 PM · Image-Suggestion-API, Image-Suggestions, Platform Team Workboards (Image Suggestion API)
gmodena added a comment to T277745: 📊The list of exported wikis should be available as standalone file..

PR at https://github.com/mirrys/ImageMatching/pull/22

Apr 19 2021, 6:56 PM · Image-Suggestions, Image-Suggestion-API, Platform Team Workboards (Image Suggestion API)
gmodena updated the task description for T277745: 📊The list of exported wikis should be available as standalone file..
Apr 19 2021, 6:47 PM · Image-Suggestions, Image-Suggestion-API, Platform Team Workboards (Image Suggestion API)

Apr 15 2021

gmodena updated subscribers of T280042: New database request: image_matching.
Apr 15 2021, 8:29 PM · Platform Engineering
gmodena updated subscribers of T280042: New database request: image_matching.

Thanks for detailed reply and constructive feedback.

Apr 15 2021, 8:28 PM · Platform Engineering

Apr 14 2021

gmodena claimed T277745: 📊The list of exported wikis should be available as standalone file..
Apr 14 2021, 9:54 AM · Image-Suggestions, Image-Suggestion-API, Platform Team Workboards (Image Suggestion API)
gmodena moved T277745: 📊The list of exported wikis should be available as standalone file. from Ready to In progress on the Platform Team Workboards (Image Suggestion API) board.
Apr 14 2021, 9:54 AM · Image-Suggestions, Image-Suggestion-API, Platform Team Workboards (Image Suggestion API)

Apr 13 2021

gmodena created T280042: New database request: image_matching.
Apr 13 2021, 3:52 PM · Platform Engineering
gmodena added a comment to T279640: Ingest user similarity data for March 2021.

The job has successfully completed at 2021-04-13 15:37:22,710.
Some stats for the ingested datasets:

Apr 13 2021, 1:48 PM · Data-Persistence (Consultation), Platform Team Workboards (Green)
gmodena added a comment to T279640: Ingest user similarity data for March 2021.

The ingestion part of the data pipeline kicked off at 2021-04-13 09:05:37,296.
It is set with

SIMILARUSERS_BATCH_SIZE=7000
SIMILARUSERS_THROTTLE_MS=1000
Apr 13 2021, 7:08 AM · Data-Persistence (Consultation), Platform Team Workboards (Green)