[Task] Investigate how and where data model objects are instanciated in our code base
Closed, ResolvedPublic

Bene created this task.Sep 17 2015, 3:01 PM
Bene updated the task description. (Show Details)
Bene raised the priority of this task from to High.
Bene claimed this task.
Bene added subscribers: Jonas, thiemowmde, JeroenDeDauw and 6 others.
Bene added a comment.Sep 17 2015, 6:13 PM

Some statistics (test files excluded):

  • SnakObject/PropertyValueSnak constructor calls (4 usages):
    • repo: SnakFactory
    • others: SnakDeserializer, DerivedPropertyValueSnak, LegacySnakDeserializer
  • SiteLink constructor calls (19 usages):
    • client: ClientHooks, PropertyParserFunction\Runner, WikibaseLuaBindings, DeletePageNoticeCreator, InfoActionHookHandler, MovePageNotice, OtherProjectsSidebarGenerator, UpdateRepo
    • lib: SiteLinkTable
    • repo: LinkTitles, UpdateRepoOnMoveJob, SiteLinkUniquenessValidator
    • others: SiteLinkDeserializer, SiteLinkList, LegacySiteLinkListDeserializer
  • Term constructor calls (11 usages):
    • lib: TermIndexEntry, LanguageFallbackLabelDescriptionLookup
    • repo: EntitySearchHelper
    • others: TermDeserializer, LanguageLabelDescriptionLookup, Fingerprint, TermList, LegacyFingerprintDeserializer
  • AliasGroup constructor calls (4 usages):
    • others: AliasGroupListDeserializer, AliasGroupList, Fingerprint, LegacyFingerprintDeserializer.php
Bene moved this task from Doing to Review on the Wikidata-Sprint-2015-09-15 board.Sep 17 2015, 6:23 PM
JanZerebecki moved this task from incoming to in progress on the Wikidata board.Sep 18 2015, 12:42 PM

Thanks for looking at this. Those findings are not surprising - such construction ought to occur primarily in the data access layer and deserialization code.

Any reason you did not include EntityId and derivatives?

Why did you exclude the tests? If I'm not mistaken the reason for having this investigation task is to get an idea of the cost of making breaking changes. That cost applies just as well to test code as to production code.

What do you mean with "our codebase"? Which code did you include in this analysis? Is it everything that needs to be maintained by the Wikidata team, or just Client, Lib and Repository?

I think it's useful to have separate numbers for the tests, but I agree that we should know those numbers too, since we need to fix these instances too.

Bene added a comment.Sep 21 2015, 8:43 PM

Thanks for looking at this. Those findings are not surprising - such construction ought to occur primarily in the data access layer and deserialization code.

Indeed, our code seems to handle the construction of value objects in a very nice way.

Any reason you did not include EntityId and derivatives?

Those aren't important for the current RFC that is adressed by this investigation because we already have an abstract base interface with EntityId.

Why did you exclude the tests? If I'm not mistaken the reason for having this investigation task is to get an idea of the cost of making breaking changes. That cost applies just as well to test code as to production code.

Yes, but fixing the tests should be very easy in most cases by replacing the constructor calls to some random implementation. However, it would be nice if all the tests would cover all implementations of the data model that we are going to implement. Just for record: There are of course a lot of constructor calls in the tests.

What do you mean with "our codebase"? Which code did you include in this analysis? Is it everything that needs to be maintained by the Wikidata team, or just Client, Lib and Repository?

For now, I only looked into the Wikibase.git repository and its dependencies but we also need to check our other extensions. We should add the usages from there to the list.

Those aren't important for the current RFC

I was under the impression the proposal was to have interfaces for all value objects (and presumably collections) in DataModel. If that is the case, we should consider the cost of changes to all of them. If not, then I'm rather concerned about making such disruptive changes for just part of the DataModel, forcing us to do more breaking of stuff as soon as we want another alternate implementation.

Yes, but fixing the tests should be very easy

Do you see a difference between the tests and the production code?

Tobi_WMDE_SW set Security to None.
Tobi_WMDE_SW moved this task from Proposed to Review on the Wikidata-Sprint-2015-09-29 board.

This should be extended to everything that is included in the Wikidata build (e.g. serializers, etc..). We also need this for our tests (ideally separated).

Tobi_WMDE_SW removed Bene as the assignee of this task.Oct 13 2015, 1:10 PM
Tobi_WMDE_SW moved this task from Review to Proposed on the Wikidata-Sprint-2015-10-13 board.
Tobi_WMDE_SW assigned this task to thiemowmde.
thiemowmde added a comment.EditedNov 12 2015, 4:43 PM

I did regex searches for new Class( instantiations as well as static Class::…( calls in my local copy of the codebase, including all dependencies (ValueView, DataModel, all DataValues, and so on). I did not included other extensions like PropertySuggester or WikibaseQuality. I listed all usages no matter in which (sub) component they are.

Value objects

  • AliasGroup
    • Instantiations in production: AliasGroupList, AliasGroupListDeserializer, Fingerprint, LegacyFingerprintDeserializer
    • Usage in tests: 122
  • AliasGroupFallback
    • Usage in production: 0
    • Usage in tests: AliasGroupSerializerTest, AliasGroupTest
  • AliasGroupList
    • Instantiations in production: AliasGroupListDeserializer, Entity, Fingerprint, FingerprintPatcher, LegacyFingerprintDeserializer
    • Usage in tests: 96
  • ByPropertyIdArray
    • Instantiations in production: ChangeOpStatement
    • Usage in tests: 0
  • Claims
    • Usage: 0
  • DerivedPropertyValueSnak
    • Usage: 0
  • EntityIdValue
    • Instantiations in production: EntityIdValueParser
    • Usage in tests: 44
  • EntityRedirect
    • Instantiations in production: EntityContent, EntityContentDataCodec, EntityDataRequestHandler, RedirectCreationInteractor
    • Usage in tests: 52
  • Fingerprint
    • Instantiations in production: EntityRetrievingTermLookup, Fingerprint, Item, LegacyFingerprintDeserializer, Property
    • Usage in tests: 72
  • HashableObjectStorage
    • Usage: 0
  • Item
    • Instantiations in production: createBlacklistedItems, GenericEntityInfoBuilder, Item, ItemContent, ItemDeserializer, ItemDiffer, ItemHandler, LegacyItemDeserializer, LinkTitles, SpecialNewItem
    • Usage in tests: 429
  • ItemId
    • Instantiations in production: BasicEntityIdParser, CachingSiteLinkLookup, createBlacklistedItems, DiffView, ExternalChangeFactory, HashSiteLinkStore, ItemHandler, ItemIdParser, LegacySiteLinkListDeserializer, MergeItems, ModifyEntity, SiteLinkListPatcher, SpecialPagesWithBadges, SpecialSetSiteLink, UpdateRepoJob, WikibaseLuaBindings
    • Static calls in production: ChangesSubscriptionTableBuilder, EntityPerPageTable, Item, LegacyIdInterpreter, SiteLinkTable, SiteLinkUniquenessValidator, SiteLinkUsageLookup
    • Usage in tests: 1034
  • ItemIdSet
    • Instantiations in production: SiteLink
    • Usage in tests: SiteLinkListTest, SiteLinkTest
  • Property
    • Instantiations in production: LegacyPropertyDeserializer
    • Static calls in production: importProperties, PropertyContent, PropertyDeserializer, PropertyHandler, SpecialNewProperty
    • Usage in tests: 114
  • PropertyId
    • Instantiations in production: BasicEntityIdParser, ByPropertyIdArray, ByPropertyIdGrouper, CallbackFactory, changePropertyDataType, PropertyHandler, PropertyIdResolver, SnakObject, WikibaseRepo
    • Static calls in production: LegacyIdInterpreter, Property, PropertyInfoTableBuilder, SnakObject, SpecialListProperties, SqlEntityInfoBuilder
    • Usage in tests: 545
  • PropertyNoValueSnak
    • Instantiations in production: LegacySnakDeserializer, SnakDeserializer, SnakFactory
    • Usage in tests: 432
  • PropertySomeValueSnak
    • Instantiations in production: LegacySnakDeserializer, SnakDeserializer, SnakFactory
    • Usage in tests: 110
  • PropertyValueSnak
    • Instantiations in production: DerivedPropertyValueSnak, LegacySnakDeserializer, SnakDeserializer, SnakFactory
    • Usage in tests: 264
  • Reference
    • Instantiations in production: LegacyStatementDeserializer, ReferenceDeserializer, ReferenceList, SetReference
    • Usage in tests: 143
  • ReferenceList
    • Instantiations in production: LegacyStatementDeserializer, ReferenceListDeserializer, Statement, StatementList
    • Usage in tests: 70
  • SiteLink
    • Instantiations in production: DeletePageNoticeCreator, InfoActionHookHandler, LegacySiteLinkListDeserializer, LinkTitles, MovePageNotice, OtherProjectsSidebarGenerator, Runner, SiteLinkDeserializer, SiteLinkList, SiteLinkTable, SiteLinkUniquenessValidator, UpdateRepo, UpdateRepoOnMoveJob, WikibaseLuaBindings
    • Usage in tests: 155
  • SiteLinkList
    • Instantiations in production: Item, LegacySiteLinkListDeserializer, LinkTitles, SetSiteLink, SiteLinkListPatcher
    • Usage in tests: 54
  • SnakList
    • Instantiations in production: LegacySnakListDeserializer, Reference, SnakListDeserializer, Statement, StatementList
    • Usage in tests: 144
  • Statement
    • Instantiations in production: LegacyStatementDeserializer, StatementDeserializer, StatementGroupListView, StatementList
    • Usage in tests: 212
  • StatementByGuidMap
    • Usage: 0
  • StatementGuid
    • Instantiations in production: StatementGuidParser
    • Usage in tests: StatementGuidParserTest
  • StatementList
    • Instantiations in production: ChangeOpStatement, EntityChangeFactory, Item, LegacyItemDeserializer, Property, PropertyDiffer, ResultBuilder, StatementListDeserializer, StatementListPatcher
    • Usage in tests: 138
  • Term
    • Instantiations in production: EntitySearchHelper, Fingerprint, LanguageLabelDescriptionLookup, LegacyFingerprintDeserializer, TermDeserializer, TermIndexEntry, TermList
    • Usage in tests: 197
  • TermFallback
    • Instantiations in production: LanguageFallbackLabelDescriptionLookup
    • Usage in tests: EntityIdHtmlLinkFormatterTest, TermIndexSearchInteractorTest, TermListSerializerTest, TermSerializerTest, TermTest
  • TermList
    • Instantiations in production: Entity, Fingerprint, FingerprintPatcher, LegacyFingerprintDeserializer, TermListDeserializer
    • Usage in tests: 130
  • TypedSnak
    • Usage in production: 0
    • Usage in tests: SerializerFactoryTest, TypedSnakSerializerTest

Services

  • BasicEntityIdParser
    • Instantiations in production: dumpEntities, EntityChange, SqlEntityInfoBuilder
    • Static calls in production: WikibaseClient, WikibaseRepo
    • Usage in tests: 75
  • DispatchingEntityIdParser
    • Instantiations in production: BasicEntityIdParser, WikibaseClient, WikibaseRepo
    • Usage in tests: 0
  • EntityIdParsingException
    • Instantiations in production: DispatchingEntityIdParser, ItemIdParser, populateChangesSubscription, SuffixEntityIdParser
    • Usage in tests: EntityIdDeserializerTest
  • ItemIdParser
    • Usage: 0
  • LegacyIdInterpreter
    • Static calls in production: EntityContentDataCodec, EntityIdValue, EntityPerPageTable, LegacyEntityIdDeserializer, SqlEntityInfoBuilder, TermIndexEntry
    • Usage in tests: TermIndexEntryTest
  • MapValueHasher
    • Instantiations in production: HashableObjectStorage, HashArray, SnakList
    • Usage in tests: 0
  • ReferencedStatementFilter
    • Usage in production: 0
    • Usage in tests: StatementListTest

Thanks for the overview Thiemo!

Note that you included some services, while this task is about value objects, aggregates and entities.

The task description talked about "data model objects", so I simply did the same steps for all classes in the current Wikibase-DataModel 4.4.0 (dev).

@Thiemo: I'm missing numbers for the Term, TermList, and TermFallback classes. Did these slip through the cracks somehow?

Not sure how this happened. AliasGroupList, Fingerprint, and all Term classes where missing. I updated my list above.

thiemowmde closed this task as Resolved.Sep 28 2017, 3:26 PM