We need an SQLite3 storage back-end in order to support lightweight users and test development/test environments.
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Duplicate | None | T88190 Implement alternative storage modules, for example SQLite3 | |||
Resolved | • Pchelolo | T107162 Enable restbase testing with SQLite backed | |||
Resolved | • mobrovac | T107447 Switch to SQLite as the RESTBase back-end in MW-Vagrant | |||
Resolved | • Pchelolo | T88191 SQLite3 storage module back-end for RESTBase | |||
Resolved | • Pchelolo | T89888 Support 'static' columns in SQLite3 module back-end for RESTBase | |||
Resolved | • Pchelolo | T103406 move restbase-mod-table-cassandra tests and spec into a separate module | |||
Resolved | Hardikj | T99696 cleanup and refactor test/index.js |
Event Timeline
Current work on this is hosted here https://github.com/hardikj/restbase-mod-table-sqlite
Right now, i am trying to fix the PUT path in the module so it matches the put path of restbase-cassandra module.
In cassandra we use "insert" only for 'if not exists', or when no non-primary-key attributes are specified and "update" for other cases (upsert). Since "update" query also perform insert this is handled easily by cassandra. But, In sqlite3 a "update" query don't perform insert if the row doesn't exists.
So to solve the situation we have two options
- To use insert or replace, http://sqlite.org/lang_conflict.html.
- first perform a select and do an update
- have a parameter in req which specify this is update query.
My vote is to the first option here, this includes
- use a insert or replace for upsert
- check if a 'if' attribute is not 'not exists' and contain a condition do a update
Actually, if not exists is used only for table creation inside the Cassandra module. However, since this is a purely-Cassandra construct, I wouldn't rely on it.
- To use insert or replace, http://sqlite.org/lang_conflict.html.
Hmmm, replace is too destructive, IMHO, as it triggers cascading deletes in case a PK exists.
- first perform a select and do an update
I'd vote for this option, definitely. While it seems complicated, this is the way SQL logic should go. An alternative to that is to preform a SELECT first to check if a certain row is present. In either case, using transactions is a must. We should also evaluate whether we strive for data correctness, in which case tables should be also locked when inserting a new record.
- have a parameter in req which specify this is update query.
This implies a user's knowledge of the data's structure and contents, which makes the module much harder to use (and to some extent incompatible with the Cassandra one).
To use insert or replace, http://sqlite.org/lang_conflict.html.
Hmmm, replace is too destructive, IMHO, as it triggers cascading deletes in case a PK exists.
@mobrovac I agree that replace triggers cascading deletes but how about considering this option as defined here http://stackoverflow.com/questions/418898/sqlite-upsert-not-insert-or-replace.
In an IRC discussion we decided it's best to go the insert + update (or viceversa) way for the sake of correctness. Alternative approaches are to be investigated. Specifically, we should try to figure out the exact write patterns used by RESTBase, i.e. the ratio of inserts vs updates. This will allow us to optimise the queries (and their ordering perhaps as well).
We actually have OR IGNORE in sqlite3 that allow same functionality as 'if not exists' so we can use this in sqlite module as well.
I think we should try to keep the user inputs (request) and the outer layer similar to what we give to cassandra. Since the tests that will run on this module should eventually pass on cassandra.
Actually, if not exists is used only for table creation inside the Cassandra module. However, since this is a purely-Cassandra construct, I wouldn't rely on it.
It is actually part of the table storage interface, it just sounds similar to the cassandra feature.
We actually have OR IGNORE in sqlite3 that allow same functionality as 'if not exists' so we can use this in sqlite module as well.
I think we should try to keep the user inputs (request) and the outer layer similar to what we give to cassandra. Since the tests that will run on this module should eventually pass on cassandra.
+1. You can translate the if: 'exists' case in requests to a plain update (without OR IGNORE) in sqlite.
Current Status: Create table, Get and Put path seems to work fine for the most used cases.
Next Task: refine secondary index creation
Put path do not support queries with " if : { body: { "eq": new Buffer("<p>Service Oriented Architecture</p>") } " yet. Since sqlite do not support queries like
update "local_test_cassandra_restbase_T_simplejVX8RkRC4w_data" set "body" = ? where "key" = ? AND "tid" = ? if "body" = ?
I am looking for a good solution to solve this case.
one way is that we handle such condition by running a select query.
Also, I am not sure if this is supported
update "local_test_cassandra_restbase_T_simplejVX8RkRC4w_data" set "body" = ? where "key" = ? AND "tid" = ? AND "body" = ?
Cassandra seems to lists them as lightweight transaction
http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_ltwt_transaction_c.html
Do you have a branch to look at? Are you using SQLite's built-in secondary indexes?
Put path do not support queries with " if : { body: { "eq": new Buffer("<p>Service Oriented Architecture</p>") } " yet. Since sqlite do not support queries like
update "local_test_cassandra_restbase_T_simplejVX8RkRC4w_data" set "body" = ? where "key" = ? AND "tid" = ? if "body" = ?I am looking for a good solution to solve this case.
one way is that we handle such condition by running a select query.
Also, I am not sure if this is supportedupdate "local_test_cassandra_restbase_T_simplejVX8RkRC4w_data" set "body" = ? where "key" = ? AND "tid" = ? AND "body" = ?
Should be.
Cassandra seems to lists them as lightweight transaction
In SQLite things are much simpler as it's not a distributed system. Each update query is normally executed in its own transaction. IIRC at least with some drivers this happens implicitly, but if there are multiple queries (batch) all of them should be wrapped in a single transaction. This also improves performance.
I'd like to see the progress as well :)
As for secondary indices, SQLite supports multiple indices to be defined on a table, so no additional logic should be needed.
Put path do not support queries with " if : { body: { "eq": new Buffer("<p>Service Oriented Architecture</p>") } " yet. Since sqlite do not support queries like
update "local_test_cassandra_restbase_T_simplejVX8RkRC4w_data" set "body" = ? where "key" = ? AND "tid" = ? if "body" = ?I am looking for a good solution to solve this case.
one way is that we handle such condition by running a select query.
Also, I am not sure if this is supportedupdate "local_test_cassandra_restbase_T_simplejVX8RkRC4w_data" set "body" = ? where "key" = ? AND "tid" = ? AND "body" = ?Should be.
You can use the latter query, as it is roughly a literal translation of the CQL above. However, note that this means that an additional index should be present, encompassing key, tid and body. This is not mandatory, but it greatly improves performance.
Cassandra seems to lists them as lightweight transaction
In SQLite things are much simpler as it's not a distributed system. Each update query is normally executed in its own transaction. IIRC at least with some drivers this happens implicitly, but if there are multiple queries (batch) all of them should be wrapped in a single transaction. This also improves performance.
Right. SQLite runs in autocommit mode by default. However, having a separate transaction for multiple queries whenever there is at least one UPDATE or INSERT is recommended (for consistency, but also performance).
I posted the link to the repo for the current work in my first comment here - https://phabricator.wikimedia.org/T88191#1016968
Here's the repo link - https://github.com/hardikj/restbase-mod-table-sqlite
Do you guys want me to move this to some other place?
I think we should have a well-defined workflow. My suggestion is to:
- fork the repo into the wikimedia space
- identify next mini-tasks and open PRs for them in the wikimedia repo so as to make it easier to follow progress and look over new code
What do you guys think? Makes sense?
For longer-running collaborative work outside of master I am normally in favor of using a remote branch in the wikimedia repository. We can then work with smaller PRs against that branch instead of one huge PR at the end.
The basic version has been implemented: https://github.com/wikimedia/restbase-mod-table-sqlite
Any following bugs/optimisations will be tracked under separate tickets.