Page MenuHomePhabricator

SQLite3 storage module back-end for RESTBase
Closed, ResolvedPublic

Description

We need an SQLite3 storage back-end in order to support lightweight users and test development/test environments.

Event Timeline

mobrovac assigned this task to Hardikj.
mobrovac raised the priority of this task from to Medium.
mobrovac updated the task description. (Show Details)
mobrovac added a project: RESTBase.
Aklapper renamed this task from SQLite3 storage module to SQLite3 storage module back-end for RESTBase.Feb 2 2015, 6:30 PM
Aklapper set Security to None.

Current work on this is hosted here https://github.com/hardikj/restbase-mod-table-sqlite

Right now, i am trying to fix the PUT path in the module so it matches the put path of restbase-cassandra module.

In cassandra we use "insert" only for 'if not exists', or when no non-primary-key attributes are specified and "update" for other cases (upsert). Since "update" query also perform insert this is handled easily by cassandra. But, In sqlite3 a "update" query don't perform insert if the row doesn't exists.

So to solve the situation we have two options

  1. To use insert or replace, http://sqlite.org/lang_conflict.html.
  2. first perform a select and do an update
  3. have a parameter in req which specify this is update query.

My vote is to the first option here, this includes

  1. use a insert or replace for upsert
  2. check if a 'if' attribute is not 'not exists' and contain a condition do a update

In cassandra we use "insert" only for 'if not exists', or when no non-primary-key attributes are specified and "update" for other cases (upsert). Since "update" query also perform insert this is handled easily by cassandra. But, In sqlite3 a "update" query don't perform insert if the row doesn't exists.

Actually, if not exists is used only for table creation inside the Cassandra module. However, since this is a purely-Cassandra construct, I wouldn't rely on it.

  1. To use insert or replace, http://sqlite.org/lang_conflict.html.

Hmmm, replace is too destructive, IMHO, as it triggers cascading deletes in case a PK exists.

  1. first perform a select and do an update

I'd vote for this option, definitely. While it seems complicated, this is the way SQL logic should go. An alternative to that is to preform a SELECT first to check if a certain row is present. In either case, using transactions is a must. We should also evaluate whether we strive for data correctness, in which case tables should be also locked when inserting a new record.

  1. have a parameter in req which specify this is update query.

This implies a user's knowledge of the data's structure and contents, which makes the module much harder to use (and to some extent incompatible with the Cassandra one).

To use insert or replace, http://sqlite.org/lang_conflict.html.

Hmmm, replace is too destructive, IMHO, as it triggers cascading deletes in case a PK exists.

@mobrovac I agree that replace triggers cascading deletes but how about considering this option as defined here http://stackoverflow.com/questions/418898/sqlite-upsert-not-insert-or-replace.

In an IRC discussion we decided it's best to go the insert + update (or viceversa) way for the sake of correctness. Alternative approaches are to be investigated. Specifically, we should try to figure out the exact write patterns used by RESTBase, i.e. the ratio of inserts vs updates. This will allow us to optimise the queries (and their ordering perhaps as well).

In cassandra we use "insert" only for 'if not exists', or when no non-primary-key attributes are specified and "update" for other cases (upsert). Since "update" query also perform insert this is handled easily by cassandra. But, In sqlite3 a "update" query don't perform insert if the row doesn't exists.

Actually, if not exists is used only for table creation inside the Cassandra module. However, since this is a purely-Cassandra construct, I wouldn't rely on it.

We actually have OR IGNORE in sqlite3 that allow same functionality as 'if not exists' so we can use this in sqlite module as well.
I think we should try to keep the user inputs (request) and the outer layer similar to what we give to cassandra. Since the tests that will run on this module should eventually pass on cassandra.

Actually, if not exists is used only for table creation inside the Cassandra module. However, since this is a purely-Cassandra construct, I wouldn't rely on it.

It is actually part of the table storage interface, it just sounds similar to the cassandra feature.

We actually have OR IGNORE in sqlite3 that allow same functionality as 'if not exists' so we can use this in sqlite module as well.
I think we should try to keep the user inputs (request) and the outer layer similar to what we give to cassandra. Since the tests that will run on this module should eventually pass on cassandra.

+1. You can translate the if: 'exists' case in requests to a plain update (without OR IGNORE) in sqlite.

Current Status: Create table, Get and Put path seems to work fine for the most used cases.
Next Task: refine secondary index creation

Put path do not support queries with " if : { body: { "eq": new Buffer("<p>Service Oriented Architecture</p>") } " yet. Since sqlite do not support queries like

update "local_test_cassandra_restbase_T_simplejVX8RkRC4w_data" set "body" = ?  where "key" = ? AND "tid" = ? if "body" = ?

I am looking for a good solution to solve this case.
one way is that we handle such condition by running a select query.
Also, I am not sure if this is supported

update "local_test_cassandra_restbase_T_simplejVX8RkRC4w_data" set "body" = ?  where "key" = ? AND "tid" = ? AND "body" = ?

Cassandra seems to lists them as lightweight transaction
http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_ltwt_transaction_c.html

Current Status: Create table, Get and Put path seems to work fine for the most used cases.
Next Task: refine secondary index creation

Do you have a branch to look at? Are you using SQLite's built-in secondary indexes?

Put path do not support queries with " if : { body: { "eq": new Buffer("<p>Service Oriented Architecture</p>") } " yet. Since sqlite do not support queries like

update "local_test_cassandra_restbase_T_simplejVX8RkRC4w_data" set "body" = ?  where "key" = ? AND "tid" = ? if "body" = ?

I am looking for a good solution to solve this case.
one way is that we handle such condition by running a select query.
Also, I am not sure if this is supported

update "local_test_cassandra_restbase_T_simplejVX8RkRC4w_data" set "body" = ?  where "key" = ? AND "tid" = ? AND "body" = ?

Should be.

Cassandra seems to lists them as lightweight transaction

In SQLite things are much simpler as it's not a distributed system. Each update query is normally executed in its own transaction. IIRC at least with some drivers this happens implicitly, but if there are multiple queries (batch) all of them should be wrapped in a single transaction. This also improves performance.

Current Status: Create table, Get and Put path seems to work fine for the most used cases.
Next Task: refine secondary index creation

Do you have a branch to look at? Are you using SQLite's built-in secondary indexes?

I'd like to see the progress as well :)

As for secondary indices, SQLite supports multiple indices to be defined on a table, so no additional logic should be needed.

Put path do not support queries with " if : { body: { "eq": new Buffer("<p>Service Oriented Architecture</p>") } " yet. Since sqlite do not support queries like

update "local_test_cassandra_restbase_T_simplejVX8RkRC4w_data" set "body" = ?  where "key" = ? AND "tid" = ? if "body" = ?

I am looking for a good solution to solve this case.
one way is that we handle such condition by running a select query.
Also, I am not sure if this is supported

update "local_test_cassandra_restbase_T_simplejVX8RkRC4w_data" set "body" = ?  where "key" = ? AND "tid" = ? AND "body" = ?

Should be.

You can use the latter query, as it is roughly a literal translation of the CQL above. However, note that this means that an additional index should be present, encompassing key, tid and body. This is not mandatory, but it greatly improves performance.

Cassandra seems to lists them as lightweight transaction

In SQLite things are much simpler as it's not a distributed system. Each update query is normally executed in its own transaction. IIRC at least with some drivers this happens implicitly, but if there are multiple queries (batch) all of them should be wrapped in a single transaction. This also improves performance.

Right. SQLite runs in autocommit mode by default. However, having a separate transaction for multiple queries whenever there is at least one UPDATE or INSERT is recommended (for consistency, but also performance).

I posted the link to the repo for the current work in my first comment here - https://phabricator.wikimedia.org/T88191#1016968

Here's the repo link - https://github.com/hardikj/restbase-mod-table-sqlite
Do you guys want me to move this to some other place?

I think we should have a well-defined workflow. My suggestion is to:

  • fork the repo into the wikimedia space
  • identify next mini-tasks and open PRs for them in the wikimedia repo so as to make it easier to follow progress and look over new code

What do you guys think? Makes sense?

For longer-running collaborative work outside of master I am normally in favor of using a remote branch in the wikimedia repository. We can then work with smaller PRs against that branch instead of one huge PR at the end.

For longer-running collaborative work outside of master I am normally in favor of using a remote branch in the wikimedia repository. We can then work with smaller PRs against that branch instead of one huge PR at the end.

Yup. That was my general idea. Ok, I'm forking the sqlite mod repo into wikimedia.

OK, forked it to wikimedia . @GWicke you will have to connect it to travis, though. It seems I cannot add it there from my travis account.

I have deleted wikimedia/restbase-mod-table-sqlite. @Hardikj please transfer your repo to me or @GWicke so we can put it under wikimedia (go to settings > transfer in your repo).

The basic version has been implemented: https://github.com/wikimedia/restbase-mod-table-sqlite
Any following bugs/optimisations will be tracked under separate tickets.