SQLite3 storage module back-end for RESTBase
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• mobrovac
	Jan 31 2015, 12:18 AM

Description

We need an SQLite3 storage back-end in order to support lightweight users and test development/test environments.

Related Objects
Search...

Status	Assigned	Task
Duplicate	None	T88190 Implement alternative storage modules, for example SQLite3
Resolved	• Pchelolo	T107162 Enable restbase testing with SQLite backed
Resolved	• mobrovac	T107447 Switch to SQLite as the RESTBase back-end in MW-Vagrant
Resolved	• Pchelolo	T88191 SQLite3 storage module back-end for RESTBase
Resolved	• Pchelolo	T89888 Support 'static' columns in SQLite3 module back-end for RESTBase
Resolved	• Pchelolo	T103406 move restbase-mod-table-cassandra tests and spec into a separate module
Resolved	Hardikj	T99696 cleanup and refactor test/index.js

Event Timeline

• mobrovac created this task.Jan 31 2015, 12:18 AM

• mobrovac assigned this task to Hardikj.

• mobrovac raised the priority of this task from to Medium.

• mobrovac updated the task description. (Show Details)

• mobrovac added a project: RESTBase.

• mobrovac added subscribers: • Jdouglas, Hardikj, • GWicke and 2 others.

Aklapper renamed this task from SQLite3 storage module to SQLite3 storage module back-end for RESTBase.Feb 2 2015, 6:30 PM

Aklapper set Security to None.

Current work on this is hosted here https://github.com/hardikj/restbase-mod-table-sqlite

Right now, i am trying to fix the PUT path in the module so it matches the put path of restbase-cassandra module.

In cassandra we use "insert" only for 'if not exists', or when no non-primary-key attributes are specified and "update" for other cases (upsert). Since "update" query also perform insert this is handled easily by cassandra. But, In sqlite3 a "update" query don't perform insert if the row doesn't exists.

So to solve the situation we have two options

To use insert or replace, http://sqlite.org/lang_conflict.html.
first perform a select and do an update
have a parameter in req which specify this is update query.

My vote is to the first option here, this includes

use a insert or replace for upsert
check if a 'if' attribute is not 'not exists' and contain a condition do a update

In T88191#1016968, @Hardikj wrote:

In cassandra we use "insert" only for 'if not exists', or when no non-primary-key attributes are specified and "update" for other cases (upsert). Since "update" query also perform insert this is handled easily by cassandra. But, In sqlite3 a "update" query don't perform insert if the row doesn't exists.

Actually, if not exists is used only for table creation inside the Cassandra module. However, since this is a purely-Cassandra construct, I wouldn't rely on it.

To use insert or replace, http://sqlite.org/lang_conflict.html.

Hmmm, replace is too destructive, IMHO, as it triggers cascading deletes in case a PK exists.

first perform a select and do an update

I'd vote for this option, definitely. While it seems complicated, this is the way SQL logic should go. An alternative to that is to preform a SELECT first to check if a certain row is present. In either case, using transactions is a must. We should also evaluate whether we strive for data correctness, in which case tables should be also locked when inserting a new record.

have a parameter in req which specify this is update query.

This implies a user's knowledge of the data's structure and contents, which makes the module much harder to use (and to some extent incompatible with the Cassandra one).

• mobrovac moved this task from Backlog to In progress on the RESTBase board.Feb 5 2015, 5:58 AM

To use insert or replace, http://sqlite.org/lang_conflict.html.

Hmmm, replace is too destructive, IMHO, as it triggers cascading deletes in case a PK exists.

@mobrovac I agree that replace triggers cascading deletes but how about considering this option as defined here http://stackoverflow.com/questions/418898/sqlite-upsert-not-insert-or-replace.

• GWicke mentioned this in T1228: RESTbase deployment.Feb 5 2015, 11:15 PM

In an IRC discussion we decided it's best to go the insert + update (or viceversa) way for the sake of correctness. Alternative approaches are to be investigated. Specifically, we should try to figure out the exact write patterns used by RESTBase, i.e. the ratio of inserts vs updates. This will allow us to optimise the queries (and their ordering perhaps as well).

In T88191#1017030, @mobrovac wrote:

In cassandra we use "insert" only for 'if not exists', or when no non-primary-key attributes are specified and "update" for other cases (upsert). Since "update" query also perform insert this is handled easily by cassandra. But, In sqlite3 a "update" query don't perform insert if the row doesn't exists.

Actually, if not exists is used only for table creation inside the Cassandra module. However, since this is a purely-Cassandra construct, I wouldn't rely on it.

We actually have OR IGNORE in sqlite3 that allow same functionality as 'if not exists' so we can use this in sqlite module as well.
I think we should try to keep the user inputs (request) and the outer layer similar to what we give to cassandra. Since the tests that will run on this module should eventually pass on cassandra.

Actually, if not exists is used only for table creation inside the Cassandra module. However, since this is a purely-Cassandra construct, I wouldn't rely on it.

It is actually part of the table storage interface, it just sounds similar to the cassandra feature.

We actually have OR IGNORE in sqlite3 that allow same functionality as 'if not exists' so we can use this in sqlite module as well.
I think we should try to keep the user inputs (request) and the outer layer similar to what we give to cassandra. Since the tests that will run on this module should eventually pass on cassandra.

+1. You can translate the if: 'exists' case in requests to a plain update (without OR IGNORE) in sqlite.

Current Status: Create table, Get and Put path seems to work fine for the most used cases.
Next Task: refine secondary index creation

Put path do not support queries with " if : { body: { "eq": new Buffer("Service Oriented Architecture") } " yet. Since sqlite do not support queries like

update "local_test_cassandra_restbase_T_simplejVX8RkRC4w_data" set "body" = ?  where "key" = ? AND "tid" = ? if "body" = ?

I am looking for a good solution to solve this case.
one way is that we handle such condition by running a select query.
Also, I am not sure if this is supported

update "local_test_cassandra_restbase_T_simplejVX8RkRC4w_data" set "body" = ?  where "key" = ? AND "tid" = ? AND "body" = ?

Cassandra seems to lists them as lightweight transaction
http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_ltwt_transaction_c.html

In T88191#1023016, @Hardikj wrote:

Current Status: Create table, Get and Put path seems to work fine for the most used cases.
Next Task: refine secondary index creation

Do you have a branch to look at? Are you using SQLite's built-in secondary indexes?

Put path do not support queries with " if : { body: { "eq": new Buffer("Service Oriented Architecture") } " yet. Since sqlite do not support queries like
update "local_test_cassandra_restbase_T_simplejVX8RkRC4w_data" set "body" = ? where "key" = ? AND "tid" = ? if "body" = ?
I am looking for a good solution to solve this case.
one way is that we handle such condition by running a select query.
Also, I am not sure if this is supported
update "local_test_cassandra_restbase_T_simplejVX8RkRC4w_data" set "body" = ? where "key" = ? AND "tid" = ? AND "body" = ?

Should be.

Cassandra seems to lists them as lightweight transaction

In SQLite things are much simpler as it's not a distributed system. Each update query is normally executed in its own transaction. IIRC at least with some drivers this happens implicitly, but if there are multiple queries (batch) all of them should be wrapped in a single transaction. This also improves performance.

In T88191#1023093, @GWicke wrote:

In T88191#1023016, @Hardikj wrote:

Current Status: Create table, Get and Put path seems to work fine for the most used cases.
Next Task: refine secondary index creation

Do you have a branch to look at? Are you using SQLite's built-in secondary indexes?

I'd like to see the progress as well :)

As for secondary indices, SQLite supports multiple indices to be defined on a table, so no additional logic should be needed.

Put path do not support queries with " if : { body: { "eq": new Buffer("Service Oriented Architecture") } " yet. Since sqlite do not support queries like
update "local_test_cassandra_restbase_T_simplejVX8RkRC4w_data" set "body" = ? where "key" = ? AND "tid" = ? if "body" = ?
I am looking for a good solution to solve this case.
one way is that we handle such condition by running a select query.
Also, I am not sure if this is supported
update "local_test_cassandra_restbase_T_simplejVX8RkRC4w_data" set "body" = ? where "key" = ? AND "tid" = ? AND "body" = ?
Should be.

You can use the latter query, as it is roughly a literal translation of the CQL above. However, note that this means that an additional index should be present, encompassing key, tid and body. This is not mandatory, but it greatly improves performance.

Cassandra seems to lists them as lightweight transaction

In SQLite things are much simpler as it's not a distributed system. Each update query is normally executed in its own transaction. IIRC at least with some drivers this happens implicitly, but if there are multiple queries (batch) all of them should be wrapped in a single transaction. This also improves performance.

Right. SQLite runs in autocommit mode by default. However, having a separate transaction for multiple queries whenever there is at least one UPDATE or INSERT is recommended (for consistency, but also performance).

I posted the link to the repo for the current work in my first comment here - https://phabricator.wikimedia.org/T88191#1016968

Here's the repo link - https://github.com/hardikj/restbase-mod-table-sqlite
Do you guys want me to move this to some other place?

I think we should have a well-defined workflow. My suggestion is to:

fork the repo into the wikimedia space
identify next mini-tasks and open PRs for them in the wikimedia repo so as to make it easier to follow progress and look over new code

What do you guys think? Makes sense?

Yup, that makes sense to me.

For longer-running collaborative work outside of master I am normally in favor of using a remote branch in the wikimedia repository. We can then work with smaller PRs against that branch instead of one huge PR at the end.

In T88191#1025137, @GWicke wrote:

For longer-running collaborative work outside of master I am normally in favor of using a remote branch in the wikimedia repository. We can then work with smaller PRs against that branch instead of one huge PR at the end.

Yup. That was my general idea. Ok, I'm forking the sqlite mod repo into wikimedia.

OK, forked it to wikimedia . @GWicke you will have to connect it to travis, though. It seems I cannot add it there from my travis account.

I have deleted wikimedia/restbase-mod-table-sqlite. @Hardikj please transfer your repo to me or @GWicke so we can put it under wikimedia (go to settings > transfer in your repo).

Aklapper added a parent task: T22257: [DO NOT USE] SQLite support (tracking).Feb 19 2015, 11:54 AM

• GWicke reassigned this task from Hardikj to Eevans.Jun 20 2015, 12:17 AM

• GWicke merged a task: T88190: Implement alternative storage modules, for example SQLite3.

Eevans mentioned this in T103406: move restbase-mod-table-cassandra tests and spec into a separate module.Jun 22 2015, 7:01 PM

Eevans added a subtask: T103406: move restbase-mod-table-cassandra tests and spec into a separate module.Jun 22 2015, 7:35 PM

• Pchelolo closed subtask T103406: move restbase-mod-table-cassandra tests and spec into a separate module as Resolved.Jul 20 2015, 4:49 PM

Eevans reassigned this task from Eevans to • Pchelolo.Jul 20 2015, 4:50 PM

• Pchelolo mentioned this in T89888: Support 'static' columns in SQLite3 module back-end for RESTBase.Jul 20 2015, 4:51 PM

• Pchelolo closed subtask T89888: Support 'static' columns in SQLite3 module back-end for RESTBase as Resolved.

• Pchelolo added a parent task: T107162: Enable restbase testing with SQLite backed.Jul 28 2015, 4:23 PM

• mobrovac mentioned this in T107447: Switch to SQLite as the RESTBase back-end in MW-Vagrant.Jul 30 2015, 5:48 PM

• mobrovac added a parent task: T107447: Switch to SQLite as the RESTBase back-end in MW-Vagrant.

The basic version has been implemented: https://github.com/wikimedia/restbase-mod-table-sqlite
Any following bugs/optimisations will be tracked under separate tickets.

• Pchelolo closed this task as Resolved.Jul 30 2015, 5:49 PM

tstarling added a project: SQLite.May 19 2017, 8:20 AM

tstarling removed a parent task: T22257: [DO NOT USE] SQLite support (tracking).May 19 2017, 8:28 AM

SQLite3 storage module back-end for RESTBaseClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

SQLite3 storage module back-end for RESTBase
Closed, ResolvedPublic
Actions

Related Objects
Search...