Page MenuHomePhabricator

Solve database schema (SQL/NoSQL) related issues
Closed, ResolvedPublic

Description

Currently stuck with "No support for ALTER of constraints in SQLite dialect")
NotImplementedError: No support for ALTER of constraints in SQLite dialect error with SQLite in SQLAlchemy. Discussing plans to switch over to either SQL or NoSQL.

http://stackoverflow.com/a/2581859
http://stackoverflow.com/questions/2559411/sql-mysql-vs-nosql-couchdb

Current schema is
CREATE TABLE roles
(
id INT,
name VARCHAR (64) UNIQUE,
defaultpermission BOOLEAN DEFAULT False,
permissions INT,
PRIMARY KEY (id)
);

CREATE TABLE reviewers
(
id INT,
email VARCHAR(64) UNIQUE,
username VARCHAR(64) UNIQUE,
password_hash VARCHAR(128),
role_id INT,
confirmed BOOLEAN,
agreement FLOAT,
reputation FLOAT,
PRIMARY KEY (id),
FOREIGN KEY (role_id) REFERENCES roles(id)
);

Event Timeline

Could we not move from SQLite to MySQL? The error is only due to SQLite's limitation, which will not move over to MySQL.

Absolutely, @prnk28, I am completely convinced that vlermv (please see
http://pythonhosted.org/vlermv/ and https://thomaslevine.com/!/vlermv/
) will:

(1) vastly simplify your code;
(2) make both general database administration and DBA scripts (such as
for archiving old records) much easier;
(3) work across any platform;
(4) not require SQLAlchemy or that other tool for changing schemas as
you add features;
(5) integrate perfectly with Python objects, so you can easily move to
other databases in production;
(6) scale to hundreds of millions of records and tens of thousands of
concurrent users;
(7) require far less memory and CPU overhead than any other SQL and
almost all other noSQL solutons; and
(8) let you prototype and experiment with code changes much faster.

If you have any reason to doubt any of these, please let me know ASAP.
I am currently writing a very small flask app with vlermv.

There is a disadvantage, though: "I chose the name 'vlermv' by banging
on the keyboard; this is how I have been naming things now that I have
discovered Dada." -- http://pythonhosted.org/vlermv/history/

Lol.

Are the links provided the only place to learn more about this database? I searched the web but couldn't find any other sites for reference, including stackoverflow. Although the db might be good enough for all the points that you listed, I'm afraid there isn't an active developer community built around it. Where do I look to if I run into errors? And what is the guarantee that this db will have support and maintenance in the future? Or is there good enough community support that I'm not aware of? Experimenting with a software that is at its nascent stage for something as big as a database might be a little too risky. Your thoughts?

Since it is open source, you can debug from the source, but the simplicity
makes it very robust. The author wrote me a lengthy email. Let me forward
your questions to him and ask his thoughts....

The author coincidentally has a Wikimedia account, so I'll post all of my answers directly.

First email

Calling it a "database" might have been a bad choice on my part because
people get disappointed and sometimes offended by the simplicity.
On the other hand, that is exactly the point; complex databases are so
versatile that people often use them when much simpler alternatives
would suffice.

Anyway, I now consider vlermv a "thing that lets you pretend that your
filesystem is a dictionary", or a "dict-filesystem mapper".

Are the links provided the only place to learn more about this database?
I searched the web but couldn't find any other sites for reference,
including stackoverflow.

Below is the only documentation that I know of.

The last of these provides simple examples of it in use. I can send more
examples if you think it would help.

Although the db might be good enough for all the points that
you listed, I'm afraid there isn't an active developer community built
around it.

I'm the only developer as far as I know; anyone else has extended it has
kept the extensions private.

Where do I look to if I run into errors?

You can email or call me.

And what is the guarantee that this db will have support and
maintenance in the future?

There is no guarantee; or, the guarantee comes about because vlermv is
so simple. The whole point of vlermv, and much of my stuff, is that you
should be able to figure out how your program worked once you have lost
the program, the documentation, and the source code and are left only
with the user data.

Here are some very bad notes on this ideal and a related mindset.
https://thomaslevine.com/!/how-i-write-data-analysis-software/#legibility

Vlermv helps you name your files meaningfully, and it stores your data
as files with whatever serialization you specify. The default
serialization is pickle, which is included in standard Python.
If you are working with data that can be serialized legibly in a text
format, you can change this to a more obvious format.

I suspect that part of your project will involve downloading stuff from
Wikipedia and doing substantial processing on them. For that I would
probably store only the HTTP responses, and I would store them as
pickles. If you would prefer a text format, I can add a feature that
I have been meaning to add for years: Serializing and deserializing
python-requests objects as JSON. Testing libraries already have this
feature, but I, amusingly, would mostly have to do a lot of testing to
make sure that it doesn't throw away any important information.

If you tell me more about how you are thinking about using vlermv,
I can give you more specific advice on how to develop a legible
filesystem that will be easier to maintain and support without knowledge
of how vlermv works.

Here is a link to a description of your project, for my memory.
https://phabricator.wikimedia.org/T89416

I should warn you that your concern for long-term support of software,
and Vlermv's design for simplicity and legibility, can be a major
problem in many situations. The purpose of much software development,
especially in business, is to maintain the illusion that one is engaged
in challenging and important work that one is uniquely qualified for;
and to inflict pain on oneself so that one may feel worthy of any good
things that happen to oneself. Most software is never used, so it often
doesn't even matter whether the software works; and of the software that
is used, most of it is completely unnecessary because better things
already existed. The tech industry is fundamentally the ecosystem
surrounding the idea that the world has big problems and that computers
will save us; success in a career is about indulging this faith and
ignoring the long-term consequences. Robust software thus has very
little place in business, so if you are considering a career in tech,
I recommend that you suspend your appreciation for maintainability,
voicing it only at politically opportune moments.
https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD896.html
https://en.wikipedia.org/wiki/Protestant_work_ethic

Or is there good enough community support that I'm not aware of?
Experimenting with a software that is at its nascent stage for
something as big as a database might be a little too risky. Your
thoughts?

Another option is to write your own, even simpler version of vlermv that
has only the features you need. This might in fact be the most common
way of using vlermv. Hmm I suppose I could set up an email list around
this topic. These two documentation sections might be helpful.
https://docs.python.org/3.5/library/collections.abc.html#collections.abc.MutableMapping
https://thomaslevine.com/!/vlermv/#parametrized-decorators

Some more notes, since you propose to develop a dynamic website and
I don't really know what you're considering doing with vlermv

If you use vlermv for things like user credentials and recording of
reviews, you should consider the database guarantees. This consideration
can often be skipped for relational databases because their guarantees
are so strong.
https://pythonhosted.org/vlermv/acid/

Following my preference towards things that work after I have lost the
source code and am left only with the data, I would record stuff like
this in a log that you can replay. You could do that in vlermv by
putting the datetime in the filename, but it could be slow to replay
if you wind up with lots of files. I have used LevelDB for this, as you
might be able to decipher in this package, but I have had trouble with
LevelDBs getting corrupted, so I try to avoid this.
http://dada.pink/scott2/levelhistory/

And now I am thinking, "Maybe I want to add a file-based log thingy to
vlermv".

Feel free to send me more questions. I am quite happy to answer
questions privately over email because I eventually work my answers into
the public-facing documentation.

Second email

Oh, now I see this.
https://phabricator.wikimedia.org/T137739

If you are only using the database for user stuff, you might be consider
a library focused specifically on that.

I have used bottle-beaker, and that with bottle-cork might be
appropriate for you. Both of these libraries abstract the data storage,
so you won't have to concern yourself with the schema if these already
work for you.
http://bottlepy.org/docs/dev/plugins/index.html

Of course, you are not using bottle, so you would have to find something
equivalent that works with flask or general WSGI.

Here is an example of using beaker with WebOb. You should be able to do
something similar with Flask, but I would look for a Flask-specific
beaker library first.
http://dada.pink/scott2/scott/server/session.py
http://dada.pink/scott2/scott/server/main.py

More comments
I conversed with once with John Vandenberg about vlermv and a related library of mine that preceded it. It seems that he too is involved in GSoC wiki bots.

Vlermv is better than shelve, too, because it supports concurrency
automatically without having to call sync()

Completely shifted over to file based data management.