Wednesday, December 28, 2005

LooSeQL

I've done quite a bit of work with SQL and there are two things that always occur to me when I'm using it: 1) it's really quite simple in terms of what each command does and how it does it and 2) it is really ugly; it is syntactically inelegant, inconsistent and inefficient. So what should I do? Yes, re-create it or at least a workable sub-set of it!

Lisp should be good for making a working SQL-alike. I can use bottom up design to abstract myself from list access to SQL's set like operators and I can get some practice using macros to replicate SQL's 3i syntax. And it'll give me something to do over the holidays, of course I'm currently unemployed so every day is a holiday! ;(

Layer 1: Basic database structure.
What does a database, table, column, row look like? How can I get tables from a database, columns or rows from a table, fields from a row etc. Here's what I decided on (in my own unique BNF that may or may not bear any resemblance to real BNF)
database -> (database (table*))
table -> (table (column*) (row*))
column -> (column column-name column-number)
row -> (row (datum*))

At this stage I can create items and get at their contents but other than that they are in the same table there is no relation between rows and columns.

Layer 2a: Adding/Removing items.
This layer just adds/removes tables to databases and columns and rows to tables.
It also maintains some basic data consistency like database, table and column name uniqueness and that rows have the correct number of elements for the tables they are added to.
I'm feeling pretty good about this because it allows me to do something that normal database systems generally can't: add and remove columns from tables in a live database while maintaining data consistency.

Layer 2b: Getting and setting data.
Using only table column and row objects (or their unique names) I can get and set at each data item uniquely.

Layer 3: Proto-SQL.
This is where things start to get interesting. I decided to make the SELECT statement and the row eliminator (WHERE clause functionality) return a temporary table. This seems like it could be pretty horrible for performance but it makes my life very easy because all the other operators can accept a temporary table just as easily as one that's connected to a database.
This layer also has the first macro the so called parse-where and my first ever use of eval-when for its sidekick expand-where.
This is also where the first major limitation occurs, consider:
WHERE (string= "COLUMN1" "VALUE1")
In normal SQL column names are double quoted and strings are single quoted. You can't do this (easily) in lisp because of #'quote/' so no database value can have the same value as a database, table or column name. I do plan to get around this using symbols at some point but not right now.

Layer 4: Real SQL-like.
This part is all macros that re-arrange, de-reference and error check the 4 implemented statements: SELECT, INSERT, UPDATE and DELETE-FROM (delete's already taken).
There are some real limitations: no order by, no joins, no sub-selects and some departures from SQL that I think make it better: where clauses are lisp expressions and the set clause for an update statement is a list of column-names and new values.


(in-package :looseql)
(defun test ()
(setf *databases* nil)
(let* ((database (make-database "DB1")))
(add-table (make-table "TABLE1") database)
(add-column (make-column "COLUMN0") (get-table-by-name database "TABLE1"))
(add-column (make-column "COLUMN1") (get-table-by-name database "TABLE1"))
(add-column (make-column "COLUMN2") (get-table-by-name database "TABLE1"))
(add-database database)
(setq *current-db* database)

(print 'inserts)
(insert INTO "TABLE1" VALUES ("INS00" "INS01" "INS02"))
(insert INTO "TABLE1" VALUES ("INS10" "INS11" "INS12"))
(insert INTO "TABLE1" VALUES ("INS20" "INS21" "INS22"))
(insert INTO "TABLE1" VALUES ("INS30" "INS31" "INS32"))
(insert INTO "TABLE1" VALUES ("INS40" "INS41" "INS42"))
(insert INTO "TABLE1" VALUES ("INS50" "INS51" "INS52"))


(print 'inserts-into-columns)
(insert INTO "TABLE1"("COLUMN0" "COLUMN1") VALUES ("INS0" "INS1"))
(insert INTO "TABLE1"("COLUMN1" "COLUMN2") VALUES ("INS1" "INS2"))
(insert INTO "TABLE1"("COLUMN0" "COLUMN2") VALUES ("INS0" "INS2"))

(print 'databases-after-inserts)
(print *databases*)

(print 'select-without-where)
(print (select ("COLUMN1" "COLUMN2") from "TABLE1"))

(print 'select-with-where)
(print (select ("COLUMN1" "COLUMN2") from "TABLE1" where
(or (string= "COLUMN0" "INS0")
(null "COLUMN1"))))

(print 'select-*)
(print (select * from "TABLE1"))

(print 'select-into)
(let (var)
(select ("COLUMN0") INTO (var) FROM "TABLE1"
WHERE (string= "COLUMN0" "INS50"))
(print var))

(print 'update)
(update "TABLE1" SET (("COLUMN1" "UPD1")
("COLUMN2" "UPD2"))
WHERE (not (or (string= "COLUMN0" "INS0")
(string= "COLUMN0" "INS00")
(string= "COLUMN0" "INS10")
(string= "COLUMN0" "INS20"))))

(print *databases*)

(print 'delete-from-with-where)
(delete-from "TABLE1" WHERE (null "COLUMN0"))

(print *databases*)

(print 'delete-from)
(delete-from "TABLE1")
;;...
))

I think that for three days work its not too shabby.
I also think that the name's pretty cool LooSeQL: Lisp SQL, loose because it's not only loosely SQL and also its loosely typed and finally I hear SQL refered to as 'sequel' a lot, I think 'squil' would be more accurate, so SeQL is better because we can all agree that it sounds like 'sequel'. It would be better if I used CLOS then I could say that it was Lisp Object Oriented SeQL, maybe next version.

Get the full source here.

5 comments:

Anonymous said...

dead sexy

Faré said...

You might be interested in AP5, and maybe combining it with AllegroCache, BKNR, Elephant, etc., etc.

Anonymous said...

Main problem with AP5 is that it doesn't seem to be open source, unfortunately - it has no licence I could find, so under current (rather idiotic/evil, but hey) law it defaults to source-available-closed, as copyright is currently automatic.

Anonymous said...

You might have an easier job implementing relational operators (which are functional) in Lisp for SELECTs at least. All the base relvars can be retrieved from the database:
(setf my-table (get-table (get-database dnname) tablename))

then all the relational operators are functions:
(restrict (project my-table "column1" "column2") "key" 2)
This would evaluate to a tuple (which could be a list, for added lispy-ness).

As you probably know, in relational algebra, INSERTs, DELETEs and UPDATEs are (roughly) A = A U r, A = A - r, and A = A - r + r'.
These will be a b*tch ("interesting problem" which has probably been worked on extensively) to 'compile' to SQL.

Fundamentally functional programming matches nicely with relational theory, I think. What do you think?

charlieb said...

Wow, anon3 you obviously know a lot more about database theory than I. I just have like 5 years using SQL every day and very little in the way of academic theory. I've often thought, as I said, that SQL wouldn't be too difficult to replicate and I was right, at least from a nieve, non-compiling, not caring about performance perspective.

I'm really not sure what you're trying to tell me. Could you explain further in layman's terms or point me at some resources?
I am learning prolog from the rather excellent "The Art of Prolog" which seems to touch on some of these ideas in it's database programming chapter but clearly it's more logic than database oriented.

Thanks for the interest, maybe I just need to formalise my knowledge a bit before I can be involved in a proper discussion of databases and set theory.