How to make a DB? [Draft]
It will built in C as most the db in the world, and using b-tree or b+ tree to store. the keypoint here are:
- db store index and data seperately on filesystem
- then ISAM GET/PUT to manipulate a record’s field
- SQL statement will translate to a set of GET/PUT
- a log of statment for transaction
Let first simply the question by store all data is a file. Assume a record takes 100 byte (which is defined by scheme type)
FirstName(30) | LastName(20) | Address(50) |
---|---|---|
foo1 | bar1 | foobar1 |
foo2 | bar2 | foobar2 |
foo3 | bar3 | foobar3 |
Q: How can I get the second record of firstname A: easy , lseek(file, sizeof(firstname),100)
Q: How can I get record of “bar1”? A: sequential search? ofc not. that is how INDEX comes in. If I already I have the POS of the record, then fetching it will be no brain.
so let create a index on “LastName”, so search the record ==> search the index in b-tree, and lseek the data file
bar2
130
/ \
bar1 bar3
30 230
then a table will have a data file and many index files(I am going to index every column)
Read
- after have stored data and index files, using a ISAM http://en.wikipedia.org/wiki/ISAM, then I could GET/PUT
- then I need a SQL parser that translate SQL statement into a sequence of GETs.
- If two table is joined, then a optimizer is need to decide which table to read first.
Write
- Insert parse into PUT statement
- multiple INSERTs => transacation manager to make sure atomic
- then a log of command if recored.
- backup/restore….replicaion tools to read transcation log and replicate the transcation
- it will need some connection manager the handle connections, authenticate, then run the request
- commerical db direct write to raw disk because of the cacheing issue.
B-tree
- one node can have multi value to reduce the layer#,
- the size of one node depends on page size.
- parent size is a, then it got a+1 children
Distributed
If all data is in one machine, the problem of network partition didn’t show up. But if now we are can’t handle the request, what should I do?
- bought a power machine (do you smell expensive?)
- distribute database.
If I distribute data in multi database.
- first we have a Consistency issue, read after write may have wrong result
- => lock down and sync
- then we have Availability issue, if all db are waiting for sync
- =>message offline node about updates or hire a run around boy
- then what if the db network is partitioned?
- if I block read before sync is secured, I sacrificed A to C
- if I let read happen before sync is secured, I sacrificed C to A
- conflict? vector-clock or app-specifi logic
REF:
- http://www.sqlite.org/arch.html
- http://sqlite.org/atomiccommit.html
- http://www.sqlite.org/fileformat.html