How to make a DB? [Draft]

It will built in C as most the db in the world, and using b-tree or b+ tree to store. the keypoint here are:

  • db store index and data seperately on filesystem
  • then ISAM GET/PUT to manipulate a record’s field
  • SQL statement will translate to a set of GET/PUT
  • a log of statment for transaction

Let first simply the question by store all data is a file. Assume a record takes 100 byte (which is defined by scheme type)

FirstName(30) LastName(20) Address(50)
foo1 bar1 foobar1
foo2 bar2 foobar2
foo3 bar3 foobar3

Q: How can I get the second record of firstname A: easy , lseek(file, sizeof(firstname),100)

Q: How can I get record of “bar1”? A: sequential search? ofc not. that is how INDEX comes in. If I already I have the POS of the record, then fetching it will be no brain.

so let create a index on “LastName”, so search the record ==> search the index in b-tree, and lseek the data file

                     bar2
                      130
                   /     \                   
                 bar1    bar3
                 30       230

then a table will have a data file and many index files(I am going to index every column)

Read

  • after have stored data and index files, using a ISAM http://en.wikipedia.org/wiki/ISAM, then I could GET/PUT
  • then I need a SQL parser that translate SQL statement into a sequence of GETs.
  • If two table is joined, then a optimizer is need to decide which table to read first.

Write

  • Insert parse into PUT statement
  • multiple INSERTs => transacation manager to make sure atomic
  • then a log of command if recored.
  • backup/restore….replicaion tools to read transcation log and replicate the transcation
  • it will need some connection manager the handle connections, authenticate, then run the request
  • commerical db direct write to raw disk because of the cacheing issue.

B-tree

  • one node can have multi value to reduce the layer#,
  • the size of one node depends on page size.
  • parent size is a, then it got a+1 children

Distributed

If all data is in one machine, the problem of network partition didn’t show up. But if now we are can’t handle the request, what should I do?

  1. bought a power machine (do you smell expensive?)
  2. distribute database.

If I distribute data in multi database.

  1. first we have a Consistency issue, read after write may have wrong result
  2. => lock down and sync
  3. then we have Availability issue, if all db are waiting for sync
  4. =>message offline node about updates or hire a run around boy
  5. then what if the db network is partitioned?
    • if I block read before sync is secured, I sacrificed A to C
    • if I let read happen before sync is secured, I sacrificed C to A
      • conflict? vector-clock or app-specifi logic

REF:

  • http://www.sqlite.org/arch.html
  • http://sqlite.org/atomiccommit.html
  • http://www.sqlite.org/fileformat.html
Written on February 1, 2015