ntoll.org

(Everything I say is false...)
home | about | articles | presentations | cv | contact

FluidDB is not CouchDB (and FluidDB's secret sauce)

Friday 05 February 2010 (10:11PM)

FluidDBWordle

My last post resulted in the following question from John Erickson.

"I've only just discovered FluidDB and the reoccurring question is, ‘how is FluidDB different/better than CouchDB??'"

I'll try to answer as best I can.

The usual caveat applies when using a loaded word like "better" – it depends on what you want/need. As I can't speak for anyone else I'll only do a compare and contrast.

Lets concentrate on the similarities first… both databases are flat collections of items ("documents" in CouchDB, "objects" in FluidDB) with a structure based upon the classic subject/predicate/value triple:

CouchDB: Documents have fields with values.
FluidDB: Objects are tagged (often with values)

Documents <=> Objects (both are identified by a UUID),
Fields <=> Tags (both are dynamically typed),
Values <=> Values

A simple difference in terminology.

(Actually, in FluidDB an object/tag pair does not need to have an associated value. Simply associating a tag with an object can provide enough information. For example, the object that represents the book Dune might have the "ntoll/has_read" tag associated with it and as I am the only person within FluidDB to have permission to associate this tag with objects you can infer from the name of the tag that I once read the referenced book. I'm getting ahead of myself here, but it's important to point out that a tag-value is optional in FluidDB.)

The other obvious similarity is that both DBs have a RESTful API (hence the potential for CouchApps and Flow – web-applications as data hosted within their own database).

Now for the differences in no particular order:

  1. There is only one FluidDB floating "up there" in the cloud and it holds everyone's data. With CouchDB companies and individuals are responsible for hosting their own instance(s) and such instances are usually created for a specific purpose/application.
  2. The predicate part of the triple is very different. Fields in CouchDB are simply names of values. In FluidDB tags can be organised with namespaces. You start with an empty root namespace named after your username and create tags and namespaces underneath this. Just to be clear here, you can't associate an object with a tag that isn't yet defined – and yes, tags and namespaces are also objects in FluidDB so you can meta-tag. ;-)
  3. Querying data is very different. CouchDB's "Views" are pre-defined map/reduce based algorithms. FluidDB provides an uber-minimalist (yet still evolving) SQL-like declarative language.
  4. Security and permissions in CouchDB are document centric and define who can read data and what validation steps a user needs to pass before they can write data to a document (CouchDB also has the concept of an admin account). The "model of control" (as Terry Jones [creator of FluidDB] calls it) is FluidDB's killer feature: permissions apply only to tags, namespaces and tag-values. All objects are public and can be tagged by anyone.

The implications of point four are not at first obvious but it is definitely this feature that sets FluidDB apart from CouchDB and every other database that I know of. Because of this I'll spend the rest of this post illustrating why the model of control is so important.

Because all objects can be tagged by anyone lots of interesting information/behaviour begins to emerge and become possible.

Dune

Remember the "Dune" example? Because only I have permission to associate the "ntoll/has_read" tag with an object you can be pretty certain only those objects with this tag have been read by me. I can also control who sees my tags, their values and even allow other trusted users the ability to add, edit, and delete namespaces, tags and tag-values (in a similar way to being able to set permissions on a filesystem). This is important because it allows people to collaborate: users who are food enthusiasts could all agree to use the "foodies/rating" and "foodies/review" tags on objects representing restaurants. Only those users enthusiastic/trusted enough would be allowed into the group with permission to associate such tags. Furthermore, as "insiders" they might also have created a tag called "foodies/discount" that is only visible to them and, when attached to an object representing a restaurant, explains how to get a discount.

Even though some objects have an optional "fluiddb/about" tag (holding a unique value and set by the object's creator to provide some guidance as to what the object is supposed to represent) the only way to find out what an object really represents is to look at what tags and values are associated with it. The tags and associated values cast an outline of a sort of "data-shadow" identifying the object's referent.

For example, the object with the fluiddb/about tag-value "book:DUNE" might have the following subset of tags/values associated with it:


{

fluiddb/about: "book:DUNE",

ntoll/has_read, 

tim_oreilly/has_read, 

amazon.com/type: "Paperback Book",

amazon.com/title: "Dune", 

amazon.com/author: "Frank Herbert", 

amazon.com/isbn: "ISBN123456789", 

amazon.com/price: "$9.99", 

amazon.com/genre: "Sci-Fi", 

amazon.com/cover: (BINARY DATA FOR A PNG FILE),

tom/comment: "I really like the Sandworms", 

dick/opinion: "Far fetched and obtuse", 

sally/rating: 5,

books/other_editions: ['UUID for object x', 'UUID for object y', 'UUID for object z'],

books/isbn: "ISBN 123456789",

books/title: "Dune",

books/author: ['UUID of object representing Frank Herbert']

books/publisher: "Chilton Books",

books/first_published: "1965"

...

} 

Now, consider the following:

Another side-effect of the FluidDB model of control is that completely unrelated applications will be able to share data. This is already happening: Terry has written an application called Tickery that has imported lots of information from Twitter under the "twitter.com" namespace in FluidDB. Because this data is open for everyone to read I can make use of it from within Flow and carry out exactly the same searches as Tickery does (e.g. "has twitter.com/friends/jack and has twitter.com/friends/ev").

Suddenly the potential for mashing up data becomes huge and very interesting – especially as anyone can add further data to the objects Tickery and Flow have tagged.

In conclusion, we're all familiar with social networks – FluidDB is simply a social database ‘in the cloud' with its model of control as the secret magic sauce.