Skip to main content

Drogulus - Questions and Clarifications

Last weekend I gave a very short (15 minute) talk on the drogulus: a programmable peer-to-peer data store that I've been working on in my own time. Pretty much all my answers in the short Q/A that followed were a variation of "I don't know". I consider this a success since it provided evidence for future avenues of investigation that others have proposed about the drogulus (one of the purposes for giving the talk).

I prefer to say "I don't know" then think about the problem and write a considered response. I've had a couple of days to ponder the questions and comments from the Q/A (and later discussions in the bar), and in this post I'll attempt to answer, clarify or admit that, even upon reflection, I still don't know.

I'll assume you've read the blog post of the talk before reading the following.

What about flooding the network with Logos jobs? (or) How do you solve the problem of the tragedy of the commons?

Upon reflection I think this is a solvable problem. Put simply, there must be costs for misbehaviour and rewards for collaboration. I think a mechanism that uses both carrot and stick could be a solution (note that I didn't say, is the answer). My guess is that the specifics of a solution will become clearer if/when the drogulus gets used.

The drogulus is a peer-to-peer system: by virtue of the way the system works peers have evidence of how each other behave. The ultimate punishment is to ostracise misbehaving nodes from the network, cutting them off from data and the latent computing power within the drogulus.

Therefore, in order to run Logos jobs, nodes must have shown evidence that they are "good citizens" of the network. Furthermore, there is the threat that evidence of "bad" behaviour will result in punishment.

To some extent this feature already exists within the drogulus: every node maintains a simple data structure called a routing table - the means of keeping track of peers on the network. To get in to another node's routing table you must have been in contact with the node and provided some useful information in a timely fashion. The number of available slots in the routing table is limited by a constant called K. Only the most reliable nodes get included in node's routing tables and those that do not maintain good performance are removed and quickly replaced.

In this way, the distributed hash table's nodes attempt to use the most reliable peers to maintain the system's performance. Furthermore, if a node is found to propagate a value that fails the cryptographic checks it is immediately removed from routing tables no matter how reliable its prior performance.

Something similar could be achieved for running Logos scripts. For example, peers may only run Logos jobs from remote nodes that have already run a Logos job for them (you scratch my back, I'll scratch yours) or from nodes that have existed within the routing table and fulfilled a certain number of successful interactions with the local node.

My aim is simply to think up a mechanism by which it costs nothing to be a good citizen yet is fatally expensive to be disruptive.

What happens if a third party attempts to block by IP address?

As I mentioned in the talk, areas of the key space are covered by many different nodes. The IP address of a node has nothing to do with the key space it covers. I presume a third party would be attempting to block access to a key/value item stored in the drogulus rather than a specific machine. To block an area of the key space a third party would have to take down all nodes containing the target key/value item.

Unfortunately, this is easier said than done because:

  • There is no central list tracking which nodes contain what values. There's a pretty good chance you won't discover them all at any single point in time.
  • Nodes are constantly joining and leaving the drogulus and replicating data between each other. The set of nodes containing a certain item is constantly changing. Πάντα ῥεῖ.
  • If nodes are blocked then the drogulus quickly routes around the failing nodes through the mechanism of the routing table (see above).

I imagine some of the properties of the drogulus are like a swarming flock of starlings: a dynamic system consisting of a multitude of independent parts that are constantly acting on and reacting to each other.

Swarm of starlings

What is the etymology of "drogulus" and "logos"?

A drogulus is an entity whose presence is unverifiable, because it has no physical effects. The atheist philosopher A.J.Ayer coined it as a way of ridiculing the belief system of his friend, the Jesuit philosopher, Frederick Copleston.

In 1949 Ayer and Copleston took part in a radio debate about the existence of God. The debate then went back and forth, until Ayer came up with the following as a way of illustrating the point that Copleston's metaphysics had no content because there was no way of testing the truth of metaphysical assertions. He said:

"I say, 'There's a "drogulus" over there,' and you say, 'What?' and I say, 'drogulus' and you say 'What's a drogulus?' Well, I say, 'I can't describe what a drogulus is, because it's not the sort of thing you can see or touch, it has no physical effects of any kind, but it's a disembodied being.' And you say, 'Well how am I to tell if it's there or it's not there?' and I say, 'There's no way of telling. Everything's just the same if it's there or it's not there. But the fact is it's there. There's a drogulus there standing just behind you, spiritually behind you.' Does that makes sense?"

Of course, the natural answer Ayer was waiting for was "No, of course it doesn't make sense." Therefore, the implication would be that metaphysics is like the "drogulus" ~ a being which cannot be seen and has no perceptible effects. If Ayer can get to that point, he can claim that any kind of belief in the Christian God or in metaphysical principles in general is really contrary to our logical and scientific understanding of the world.

This appeals greatly to my sense of humour and I've always thought it'd be a fun name for a software project. Especially a project like this one. :-)

Portrait of A.J.Ayer
A.J.Ayer

Logos (λόγος) is a term used by the pre-Socratic philosopher, Heraclitus, to mean several different things: account, explanation, reason, organising principle, wisdom, nature or saying. It's the etymological root of the modern English word "logic". I do not use it in the biblical sense where it means the word of God (I refer readers to the explanation of "drogulus" above).

It seems to me an appropriate choice of name for a computer language.

It's a complex problem and you don't know what you're doing!

I won't contest that!

It's a fun personal project. If it is useful then people will use it. At the very least it's a helpful learning exercise for me (which is, in itself, a positive outcome).

However, a complex problem may not entail a complex solution. Rather, it only needs to work. Furthermore, while thinking about the drogulus I've attempted to work out the simplest possible solution given whatever the abstract problem I've needed to solve.

As computing pioneer Tony Hoare explains,

"There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult."

I'm aiming for Hoare's first method of constructing software.

What's being sent down the wire?

Dictionary like objects (that are themselves valid statements in Logos) are encoded using msgpack and sent as a netstring to remote nodes over TCP/IP.

How does the cryptographic signing work?

The relevant code can be found in the crypto.py module. I use the popular PyCrypto library for all cryptographic functionality.

Put simply, each item encompassing a key/value pair includes the following fields:

  • value - the value to store
  • timestamp - a UNIX timestamp representing when the item was created (so it's easy to discern the latest version of an item)
  • expires - a UNIX timestamp beyond which the item is expired and should be ignored
  • name - a meaningful name for the key
  • meta - a list of tuples containing key/value strings for user defined metadata about the item
  • public_key - the originator's public key
  • sig - a cryptographic signature created using the originator's private key with the value, timestamp, expires, name and meta values
  • key - the SHA512 value of the compound key (based upon the public_key and name fields) used as the actual key on the distributed hash table

The public_key field is used to validate the sig value. If this is OK then the compound key is checked using the obviously valid public_key and name fields.

This ensures both the provenance of the data and that it hasn't been tampered with.

Any items that don't pass the cryptographic checks are ignored and nodes that propagate them are punished by being blocked.

How is it licensed?

Under the GNU Affero general public license version 3. I usually license my code under a more liberal license (e.g. the MIT license) but decided to use the AGPLv3 because this is the reference version of the drogulus and should always remain open (if anything comes of the drogulus, I hope many different implementations will exist).

Do you want or need help?

YES!

Image credits: Swarm of Starlings © 2008 Gail Johnson (under a creative commons license). Portrait of A.J.Ayer sourced from Wikipedia as fair use.