mindshare: Unique Unique ID's

Unique Unique ID's

Mindshare being the unique thing that it is needs a unique ID format. Each group member generates their own member ID. These ID's need to be unique, which is the usual requirement. Because the number of ID's create s small and scoped within a group the size of the ID's does not need to be that large. The 128 bit UUID format is probably overkill and Java cant generate true UUID's anyway because the JVM lacks access to several components of that format (MAC address and a high resolution timer). My initial response to this requirement was to Base64 encode some random bits.

The other requirements are more subtle. Mindshare save everyones files to disk and cares about who owns those files, something no other P2P program cares about. The user ID may be called upon to disambiguate files in cases where two files have the same name or to store each users files in a separate directory. This puts severe restrictions on the characters that can be used as ID's. For starters using Base64 encoding cant be used for file/directory names because it includes the '/' character which is either illegal or delineates directories on Unix platforms and in URI's.

Then there are the case-preserving, case-insensitive filesystems of the two popular desktop OS's. Base64 is case sensitive so raw Base64 cant be reliable used because two different Base64 strings might be the same thing as a file name.

Finally Base64 uses the '+' and '='characters. The = is easy to avoid by packing the base 64 such that no filler is needed. The '+' character falls into the punctuation group and is thusly not suitable for use in the authority section of a URI.

So for now Mindshare ID's will be 15 bytes of secure random data, encoded with Base64, resulting in a 20 byte ID. All capitol letters will be replaced with their common equivalent (id.toLowerCase()). The characters '+' and '/' will be replaced with the more URI friendly '-' and '_' respectively. There is still the possibility of collision in this scheme by virtue of the loss of capitalisation. I consider this to be unlikely or at least as nebulous as the possibility of generating identical random bits because Java doesn't have access to accurate spatial or chronological information. At some future time an encoder can be built that encodes raw bytes in the alphabet [a-z][0-9][-_] without changes to the protocol.

Basically this is a post to say I have thought about this issue but I am too busy/lazy to do the right thing at this point in time. This gets put off for 0.2 when we do crypto & security.

Posted by Gareth at January 5, 2005 03:48 PM | TrackBack

mindshare

Unique Unique ID's

Comments