zkof_spec3.py

# synopsis spec of zero knowledge application framework.

# this is a draft. no code exists. just trying to make the interface intuitive
# and effective.  But I already wish, this had existed 5 years ago when I was
# writing the original SpiderOak!

# this demos client side operations, so everything is from the perspective of
# code running in a browser.  We'll demo the ways a developer would use the
# framework to build apps, rather than show the internal details of what the
# framework is doing behind the scenes.  We will attempt to descirbe those
# behind the scenes details (mostly about crypto, protocol, and internal data
# structures) in comments.

# we're using pseudocode rather correct javascript, because for purposes of
# understanding and iterating the concept quickly, I find this easier to read.
# Also, the spec is intended to be language agnostic, such tha clients could be
# developed in a variety of languages, and all interoperate.

# for similar reasons, this is written using synchronous code, whereas in a
# real app an async approach would usually be needed with callbacks and such,
# since many of the methods involve blocking network access, although the
# framework may do behind the scenes caching so blocking is reduced.  Since
# we're just trying to demo capabilities here and most people find synchronous
# code easier to read, we'll stick with that.

# === generating a new account (i.e. which implies a crypto context for object
# storage and communication with peers.)

# bob signs up for an account in our OMG awesome zero knowledge diary
# application.

# this locally generates an object containing root level keys, including an
# outer level session key derived from the pass phrase using a key derivation
# function.  Also creates public/private keypair, a HMAC key, a couple of salt
# strings, and a challenge key that can later be used for auth between
# client/server via a zero knowledge password proof.
account = zkaf.generate_account('pass phrase')
# account names can be whatever strings the server's policy allows. that's up
# to the server. account names are plain text.
account.name = 'account_name'
# this establishes a new account on the server with the given name. that
# name might not be available so check result.
if (result.status !== 'ok') {
    if (result.error !== 'dupe') {
        alert("can't create this account: " + result.errormessage)
        return
    }
    account.name = 'another account_name' # try again with a name that isn't taken
    result = account.save()
}

# now we have an account. we can securely store objects and data.

# === authenticating into an established account (establishing a crypto session
# in the context of an existing account.)
# this does a zero knowledge password proof auth to the server for the
# specified account, then retrieves the cipher text of our keys, decrypts them
# with the key from the KDF and the pass phrase, and therefore establishes a
# crypto session with access to our keys and authorization to retrieve our
# storage containers.
session = zkaf.auth('account_name', 'pass phrase')
if not session:
    # we don't necessarily get a clear message on why this failed, because
    # that would leak information
    raise RuntimeError("bad account name or password")

# === persisting a session
# caching the crypto session locally (HTML5 local storage?) so we don't have to
# prompt for password and derive keys again (which is expensive.)
session_as_string = session.serialize() # save this somewhere

# resurrect it later
zkaf.session.from_string(session_as_string)
# make sure that server believes this is still a valid account/session. (like
# if it's been persisted for a long time..)
result = session.ping()
# for example, the session might be invalid if the password has changed since
# it was saved, or if the account has been deleteded or disabled server side
# (such as for non-payment.)

# === changing a pass phrase for an account
session.account.pasphrase = 'new passprhase'
result = session.account.save()
# check the result -- you might get a version error if you don't have the
# newest version of the account at the time you try to save. (more on
# versioning later.)
if result == 'refresh':
    old_version = session.account.version()
    session.account.refresh()
    new_version = session.account.version()
    result = session.account.save() 

# === introducing object storage via containers 
# we're going to store our app's data using objects, like a traditional object
# database.  Many people are already familiar with object databases such as
# ZODB.  Our object database is a similar concept, while making zero knowledge
# privacy guarantees and also making it possible to selectively (and privately)
# share and collaborate with others.

# Objects are stored in containers.  Containers are always associated with a
# specific account (and unreadable by other accounts, unless explicitly shared
# with a peer or a group.)  Containers are a way of partitioning the data
# required for object storage. an app with relatively smallish data storage
# requirements might just use one container, and store every object in that
# container. other apps may use many separate containers.  A container must be
# fetched in its full length from the server and decrypted client side for the
# objects inside it to be read.  

container = session.load('diary') # load this container from the server to
                                  # local, making access to all objects
                                  # available without blocking.  If the
                                  # container is already available in cache,
                                  # this returns immediately.

# containers are identified by keys such as 'diary' above. any string is
# allowed. From the server's perspective, container names (and of course their
# contents) are unreadable.

# If the data set is going to grow very large, partitioning data across
# containers and lazy loading them as you need them can help with app load
# time.  Behind the scenes, the framework does some local caching for the
# contents of frequently used storage containers, so it doesn't always have to
# download the full container when the app needs it.

# One simple tactic for partitioning data among containers is to keep app
# metadata all in one container, and keep bulky binary data (images, videos,
# long text strings, whatever) in many other containers.  Metadata is usually
# very small, and can load quickly.

# For example, a diary app might have a single metadata container with a list
# of entries. Each entry has a title, date, keywords, and other basic
# attributes.  Average size is probably under 500 bytes.  So one entry per day
# for 10 years would only be be 1.8 meg.  In 10 years, 1.8 meg will be like 1.8
# KB today, so storing all this in one container is fine.  Each entry would
# reference other containers to find the text of the entry and media
# attachments like images, videos, etc.


diary_entries = container.get("entries")
draft_entries = container.get("drafts")
# these objects are like regular javascript objects that we can store data
# into.
new_entry = { id: get_uuid(), title: "Adventures with Crypto" }
draft_entries.push(new_entry)
# atomically save all modified objects back to the container. if we had made
# changes to either diary_entries or draft_entries, both would be saved. the
# default parameters for saving objects preserves object history (i.e. previous
# versions of the object are still reachable) and uses diffing where
# appropritae to minimize the total size. 
result = container.save()

# let's add some more content to this entry.  We'll store the text content
# separately from the metadata, in its own one-off container.
text = "we're having fun here in object storage land...."

new_entry.text_container_name = 'text_for_' + new_entry.id
text_container = session.new_container(new_entry.text_container_name)
text_container.add(new_entry.id, text)
text_container.save() # save our text to one the one-off container
container.save() # and save our changes to draft_entries from above to the main
                 # container. (later we'll show how to do both of these
                 # atomically in one op.)

# here's how we would iterate over all the entries in the diary, retrieving
# text and media attachments from their own containers.
for entry in diary_entries:
    text_container = session.load(entry.text_container_name)
    diary_text = text_container.get(entry.id)
    if entry.attachment_container_name:
        attachment_container = session.load(entry.attachment_container_name)
        list_of_attachments = attachment_container.get(entry.id)

# general container operations
list_of_object_keys = container.keys()
myobject = container.new('mynewobjectname', {})
myobject = container.get('myobjectname')
myobject = container.delete('myobjectname') # it's still available in the
# container's history, until the next compaction. more on compaction later.
myobject.new_property = 'abc'
myobject.save()
myobject.delete()

myobject = container.get('myobjectname')
mylist = []
myobject['list_of_stuff'] = mylist
mylist.push(1)



# see which objects have been locally modified and not yet saved
modified_keys = container.modified_keys()
modified_objects = container.modified_objects()

# atomically save an object, multiple objects, or all modified objects
result = myobject.save()
result = container.save([myobject])
result = container.save_keys(['myobjectname'], 
                             # some optional paramaters
result = container.save() # save everything that's been locally modified
result = container.save(  # give the container more direction on finding 
                          # modified objects. by default it will do a deep
                          # comparsion of every object that's been retrieved
                          # from the container with .get(), which is guaranteed
                          # to be accurate, but can be slow if there are many
                          # objects or very large objects.
                        { shallow = 1, deep = 1, levels = 5})

# atomically save across multiple containers using a transaction:
tx = session.tx()
tx.save(myobject)  # add a particular object to a transaction 
tx.save([myobject1, myobject2]) # add a list of objects
tx.save(container1, [another_object, and_another_object]) # add a list of objects
tx.save(container3, [another_object, and_another_object])
result = tx.commit()

if result == 'refresh':
    # refresh and deal with the changes.

# == modifying objects, concurrency, and versioning


# containers have version identifiers. the identifiers are opague strings
# provided by the server. they aren't sortable client side.  a container's
# version identifier changes whenever modifications to a container and/or the
# objects in a container are saved.  When a container is loaded, it normally
# includes the complete change history of all objects in the container (but see
# below about compaction.)  This means that clients can see the history of
# objects in a container, and find the differences between them.

# refreshing objects, finding changes between times.

old_identifier = container.version_identifier()
container.refresh()
new_identifier = container.version_identifier()
# getting many tokens at once
version_identifier_list = session.version_identifiers([container1, container2])

# we can see which things have changed between versions of a container.
if not old_identifier == new_identifier:
    list_of_changed_keys = container.modified_keys(old_identifier, new_identifier)
    # for a given key, we can get a list of one or more diff objects that shows
    # us specifically what changed between the intervening version(s)
    list_of_diffs = container.diffs(key, old_identifier, new_identifier)
    # or we could compare the objects ourselves
    old_object = container.get_version(key, old_identifier)
    new_object = container.get_version(key, new_identifier)
    # or we could see the whole history
    object_history = container.get_history(key)
    for entry in object_history:
        # we can look through the attributes of the entry

        # these properties are added by the server, and are guaranteed to be
        # correct.
        entry.version_identifier # the new version of the container at the time
                                 # this object was changed. (note that many
                                 # objects in the container may share the same
                                 # version identifiers, since the identifier
                                 # applies to the entire container and all the
                                 # objects it changes; a transaction modifying
                                 # several objects would give the new versions
                                 # of those objects each the same new version
                                 # identifier.
        entry.author    # the account name the change came from. 
        entry.timestamp # this is server time from whenever .save() was called.
                        # not the time the object actually changed. (also
                        # guaranteed to be accurate by the server.)
        entry.diffsize  # the size of the binary storage of the diff (after 
                        # compression, encryption, etc.)  In other words, the
                        # length of the ciphertext of the diff.

        # for these, the server cannot guarantee anything about the contents, since
        # they are unreadable to the server.

        entry.get_serialized_diff() # get the serialize (as a string) of the 
                                    # diff (the plaintext.)

        entry.get_diff() # get the actual diff object.

        old_object = entry.get_object() # just materialize the object at this
                                        # point in history.

        old_object.abc = "changed"  # this change is temporary and unsavable

        old_object.save() # throws an exception; historical objects are
                          # immutable.
        
    
# === zero knowledge sharing of containers/objects with peers.
# you can share with an individual peer or with defined groups (below.) sharing
# happens by making objects available to peers or groups.  The objects continue
# to be unreadable to the server or to anyone other than the peers the objects
# are shared with.

# obligitory scenario of alice and bob wanting to communicate with crypto :)

alice = session.get_peer("alice")
if not alice:
    alert("peer alice not available")

# share an entire container (including all objects and their history since the
# last compaction) with alice.  share implies reads only, not writes.  alice
# will be able to read the historical and current state of all objects in this
# container as they are updated and changed until it is deleted or unshared.
container.share(alice)
# appropritate warnings about info theory here!
container.unshare()       # remove all sharing with all peers
container.unshare(alice)  # just remove alice
result = container.save() # still have to check for conflicts/retry, even
                          # though you're not changing any data.

# what really happens when bob unshares a container with alice? 
#  - we tell the server to no longer make the contents of the container
#    readable to alice (i.e. so that alice is disallowed by the server 
#    from retreving the encrypted data stream of the container.)
#  - we re-key the container such that all further writes to the container
#    continue to be unreadable to alice, even if alice happens to be Mallory's
#    girlfriend (and mallory is an evil system administrator employed by the
#    service operator, with administrative and physical access to the server(s)
#    the container is stored on.)
#  (information theory of course means that any data previously shared with
#  alice, alice already knows. so there's not much point in re-encrypting that
#  data.)

# share only a specific object with alice. like above, object will remain
# shared and the peer can see changes/updates/history.  this is more expensive
# than just sharing a whole container, but not excessively so.  From alice's
# perspective, it seems like the container only contains the specific objects
# that are shared.
myobject.share(alice)
myobject.unshare(alice)
myobject.save()

# how alice gets to this data (from her own session)
bob = session.get_peer("bob")
# get a container object for from peer
bob_container = bob.load("container_name")
# bob_container may have every key available to us (if bob shared the whole
# container), or just the specific keys that bob shared with us.  from alice's
# perspective, she can't tell the difference between a container that's fully
# shared vs. one that just has some shared objects in it.
bob_object = bob_container.get('bobs_object')

# now we could read bob's diary as above, just as if it was one of our own
# objects. We just can't save changes back to it.  (But more on that later...)

# === secure real time messages. this is not messages in the
# sense of email, but messages in the sense of message queues, or message
# oriented application development.  still, the inbox metaphore mostly fits.
# messages are async; the recipients don't have to be connected/online right
# now to receive these. they'll notice new messages whenever they next poll
# their inbox.

# allow our peer alice to send us stuff.
session.inbox.allow_messages(alice, 
    { max_unread_messages: 1000, 
         max_message_size: 10000, 
          max_unread_size: 1000000 })
# TODO: resolve this question: should it be possible to blanket allow messages
# from anyone? maybe a limit of just a few small messages?
# proposed answer: make it easy to allow a small number of messages from
# anyone, and have that limit automatically go away if you send a message back.

# sending and checking message queues. we can send messages to ourselves, or to
# our peers that have set allow_messages from us.
# messages have these properties: id, from, to, timestamp, size, header, body, ttl
# messages are immutable -- they are only created, delivered, retrieved, and
#   deleted.
# they are never modified (not even in the sense that they are "marked as
#   read").  In that sense, in message queue terms, they have "at least once"
#   delivery semantics.
# header and body are encrypted such that only recipient can read them.
# the size property is the length of the ciphertext of headers + body
# headers may optionally be retrieved when the inbox contents is listed.
# headers are limited to 4k in size per message.
# the headers are really just an object. any javascript JSON object is allowed.
#   the size limitation is applied to the serialized, compressed, and encrypted
#   form of the object.
# the body is also an object, but without a hard limit.
# headers and body must be defined. it's conventional to send false as a value
#   when you don't need to send a particular value.
# searching or filtering through the listed messages by headers necessarily
#   takes place on the client side, since the server cannot decrypt them for
#   us.
# headers and body together count towards maximum message sizes that a peer
#   might be willing to receive from another peer (as configured above.)
session.inbox.poll()                # return when new messages are available
session.inbox.messages_by_peer()    # get a map of peer -> number of messages
session.inbox.list()                # get list of unread messages (metadata only)
for msg in session.inbox.list():
    # properties: msg.id, msg.from, msg.to, msg.timestamp, msg.size, msg.ttl
    msg.get_headers()
    msg.get_body()
    msg.delete() # it's faster to build up a list of IDs and delete them all at
                 # once (as below.)
    tx.add(msg.delete)  # add deleting this message to a running transaction, 
                        # such that deleting the message can happen atomically
                        # in combination with changes to objects.
tx.commit() # rememer to commit our transaction

# list with filtering. all parameters optional. specifying no filtering
# parameters gets same results as list, except that the inclusion of headers in
# the result set can be controlled.
session.inbox.filter(peer = bob, 
                     after = earliest_time, 
                     before = latest_time,
                     include_headers = True,
                     header_filter = my_filter_function,
                     limit = max_messages_to_return)
session.inbox.get([list_of_message_ids])  # get a list of message headers and bodies
session.inbox.delete([list_of_message_ids]) # atomically remove a list of messages
session.inbox.clear() # flippantly delete all messages, seen or unseen

message_id = peer.send_message(header, body)
if message_id == 'error'

# sharing with groups instead of just individual peers:
# groups are about membership.
# groups members can be individuals or other groups, recursively to a
#   reasonably high limit.
# group membership always must happen by invitation, such that an existing
#   member (with read privledges) invites new members.
# a group has a creator.
# members maybe given permission (individually) to 
#  - read data intended for the group
#  - author data as the group
#  - invite new members
#  - revoke existing members
#  - give a member the privledge to invite
#  - give a member the privledge to revoke
# crypto structure for a group:
#  - a group has a set of keys like an individual
#   - read key
#   - sign key
#  - whenever a new member joins the group, the read key for the group
#    is encoded to that member's keys.
#  - when a member leaves the group, the groups keys are rotated:
#    - new group keys are generated
#    - all new keys are encrypted to all the continuing members of the group.

# - can create a group.
# - can invite others to the group as:
#     - load containers shared to group
#     # modify containers shar
#   - viewers (can load containers/messages )
#   - writers (can add new containers/messages)
#   - administrators (can change privledges of others)
#   - owners (can destroy the group)
# - peers have to accept an invitation to join a group, although they could
#   also set an auto-accept setting.
# - storage for the group object is outside the billable storage amount for any
#   particular account.

# TODO: standardize result objects. 
#   result.status, result.error, result.error_message

# TODO: expose the internal container data for sharing histroy. like, to see
# the peers you've shared, the objects you've shared with them, etc.

# TODO: add optional automatic notifications of shared object access -- where
# the system sends you messages when a peer accesses one of your shared
# objects. so apps can implement auditing requirements. (so the messages
# wouldn't originate from a peer, but from the server itself, in response to
# object retrieval.)

# TODO multi-user read-write shared containers
# TODO container compaction