Fabian Lindfors

Building an object store with FoundationDB

Apple recently open-sourced FoundationDB, two years after acquiring the company and depriving the world of some promising technology. FoundationDB is a distributed key-value store featuring ACID transactions, painless scaling and easy fault tolerance, all tested with ridiculous thoroughness. In short, a seriously impressive feat of engineering. Before the acquisition its creators often touted the concept of “layers”, meaning stateless programs which add new features to the otherwise simple database. A SQL layer could make FoundationDB behave like a relational database while an AMQP layer could make it function as a message broker.

In this post we’ll explore how to build a simple object store (like Amazon S3) as a stateless web server which persists data to FoundationDB. The final code is available on Github, some familiarity with Go is recommended to follow along!

Connecting and serving

We’ll be building our service in Go and if you want to follow along there are two dependencies you need to have installed. The first is the official FoundationDB Go bindings which can be installed by following the instructions here. We are also going to be using the Gin web framework so make sure you have that available as well. Let’s jump in!

The service we are building will be super simple and have only two features: uploading files and downloading them. Because all files will be persisted to FoundationDB it will be a breeze to scale this out to fit a huge amount of data with great redundancy and performance. Our first step is to set up a simple web server. Start by creating a directory for the project and adding a main.go file containing:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
package main

import (
    "bytes"
    "github.com/apple/foundationdb/bindings/go/src/fdb"
    "github.com/apple/foundationdb/bindings/go/src/fdb/tuple"
    "github.com/gin-gonic/gin"
    "io"
    "strings"
)

func main() {
    router := gin.Default()

    router.GET("/object/*name", func(c *gin.Context) {
        name := c.Param("name")

        c.String(200, "Getting file with name %s", name)
    })

    router.POST("/object/*name", func(c *gin.Context) {
        name := c.Param("name")

        c.String(200, "Saving file with name %s", name)
    })

    router.Run()
}

This code can be run with go run main.go which starts a simple web server responding to GET and POST requests for our object endpoints. By defining the endpoints as /object/*name we allow names containing slashes such as path/to/image.png. Our store won’t have any notion of directories and hierarchy but this structure allows it to be simulated, similar to Amazon S3.

Saving to the database

Next up: connecting to FoundationDB and saving data. We’ll use a single FDB connection so let’s add a variable at the start of our file to hold it. In the main function a connection will be established and assigned to the variable:

1
2
3
4
5
6
7
8
9
10
11
// ...

var db fdb.Database

func main() {
    // An API version needs to be specified
    fdb.MustAPIVersion(510)
    db = fdb.MustOpenDefault() 

    // ...
}

Gin exposes the convenient c.FormFile() to handle file uploads. This returns a file object which contents can be read using file.Open(). Add the following code to the POST endpoint:

1
2
3
4
5
6
7
8
9
10
// Content type will be needed to enable downloads later
contentType := c.PostForm("content_type") 
file, err := c.FormFile("file")
if err != nil {
  c.AbortWithError(400, err)
  return
}

reader, _ := file.Open()
defer reader.Close()

With our file uploaded and ready we are all set to save it to our object store. We’ll add a function saveFile(name string, contentType string, reader io.Reader) which will read the data from reader (the file handle) and save it with name to the database. All keys and values in FoundationDB are simple byte strings but oftentimes we need to structure keys hierarchically. For this FoundationDB has the notion of a tuple layer allowing keys to be specified as tuples which later will be encoded to byte strings. Our data will be structured as (name, "content-type") and (name, "data").

When storing large blobs of data under a single key there are a few things to keep in mind. Most importantly, FoundationDB values are limited to 100kB and should be kept below 10kB for best performance. The documentation recommends splitting blob data over multiple keys which can be joined on retrieval and that is exactly how we will store our file contents. All interactions with FoundationDB are handled through atomic transactions therefore we add one to our saveFile.

1
2
3
db.Transact(func(tr fdb.Transaction) (ret interface{}, e error) {
    return
})

Using the reader we can step through the file contents incrementally with a buffer of the size we choose. By setting the buffer size to 10kB, Go can handle the data splitting previously mentioned.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
db.Transact(func(tr fdb.Transaction) (ret interface{}, e error) {
    // Allocate a 10kB buffer
    buffer := make([]byte, 10000)
    i := 0

    for {
        // Read data into buffer
        _, err := reader.Read(buffer)

        if err == io.EOF {
            break
        }

        // Save buffer to key on the form (name, "data", index)
        tr.Set(tuple.Tuple{name, "data", i}, buffer)
        i++
    }

    // Save content type to object
    tr.Set(tuple.Tuple{name, "content-type"}, []byte(contentType))

    return
})

This code will step through the file contents 10kB at a time and save the data to individual keys. A simple counter is used to index the data, ensuring that the keys end up in the correct order. The final step is calling the save function from the post endpoint: saveFile(name, contentType, reader). Start the server again and try uploading a file with cURL (or my favorite, HTTPie). Don’t forget to pass content_type as well.

Reading and returning

Now that our data is safely stored in FoundationDB it’s time to get it back. We’ll start by adding a function getFile which takes an object name and returns the file data and content type. For this we define a simple File struct wrapping the content type and data.

1
2
3
4
5
6
7
8
type File struct {
    Data        []byte
    ContentType string
}

func getFile(name string) *File {
    return nil
}

Once again we want to interact with FoundationDB meaning we need to create a transaction. We’ll start by retrieving the content type and if it doesn’t exist we’ll assume there is no such file. To check if a key exists we can perform a nil check when retrieving its value, like this:

1
2
3
4
5
6
7
8
file, _ := db.Transact(func(tr fdb.Transaction) (interface{}, error) {
    contentType := tr.Get(tuple.Tuple{name, "content-type"}).MustGet()
    if contentType == nil {
        return nil, nil
    }

    return &File{Data: nil, ContentType: string(contentType)}, nil
})

The actual file data is stored under the key (name, "data", index). What we want is to retrieve all the keys under (name, "data") and join them into a single byte array. This can be achieved with a prefix range where we specify a prefix tuple and then get a range to query which includes all subkeys. From this a slice of key-value pairs can be retrieved. Add the following code to the transaction:

1
2
3
start, end := tuple.Tuple{name, "data"}.FDBRangeKeys()
range := fdb.KeyRange{Begin: start, End: end}
kvSlice := tr.GetRange(range, fdb.RangeOptions{}).GetSliceOrPanic()

Our next step is joining all the values retrieved into a single buffer. Go’s standard library has a Buffer data type built in to the bytes package and it’s dead simple to use:

1
2
3
4
var b bytes.Buffer
for _, kv := range kvSlice {
    b.Write(kv.Value)
}

Put together we now have a function which can retrieve all the pieces from the database and put them back together into a file. Because the transaction returns a generic interface{} we need to cast back into a File. The resulting function should look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
func getFile(name string) *File {
  file, _ := db.Transact(func(tr fdb.Transaction) (interface{}, error) {
    contentType := tr.Get(tuple.Tuple{name, "content-type"}).MustGet()
    if contentType == nil {
      return nil, nil
    }

    // Retrieve the split data using a prefix query
    start, end := tuple.Tuple{name, "data"}.FDBRangeKeys()
    range := fdb.KeyRange{Begin: start, End: end}
    kvSlice := tr.GetRange(range, fdb.RangeOptions{}).GetSliceOrPanic()

    // Combine the retrieved file data into a buffer
    var b bytes.Buffer
    for _, kv := range kvSlice {
      b.Write(kv.Value)
    }

    return &File{Data: b.Bytes(), ContentType: string(contentType)}, nil
  })

  if file == nil {
    return nil
  }

  return file.(*File)
}

Finally we need to invoke getFile in our GET handler and return the file as a HTTP download, Gin makes this really easy to do. We’ll establish a file name for the download by splitting the file name (or “path”) by slashes and using the last part.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
router.GET("/object/*name", func(c *gin.Context) {
    name := c.Param("name")
    file := getFile(name)

    if file == nil {
        c.AbortWithStatus(404)
        return
    }

    // Split file path by slash to get file name
    splitName := strings.Split(name, "/")

    c.Header("Content-Description", "File Transfer")
    c.Header("Content-Disposition", "attachment; filename="+splitName[len(splitName)-1])
    c.Data(200, file.ContentType, file.Data)
})

That’s it! Try uploading a file using your favorite HTTP client and then visit the same path in your browser. Your file should come right back like any regular download, only this time fetched from a distributed key-value store.

Final words

This is a very rudimentary implementation lacking a lot of necessary error handling and validation. It also has some serious limitations, for example FoundationDB limits transaction sizes to 10MB hence files over that size won’t be saved.

I do hope this serves as a proof of concept for the power of stateless layers coupled with FoundationDB. Stateless services are easy to write, reason about, and deploy. By leaving complex things involving persistence, scaling, and fault tolerance to stable software such as FoundationDB, building complex services becomes a cakewalk. The prospects are exciting and I can’t wait to see what the open source community will put together.

Final code available on Github.