February 24, 2010
Riak has a notion of “links” as part of the metadata of its objects. We talk about traversing, or “walking”, links, but what do the queries for doing so actually look like?
Let's put four objects in riak:
|
![]() |
$ curl -X PUT -H "content-type:text/plain" \ -H "Link: </riak/hb/second>; riaktag=\"foo\", </riak/hb/third>; riaktag=\"bar\"" \ http://localhost:8098/riak/hb/first --data "hello" $ curl -X PUT -H "content-type: text/plain" \ -H "Link:</riak/hb/fourth>; riaktag=\"foo\"" \ http://localhost:8098/riak/hb/second --data "the second" $ curl -X PUT -H "content-type: text/plain" \ -H "Link:</riak/hb/fourth>; riaktag=\"foo\"" \ http://localhost:8098/riak/hb/third --data "the third" $ curl -X PUT -H "content-type: text/plain" \ http://localhost:8098/riak/hb/fourth --data "the fourth"
Now, say we wanted to start at hb/first
, and follow
all of its outbound links. The easiest way to do this is with the
link-walker URL syntax:
$ curl http://localhost:8098/riak/hb/first/_,_,_
The response will be a multipart/mixed body with two parts: the
hb/second
object in one, and the hb/third object in the
other:
--N2gzGP3AY8wpwdQY0jio62L9nJm Content-Type: multipart/mixed; boundary=3ai6VRl4aLli3dKw8tG9unUeznT --3ai6VRl4aLli3dKw8tG9unUeznT X-Riak-Vclock: a85hYGBgzGDKBVIsTKLLozOYEhnzWBn+H/h5hC8LAA== Location: /riak/hb/third Content-Type: text/plain Link: </riak/hb>; rel="up", </riak/hb/fourth>; riaktag="foo" Etag: 5Fs0VskZWx7Y25tf1oQsvS Last-Modified: Wed, 24 Feb 2010 15:25:51 GMT the third --3ai6VRl4aLli3dKw8tG9unUeznT X-Riak-Vclock: a85hYGBgzGDKBVIsLEHbN2YwJTLmsTLMPvDzCF8WAA== Location: /riak/hb/second Content-Type: text/plain Link: </riak/hb>; rel="up", </riak/hb/fourth>; riaktag="foo" Etag: 2ZKEJ2gaT57NT7xhLDPCQz Last-Modified: Wed, 24 Feb 2010 15:24:11 GMT the second --3ai6VRl4aLli3dKw8tG9unUeznT-- --N2gzGP3AY8wpwdQY0jio62L9nJm--
It's also possible to express the same query in map-reduce, directly:
$ curl -X POST -H "content-type:application/json" \ http://localhost:8098/mapred --data @- {"inputs":[["hb","first"]],"query":[{"link":{}},{"map":{"language":"javascript","source":"function(v) { return [v]; }"}}]} ^D
That's the exact same query. The content type of the response is
different. It's now a JSON array with two elements: a JSON encoding
of the hb/second
object, and a JSON encoding of
the hb/third
object. (Pretty-printed here, for
clarity.)
[ { "bucket": "hb", "key": "second", "vclock": "a85hYGBgzGDKBVIsLEHbN2YwJTLmsTLMPvDzCF8WAA==", "values": [ { "metadata": { "Links": [ ["hb","fourth","foo"] ], "X-Riak-VTag": "2ZKEJ2gaT57NT7xhLDPCQz", "content-type": "text/plain", "X-Riak-Last-Modified": "Wed, 24 Feb 2010 15:24:11 GMT", "X-Riak-Meta": [] }, "data": "the second" } ] }, { "bucket": "hb", "key": "third", "vclock": "a85hYGBgzGDKBVIsTKLLozOYEhnzWBn+H/h5hC8LAA==", "values": [ { "metadata": { "Links": [ ["hb","fourth","foo"] ], "X-Riak-VTag": "5Fs0VskZWx7Y25tf1oQsvS", "content-type": "text/plain", "X-Riak-Last-Modified": "Wed, 24 Feb 2010 15:25:51 GMT", "X-Riak-Meta": [] }, "data": "the third" } ] } ]
Another interesting query is “follow only links that are tagged
foo
.” For that, just add a tag
field
to the link phase spec:
$ curl -X POST -H "content-type:application/json" \ http://localhost:8098/mapred --data @- {"inputs":[["hb","first"]],"query":[{"link":{"tag":"foo"}},{"map":{"language":"javascript","source":"function(v) { return [v]; }"}}]} ^D
Here you should get a JSON array with one element: a JSON encoding
of the hb/second
object. The link to
the hb/third
object was tagged bar
, so that
link was not followed. The equivalent URL syntax is:
$ curl http://localhost:8098/riak/hb/first/_,foo,_
It's also possible to filter links by bucket by adding a
bucket
field to the link phase spec, or by replacing the
first underscore with a bucket name in the URL format. But, all of
our example links point to the same bucket, so
hb
is the only interesting setting here.
Link phases may also be chained together (or put after other phases
if those phases produce bucket/key lists). For example, we could
follow the links all the way from hb/first
to hb/fourth
with:
$ curl -X POST -H "content-type:application/json" \ http://localhost:8098/mapred --data @- {"inputs":[["hb","first"]],"query":[{"link":{}},{"link":{}},{"map":{"language":"javascript","source":"function(v) { return [v]; }"}}]} ^D
(Notice the added link phase.) If you run that, you'll find that
you get two copies of the hb/fourth
object in the
response. This is because we didn't bother uniquifying the results of
the link extraction, and both hb/second
and hb/third
link to hb/fourth
. A reduce
phase is fairly easy to add:
$ curl -X POST -H "content-type:application/json" \ http://localhost:8098/mapred --data @- {"inputs":[["hb","first"]],"query":[{"link":{}},{"link":{}},{"reduce":{"language":"erlang","module":"riak_mapreduce","function":"reduce_set_union"}},{"map":{"language":"javascript","source":"function(v) { return [v]; }"}}]} ^D
The resource handling the URL link-walking format does just this:
$ curl http://localhost:8098/riak/hb/first/_,_,_/_,_,_
That should get you just one copy of the hb/fourth
object.
So why choose either map/reduce or URL-syntax? The advantage of URL syntax is that if you're starting from just one object, and just want to get the objects at the ends of the links, and you can handle multipart/mixed encoding, then URL syntax is much simpler and more compact. Map/reduce with link phases should be your choice if you want to start from multiple objects at once, or you want to get some processed or aggregated form of the objects, or you want the result to be JSON-encoded.
Riak version 0.8 note: In Riak 0.8, the format of the result of 'link' map/reduce phases was not able to be transformed into JSON. This meant both that it was not possible to put a Javascript reduce phase right after a link phase, and also that it was not possible to end an HTTP map/reduce query with a link phase. Those issues have been resolved in the tip of the source repository, and will be part of the 0.9 release.