MARCH 10, 2011
In my previous article, I took a superficial look at how Redis starts up and prepares itself to process commands. In this article, I’ll follow a
GET and a
SET command as they move from client through the server and back. The
GET will be for a key that doesn’t exist, and the
SET will set that key. Then I’ll look quickly at a subsequent
GET and how it differs.
As before, I’m exploring Redis with the source code open in my editor and indexed with a
tags file, and the Redis server running under
gdb in another terminal.
Caveat: this article was written against the codebase of Redis 2.2.1. With respect to my previous article, for a list of what has changed in Redis since I wrote it, see this comment on HN.
Edit: I made two minor changes based on feedback—Redis keys are not Redis objects, they are
sds strings; and you don’t have to hack the Makefile to compile without optimizations.
Let’s execute the command
get users:1234 on Redis.
If you inspect certain variables under
gdb, you might not get what you want. Instead you see a message like “.” This is because the compiler has optimized the machine code in such a fashion that the portion of memory you wanted to look at that was never used, at least for the variable under inspection. To make stepping through the code and inspecting a little easier, we make sure that Redis is not compiled with optimizations. You can do this by either building Redis with the following invocation:
or by setting an environment variable:
If we look at the
repl loop of redis-cli, we see that it uses the
linenoise library to split the arguments of the command. It dispatches on the first argument. The loop checks for client commands that aren’t processed as a command by the Redis server, like
clear (to clear the screen), and
connect (which is a way to connect to a specified Redis server on host port, instead of the default host of 127.0.0.1 and port of 6379.
Any other argument is considered the name of a command to send to the server. The remaining arguments are the arugments for that command.
We’re trying to get the on-the-wire representation of the
get users:1234 command. redis-cli uses a
redisContext struct to encapsulate the state of the connection to the server, as well as the output buffer where the command in the form of the Redis protocol is written for sending to the server. We know from reading the source of hiredis (the Redis C client library that the redis-cli program is built on) that the
obuf field is where the raw command is stored:
If we set a breakpoint on
cliReadReply, we can inspect the output buffer and see exactly how the command looks as a bytestring bound for the server.
client $ gdb src/redis-cli
We see that the Redis command
get users:1234 as entered in our client is translated to
*2\r\n$3\r\nget\r\n$10\r\nusers:1234\r\n. Any Redis client is going to convert our command expressed in its respective syntax to the same on-the-wire format. So in Python:
will send the same
*2\r\n$3\r\nget\r\n$10\r\nusers:1234\r\n bytestring to the server.
Let’s print that bytestring to screen and render those `\r\n’s as line feeds so we can see an expanded view and get a better look at the protocol.
The first bit is
*2, which tells us that the arity of the command, including the command name, is 2. That is, the server should expect two arguments to follow.
The next bit is
$3, which means that the length of the first argument in bytes is 3. The argument itself follows, our command name,
The next bit after that is
$10, so the length in bytes of the second argument is 10. Our one and only argument to the command is next,
users:1234, the key we are trying to
If you remember from the last article,
readQueryFromClient is a good place to set a breakpoint in your debugger on the server side for inspecting an inbound client command.
server $ gdb src/redis-server
Now back in the terminal with the client running in the debugger, continue to send the command to the server, which will stop at the breakpoint we set.
Let’s step to the following line:
If we step to here in our debugger, we see that the server read 30 bytes. If you count the number of bytes in our Redis protocol-encoded command,
*2\r\n$3\r\nget\r\n$10\r\nusers:1234\r\n, you’ll see it’s 30. Just for good measure, let’s look at the 30 bytes beginning at the memory location pointed to by
(gdb) p nread
And we can see our whole command bytestring is there, byte by byte.
The server has now read in the entirety of our command in one step. (Because we had a relatively short command, one that fits inside a kernel buffer, and we are the only client on a loopback network device, this is the case, but it need not be. Since Redis is event-driven, this function,
readQueryFromClient, is called whenever there are bytes from buffers to be read. If our command was particularly long, or there was a lot of network contention, the command may take more than one I/O event before it is fully read. For this reason, Redis builds up a buffer per client and appends bytes to it on each call to this function. It only proceeds with processing the command when it has been fully read. But we don’t need to consider this in our simple example, so we will proceed.)
We’re going to elide the processing of the input buffer. This is the point where the server takes the Redis protocol-encoded bytestring of our request and unpacks it into arguments on the client struct object. If you are interested in the details of that parsing, examine the function
networking.c. All we are interested in at this point is that the
argc member of the client object is the number of command arguments (counting the command name itself) and
argv is a pointer to the list of arguments.
The bit of code we care at this point is
processCommand. The first thing the server does is look up the command in its command table (see “Setting up command table” in the previous article, but note that this lookup is now O(1), see the HN thread linked above). Assuming the command is found (which our
get will be), the server will double-check that the arity of the command as defined in the command table matches the number of arguments received from the client (
Skip down to the end of
processCommand. Because our humble
get is not a “multi” command like
mset, etc., it doesn’t require queue-like processing of the underlying multiple commands, so we go right to
call, which is where our command is dispatched.
Let’s focus on line 952,
cmd->proc(c);. This is Redis’s dynamic dispatching of command function calling. Redis makes this clean and simple by encapsulating commands and giving all the actual underlying command functions the same function signature, taking our client object, which carries the payload of our command’s arguments. So we’re interested in looking into the details of the Redis command object and the actual function that will handle our
If we pop up to the top of
redis.c, we see the definition of the Redis command table, and our
get is the first entry.
getCommand is the function that does the actual work for our command. It’s a thin wrapper for
The arguments to
lookupKeyReadOrReply are the client object, the key
users:1234 we’re trying to look up, and an object,
shared.nullbulk that will be the default reply to the client if the key is not found.
lookupKeyRead is a thin wrapper for
lookupKey that handles removing keys that have been set to expire.
Now we get to the heart of the
get command – looking up the key in the database.
Redis uses its own hash table implementation to store keys and their values in memory. Inside the
db object, the field
dict is a pointer to the hash value for the current Redis database (remember there can be up to 16 separate databases in a single Redis server instance, indexed by number).
First, Redis calls
dictFind with the database’s hash table and a pointer to the key’s bytestring.
dictFind looks up the hash of the key in the table, using a standard algorithm that should be familiar to anyone who’s implemented a hash table (check out
dict.c starting at line 391 if you’re interested, the table is an array with linked lists for colliding hashes).
If the key is found in the table,
dictFind returns a pointer to the entry in the table. Otherwise, it returns
NULL. Back in
lookupKey, if the entry is not null, Redis retrieves the value (i.e., the Redis object our key references) from the hash table via
dictGetEntryVal and takes care of a bit of bookkeeping for expiry and VM, if the key was found, and stats in either case (hits and misses). If the entry was
lookupKey also returns
NULL; we’ll see how this is handled by Redis for a reply to the client when the key is not found, which is the case for us at this stage.
With the value of
lookupKey, we’ll go back up the stack to our callers. Back to
lookupKeyReadOrReply, we look at line 60:
Since we got
lookupKey this time, we call
addReply. The value of
reply here comes from the call in
getGenericCommand, and it is
shared.nullbulk. This field in the global struct object
shared is initialize thusly:
We can see that it is a Redis string object who’s on-the-wire value is
$-1\r\n, meaning a length of -1, Redis’s way of indicating null to a client, according to the protocol.
addReply builds the reply to the client. It does this by first setting up a write event on the main event loop listener with
_installWriteEvent. This makes sure that the reply is written out to the client connection when there are bytes present in the buffer. Next, Redis adds the reply to the client’s buffer. If the reply object were an non-string value like an integer, or a list, or a set, Redis would first decode it to a bytestring that can be serialized to, for example, on a socket. Redis string objects are encoded “raw,” or as-is. The
nullbulk object is technically a string object, so no decoding is necessary in our case. In any case, the reply bytestring is copied to the client’s reply buffer with
_addReplyToBuffer, which for all intents and purposes completes the execution of our
get command on the server.
The client will read the on-the-wire reply of
$-1\r\n and know that it is a string reply of length -1, and therefore is the null (or “nil,” in the context of
redis-cli) object, and to convert that into the appropriate object for the language of the client. Back to our
redis-cli client patiently waiting for a reply from our breakpointed server, which we continue from, that looks like:
set command proceeds much the same way as the
get, up to the point of command dispatching on the server.
This time, our protocol-encoded bytestring is 47 bytes long, owing to the extra argument “Paul Smith” and the length tag it requires. Also notice the leading
*3 indicates there are three arguments:
Let’s skip ahead now to the point in
redis.c, after the command has been looked-up in the command table and the server is about ready to call the underlying
proc function with the client object argument. The
set command, in the form of a
redisCommand struct, looks like this:
Notice that the arity of the command is 3, which includes the leading command name, plus key and value, and matches what we expect from the client. The
set command has a flag that
get did not: the constant
REDIS_CMD_DENYOOM means that, in out-of-memory situations where Redis can’t allocate any more memory, the execution of the command should be denied. (The absence of this flag means that Redis can continue to serve client “read” requests like
get even when the server can no longer write any new data.)
I set a breakpoint on
setCommand and let the server continue running until that point:
Incidentally, you can inspect the values of the client’s command arguments at any time, with a simple
gdb invocation. The arguments are of type
robj, which has a field
ptr that is a pointer to the actual value in memory. Since in our
set case these are strings, we can inspect them by typecasting to
char * like so:
(gdb) p (char *)c->argv->ptr
The first thing the server does in
setCommand is encode the value being set with
tryObjectEncoding. It will try to create an efficient encoding if the bytestring can be interpreted as an integer, for example. This can save space especially in the case where many numbers are being stored. Additionally, Redis will try to reuse shared integers as values instead of allocating resources for new ones – see the previous article for more on the creation and use of shared integers.
Once the value being set has been encoded,
setGenericCommand is called (
setGenericCommand with the
setex commands). From here,
dbAdd is called, with the client, key, and value as arguments.
dbAdd will only add the value to the database’s hash table if the key does not already exist. In our case, since the key
users:1234 does not exist, the value of
dictFind is null, and the function proceeds to add the value with
dictAdd takes the dictionary hash table, key, and value as arguments. It uses
_dictKeyIndex to find the index of a free slot in the hash table for our new entry. See the implementation in
dict.h for the details of key hashing, and the structure of the dictionary and its component hash tables (each Redis dictionary contains two hash tables in order to provide incremental rehashing as the dictionary grows).
dictAdd allocates memory for the new entry and stores it in the new index.
The server returns back up the stack from
setGenericCommand, where it increments the reference count on our new value. Redis uses reference counting in order to free memory used by values that have been deleted or have expired. It then “touches” the key so that if any clients are
watching the key, the next
exec command will fail. It also increments the server’s
dirty flag, which it uses to determine when to write out the dump file to disk. Finally, it writes out the reply to the client, which is a shared object,
shared.ok. This is special Redis string object in the protocol that consists of the bytestring “+OK\r\n”. Clients will typically convert this into the equivalent “true” value for their language.
Our key is now set, so we can try the
get users:1234 command again and see how it differs for a found key.
The point where a
get on an existing key and a
get on a non-existent key differ is line 11, where the
de entry in the database’s dictionary is found.
dictGetEntryVal is a simple macro for accessing the field in the
de struct that carries the value associated with the key. Redis updates its statistics to indicate a key hit and returns the value object.
Again, as with the key miss from above (remember the null value is a Redis object, too), the value is decoded into the Redis bytestring protocol. This is the response to the client, and we have concluded our
- The All-Blue-Collar League
- A partial list of metaphors, visual and otherwise, anti-skeuomorphism zealots have to tackle to reach utopia
Paul Smith is a software engineer. More …
ⓒ Paul Smith