Compare commits

...

27 Commits

Author SHA1 Message Date
antirez
910b6d34f2 Redis 2.9.50 (Redis 3.0.0 beta-1) 2014-02-11 10:47:54 +01:00
antirez
0725988a07 Cluster: clusterDelNode(): remove node from master's slaves. 2014-02-11 10:34:14 +01:00
antirez
4513d8fcd4 Cluster: UPDATE messages are the norm and verbose.
Logging them at WARNING level was of little utility and of sure disturb.
2014-02-11 10:22:05 +01:00
antirez
9721255178 Cluster: redis-trib fix: handling of another trivial case. 2014-02-11 10:22:02 +01:00
antirez
6d550f2de4 Cluster: configEpoch assignment in SETNODE improved.
Avoid to trash a configEpoch for every slot migrated if this node has
already the max configEpoch across the cluster.

Still work to do in this area but this avoids both ending with a very
high configEpoch without any reason and to flood the system with fsyncs.
2014-02-11 10:21:58 +01:00
antirez
585e9fb886 Cluster: clusterSetStartupEpoch() made more generally useful.
The actual goal of the function was to get the max configEpoch found in
the cluster, so make it general by removing the assignment of the max
epoch to currentEpoch that is useful only at startup.
2014-02-11 10:21:55 +01:00
antirez
8b5196addf Cluster: always increment the configEpoch in SETNODE after import.
Removed a stale conditional preventing the configEpoch from incrementing
after the import in certain conditions. Since the master got a new slot
it should always claim a new configuration.
2014-02-11 10:21:52 +01:00
antirez
2e3f6b0fb3 Cluster: on resharding upgrade version of receiving node.
The node receiving the hash slot needs to have a version that wins over
the other versions in order to force the ownership of the slot.

However the current code is far from perfect since a failover can happen
during the manual resharding. The fix is a work in progress but the
bottom line is that the new version must either be voted as usually,
set by redis-trib manually after it makes sure can't be used by other
nodes, or reserved configEpochs could be used for manual operations (for
example odd versions could be never used by slaves and are always used
by CLUSTER SETSLOT NODE).
2014-02-11 00:39:24 +01:00
antirez
a221ae5ce2 Cluster: fsync at every SETSLOT command puts too pressure on disks.
During slots migration redis-trib can send a number of SETSLOT commands.
Fsyncing every time is a bit too much in production as verified
empirically.

To make sure configs are fsynced on all nodes after a resharding
redis-trib may send something like CLUSTER CONFSYNC.

In this case fsyncs were not providing too much value since anyway
processes can crash in the middle of the resharding of an hash slot, and
redis-trib should be able to recover from this condition anyway.
2014-02-11 00:39:20 +01:00
antirez
77c6fa65f1 Cluster: conditions to clear "migrating" on slot for SETSLOT ... NODE changed.
If the slot is manually assigned to another node, clear the migrating
status regardless of the fact it was previously assigned to us or not,
as long as we no longer have keys for this slot.

This avoid a race during slots migration that may leave the slot in
migrating status in the source node, since it received an update message
from the destination node that is already claiming the slot.

This way we are sure that redis-trib at the end of the slot migration is
always able to close the slot correctly.
2014-02-11 00:39:14 +01:00
antirez
f406a09495 Cluster: remove debugging xputs from redis-trib. 2014-02-11 00:39:11 +01:00
antirez
01de246843 Cluster: redis-trib fix: cover new case of open slot.
The case is the trivial one a single node claiming the slot as
migrating, without nodes claiming it as importing.
2014-02-11 00:39:08 +01:00
antirez
8f25428772 redis-trib: log event after we have reference to 'master'. 2014-02-11 00:39:05 +01:00
antirez
cc97305ec3 Cluster: don't update slave's master if we don't know it.
There is no way we can update the slave's node->slaveof pointer if we
don't know the master (no node with such an ID in our tables).
2014-02-11 00:39:02 +01:00
antirez
fa6f4f21c3 Cluster: ignore slot config changes if we are importing it. 2014-02-11 00:38:59 +01:00
antirez
30214fff3e Cluster: update configEpoch after manually messing with slots. 2014-02-11 00:38:56 +01:00
antirez
cfcb09bf76 Cluster: redis-trib, more info about open slots error. 2014-02-11 00:38:53 +01:00
antirez
8e12fae05e Cluster: fixed inverted arguments in logging function call. 2014-02-10 17:21:17 +01:00
antirez
6a01545744 Cluster: clear the FAIL status for masters without slots.
Masters without slots don't participate to the cluster but just do
redirections, no need to take them in FAIL state if they are back
reachable.
2014-02-10 17:19:16 +01:00
antirez
969a4f1db3 Cluster: replica migration should only work for masters serving slots. 2014-02-10 17:08:47 +01:00
antirez
cb92a1ef08 Cluster: redis-trib del-node variable typo fixed. 2014-02-10 16:59:17 +01:00
antirez
6987a95952 Cluster: clusterReadHandler() fixed to work with new message header. 2014-02-10 16:28:44 +01:00
antirez
ad85f520f8 Cluster: don't propagate PUBLISH two times.
PUBLISH both published messages via Cluster bus and replication when
cluster was enabled, resulting in duplicated message in the slave.
2014-02-10 16:05:26 +01:00
antirez
b82b66b51d Cluster: signature changed to "RCmb" (Redis Cluster message bus).
Sounds better after all.
2014-02-10 16:05:22 +01:00
antirez
b6e04f5584 Cluster: discard bus messages with version != 0. 2014-02-10 16:05:18 +01:00
antirez
0ee1a78c86 Cluster: added signature + version in bus packets. 2014-02-10 16:05:15 +01:00
antirez
ff6a75a0c1 3.0 release notes added. 2014-02-10 15:36:02 +01:00
6 changed files with 119 additions and 114 deletions

View File

@@ -1,83 +1,45 @@
Redis 2.6 release notes
Redis 3.0 release notes
=======================
Migrating from 2.4 to 2.6
WARNING: Redis 3.0 is currently a BETA not suitable for production environments.
--------------------------------------------------------------------------------
Upgrade urgency levels:
LOW: No need to upgrade unless there are new features you want to use.
MODERATE: Program an upgrade of the server, but it's not urgent.
HIGH: There is a critical bug that may affect a subset of users. Upgrade!
CRITICAL: There is a critical bug affecting MOST USERS. Upgrade ASAP.
--------------------------------------------------------------------------------
--[ Redis 3.0.0 Beta 1 (version 2.9.50) ] Release date: 11 Feb 2014
This is the first beta of Redis 3.0.0 (official version is 2.8.50).
The following is a list of improvements in Redis 3.0, compared to Redis 2.8.
* [NEW] Redis Cluster: a distributed implementation of a subset of Redis.
* [NEW] New "embedded string" object encoding resulting in less cache
misses. Big speed gain under certain work loads.
* [NEW] WAIT command to block waiting for a write to be transmitted to
the specified number of slaves.
* [NEW] MIGRATE connection caching. Much faster keys migraitons.
* [NEW] MIGARTE new options COPY and REPLACE.
* [NEW] CLIENT PAUSE command: stop processing client requests for a
specified amount of time.
Migrating from 2.8 to 3.0
=========================
Redis 2.4 is mostly a strict subset of 2.6. However there are a few things
that you should be aware of:
* You can't use .rdb and AOF files generated with 2.6 into a 2.4 instance.
* 2.6 slaves can be attached to 2.4 masters, but not the contrary, and only
for the time needed to perform the version upgrade.
There are also a few API differences, that are unlikely to cause problems,
but it is better to keep them in mind:
* SORT now will refuse to sort in numerical mode elements that can't be parsed
as numbers.
* EXPIREs now all have millisecond resolution (but this is very unlikely to
break code that was not conceived exploting the previous resolution error
in some way.)
* INFO output is a bit different now, and contains empty lines and comments
starting with '#'. All the major clients should be already fixed to work
with the new INFO format.
Also the following redis.conf and CONFIG GET / SET parameters changed name:
* hash-max-zipmap-entries, now replaced by hash-max-ziplist-entries
* hash-max-zipmap-value, now replaced by hash-max-ziplist-value
* glueoutputbuf was now completely removed as it does not make sense
---------
CHANGELOG
---------
What's new in Redis 2.6.0
=========================
UPGRADE URGENCY: We suggest new users to start with 2.6.0, and old users to
upgrade after some testing of the application with the new
Redis version.
* Server side Lua scripting, see http://redis.io/commands/eval
* Virtual Memory removed (was deprecated in 2.4)
* Hardcoded limits about max number of clients removed.
* AOF low level semantics is generally more sane, and especially when used
in slaves.
* Milliseconds resolution expires, also added new commands with milliseconds
precision (PEXPIRE, PTTL, ...).
* Clients max output buffer soft and hard limits. You can specifiy different
limits for different classes of clients (normal,pubsub,slave).
* AOF is now able to rewrite aggregate data types using variadic commands,
often producing an AOF that is faster to save, load, and is smaller in size.
* Every redis.conf directive is now accepted as a command line option for the
redis-server binary, with the same name and number of arguments.
* Hash table seed randomization for protection against collisions attacks.
* Performances improved when writing large objects to Redis.
* Significant parts of the core refactored or rewritten. New internal APIs
and core changes allowed to develop Redis Cluster on top of the new code,
however for 2.6 all the cluster code was removed, and will be released with
Redis 3.0 when it is more complete and stable.
* Redis ASCII art logo added at startup.
* Crash report on memory violation or failed asserts improved significantly
to make debugging of hard to catch bugs simpler.
* redis-benchmark improvements: ability to run selected tests,
CSV output, faster, better help.
* redis-cli improvements: --eval for comfortable development of Lua scripts.
* SHUTDOWN now supports two optional arguments: "SAVE" and "NOSAVE".
* INFO output split into sections, the command is now able to just show
pecific sections.
* New statistics about how many time a command was called, and how much
execution time it used (INFO commandstats).
* More predictable SORT behavior in edge cases.
* INCRBYFLOAT and HINCRBYFLOAT commands.
Redis 3.0 is mostly a strict subset of 2.8, you should not have any problem
upgrading your application from 2.8 to 3.0.
--------------------------------------------------------------------------------
Credits: Where not specified the implementation and design are done by
Salvatore Sanfilippo and Pieter Noordhuis. Thanks to VMware for making all
this possible. Also many thanks to all the other contributors and the amazing
community we have.
Credits: Where not specified the implementation and design is done by
Salvatore Sanfilippo. Thanks to Pivotal for making all this possible.
Also many thanks to all the other contributors and the amazing community
we have.
See commit messages for more credits.

View File

@@ -73,21 +73,19 @@ void resetManualFailover(void);
* Initialization
* -------------------------------------------------------------------------- */
/* This function is called at startup in order to set the currentEpoch
* (which is not saved on permanent storage) to the greatest configEpoch found
* in the loaded nodes (configEpoch is stored on permanent storage as soon as
* it changes for some node). */
void clusterSetStartupEpoch() {
/* Return the greatest configEpoch found in the cluster. */
uint64_t clusterGetMaxEpoch(void) {
uint64_t max = 0;
dictIterator *di;
dictEntry *de;
di = dictGetSafeIterator(server.cluster->nodes);
while((de = dictNext(di)) != NULL) {
clusterNode *node = dictGetVal(de);
if (node->configEpoch > server.cluster->currentEpoch)
server.cluster->currentEpoch = node->configEpoch;
if (node->configEpoch > max) max = node->configEpoch;
}
dictReleaseIterator(di);
return max;
}
int clusterLoadConfig(char *filename) {
@@ -227,7 +225,10 @@ int clusterLoadConfig(char *filename) {
/* Config sanity check */
redisAssert(server.cluster->myself != NULL);
redisLog(REDIS_NOTICE,"Node configuration loaded, I'm %.40s", myself->name);
clusterSetStartupEpoch();
/* Set the currentEpoch to the max epoch found in the master.
* FIXME: this should actually be part of the persistent state, as
* documented in the Github issue #1479. */
server.cluster->currentEpoch = clusterGetMaxEpoch();
return REDIS_OK;
fmterr:
@@ -663,7 +664,11 @@ void clusterDelNode(clusterNode *delnode) {
}
dictReleaseIterator(di);
/* 3) Free the node, unlinking it from the cluster. */
/* 3) Remove this node from its master's slaves if needed. */
if (nodeIsSlave(delnode) && delnode->slaveof)
clusterNodeRemoveSlave(delnode->slaveof,delnode);
/* 4) Free the node, unlinking it from the cluster. */
freeClusterNode(delnode);
}
@@ -830,10 +835,11 @@ void clearNodeFailureIfNeeded(clusterNode *node) {
/* For slaves we always clear the FAIL flag if we can contact the
* node again. */
if (nodeIsSlave(node)) {
if (nodeIsSlave(node) || node->numslots == 0) {
redisLog(REDIS_NOTICE,
"Clear FAIL state for node %.40s: slave is reachable again.",
node->name);
"Clear FAIL state for node %.40s: %s is reachable again.",
node->name,
nodeIsSlave(node) ? "slave" : "master without slots");
node->flags &= ~REDIS_NODE_FAIL;
clusterDoBeforeSleep(CLUSTER_TODO_UPDATE_STATE|CLUSTER_TODO_SAVE_CONFIG);
}
@@ -1111,11 +1117,12 @@ void clusterUpdateSlotsConfigWith(clusterNode *sender, uint64_t senderConfigEpoc
if (bitmapTestBit(slots,j)) {
/* We rebind the slot to the new node claiming it if:
* 1) The slot was unassigned.
* 2) The new node claims it with a greater configEpoch. */
if (server.cluster->slots[j] == sender) continue;
* 2) The new node claims it with a greater configEpoch.
* 3) We are not currently importing the slot. */
if (server.cluster->slots[j] == sender ||
server.cluster->importing_slots_from[j]) continue;
if (server.cluster->slots[j] == NULL ||
server.cluster->slots[j]->configEpoch <
senderConfigEpoch)
server.cluster->slots[j]->configEpoch < senderConfigEpoch)
{
if (server.cluster->slots[j] == curmaster)
newmaster = sender;
@@ -1166,7 +1173,8 @@ int clusterProcessPacket(clusterLink *link) {
type, (unsigned long) totlen);
/* Perform sanity checks */
if (totlen < 8) return 1;
if (totlen < 16) return 1; /* At least signature, version, totlen, count. */
if (ntohs(hdr->ver) != 0) return 1; /* Can't handle versions other than 0. */
if (totlen > sdslen(link->rcvbuf)) return 1;
if (type == CLUSTERMSG_TYPE_PING || type == CLUSTERMSG_TYPE_PONG ||
type == CLUSTERMSG_TYPE_MEET)
@@ -1360,7 +1368,7 @@ int clusterProcessPacket(clusterLink *link) {
}
/* Master node changed for this slave? */
if (sender->slaveof != master) {
if (master && sender->slaveof != master) {
if (sender->slaveof)
clusterNodeRemoveSlave(sender->slaveof,sender);
clusterNodeAddSlave(master,sender);
@@ -1426,7 +1434,7 @@ int clusterProcessPacket(clusterLink *link) {
if (server.cluster->slots[j]->configEpoch >
senderConfigEpoch)
{
redisLog(REDIS_WARNING,
redisLog(REDIS_VERBOSE,
"Node %.40s has old slots configuration, sending "
"an UPDATE message about %.40s",
sender->name, server.cluster->slots[j]->name);
@@ -1579,18 +1587,22 @@ void clusterReadHandler(aeEventLoop *el, int fd, void *privdata, int mask) {
while(1) { /* Read as long as there is data to read. */
rcvbuflen = sdslen(link->rcvbuf);
if (rcvbuflen < 4) {
/* First, obtain the first four bytes to get the full message
if (rcvbuflen < 8) {
/* First, obtain the first 8 bytes to get the full message
* length. */
readlen = 4 - rcvbuflen;
readlen = 8 - rcvbuflen;
} else {
/* Finally read the full message. */
hdr = (clusterMsg*) link->rcvbuf;
if (rcvbuflen == 4) {
/* Perform some sanity check on the message length. */
if (ntohl(hdr->totlen) < CLUSTERMSG_MIN_LEN) {
if (rcvbuflen == 8) {
/* Perform some sanity check on the message signature
* and length. */
if (memcmp(hdr->sig,"RCmb",4) != 0 ||
ntohl(hdr->totlen) < CLUSTERMSG_MIN_LEN)
{
redisLog(REDIS_WARNING,
"Bad message length received from Cluster bus.");
"Bad message length or signature received "
"from Cluster bus.");
handleLinkIOError(link);
return;
}
@@ -1616,7 +1628,7 @@ void clusterReadHandler(aeEventLoop *el, int fd, void *privdata, int mask) {
}
/* Total length obtained? Process this packet. */
if (rcvbuflen >= 4 && rcvbuflen == ntohl(hdr->totlen)) {
if (rcvbuflen >= 8 && rcvbuflen == ntohl(hdr->totlen)) {
if (clusterProcessPacket(link)) {
sdsfree(link->rcvbuf);
link->rcvbuf = sdsempty();
@@ -1677,6 +1689,10 @@ void clusterBuildMessageHdr(clusterMsg *hdr, int type) {
myself->slaveof : myself;
memset(hdr,0,sizeof(*hdr));
hdr->sig[0] = 'R';
hdr->sig[1] = 'C';
hdr->sig[2] = 'm';
hdr->sig[3] = 'b';
hdr->type = htons(type);
memcpy(hdr->sender,myself->name,REDIS_CLUSTER_NAMELEN);
@@ -2255,7 +2271,7 @@ void clusterHandleSlaveMigration(int max_slaves) {
if (nodeIsSlave(node) || nodeFailed(node)) continue;
okslaves = clusterCountNonFailingSlaves(node);
if (okslaves == 0 && target == NULL) target = node;
if (okslaves == 0 && target == NULL && node->numslots > 0) target = node;
if (okslaves == max_slaves) {
for (j = 0; j < node->numslaves; j++) {
if (memcmp(node->slaves[j]->name,
@@ -2478,7 +2494,7 @@ void clusterCron(void) {
if (nodeIsSlave(myself) && nodeIsMaster(node) && !nodeFailed(node)) {
int okslaves = clusterCountNonFailingSlaves(node);
if (okslaves == 0) orphaned_masters++;
if (okslaves == 0 && node->numslots > 0) orphaned_masters++;
if (okslaves > max_slaves) max_slaves = okslaves;
if (nodeIsSlave(myself) && myself->slaveof == node)
this_slaves = okslaves;
@@ -3143,10 +3159,10 @@ void clusterCommand(redisClient *c) {
return;
}
}
/* If this node was the slot owner and the slot was marked as
* migrating, assigning the slot to another node will clear
/* If this slot is in migrating status but we have no keys
* for it assigning the slot to another node will clear
* the migratig status. */
if (server.cluster->slots[slot] == myself &&
if (countKeysInSlot(slot) == 0 &&
server.cluster->migrating_slots_to[slot])
server.cluster->migrating_slots_to[slot] = NULL;
@@ -3154,14 +3170,30 @@ void clusterCommand(redisClient *c) {
* itself also clears the importing status. */
if (n == myself &&
server.cluster->importing_slots_from[slot])
{
/* This slot was manually migrated, set this node configEpoch
* to a new epoch so that the new version can be propagated
* by the cluster.
*
* FIXME: the new version should be agreed otherwise a race
* is possible if while a manual resharding is in progress
* the master is failed over by a slave. */
uint64_t maxEpoch = clusterGetMaxEpoch();
if (myself->configEpoch != maxEpoch) {
server.cluster->currentEpoch++;
myself->configEpoch = server.cluster->currentEpoch;
clusterDoBeforeSleep(CLUSTER_TODO_FSYNC_CONFIG);
}
server.cluster->importing_slots_from[slot] = NULL;
}
clusterDelSlot(slot);
clusterAddSlot(n,slot);
} else {
addReplyError(c,"Invalid CLUSTER SETSLOT action or number of arguments");
return;
}
clusterDoBeforeSleep(CLUSTER_TODO_UPDATE_STATE|CLUSTER_TODO_SAVE_CONFIG);
clusterDoBeforeSleep(CLUSTER_TODO_SAVE_CONFIG|CLUSTER_TODO_UPDATE_STATE);
addReply(c,shared.ok);
} else if (!strcasecmp(c->argv[1]->ptr,"info") && c->argc == 2) {
/* CLUSTER INFO */

View File

@@ -195,7 +195,10 @@ union clusterMsgData {
typedef struct {
char sig[4]; /* Siganture "RCmb" (Redis Cluster message bus). */
uint32_t totlen; /* Total length of this message */
uint16_t ver; /* Protocol version, currently set to 0. */
uint16_t notused0; /* 2 bytes not used. */
uint16_t type; /* Message type */
uint16_t count; /* Only used for some kind of messages. */
uint64_t currentEpoch; /* The epoch accordingly to the sending node. */

View File

@@ -306,8 +306,10 @@ void punsubscribeCommand(redisClient *c) {
void publishCommand(redisClient *c) {
int receivers = pubsubPublishMessage(c->argv[1],c->argv[2]);
if (server.cluster_enabled) clusterPropagatePublish(c->argv[1],c->argv[2]);
forceCommandPropagation(c,REDIS_PROPAGATE_REPL);
if (server.cluster_enabled)
clusterPropagatePublish(c->argv[1],c->argv[2]);
else
forceCommandPropagation(c,REDIS_PROPAGATE_REPL);
addReplyLongLong(c,receivers);
}

View File

@@ -359,11 +359,11 @@ class RedisTrib
@nodes.each{|n|
if n.info[:migrating].size > 0
cluster_error \
"[WARNING] Node #{n} has slots in migrating state."
"[WARNING] Node #{n} has slots in migrating state (#{n.info[:migrating].keys.join(",")})."
open_slots += n.info[:migrating].keys
elsif n.info[:importing].size > 0
cluster_error \
"[WARNING] Node #{n} has slots in importing state."
"[WARNING] Node #{n} has slots in importing state (#{n.info[:importing].keys.join(",")})."
open_slots += n.info[:importing].keys
end
}
@@ -469,6 +469,12 @@ class RedisTrib
# importing state in 1 slot. That's trivial to address.
if migrating.length == 1 && importing.length == 1
move_slot(migrating[0],importing[0],slot,:verbose=>true)
elsif migrating.length == 1 && importing.length == 0
xputs ">>> Setting #{slot} as STABLE"
migrating[0].r.cluster("setslot",slot,"stable")
elsif migrating.length == 0 && importing.length == 1
xputs ">>> Setting #{slot} as STABLE"
importing[0].r.cluster("setslot",slot,"stable")
else
xputs "[ERR] Sorry, Redis-trib can't fix this slot yet (work in progress)"
end
@@ -898,10 +904,10 @@ class RedisTrib
xputs ">>> Sending CLUSTER FORGET messages to the cluster..."
@nodes.each{|n|
next if n == node
if n.info[:replicate] && n.info[:replicate].downcase == node_id
if n.info[:replicate] && n.info[:replicate].downcase == id
# Reconfigure the slave to replicate with some other node
xputs ">>> #{n} as replica of #{master}"
master = get_master_with_least_replicas
xputs ">>> #{n} as replica of #{master}"
n.r.cluster("replicate",master.info[:name])
end
n.r.cluster("forget",argv[1])

View File

@@ -1 +1 @@
#define REDIS_VERSION "2.9.11"
#define REDIS_VERSION "2.9.50"