rbd: __rbd_init_snaps_header() bug

There is a bug in the way __rbd_init_snaps_header() handles newly-
discovered snapshots, in the unusual case where the new snapshot id
is less than an existing rbd snapshot id.


When a new snapshot for an rbd image is created it is assigned a new
snapshot id.  The client requests the id from a monitor, and later
updates the osds with information about the snapshot.  The sequence
of snapshot ids is monotonically increasing, which guarantees each
is unique.  And typically all new snapshots will have higher ids
than any others a client has encountered.

Because of the delay between getting assigned a snapshot id and
recording the snapshot, there is a possible race between two clients
requesting new snapshots on the same rbd image.  It's possible in
this case for the client granted a higher id to be the first to
update the set of snapshots for the rbd image.  This leaves a sort
of gap in the set of snapshots until the other client's update gets
recorded.

This is an unlikely scenario, but it appears to be possible.  I'm
not sure whether we even support multiple concurrent clients for the
same rbd image.  However there is code in place to handle this case
and it's wrong, so it needs to be fixed.


The job of __rbd_init_snaps_header() is to compare an rbd's existing
list of snapshots to a new set, and remove any existing snapshots
that are not present in the new set and create any new ones not
present in the existing set.

If any new snapshots exist after the entire list of existing ones
has been examined, all the new ones that remain need to be created.
The logic for handling these last snapshots correct.

However, if a new snapshot is found whose id is less than an
existing one, the logic for handling this is wrong.  The basic
problem is that the name and current id used here are taken from
a place one position beyond where they should be.  In the event
this block of code is executed the first pass through the outer
loop, for example, these values will refer to invalid memory.

The fix is to make this block of code more closely match the
block that handles adding new snapshots at the end.  There are
three differences:
    - we use a do..while loop because we know we initially have an
      entry to process.
    - we insert the new entry at a position within the list instead
      of at the beginning
    - because we use it in the loop condition, we need to update
      our notion of "current id" each pass through.

Signed-off-by: Alex Elder <elder@inktank.com>
---
 drivers/block/rbd.c |   55
+++++++++++++++++++++++++++++----------------------
 1 file changed, 31 insertions(+), 24 deletions(-)

 		old_snap = list_entry(p, struct rbd_snap, node);
@@ -2104,12 +2105,16 @@ static int __rbd_init_snaps_header(struct
rbd_device *rbd_dev)
 		if (i)
 			cur_id = rbd_dev->header.snapc->snaps[i - 1];

+		/*
+		 * If old id is not present in the new context, or
+		 * if there are no more snapshots in the new
+		 * context, this snapshot should be removed.
+		 */
 		if (!i || old_snap->id < cur_id) {
 			/*
-			 * old_snap->id was skipped, thus was
-			 * removed.  If this rbd_dev is mapped to
-			 * the removed snapshot, record that it no
-			 * longer exists, to prevent further I/O.
+			 * If this rbd_dev is mapped to the removed
+			 * snapshot, record that it no longer exists,
+			 * to prevent further I/O.
 			 */
 			if (rbd_dev->snap_id == old_snap->id)
 				rbd_dev->snap_exists = false;
@@ -2122,34 +2127,36 @@ static int __rbd_init_snaps_header(struct
rbd_device *rbd_dev)
 			name = rbd_prev_snap_name(name, first_name);
 			continue;
 		}
-		for (; i > 0;
-		     i--, name = rbd_prev_snap_name(name, first_name)) {
-			if (!name) {
-				WARN_ON(1);
+
+		/*
+		 * A new snapshot (or possibly more than one)
+		 * appeared in the middle of our set of snapshots.
+		 * This could happen of two hosts raced while
+		 * creating snapshots, and the one that was granted
+		 * a higher snapshot id managed to get its snapshot
+		 * saved before the other.
+		 */
+		do {
+			i--;
+			name = rbd_prev_snap_name(name, first_name);
+			if (WARN_ON(!name))
 				return -EINVAL;
-			}
-			cur_id = rbd_dev->header.snapc->snaps[i];
-			/* snapshot removal? handle it above */
-			if (cur_id >= old_snap->id)
-				break;
-			/* a new snapshot */
-			snap = __rbd_add_snap_dev(rbd_dev, i - 1, name);
+			snap = __rbd_add_snap_dev(rbd_dev, i, name);
 			if (IS_ERR(snap))
 				return PTR_ERR(snap);
-
 			/* note that we add it backward so using n and not p */
 			list_add(&snap->node, n);
-			p = &snap->node;
-		}
+
+			cur_id = rbd_dev->header.snapc->snaps[i];
+		} while (i && cur_id < old_snap->id);
 	}
 	/* we're done going over the old snap list, just add what's left */
-	for (; i > 0; i--) {
+	while (i) {
+		i--;
 		name = rbd_prev_snap_name(name, first_name);
-		if (!name) {
-			WARN_ON(1);
+		if (WARN_ON(!name))
 			return -EINVAL;
-		}
-		snap = __rbd_add_snap_dev(rbd_dev, i - 1, name);
+		snap = __rbd_add_snap_dev(rbd_dev, i, name);
 		if (IS_ERR(snap))
 			return PTR_ERR(snap);
 		list_add(&snap->node, &rbd_dev->snaps);


rbd: __rbd_init_snaps_header() bug

Commit Message

Comments

Patch