diff mbox

[2/5] commit: Support multiple roots above top node

Message ID 20170925122808.14561-3-kwolf@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Kevin Wolf Sept. 25, 2017, 12:28 p.m. UTC
This changes the commit block job to support operation in a graph where
there is more than a single active layer that references the top node.

This involves inserting the commit filter node not only on the path
between the given active node and the top node, but between the top node
and all of its parents.

On completion, bdrv_drop_intermediate() must consider all parents for
updating the backing file link. These parents may be backing files
themselves and such read-only; reopen them temporarily if necessary.
Previously this was achieved by the bdrv_reopen() calls in the commit
block job that made overlay_bs read-write for the whole duration of the
block job, even though write access is only needed on completion.

Now that we consider all parents, overlay_bs is meaningless. It is left
in place in this commit, but we'll remove it soon.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c        | 68 ++++++++++++++++++++++++++++++++++------------------------
 block/commit.c |  2 +-
 2 files changed, 41 insertions(+), 29 deletions(-)

Comments

Eric Blake Sept. 25, 2017, 7:38 p.m. UTC | #1
On 09/25/2017 07:28 AM, Kevin Wolf wrote:
> This changes the commit block job to support operation in a graph where
> there is more than a single active layer that references the top node.
> 
> This involves inserting the commit filter node not only on the path
> between the given active node and the top node, but between the top node
> and all of its parents.
> 
> On completion, bdrv_drop_intermediate() must consider all parents for
> updating the backing file link. These parents may be backing files
> themselves and such read-only; reopen them temporarily if necessary.

s/such/as such/

> Previously this was achieved by the bdrv_reopen() calls in the commit
> block job that made overlay_bs read-write for the whole duration of the
> block job, even though write access is only needed on completion.

Do we know, at the start of the operation, enough to reject attempts
that will eventually fail because we cannot obtain write access at
completion, without having to wait for all the intermediate work before
the completion phase is reached?  If not, that might be a bit of a
regression (where a command used to fail up front, we now waste a lot of
effort, and possibly modify the backing chain irreversably, before
finding out we still fail because we lack write access).

> 
> Now that we consider all parents, overlay_bs is meaningless. It is left
> in place in this commit, but we'll remove it soon.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block.c        | 68 ++++++++++++++++++++++++++++++++++------------------------
>  block/commit.c |  2 +-
>  2 files changed, 41 insertions(+), 29 deletions(-)
>  
>      /* success - we can delete the intermediate states, and link top->base */
> -    if (new_top_bs->backing->role->update_filename) {
> -        backing_file_str = backing_file_str ? backing_file_str : base->filename;
> -        ret = new_top_bs->backing->role->update_filename(new_top_bs->backing,
> -                                                         base, backing_file_str,
> -                                                         &local_err);
> -        if (ret < 0) {
> -            bdrv_set_backing_hd(new_top_bs, top, &error_abort);
> +    backing_file_str = backing_file_str ? backing_file_str : base->filename;
> +
> +    QLIST_FOREACH_SAFE(c, &top->parents, next_parent, next) {
> +        /* Check whether we are allowed to switch c from top to base */
> +        GSList *ignore_children = g_slist_prepend(NULL, c);
> +        bdrv_check_update_perm(base, NULL, c->perm, c->shared_perm,
> +                               ignore_children, &local_err);
> +        if (local_err) {
> +            ret = -EPERM;
> +            error_report_err(local_err);
>              goto exit;
>          }
> -    }
> +        g_slist_free(ignore_children);
>  

And it looks like you DO check up front that permissions are sufficient
for later on.
Kevin Wolf Sept. 26, 2017, 5:35 p.m. UTC | #2
Am 25.09.2017 um 21:38 hat Eric Blake geschrieben:
> On 09/25/2017 07:28 AM, Kevin Wolf wrote:
> > This changes the commit block job to support operation in a graph where
> > there is more than a single active layer that references the top node.
> > 
> > This involves inserting the commit filter node not only on the path
> > between the given active node and the top node, but between the top node
> > and all of its parents.
> > 
> > On completion, bdrv_drop_intermediate() must consider all parents for
> > updating the backing file link. These parents may be backing files
> > themselves and such read-only; reopen them temporarily if necessary.
> 
> s/such/as such/

Yes, thanks.

> > Previously this was achieved by the bdrv_reopen() calls in the commit
> > block job that made overlay_bs read-write for the whole duration of the
> > block job, even though write access is only needed on completion.
> 
> Do we know, at the start of the operation, enough to reject attempts
> that will eventually fail because we cannot obtain write access at
> completion, without having to wait for all the intermediate work before
> the completion phase is reached?  If not, that might be a bit of a
> regression (where a command used to fail up front, we now waste a lot of
> effort, and possibly modify the backing chain irreversably, before
> finding out we still fail because we lack write access).

It is kind of a change in behaviour, but I wouldn't call it a
regression. The reason is that today I'm looking at it with a completely
different perspective than when the command was added.

Originally, we assumed a more or less static block node graph, which in
fact was only a mostly degenerated tree - a backing file chain. Block
jobs were kind of an exceptional thing and we allowed only one per
chain. In this restricted setting, checking at the start of the
operation made sense because we wouldn't allow things to change while
the job was running anyway.

A big part of the recent blockdev work, however, is about getting rid of
arbitrary restrictions like this. We want to allow running two commit
operations in the same backing chain as long as they don't conflict. We
don't want to prevent adding new users to an existing node unless there
is a good reason.

If you consider this, the set of nodes that will get their backing file
rewritten at the end of the block job isn't known yet when the job
starts; and even if we knew it, keeping the nodes writable all the time
while the job runs feels like it could easily conflict with other
callers of bdrv_reopen().

> > Now that we consider all parents, overlay_bs is meaningless. It is left
> > in place in this commit, but we'll remove it soon.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >  block.c        | 68 ++++++++++++++++++++++++++++++++++------------------------
> >  block/commit.c |  2 +-
> >  2 files changed, 41 insertions(+), 29 deletions(-)
> >  
> >      /* success - we can delete the intermediate states, and link top->base */
> > -    if (new_top_bs->backing->role->update_filename) {
> > -        backing_file_str = backing_file_str ? backing_file_str : base->filename;
> > -        ret = new_top_bs->backing->role->update_filename(new_top_bs->backing,
> > -                                                         base, backing_file_str,
> > -                                                         &local_err);
> > -        if (ret < 0) {
> > -            bdrv_set_backing_hd(new_top_bs, top, &error_abort);
> > +    backing_file_str = backing_file_str ? backing_file_str : base->filename;
> > +
> > +    QLIST_FOREACH_SAFE(c, &top->parents, next_parent, next) {
> > +        /* Check whether we are allowed to switch c from top to base */
> > +        GSList *ignore_children = g_slist_prepend(NULL, c);
> > +        bdrv_check_update_perm(base, NULL, c->perm, c->shared_perm,
> > +                               ignore_children, &local_err);
> > +        if (local_err) {
> > +            ret = -EPERM;
> > +            error_report_err(local_err);
> >              goto exit;
> >          }
> > -    }
> > +        g_slist_free(ignore_children);
> >  
> 
> And it looks like you DO check up front that permissions are sufficient
> for later on.

This is already during completion. The completion code uses the
transactional nature of permission updates to make sure that the
in-memory graph stays consistent with the on-disk backing file links
even in case of errors, but it's not really upfront in the sense of
checking when the block job is started. (You also need to hold the BQL
between starting and completing a permission change transaction, so just
moving this code wouldn't work.)

Kevin
diff mbox

Patch

diff --git a/block.c b/block.c
index 7ac8cd521b..e57fab501d 100644
--- a/block.c
+++ b/block.c
@@ -985,14 +985,26 @@  static int bdrv_backing_update_filename(BdrvChild *c, BlockDriverState *base,
                                         const char *filename, Error **errp)
 {
     BlockDriverState *parent = c->opaque;
+    int orig_flags = bdrv_get_flags(parent);
     int ret;
 
+    if (!(orig_flags & BDRV_O_RDWR)) {
+        ret = bdrv_reopen(parent, orig_flags | BDRV_O_RDWR, errp);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
     ret = bdrv_change_backing_file(parent, filename,
                                    base->drv ? base->drv->format_name : "");
     if (ret < 0) {
         error_setg_errno(errp, ret, "Could not update backing file link");
     }
 
+    if (!(orig_flags & BDRV_O_RDWR)) {
+        bdrv_reopen(parent, orig_flags, NULL);
+    }
+
     return ret;
 }
 
@@ -3482,7 +3494,7 @@  BlockDriverState *bdrv_find_base(BlockDriverState *bs)
 int bdrv_drop_intermediate(BlockDriverState *active, BlockDriverState *top,
                            BlockDriverState *base, const char *backing_file_str)
 {
-    BlockDriverState *new_top_bs = NULL;
+    BdrvChild *c, *next;
     Error *local_err = NULL;
     int ret = -EIO;
 
@@ -3492,42 +3504,42 @@  int bdrv_drop_intermediate(BlockDriverState *active, BlockDriverState *top,
         goto exit;
     }
 
-    new_top_bs = bdrv_find_overlay(active, top);
-
-    if (new_top_bs == NULL) {
-        /* we could not find the image above 'top', this is an error */
-        goto exit;
-    }
-
-    /* special case of new_top_bs->backing->bs already pointing to base - nothing
-     * to do, no intermediate images */
-    if (backing_bs(new_top_bs) == base) {
-        ret = 0;
-        goto exit;
-    }
-
     /* Make sure that base is in the backing chain of top */
     if (!bdrv_chain_contains(top, base)) {
         goto exit;
     }
 
     /* success - we can delete the intermediate states, and link top->base */
-    if (new_top_bs->backing->role->update_filename) {
-        backing_file_str = backing_file_str ? backing_file_str : base->filename;
-        ret = new_top_bs->backing->role->update_filename(new_top_bs->backing,
-                                                         base, backing_file_str,
-                                                         &local_err);
-        if (ret < 0) {
-            bdrv_set_backing_hd(new_top_bs, top, &error_abort);
+    backing_file_str = backing_file_str ? backing_file_str : base->filename;
+
+    QLIST_FOREACH_SAFE(c, &top->parents, next_parent, next) {
+        /* Check whether we are allowed to switch c from top to base */
+        GSList *ignore_children = g_slist_prepend(NULL, c);
+        bdrv_check_update_perm(base, NULL, c->perm, c->shared_perm,
+                               ignore_children, &local_err);
+        if (local_err) {
+            ret = -EPERM;
+            error_report_err(local_err);
             goto exit;
         }
-    }
+        g_slist_free(ignore_children);
 
-    bdrv_set_backing_hd(new_top_bs, base, &local_err);
-    if (local_err) {
-        ret = -EPERM;
-        error_report_err(local_err);
-        goto exit;
+        /* If so, update the backing file path in the image file */
+        if (c->role->update_filename) {
+            ret = c->role->update_filename(c, base, backing_file_str,
+                                           &local_err);
+            if (ret < 0) {
+                bdrv_abort_perm_update(base);
+                error_report_err(local_err);
+                goto exit;
+            }
+        }
+
+        /* Do the actual switch in the in-memory graph.
+         * Completes bdrv_check_update_perm() transaction internally. */
+        bdrv_ref(base);
+        bdrv_replace_child(c, base);
+        bdrv_unref(top);
     }
 
     ret = 0;
diff --git a/block/commit.c b/block/commit.c
index 8f0e83578a..610e1cd8f5 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -350,7 +350,7 @@  void commit_start(const char *job_id, BlockDriverState *bs,
         error_propagate(errp, local_err);
         goto fail;
     }
-    bdrv_set_backing_hd(overlay_bs, commit_top_bs, &local_err);
+    bdrv_replace_node(top, commit_top_bs, &local_err);
     if (local_err) {
         bdrv_unref(commit_top_bs);
         commit_top_bs = NULL;