[v6,04/22] block: Introduce image file locking

Message ID	1464943756-14143-5-git-send-email-famz@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org> From: Fam Zheng <famz@redhat.com> To: qemu-devel@nongnu.org Date: Fri, 3 Jun 2016 16:48:58 +0800 Message-Id: <1464943756-14143-5-git-send-email-famz@redhat.com> In-Reply-To: <1464943756-14143-1-git-send-email-famz@redhat.com> References: <1464943756-14143-1-git-send-email-famz@redhat.com> Subject: [Qemu-devel] [PATCH v6 04/22] block: Introduce image file locking Precedence: list Cc: Kevin Wolf <kwolf@redhat.com>, qemu-block@nongnu.org, rjones@redhat.com, John Snow <jsnow@redhat.com>, Jeff Cody <jcody@redhat.com>, Markus Armbruster <armbru@redhat.com>, Max Reitz <mreitz@redhat.com>, stefanha@redhat.com, den@openvz.org, pbonzini@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>

Fam Zheng June 3, 2016, 8:48 a.m. UTC

Block drivers can implement this new operation .bdrv_lockf to actually lock the
image in the protocol specific way.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block.c                   | 57 +++++++++++++++++++++++++++++++++++++++++++++++
 include/block/block.h     | 11 ++++++++-
 include/block/block_int.h |  5 +++++
 3 files changed, 72 insertions(+), 1 deletion(-)

Jason Dillaman June 8, 2016, 1:51 a.m. UTC | #1

On Fri, Jun 3, 2016 at 4:48 AM, Fam Zheng <famz@redhat.com> wrote:
> +typedef enum {
> +    /* The values are ordered so that lower number implies higher restriction.
> +     * Starting from 1 to make 0 an invalid value.
> +     * */
> +    BDRV_LOCKF_EXCLUSIVE = 1,
> +    BDRV_LOCKF_SHARED,
> +    BDRV_LOCKF_UNLOCK,
> +} BdrvLockfCmd;
> +

We started to talk about new APIs in librbd to support this feature
where we don't need to worry about admin action should QEMU crash
while holding the lock.

Any chance for separating the UNLOCK enum into the exclusive vs shared
case? We could do some magic in the rbd block driver to guess how it
was locked but it seems like it would be cleaner (at least for us) to
explicitly call out what type of unlock you are requesting since it
will involve different API methods.

Fam Zheng June 8, 2016, 2:45 a.m. UTC | #2

On Tue, 06/07 21:51, Jason Dillaman wrote:
> On Fri, Jun 3, 2016 at 4:48 AM, Fam Zheng <famz@redhat.com> wrote:
> > +typedef enum {
> > +    /* The values are ordered so that lower number implies higher restriction.
> > +     * Starting from 1 to make 0 an invalid value.
> > +     * */
> > +    BDRV_LOCKF_EXCLUSIVE = 1,
> > +    BDRV_LOCKF_SHARED,
> > +    BDRV_LOCKF_UNLOCK,
> > +} BdrvLockfCmd;
> > +
> 
> We started to talk about new APIs in librbd to support this feature
> where we don't need to worry about admin action should QEMU crash
> while holding the lock.
> 
> Any chance for separating the UNLOCK enum into the exclusive vs shared
> case? We could do some magic in the rbd block driver to guess how it
> was locked but it seems like it would be cleaner (at least for us) to
> explicitly call out what type of unlock you are requesting since it
> will involve different API methods.

This should be possible but I'm not sure I fully understand the rationale
behind it. The server side who implements the lock and keeps track of states
should have the lock type information already, why is it necessary for the
client to be explicit? It doesn't sound necessary to me at all from an
interface point of view. Can you elaborate more on the API methods that need
this?

Fam

Kevin Wolf June 17, 2016, 11:34 a.m. UTC | #3

Am 03.06.2016 um 10:48 hat Fam Zheng geschrieben:
> Block drivers can implement this new operation .bdrv_lockf to actually lock the
> image in the protocol specific way.
> 
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  block.c                   | 57 +++++++++++++++++++++++++++++++++++++++++++++++
>  include/block/block.h     | 11 ++++++++-
>  include/block/block_int.h |  5 +++++
>  3 files changed, 72 insertions(+), 1 deletion(-)
> 
> diff --git a/block.c b/block.c
> index 736432f..4c2a3ff 100644
> --- a/block.c
> +++ b/block.c
> @@ -854,6 +854,50 @@ out:
>      g_free(gen_node_name);
>  }
>  
> +BdrvLockfCmd bdrv_get_locking_cmd(int flags)
> +{
> +    if (flags & BDRV_O_NO_LOCK) {
> +        return BDRV_LOCKF_UNLOCK;
> +    } else if (flags & BDRV_O_SHARED_LOCK) {
> +        return BDRV_LOCKF_SHARED;
> +    } else if (flags & BDRV_O_RDWR) {
> +        return BDRV_LOCKF_EXCLUSIVE;
> +    } else {
> +        return BDRV_LOCKF_SHARED;
> +    }
> +}

It feels a bit counterintuitive to use the very same operation for
"start of critical section, but don't actually lock" and "end of
critical section".

Is there a specific reason why you chose this instead of separate
.bdrv_lock/bdrv_unlock callbacks?

> +static int bdrv_lock_unlock_image_do(BlockDriverState *bs, BdrvLockfCmd cmd)
> +{
> +    int ret;
> +
> +    if (bs->cur_lock == cmd) {
> +        return 0;
> +    } else if (!bs->drv) {
> +        return -ENOMEDIUM;
> +    } else if (!bs->drv->bdrv_lockf) {
> +        return 0;
> +    }
> +    ret = bs->drv->bdrv_lockf(bs, cmd);
> +    if (ret == -ENOTSUP) {
> +        /* Handle it the same way as !bs->drv->bdrv_lockf */
> +        ret = 0;
> +    } else if (ret == 0) {
> +        bs->cur_lock = cmd;
> +    }
> +    return ret;
> +}

Okay, so the callback is supposed to support going from exclusive to
shared and vice versa? Noted for the next patches.

> +static int bdrv_lock_image(BlockDriverState *bs)
> +{
> +    return bdrv_lock_unlock_image_do(bs, bdrv_get_locking_cmd(bs->open_flags));
> +}
> +
> +static int bdrv_unlock_image(BlockDriverState *bs)
> +{
> +    return bdrv_lock_unlock_image_do(bs, BDRV_LOCKF_UNLOCK);
> +}
> +
>  static QemuOptsList bdrv_runtime_opts = {
>      .name = "bdrv_common",
>      .head = QTAILQ_HEAD_INITIALIZER(bdrv_runtime_opts.head),
> @@ -1003,6 +1047,14 @@ static int bdrv_open_common(BlockDriverState *bs, BdrvChild *file,
>          goto free_and_fail;
>      }
>  
> +    if (!(open_flags & (BDRV_O_NO_LOCK | BDRV_O_INACTIVE))) {
> +        ret = bdrv_lock_image(bs);
> +        if (ret) {
> +            error_setg(errp, "Failed to lock image");
> +            goto free_and_fail;
> +        }
> +    }
> +
>      ret = refresh_total_sectors(bs, bs->total_sectors);
>      if (ret < 0) {
>          error_setg_errno(errp, -ret, "Could not refresh total sector count");
> @@ -2164,6 +2216,7 @@ static void bdrv_close(BlockDriverState *bs)
>      if (bs->drv) {
>          BdrvChild *child, *next;
>  
> +        bdrv_unlock_image(bs);
>          bs->drv->bdrv_close(bs);
>          bs->drv = NULL;
>  
> @@ -3187,6 +3240,9 @@ void bdrv_invalidate_cache(BlockDriverState *bs, Error **errp)
>          error_setg_errno(errp, -ret, "Could not refresh total sector count");
>          return;
>      }
> +    if (!(bs->open_flags & BDRV_O_NO_LOCK)) {
> +        bdrv_lock_image(bs);
> +    }
>  }

I think the if is unnecessary, bdrv_lock_image() already looks at
BDRV_O_NO_LOCK.

>  void bdrv_invalidate_cache_all(Error **errp)
> @@ -3229,6 +3285,7 @@ static int bdrv_inactivate_recurse(BlockDriverState *bs,
>      }
>  
>      if (setting_flag) {
> +        ret = bdrv_unlock_image(bs);
>          bs->open_flags |= BDRV_O_INACTIVE;
>      }
>      return 0;

Kevin

Fam Zheng June 22, 2016, 7:23 a.m. UTC | #4

On Fri, 06/17 13:34, Kevin Wolf wrote:
> Am 03.06.2016 um 10:48 hat Fam Zheng geschrieben:
> > Block drivers can implement this new operation .bdrv_lockf to actually lock the
> > image in the protocol specific way.
> > 
> > Signed-off-by: Fam Zheng <famz@redhat.com>
> > ---
> >  block.c                   | 57 +++++++++++++++++++++++++++++++++++++++++++++++
> >  include/block/block.h     | 11 ++++++++-
> >  include/block/block_int.h |  5 +++++
> >  3 files changed, 72 insertions(+), 1 deletion(-)
> > 
> > diff --git a/block.c b/block.c
> > index 736432f..4c2a3ff 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -854,6 +854,50 @@ out:
> >      g_free(gen_node_name);
> >  }
> >  
> > +BdrvLockfCmd bdrv_get_locking_cmd(int flags)
> > +{
> > +    if (flags & BDRV_O_NO_LOCK) {
> > +        return BDRV_LOCKF_UNLOCK;
> > +    } else if (flags & BDRV_O_SHARED_LOCK) {
> > +        return BDRV_LOCKF_SHARED;
> > +    } else if (flags & BDRV_O_RDWR) {
> > +        return BDRV_LOCKF_EXCLUSIVE;
> > +    } else {
> > +        return BDRV_LOCKF_SHARED;
> > +    }
> > +}
> 
> It feels a bit counterintuitive to use the very same operation for
> "start of critical section, but don't actually lock" and "end of
> critical section".
> 
> Is there a specific reason why you chose this instead of separate
> .bdrv_lock/bdrv_unlock callbacks?

Because unlike open(2)/close(2), locking and unlocking are typically
implemented with one parameterized operation (fcntl(2)) underneath, I followed
that naturally.

> 
> > +static int bdrv_lock_unlock_image_do(BlockDriverState *bs, BdrvLockfCmd cmd)
> > +{
> > +    int ret;
> > +
> > +    if (bs->cur_lock == cmd) {
> > +        return 0;
> > +    } else if (!bs->drv) {
> > +        return -ENOMEDIUM;
> > +    } else if (!bs->drv->bdrv_lockf) {
> > +        return 0;
> > +    }
> > +    ret = bs->drv->bdrv_lockf(bs, cmd);
> > +    if (ret == -ENOTSUP) {
> > +        /* Handle it the same way as !bs->drv->bdrv_lockf */
> > +        ret = 0;
> > +    } else if (ret == 0) {
> > +        bs->cur_lock = cmd;
> > +    }
> > +    return ret;
> > +}
> 
> Okay, so the callback is supposed to support going from exclusive to
> shared and vice versa? Noted for the next patches.

Yes.

> 
> > +static int bdrv_lock_image(BlockDriverState *bs)
> > +{
> > +    return bdrv_lock_unlock_image_do(bs, bdrv_get_locking_cmd(bs->open_flags));
> > +}
> > +
> > +static int bdrv_unlock_image(BlockDriverState *bs)
> > +{
> > +    return bdrv_lock_unlock_image_do(bs, BDRV_LOCKF_UNLOCK);
> > +}
> > +
> >  static QemuOptsList bdrv_runtime_opts = {
> >      .name = "bdrv_common",
> >      .head = QTAILQ_HEAD_INITIALIZER(bdrv_runtime_opts.head),
> > @@ -1003,6 +1047,14 @@ static int bdrv_open_common(BlockDriverState *bs, BdrvChild *file,
> >          goto free_and_fail;
> >      }
> >  
> > +    if (!(open_flags & (BDRV_O_NO_LOCK | BDRV_O_INACTIVE))) {
> > +        ret = bdrv_lock_image(bs);
> > +        if (ret) {
> > +            error_setg(errp, "Failed to lock image");
> > +            goto free_and_fail;
> > +        }
> > +    }
> > +
> >      ret = refresh_total_sectors(bs, bs->total_sectors);
> >      if (ret < 0) {
> >          error_setg_errno(errp, -ret, "Could not refresh total sector count");
> > @@ -2164,6 +2216,7 @@ static void bdrv_close(BlockDriverState *bs)
> >      if (bs->drv) {
> >          BdrvChild *child, *next;
> >  
> > +        bdrv_unlock_image(bs);
> >          bs->drv->bdrv_close(bs);
> >          bs->drv = NULL;
> >  
> > @@ -3187,6 +3240,9 @@ void bdrv_invalidate_cache(BlockDriverState *bs, Error **errp)
> >          error_setg_errno(errp, -ret, "Could not refresh total sector count");
> >          return;
> >      }
> > +    if (!(bs->open_flags & BDRV_O_NO_LOCK)) {
> > +        bdrv_lock_image(bs);
> > +    }
> >  }
> 
> I think the if is unnecessary, bdrv_lock_image() already looks at
> BDRV_O_NO_LOCK.

I intentionally made enum BDRV_O_LOCK_* start from non-zero, so bdrv_lock_image
will call drv->bdrv_lockf even for BDRV_O_NO_LOCK. In the case of
lock-mode=off, I think we should skip that.

> 
> >  void bdrv_invalidate_cache_all(Error **errp)
> > @@ -3229,6 +3285,7 @@ static int bdrv_inactivate_recurse(BlockDriverState *bs,
> >      }
> >  
> >      if (setting_flag) {
> > +        ret = bdrv_unlock_image(bs);
> >          bs->open_flags |= BDRV_O_INACTIVE;
> >      }
> >      return 0;
> 
> Kevin

Fam

Kevin Wolf June 22, 2016, 8:30 a.m. UTC | #5

Am 22.06.2016 um 09:23 hat Fam Zheng geschrieben:
> On Fri, 06/17 13:34, Kevin Wolf wrote:
> > Am 03.06.2016 um 10:48 hat Fam Zheng geschrieben:
> > > Block drivers can implement this new operation .bdrv_lockf to actually lock the
> > > image in the protocol specific way.
> > > 
> > > Signed-off-by: Fam Zheng <famz@redhat.com>
> > > ---
> > >  block.c                   | 57 +++++++++++++++++++++++++++++++++++++++++++++++
> > >  include/block/block.h     | 11 ++++++++-
> > >  include/block/block_int.h |  5 +++++
> > >  3 files changed, 72 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/block.c b/block.c
> > > index 736432f..4c2a3ff 100644
> > > --- a/block.c
> > > +++ b/block.c
> > > @@ -854,6 +854,50 @@ out:
> > >      g_free(gen_node_name);
> > >  }
> > >  
> > > +BdrvLockfCmd bdrv_get_locking_cmd(int flags)
> > > +{
> > > +    if (flags & BDRV_O_NO_LOCK) {
> > > +        return BDRV_LOCKF_UNLOCK;
> > > +    } else if (flags & BDRV_O_SHARED_LOCK) {
> > > +        return BDRV_LOCKF_SHARED;
> > > +    } else if (flags & BDRV_O_RDWR) {
> > > +        return BDRV_LOCKF_EXCLUSIVE;
> > > +    } else {
> > > +        return BDRV_LOCKF_SHARED;
> > > +    }
> > > +}
> > 
> > It feels a bit counterintuitive to use the very same operation for
> > "start of critical section, but don't actually lock" and "end of
> > critical section".
> > 
> > Is there a specific reason why you chose this instead of separate
> > .bdrv_lock/bdrv_unlock callbacks?
> 
> Because unlike open(2)/close(2), locking and unlocking are typically
> implemented with one parameterized operation (fcntl(2)) underneath, I followed
> that naturally.

Is it really "typically", or is this an idiosyncrasy of POSIX and we end
up modeling our interface for one particular block driver even though
for others a different interface might be preferable? Did you look at
more interfaces?

Apart from that, we generally make our internal interfaces so that they
are easy to use and code is easy to read rather than directly mapping
POSIX everywhere. For example, our I/O function never do short
reads/writes even though POSIX does that.

> > > @@ -3187,6 +3240,9 @@ void bdrv_invalidate_cache(BlockDriverState *bs, Error **errp)
> > >          error_setg_errno(errp, -ret, "Could not refresh total sector count");
> > >          return;
> > >      }
> > > +    if (!(bs->open_flags & BDRV_O_NO_LOCK)) {
> > > +        bdrv_lock_image(bs);
> > > +    }
> > >  }
> > 
> > I think the if is unnecessary, bdrv_lock_image() already looks at
> > BDRV_O_NO_LOCK.
> 
> I intentionally made enum BDRV_O_LOCK_* start from non-zero, so bdrv_lock_image
> will call drv->bdrv_lockf even for BDRV_O_NO_LOCK. In the case of
> lock-mode=off, I think we should skip that.

Fair enough.

Kevin

Fam Zheng June 22, 2016, 9:04 a.m. UTC | #6

On Wed, 06/22 10:30, Kevin Wolf wrote:
> Am 22.06.2016 um 09:23 hat Fam Zheng geschrieben:
> > On Fri, 06/17 13:34, Kevin Wolf wrote:
> > > Am 03.06.2016 um 10:48 hat Fam Zheng geschrieben:
> > > > Block drivers can implement this new operation .bdrv_lockf to actually lock the
> > > > image in the protocol specific way.
> > > > 
> > > > Signed-off-by: Fam Zheng <famz@redhat.com>
> > > > ---
> > > >  block.c                   | 57 +++++++++++++++++++++++++++++++++++++++++++++++
> > > >  include/block/block.h     | 11 ++++++++-
> > > >  include/block/block_int.h |  5 +++++
> > > >  3 files changed, 72 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/block.c b/block.c
> > > > index 736432f..4c2a3ff 100644
> > > > --- a/block.c
> > > > +++ b/block.c
> > > > @@ -854,6 +854,50 @@ out:
> > > >      g_free(gen_node_name);
> > > >  }
> > > >  
> > > > +BdrvLockfCmd bdrv_get_locking_cmd(int flags)
> > > > +{
> > > > +    if (flags & BDRV_O_NO_LOCK) {
> > > > +        return BDRV_LOCKF_UNLOCK;
> > > > +    } else if (flags & BDRV_O_SHARED_LOCK) {
> > > > +        return BDRV_LOCKF_SHARED;
> > > > +    } else if (flags & BDRV_O_RDWR) {
> > > > +        return BDRV_LOCKF_EXCLUSIVE;
> > > > +    } else {
> > > > +        return BDRV_LOCKF_SHARED;
> > > > +    }
> > > > +}
> > > 
> > > It feels a bit counterintuitive to use the very same operation for
> > > "start of critical section, but don't actually lock" and "end of
> > > critical section".
> > > 
> > > Is there a specific reason why you chose this instead of separate
> > > .bdrv_lock/bdrv_unlock callbacks?
> > 
> > Because unlike open(2)/close(2), locking and unlocking are typically
> > implemented with one parameterized operation (fcntl(2)) underneath, I followed
> > that naturally.
> 
> Is it really "typically", or is this an idiosyncrasy of POSIX and we end
> up modeling our interface for one particular block driver even though
> for others a different interface might be preferable? Did you look at
> more interfaces?

Yes I did.

RBD has rbd_lock_exclusive, rbd_lock_shared and rbd_unlock; Windows have
LockFile and UnlockFile.

But I was also thinking about flock(2), lockf(3) and glfs_posix_lock(). None of
them decided to use a different interface from posix, and I really have had no
personal preference on this.

Either way, we'd have to adapt one style to another by the day when we support
locking in more drivers. At this point the one first implementation, raw-posix,
fits fine.

Back to your intuition concern, I think there is no "start critical section,
but don't actually lock" semantic in this series, the block layer won't call
the driver function unless locking is needed.

> 
> Apart from that, we generally make our internal interfaces so that they
> are easy to use and code is easy to read rather than directly mapping
> POSIX everywhere. For example, our I/O function never do short
> reads/writes even though POSIX does that.

This is kinda of personal preference but as I am not insisting on either side,
we can go the other way if you don't like it.

Fam

[v6,04/22] block: Introduce image file locking

Commit Message

Comments

Patch