Message ID | 1464943756-14143-5-git-send-email-famz@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Jun 3, 2016 at 4:48 AM, Fam Zheng <famz@redhat.com> wrote: > +typedef enum { > + /* The values are ordered so that lower number implies higher restriction. > + * Starting from 1 to make 0 an invalid value. > + * */ > + BDRV_LOCKF_EXCLUSIVE = 1, > + BDRV_LOCKF_SHARED, > + BDRV_LOCKF_UNLOCK, > +} BdrvLockfCmd; > + We started to talk about new APIs in librbd to support this feature where we don't need to worry about admin action should QEMU crash while holding the lock. Any chance for separating the UNLOCK enum into the exclusive vs shared case? We could do some magic in the rbd block driver to guess how it was locked but it seems like it would be cleaner (at least for us) to explicitly call out what type of unlock you are requesting since it will involve different API methods.
On Tue, 06/07 21:51, Jason Dillaman wrote: > On Fri, Jun 3, 2016 at 4:48 AM, Fam Zheng <famz@redhat.com> wrote: > > +typedef enum { > > + /* The values are ordered so that lower number implies higher restriction. > > + * Starting from 1 to make 0 an invalid value. > > + * */ > > + BDRV_LOCKF_EXCLUSIVE = 1, > > + BDRV_LOCKF_SHARED, > > + BDRV_LOCKF_UNLOCK, > > +} BdrvLockfCmd; > > + > > We started to talk about new APIs in librbd to support this feature > where we don't need to worry about admin action should QEMU crash > while holding the lock. > > Any chance for separating the UNLOCK enum into the exclusive vs shared > case? We could do some magic in the rbd block driver to guess how it > was locked but it seems like it would be cleaner (at least for us) to > explicitly call out what type of unlock you are requesting since it > will involve different API methods. This should be possible but I'm not sure I fully understand the rationale behind it. The server side who implements the lock and keeps track of states should have the lock type information already, why is it necessary for the client to be explicit? It doesn't sound necessary to me at all from an interface point of view. Can you elaborate more on the API methods that need this? Fam
Am 03.06.2016 um 10:48 hat Fam Zheng geschrieben: > Block drivers can implement this new operation .bdrv_lockf to actually lock the > image in the protocol specific way. > > Signed-off-by: Fam Zheng <famz@redhat.com> > --- > block.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++ > include/block/block.h | 11 ++++++++- > include/block/block_int.h | 5 +++++ > 3 files changed, 72 insertions(+), 1 deletion(-) > > diff --git a/block.c b/block.c > index 736432f..4c2a3ff 100644 > --- a/block.c > +++ b/block.c > @@ -854,6 +854,50 @@ out: > g_free(gen_node_name); > } > > +BdrvLockfCmd bdrv_get_locking_cmd(int flags) > +{ > + if (flags & BDRV_O_NO_LOCK) { > + return BDRV_LOCKF_UNLOCK; > + } else if (flags & BDRV_O_SHARED_LOCK) { > + return BDRV_LOCKF_SHARED; > + } else if (flags & BDRV_O_RDWR) { > + return BDRV_LOCKF_EXCLUSIVE; > + } else { > + return BDRV_LOCKF_SHARED; > + } > +} It feels a bit counterintuitive to use the very same operation for "start of critical section, but don't actually lock" and "end of critical section". Is there a specific reason why you chose this instead of separate .bdrv_lock/bdrv_unlock callbacks? > +static int bdrv_lock_unlock_image_do(BlockDriverState *bs, BdrvLockfCmd cmd) > +{ > + int ret; > + > + if (bs->cur_lock == cmd) { > + return 0; > + } else if (!bs->drv) { > + return -ENOMEDIUM; > + } else if (!bs->drv->bdrv_lockf) { > + return 0; > + } > + ret = bs->drv->bdrv_lockf(bs, cmd); > + if (ret == -ENOTSUP) { > + /* Handle it the same way as !bs->drv->bdrv_lockf */ > + ret = 0; > + } else if (ret == 0) { > + bs->cur_lock = cmd; > + } > + return ret; > +} Okay, so the callback is supposed to support going from exclusive to shared and vice versa? Noted for the next patches. > +static int bdrv_lock_image(BlockDriverState *bs) > +{ > + return bdrv_lock_unlock_image_do(bs, bdrv_get_locking_cmd(bs->open_flags)); > +} > + > +static int bdrv_unlock_image(BlockDriverState *bs) > +{ > + return bdrv_lock_unlock_image_do(bs, BDRV_LOCKF_UNLOCK); > +} > + > static QemuOptsList bdrv_runtime_opts = { > .name = "bdrv_common", > .head = QTAILQ_HEAD_INITIALIZER(bdrv_runtime_opts.head), > @@ -1003,6 +1047,14 @@ static int bdrv_open_common(BlockDriverState *bs, BdrvChild *file, > goto free_and_fail; > } > > + if (!(open_flags & (BDRV_O_NO_LOCK | BDRV_O_INACTIVE))) { > + ret = bdrv_lock_image(bs); > + if (ret) { > + error_setg(errp, "Failed to lock image"); > + goto free_and_fail; > + } > + } > + > ret = refresh_total_sectors(bs, bs->total_sectors); > if (ret < 0) { > error_setg_errno(errp, -ret, "Could not refresh total sector count"); > @@ -2164,6 +2216,7 @@ static void bdrv_close(BlockDriverState *bs) > if (bs->drv) { > BdrvChild *child, *next; > > + bdrv_unlock_image(bs); > bs->drv->bdrv_close(bs); > bs->drv = NULL; > > @@ -3187,6 +3240,9 @@ void bdrv_invalidate_cache(BlockDriverState *bs, Error **errp) > error_setg_errno(errp, -ret, "Could not refresh total sector count"); > return; > } > + if (!(bs->open_flags & BDRV_O_NO_LOCK)) { > + bdrv_lock_image(bs); > + } > } I think the if is unnecessary, bdrv_lock_image() already looks at BDRV_O_NO_LOCK. > void bdrv_invalidate_cache_all(Error **errp) > @@ -3229,6 +3285,7 @@ static int bdrv_inactivate_recurse(BlockDriverState *bs, > } > > if (setting_flag) { > + ret = bdrv_unlock_image(bs); > bs->open_flags |= BDRV_O_INACTIVE; > } > return 0; Kevin
On Fri, 06/17 13:34, Kevin Wolf wrote: > Am 03.06.2016 um 10:48 hat Fam Zheng geschrieben: > > Block drivers can implement this new operation .bdrv_lockf to actually lock the > > image in the protocol specific way. > > > > Signed-off-by: Fam Zheng <famz@redhat.com> > > --- > > block.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++ > > include/block/block.h | 11 ++++++++- > > include/block/block_int.h | 5 +++++ > > 3 files changed, 72 insertions(+), 1 deletion(-) > > > > diff --git a/block.c b/block.c > > index 736432f..4c2a3ff 100644 > > --- a/block.c > > +++ b/block.c > > @@ -854,6 +854,50 @@ out: > > g_free(gen_node_name); > > } > > > > +BdrvLockfCmd bdrv_get_locking_cmd(int flags) > > +{ > > + if (flags & BDRV_O_NO_LOCK) { > > + return BDRV_LOCKF_UNLOCK; > > + } else if (flags & BDRV_O_SHARED_LOCK) { > > + return BDRV_LOCKF_SHARED; > > + } else if (flags & BDRV_O_RDWR) { > > + return BDRV_LOCKF_EXCLUSIVE; > > + } else { > > + return BDRV_LOCKF_SHARED; > > + } > > +} > > It feels a bit counterintuitive to use the very same operation for > "start of critical section, but don't actually lock" and "end of > critical section". > > Is there a specific reason why you chose this instead of separate > .bdrv_lock/bdrv_unlock callbacks? Because unlike open(2)/close(2), locking and unlocking are typically implemented with one parameterized operation (fcntl(2)) underneath, I followed that naturally. > > > +static int bdrv_lock_unlock_image_do(BlockDriverState *bs, BdrvLockfCmd cmd) > > +{ > > + int ret; > > + > > + if (bs->cur_lock == cmd) { > > + return 0; > > + } else if (!bs->drv) { > > + return -ENOMEDIUM; > > + } else if (!bs->drv->bdrv_lockf) { > > + return 0; > > + } > > + ret = bs->drv->bdrv_lockf(bs, cmd); > > + if (ret == -ENOTSUP) { > > + /* Handle it the same way as !bs->drv->bdrv_lockf */ > > + ret = 0; > > + } else if (ret == 0) { > > + bs->cur_lock = cmd; > > + } > > + return ret; > > +} > > Okay, so the callback is supposed to support going from exclusive to > shared and vice versa? Noted for the next patches. Yes. > > > +static int bdrv_lock_image(BlockDriverState *bs) > > +{ > > + return bdrv_lock_unlock_image_do(bs, bdrv_get_locking_cmd(bs->open_flags)); > > +} > > + > > +static int bdrv_unlock_image(BlockDriverState *bs) > > +{ > > + return bdrv_lock_unlock_image_do(bs, BDRV_LOCKF_UNLOCK); > > +} > > + > > static QemuOptsList bdrv_runtime_opts = { > > .name = "bdrv_common", > > .head = QTAILQ_HEAD_INITIALIZER(bdrv_runtime_opts.head), > > @@ -1003,6 +1047,14 @@ static int bdrv_open_common(BlockDriverState *bs, BdrvChild *file, > > goto free_and_fail; > > } > > > > + if (!(open_flags & (BDRV_O_NO_LOCK | BDRV_O_INACTIVE))) { > > + ret = bdrv_lock_image(bs); > > + if (ret) { > > + error_setg(errp, "Failed to lock image"); > > + goto free_and_fail; > > + } > > + } > > + > > ret = refresh_total_sectors(bs, bs->total_sectors); > > if (ret < 0) { > > error_setg_errno(errp, -ret, "Could not refresh total sector count"); > > @@ -2164,6 +2216,7 @@ static void bdrv_close(BlockDriverState *bs) > > if (bs->drv) { > > BdrvChild *child, *next; > > > > + bdrv_unlock_image(bs); > > bs->drv->bdrv_close(bs); > > bs->drv = NULL; > > > > @@ -3187,6 +3240,9 @@ void bdrv_invalidate_cache(BlockDriverState *bs, Error **errp) > > error_setg_errno(errp, -ret, "Could not refresh total sector count"); > > return; > > } > > + if (!(bs->open_flags & BDRV_O_NO_LOCK)) { > > + bdrv_lock_image(bs); > > + } > > } > > I think the if is unnecessary, bdrv_lock_image() already looks at > BDRV_O_NO_LOCK. I intentionally made enum BDRV_O_LOCK_* start from non-zero, so bdrv_lock_image will call drv->bdrv_lockf even for BDRV_O_NO_LOCK. In the case of lock-mode=off, I think we should skip that. > > > void bdrv_invalidate_cache_all(Error **errp) > > @@ -3229,6 +3285,7 @@ static int bdrv_inactivate_recurse(BlockDriverState *bs, > > } > > > > if (setting_flag) { > > + ret = bdrv_unlock_image(bs); > > bs->open_flags |= BDRV_O_INACTIVE; > > } > > return 0; > > Kevin Fam
Am 22.06.2016 um 09:23 hat Fam Zheng geschrieben: > On Fri, 06/17 13:34, Kevin Wolf wrote: > > Am 03.06.2016 um 10:48 hat Fam Zheng geschrieben: > > > Block drivers can implement this new operation .bdrv_lockf to actually lock the > > > image in the protocol specific way. > > > > > > Signed-off-by: Fam Zheng <famz@redhat.com> > > > --- > > > block.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++ > > > include/block/block.h | 11 ++++++++- > > > include/block/block_int.h | 5 +++++ > > > 3 files changed, 72 insertions(+), 1 deletion(-) > > > > > > diff --git a/block.c b/block.c > > > index 736432f..4c2a3ff 100644 > > > --- a/block.c > > > +++ b/block.c > > > @@ -854,6 +854,50 @@ out: > > > g_free(gen_node_name); > > > } > > > > > > +BdrvLockfCmd bdrv_get_locking_cmd(int flags) > > > +{ > > > + if (flags & BDRV_O_NO_LOCK) { > > > + return BDRV_LOCKF_UNLOCK; > > > + } else if (flags & BDRV_O_SHARED_LOCK) { > > > + return BDRV_LOCKF_SHARED; > > > + } else if (flags & BDRV_O_RDWR) { > > > + return BDRV_LOCKF_EXCLUSIVE; > > > + } else { > > > + return BDRV_LOCKF_SHARED; > > > + } > > > +} > > > > It feels a bit counterintuitive to use the very same operation for > > "start of critical section, but don't actually lock" and "end of > > critical section". > > > > Is there a specific reason why you chose this instead of separate > > .bdrv_lock/bdrv_unlock callbacks? > > Because unlike open(2)/close(2), locking and unlocking are typically > implemented with one parameterized operation (fcntl(2)) underneath, I followed > that naturally. Is it really "typically", or is this an idiosyncrasy of POSIX and we end up modeling our interface for one particular block driver even though for others a different interface might be preferable? Did you look at more interfaces? Apart from that, we generally make our internal interfaces so that they are easy to use and code is easy to read rather than directly mapping POSIX everywhere. For example, our I/O function never do short reads/writes even though POSIX does that. > > > @@ -3187,6 +3240,9 @@ void bdrv_invalidate_cache(BlockDriverState *bs, Error **errp) > > > error_setg_errno(errp, -ret, "Could not refresh total sector count"); > > > return; > > > } > > > + if (!(bs->open_flags & BDRV_O_NO_LOCK)) { > > > + bdrv_lock_image(bs); > > > + } > > > } > > > > I think the if is unnecessary, bdrv_lock_image() already looks at > > BDRV_O_NO_LOCK. > > I intentionally made enum BDRV_O_LOCK_* start from non-zero, so bdrv_lock_image > will call drv->bdrv_lockf even for BDRV_O_NO_LOCK. In the case of > lock-mode=off, I think we should skip that. Fair enough. Kevin
On Wed, 06/22 10:30, Kevin Wolf wrote: > Am 22.06.2016 um 09:23 hat Fam Zheng geschrieben: > > On Fri, 06/17 13:34, Kevin Wolf wrote: > > > Am 03.06.2016 um 10:48 hat Fam Zheng geschrieben: > > > > Block drivers can implement this new operation .bdrv_lockf to actually lock the > > > > image in the protocol specific way. > > > > > > > > Signed-off-by: Fam Zheng <famz@redhat.com> > > > > --- > > > > block.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++ > > > > include/block/block.h | 11 ++++++++- > > > > include/block/block_int.h | 5 +++++ > > > > 3 files changed, 72 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/block.c b/block.c > > > > index 736432f..4c2a3ff 100644 > > > > --- a/block.c > > > > +++ b/block.c > > > > @@ -854,6 +854,50 @@ out: > > > > g_free(gen_node_name); > > > > } > > > > > > > > +BdrvLockfCmd bdrv_get_locking_cmd(int flags) > > > > +{ > > > > + if (flags & BDRV_O_NO_LOCK) { > > > > + return BDRV_LOCKF_UNLOCK; > > > > + } else if (flags & BDRV_O_SHARED_LOCK) { > > > > + return BDRV_LOCKF_SHARED; > > > > + } else if (flags & BDRV_O_RDWR) { > > > > + return BDRV_LOCKF_EXCLUSIVE; > > > > + } else { > > > > + return BDRV_LOCKF_SHARED; > > > > + } > > > > +} > > > > > > It feels a bit counterintuitive to use the very same operation for > > > "start of critical section, but don't actually lock" and "end of > > > critical section". > > > > > > Is there a specific reason why you chose this instead of separate > > > .bdrv_lock/bdrv_unlock callbacks? > > > > Because unlike open(2)/close(2), locking and unlocking are typically > > implemented with one parameterized operation (fcntl(2)) underneath, I followed > > that naturally. > > Is it really "typically", or is this an idiosyncrasy of POSIX and we end > up modeling our interface for one particular block driver even though > for others a different interface might be preferable? Did you look at > more interfaces? Yes I did. RBD has rbd_lock_exclusive, rbd_lock_shared and rbd_unlock; Windows have LockFile and UnlockFile. But I was also thinking about flock(2), lockf(3) and glfs_posix_lock(). None of them decided to use a different interface from posix, and I really have had no personal preference on this. Either way, we'd have to adapt one style to another by the day when we support locking in more drivers. At this point the one first implementation, raw-posix, fits fine. Back to your intuition concern, I think there is no "start critical section, but don't actually lock" semantic in this series, the block layer won't call the driver function unless locking is needed. > > Apart from that, we generally make our internal interfaces so that they > are easy to use and code is easy to read rather than directly mapping > POSIX everywhere. For example, our I/O function never do short > reads/writes even though POSIX does that. This is kinda of personal preference but as I am not insisting on either side, we can go the other way if you don't like it. Fam
diff --git a/block.c b/block.c index 736432f..4c2a3ff 100644 --- a/block.c +++ b/block.c @@ -854,6 +854,50 @@ out: g_free(gen_node_name); } +BdrvLockfCmd bdrv_get_locking_cmd(int flags) +{ + if (flags & BDRV_O_NO_LOCK) { + return BDRV_LOCKF_UNLOCK; + } else if (flags & BDRV_O_SHARED_LOCK) { + return BDRV_LOCKF_SHARED; + } else if (flags & BDRV_O_RDWR) { + return BDRV_LOCKF_EXCLUSIVE; + } else { + return BDRV_LOCKF_SHARED; + } +} + +static int bdrv_lock_unlock_image_do(BlockDriverState *bs, BdrvLockfCmd cmd) +{ + int ret; + + if (bs->cur_lock == cmd) { + return 0; + } else if (!bs->drv) { + return -ENOMEDIUM; + } else if (!bs->drv->bdrv_lockf) { + return 0; + } + ret = bs->drv->bdrv_lockf(bs, cmd); + if (ret == -ENOTSUP) { + /* Handle it the same way as !bs->drv->bdrv_lockf */ + ret = 0; + } else if (ret == 0) { + bs->cur_lock = cmd; + } + return ret; +} + +static int bdrv_lock_image(BlockDriverState *bs) +{ + return bdrv_lock_unlock_image_do(bs, bdrv_get_locking_cmd(bs->open_flags)); +} + +static int bdrv_unlock_image(BlockDriverState *bs) +{ + return bdrv_lock_unlock_image_do(bs, BDRV_LOCKF_UNLOCK); +} + static QemuOptsList bdrv_runtime_opts = { .name = "bdrv_common", .head = QTAILQ_HEAD_INITIALIZER(bdrv_runtime_opts.head), @@ -1003,6 +1047,14 @@ static int bdrv_open_common(BlockDriverState *bs, BdrvChild *file, goto free_and_fail; } + if (!(open_flags & (BDRV_O_NO_LOCK | BDRV_O_INACTIVE))) { + ret = bdrv_lock_image(bs); + if (ret) { + error_setg(errp, "Failed to lock image"); + goto free_and_fail; + } + } + ret = refresh_total_sectors(bs, bs->total_sectors); if (ret < 0) { error_setg_errno(errp, -ret, "Could not refresh total sector count"); @@ -2164,6 +2216,7 @@ static void bdrv_close(BlockDriverState *bs) if (bs->drv) { BdrvChild *child, *next; + bdrv_unlock_image(bs); bs->drv->bdrv_close(bs); bs->drv = NULL; @@ -3187,6 +3240,9 @@ void bdrv_invalidate_cache(BlockDriverState *bs, Error **errp) error_setg_errno(errp, -ret, "Could not refresh total sector count"); return; } + if (!(bs->open_flags & BDRV_O_NO_LOCK)) { + bdrv_lock_image(bs); + } } void bdrv_invalidate_cache_all(Error **errp) @@ -3229,6 +3285,7 @@ static int bdrv_inactivate_recurse(BlockDriverState *bs, } if (setting_flag) { + ret = bdrv_unlock_image(bs); bs->open_flags |= BDRV_O_INACTIVE; } return 0; diff --git a/include/block/block.h b/include/block/block.h index 02b598b..782ef2f 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -183,6 +183,15 @@ typedef enum BlockOpType { BLOCK_OP_TYPE_MAX, } BlockOpType; +typedef enum { + /* The values are ordered so that lower number implies higher restriction. + * Starting from 1 to make 0 an invalid value. + * */ + BDRV_LOCKF_EXCLUSIVE = 1, + BDRV_LOCKF_SHARED, + BDRV_LOCKF_UNLOCK, +} BdrvLockfCmd; + void bdrv_info_print(Monitor *mon, const QObject *data); void bdrv_info(Monitor *mon, QObject **ret_data); void bdrv_stats_print(Monitor *mon, const QObject *data); @@ -274,7 +283,7 @@ int bdrv_drop_intermediate(BlockDriverState *active, BlockDriverState *top, BlockDriverState *bdrv_find_overlay(BlockDriverState *active, BlockDriverState *bs); BlockDriverState *bdrv_find_base(BlockDriverState *bs); - +BdrvLockfCmd bdrv_get_locking_cmd(int flags); typedef struct BdrvCheckResult { int corruptions; diff --git a/include/block/block_int.h b/include/block/block_int.h index 30a9717..29cc632 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -317,6 +317,10 @@ struct BlockDriver { Error **errp); void (*bdrv_del_child)(BlockDriverState *parent, BdrvChild *child, Error **errp); + /** + * Lock/unlock the image. + */ + int (*bdrv_lockf)(BlockDriverState *bs, BdrvLockfCmd cmd); QLIST_ENTRY(BlockDriver) list; }; @@ -500,6 +504,7 @@ struct BlockDriverState { unsigned io_plug_disabled; int quiesce_counter; + BdrvLockfCmd cur_lock; }; struct BlockBackendRootState {
Block drivers can implement this new operation .bdrv_lockf to actually lock the image in the protocol specific way. Signed-off-by: Fam Zheng <famz@redhat.com> --- block.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++ include/block/block.h | 11 ++++++++- include/block/block_int.h | 5 +++++ 3 files changed, 72 insertions(+), 1 deletion(-)