Message ID | 1460690887-32751-6-git-send-email-famz@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 04/15/2016 06:27 AM, Fam Zheng wrote: > virtlockd in libvirt locks the first byte, we lock byte 1 to avoid > the intervene. > > Suggested-by: "Daniel P. Berrange" <berrange@redhat.com> > Signed-off-by: Fam Zheng <famz@redhat.com> > --- > block/raw-posix.c | 35 +++++++++++++++++++++++++++++++++++ > 1 file changed, 35 insertions(+) > > diff --git a/block/raw-posix.c b/block/raw-posix.c > index 906d5c9..3a2c17f 100644 > --- a/block/raw-posix.c > +++ b/block/raw-posix.c > @@ -35,6 +35,7 @@ > #include "raw-aio.h" > #include "qapi/util.h" > #include "qapi/qmp/qstring.h" > +#include "glib.h" > > #if defined(__APPLE__) && (__MACH__) > #include <paths.h> > @@ -397,6 +398,38 @@ static void raw_attach_aio_context(BlockDriverState *bs, > #endif > } > > +static int raw_lockf(BlockDriverState *bs, BdrvLockfCmd cmd) > +{ > + > + BDRVRawState *s = bs->opaque; > + int ret; > + struct flock fl = (struct flock) { > + .l_whence = SEEK_SET, > + /* Locking byte 1 avoids interfereing with virtlockd. */ > + .l_start = 1, > + .l_len = 1, > + }; > + > + switch (cmd) { > + case BDRV_LOCKF_RWLOCK: > + fl.l_type = F_WRLCK; > + break; > + case BDRV_LOCKF_ROLOCK: > + fl.l_type = F_RDLCK; > + break; > + case BDRV_LOCKF_UNLOCK: > + fl.l_type = F_UNLCK; > + break; > + default: > + abort(); > + } > + ret = fcntl(s->fd, F_SETLK, &fl); > + if (ret) { > + ret = -errno; > + } > + return ret; > +} > + > #ifdef CONFIG_LINUX_AIO > static int raw_set_aio(void **aio_ctx, int *use_aio, int bdrv_flags) > { > @@ -1960,6 +1993,8 @@ BlockDriver bdrv_file = { > .bdrv_detach_aio_context = raw_detach_aio_context, > .bdrv_attach_aio_context = raw_attach_aio_context, > > + .bdrv_lockf = raw_lockf, > + > .create_opts = &raw_create_opts, > }; > would it be better to use int flock(int fd, int operation); DESCRIPTION Apply or remove an advisory lock on the open file specified by fd. The argument operation is one of the following: LOCK_SH Place a shared lock. More than one process may hold a shared lock for a given file at a given time. LOCK_EX Place an exclusive lock. Only one process may hold an exclusive lock for a given file at a given time. LOCK_UN Remove an existing lock held by this process. for this purpose? Sorry that missed this point in the initial review... We will not intersect with libvirt for the case. Den
On Fri, Apr 15, 2016 at 11:27:55AM +0800, Fam Zheng wrote: > virtlockd in libvirt locks the first byte, we lock byte 1 to avoid > the intervene. > +static int raw_lockf(BlockDriverState *bs, BdrvLockfCmd cmd) > +{ > + > + BDRVRawState *s = bs->opaque; > + int ret; > + struct flock fl = (struct flock) { > + .l_whence = SEEK_SET, > + /* Locking byte 1 avoids interfereing with virtlockd. */ > + .l_start = 1, > + .l_len = 1, > + }; > + > + switch (cmd) { > + case BDRV_LOCKF_RWLOCK: > + fl.l_type = F_WRLCK; > + break; > + case BDRV_LOCKF_ROLOCK: > + fl.l_type = F_RDLCK; > + break; > + case BDRV_LOCKF_UNLOCK: > + fl.l_type = F_UNLCK; > + break; My understanding is this prevents libguestfs from working on live disk images -- we want to be able to read live disk images (using a writable overlay and the real disk image as a read-only backing file). So please don't do this; or suggest how we can ignore the lock for backing files. Rich.
On Sun, 04/17 20:27, Richard W.M. Jones wrote: > On Fri, Apr 15, 2016 at 11:27:55AM +0800, Fam Zheng wrote: > > virtlockd in libvirt locks the first byte, we lock byte 1 to avoid > > the intervene. > > +static int raw_lockf(BlockDriverState *bs, BdrvLockfCmd cmd) > > +{ > > + > > + BDRVRawState *s = bs->opaque; > > + int ret; > > + struct flock fl = (struct flock) { > > + .l_whence = SEEK_SET, > > + /* Locking byte 1 avoids interfereing with virtlockd. */ > > + .l_start = 1, > > + .l_len = 1, > > + }; > > + > > + switch (cmd) { > > + case BDRV_LOCKF_RWLOCK: > > + fl.l_type = F_WRLCK; > > + break; > > + case BDRV_LOCKF_ROLOCK: > > + fl.l_type = F_RDLCK; > > + break; > > + case BDRV_LOCKF_UNLOCK: > > + fl.l_type = F_UNLCK; > > + break; > > My understanding is this prevents libguestfs from working on live disk > images -- we want to be able to read live disk images (using a > writable overlay and the real disk image as a read-only backing file). Do you lock the live image or the backing file? If not, you can just read/write as before, as what the -L option does in this series. Otherwise, you should use an RO lock on the backing image, which still works. Fam
On Sat, 04/16 16:29, Denis V. Lunev wrote: > On 04/15/2016 06:27 AM, Fam Zheng wrote: > >virtlockd in libvirt locks the first byte, we lock byte 1 to avoid > >the intervene. > > > >Suggested-by: "Daniel P. Berrange" <berrange@redhat.com> > >Signed-off-by: Fam Zheng <famz@redhat.com> > >--- > > block/raw-posix.c | 35 +++++++++++++++++++++++++++++++++++ > > 1 file changed, 35 insertions(+) > > > >diff --git a/block/raw-posix.c b/block/raw-posix.c > >index 906d5c9..3a2c17f 100644 > >--- a/block/raw-posix.c > >+++ b/block/raw-posix.c > >@@ -35,6 +35,7 @@ > > #include "raw-aio.h" > > #include "qapi/util.h" > > #include "qapi/qmp/qstring.h" > >+#include "glib.h" > > #if defined(__APPLE__) && (__MACH__) > > #include <paths.h> > >@@ -397,6 +398,38 @@ static void raw_attach_aio_context(BlockDriverState *bs, > > #endif > > } > >+static int raw_lockf(BlockDriverState *bs, BdrvLockfCmd cmd) > >+{ > >+ > >+ BDRVRawState *s = bs->opaque; > >+ int ret; > >+ struct flock fl = (struct flock) { > >+ .l_whence = SEEK_SET, > >+ /* Locking byte 1 avoids interfereing with virtlockd. */ > >+ .l_start = 1, > >+ .l_len = 1, > >+ }; > >+ > >+ switch (cmd) { > >+ case BDRV_LOCKF_RWLOCK: > >+ fl.l_type = F_WRLCK; > >+ break; > >+ case BDRV_LOCKF_ROLOCK: > >+ fl.l_type = F_RDLCK; > >+ break; > >+ case BDRV_LOCKF_UNLOCK: > >+ fl.l_type = F_UNLCK; > >+ break; > >+ default: > >+ abort(); > >+ } > >+ ret = fcntl(s->fd, F_SETLK, &fl); > >+ if (ret) { > >+ ret = -errno; > >+ } > >+ return ret; > >+} > >+ > > #ifdef CONFIG_LINUX_AIO > > static int raw_set_aio(void **aio_ctx, int *use_aio, int bdrv_flags) > > { > >@@ -1960,6 +1993,8 @@ BlockDriver bdrv_file = { > > .bdrv_detach_aio_context = raw_detach_aio_context, > > .bdrv_attach_aio_context = raw_attach_aio_context, > >+ .bdrv_lockf = raw_lockf, > >+ > > .create_opts = &raw_create_opts, > > }; > would it be better to use > > int flock(int fd, int operation); > > DESCRIPTION > Apply or remove an advisory lock on the open file specified by fd. > The > argument operation is one of the following: > > LOCK_SH Place a shared lock. More than one process may hold > a > shared lock for a given file at a given time. > > LOCK_EX Place an exclusive lock. Only one process may hold > an > exclusive lock for a given file at a given time. > > LOCK_UN Remove an existing lock held by this process. > > for this purpose? > > Sorry that missed this point in the initial review... > We will not intersect with libvirt for the case. > As noted in the cover letter, flock() is nop on NFS mount points on Linux, so fcntl is safer. Fam
On 04/18/2016 04:12 AM, Fam Zheng wrote: > On Sat, 04/16 16:29, Denis V. Lunev wrote: >> On 04/15/2016 06:27 AM, Fam Zheng wrote: >>> virtlockd in libvirt locks the first byte, we lock byte 1 to avoid >>> the intervene. >>> >>> Suggested-by: "Daniel P. Berrange" <berrange@redhat.com> >>> Signed-off-by: Fam Zheng <famz@redhat.com> >>> --- >>> block/raw-posix.c | 35 +++++++++++++++++++++++++++++++++++ >>> 1 file changed, 35 insertions(+) >>> >>> diff --git a/block/raw-posix.c b/block/raw-posix.c >>> index 906d5c9..3a2c17f 100644 >>> --- a/block/raw-posix.c >>> +++ b/block/raw-posix.c >>> @@ -35,6 +35,7 @@ >>> #include "raw-aio.h" >>> #include "qapi/util.h" >>> #include "qapi/qmp/qstring.h" >>> +#include "glib.h" >>> #if defined(__APPLE__) && (__MACH__) >>> #include <paths.h> >>> @@ -397,6 +398,38 @@ static void raw_attach_aio_context(BlockDriverState *bs, >>> #endif >>> } >>> +static int raw_lockf(BlockDriverState *bs, BdrvLockfCmd cmd) >>> +{ >>> + >>> + BDRVRawState *s = bs->opaque; >>> + int ret; >>> + struct flock fl = (struct flock) { >>> + .l_whence = SEEK_SET, >>> + /* Locking byte 1 avoids interfereing with virtlockd. */ >>> + .l_start = 1, >>> + .l_len = 1, >>> + }; >>> + >>> + switch (cmd) { >>> + case BDRV_LOCKF_RWLOCK: >>> + fl.l_type = F_WRLCK; >>> + break; >>> + case BDRV_LOCKF_ROLOCK: >>> + fl.l_type = F_RDLCK; >>> + break; >>> + case BDRV_LOCKF_UNLOCK: >>> + fl.l_type = F_UNLCK; >>> + break; >>> + default: >>> + abort(); >>> + } >>> + ret = fcntl(s->fd, F_SETLK, &fl); >>> + if (ret) { >>> + ret = -errno; >>> + } >>> + return ret; >>> +} >>> + >>> #ifdef CONFIG_LINUX_AIO >>> static int raw_set_aio(void **aio_ctx, int *use_aio, int bdrv_flags) >>> { >>> @@ -1960,6 +1993,8 @@ BlockDriver bdrv_file = { >>> .bdrv_detach_aio_context = raw_detach_aio_context, >>> .bdrv_attach_aio_context = raw_attach_aio_context, >>> + .bdrv_lockf = raw_lockf, >>> + >>> .create_opts = &raw_create_opts, >>> }; >> would it be better to use >> >> int flock(int fd, int operation); >> >> DESCRIPTION >> Apply or remove an advisory lock on the open file specified by fd. >> The >> argument operation is one of the following: >> >> LOCK_SH Place a shared lock. More than one process may hold >> a >> shared lock for a given file at a given time. >> >> LOCK_EX Place an exclusive lock. Only one process may hold >> an >> exclusive lock for a given file at a given time. >> >> LOCK_UN Remove an existing lock held by this process. >> >> for this purpose? >> >> Sorry that missed this point in the initial review... >> We will not intersect with libvirt for the case. >> > As noted in the cover letter, flock() is nop on NFS mount points on Linux, so > fcntl is safer. > > Fam > that seems STRANGE to me. For the time being flock() was working for Linux NFS server. Here is the implementation. int nfs_flock(struct file *filp, int cmd, struct file_lock *fl) { struct inode *inode = filp->f_mapping->host; int is_local = 0; dprintk("NFS: flock(%pD2, t=%x, fl=%x)\n", filp, fl->fl_type, fl->fl_flags); if (!(fl->fl_flags & FL_FLOCK)) return -ENOLCK; /* * The NFSv4 protocol doesn't support LOCK_MAND, which is not part of * any standard. In principle we might be able to support LOCK_MAND * on NFSv2/3 since NLMv3/4 support DOS share modes, but for now the * NFS code is not set up for it. */ if (fl->fl_type & LOCK_MAND) return -EINVAL; if (NFS_SERVER(inode)->flags & NFS_MOUNT_LOCAL_FLOCK) is_local = 1; /* We're simulating flock() locks using posix locks on the server */ if (fl->fl_type == F_UNLCK) return do_unlk(filp, cmd, fl, is_local); return do_setlk(filp, cmd, fl, is_local); } Den
On Mon, Apr 18, 2016 at 09:10:36AM +0800, Fam Zheng wrote: > On Sun, 04/17 20:27, Richard W.M. Jones wrote: > > On Fri, Apr 15, 2016 at 11:27:55AM +0800, Fam Zheng wrote: > > > virtlockd in libvirt locks the first byte, we lock byte 1 to avoid > > > the intervene. > > > +static int raw_lockf(BlockDriverState *bs, BdrvLockfCmd cmd) > > > +{ > > > + > > > + BDRVRawState *s = bs->opaque; > > > + int ret; > > > + struct flock fl = (struct flock) { > > > + .l_whence = SEEK_SET, > > > + /* Locking byte 1 avoids interfereing with virtlockd. */ > > > + .l_start = 1, > > > + .l_len = 1, > > > + }; > > > + > > > + switch (cmd) { > > > + case BDRV_LOCKF_RWLOCK: > > > + fl.l_type = F_WRLCK; > > > + break; > > > + case BDRV_LOCKF_ROLOCK: > > > + fl.l_type = F_RDLCK; > > > + break; > > > + case BDRV_LOCKF_UNLOCK: > > > + fl.l_type = F_UNLCK; > > > + break; > > > > My understanding is this prevents libguestfs from working on live disk > > images -- we want to be able to read live disk images (using a > > writable overlay and the real disk image as a read-only backing file). > > Do you lock the live image or the backing file? At the moment we don't need to do anything, but we do have the problem that if someone uses libguestfs in write mode on a live image, then obviously they end up with a corrupted guest. However in this email I'm talking about using libguestfs in "read-only" mode on a live guest, which is a completely different thing, and should not be prevented. My assumption [with this patch applied] is the live image (being live) is locked by some other qemu. Now libguestfs creates a temporary overlay with the live image as backing, using a command similar to: qemu-img create -f qcow2 -b live.img tmp-overlay.img and opens 'tmp-overlay.img'. But since the live image has an exclusive lock, the open will *fail*, and that is a problem. > If not, you can just read/write as before, as what the -L option > does in this series. Otherwise, you should use an RO lock on the > backing image, which still works. fcntl-locks don't allow a shared lock to share with an exclusive lock. Rich.
On Mon, Apr 18, 2016 at 09:12:44AM +0800, Fam Zheng wrote: > On Sat, 04/16 16:29, Denis V. Lunev wrote: > > On 04/15/2016 06:27 AM, Fam Zheng wrote: > > >virtlockd in libvirt locks the first byte, we lock byte 1 to avoid > > >the intervene. > > > > > >Suggested-by: "Daniel P. Berrange" <berrange@redhat.com> > > >Signed-off-by: Fam Zheng <famz@redhat.com> > > >--- > > > block/raw-posix.c | 35 +++++++++++++++++++++++++++++++++++ > > > 1 file changed, 35 insertions(+) > > >@@ -1960,6 +1993,8 @@ BlockDriver bdrv_file = { > > > .bdrv_detach_aio_context = raw_detach_aio_context, > > > .bdrv_attach_aio_context = raw_attach_aio_context, > > >+ .bdrv_lockf = raw_lockf, > > >+ > > > .create_opts = &raw_create_opts, > > > }; > > would it be better to use > > > > int flock(int fd, int operation); > > > > DESCRIPTION > > Apply or remove an advisory lock on the open file specified by fd. > > The > > argument operation is one of the following: > > > > LOCK_SH Place a shared lock. More than one process may hold > > a > > shared lock for a given file at a given time. > > > > LOCK_EX Place an exclusive lock. Only one process may hold > > an > > exclusive lock for a given file at a given time. > > > > LOCK_UN Remove an existing lock held by this process. > > > > for this purpose? > > > > Sorry that missed this point in the initial review... > > We will not intersect with libvirt for the case. > > As noted in the cover letter, flock() is nop on NFS mount points on Linux, so > fcntl is safer. Actually on Modern linux flock is implemented in terms of fcntl on Linux. Flock does not do byte-range locking so by using flock on NFS, IIUC you'll get a fcntl lock for the entire file range on the server, which will clash with the fcntl lock that virtlogd has acquired. On older linux flock is a no-op on NFS. Other UNIX have varying degrees of usable for flock on NFS. So in general fcntl is a better bet for NFS compatibility and will not clash with libvirt. The only minor issue is that some versions of OCFS do not support fcntl locks at all, only flock. Regards, Daniel
On 04/18/2016 12:34 PM, Daniel P. Berrange wrote: > On Mon, Apr 18, 2016 at 09:12:44AM +0800, Fam Zheng wrote: >> On Sat, 04/16 16:29, Denis V. Lunev wrote: >>> On 04/15/2016 06:27 AM, Fam Zheng wrote: >>>> virtlockd in libvirt locks the first byte, we lock byte 1 to avoid >>>> the intervene. >>>> >>>> Suggested-by: "Daniel P. Berrange" <berrange@redhat.com> >>>> Signed-off-by: Fam Zheng <famz@redhat.com> >>>> --- >>>> block/raw-posix.c | 35 +++++++++++++++++++++++++++++++++++ >>>> 1 file changed, 35 insertions(+) > >>>> @@ -1960,6 +1993,8 @@ BlockDriver bdrv_file = { >>>> .bdrv_detach_aio_context = raw_detach_aio_context, >>>> .bdrv_attach_aio_context = raw_attach_aio_context, >>>> + .bdrv_lockf = raw_lockf, >>>> + >>>> .create_opts = &raw_create_opts, >>>> }; >>> would it be better to use >>> >>> int flock(int fd, int operation); >>> >>> DESCRIPTION >>> Apply or remove an advisory lock on the open file specified by fd. >>> The >>> argument operation is one of the following: >>> >>> LOCK_SH Place a shared lock. More than one process may hold >>> a >>> shared lock for a given file at a given time. >>> >>> LOCK_EX Place an exclusive lock. Only one process may hold >>> an >>> exclusive lock for a given file at a given time. >>> >>> LOCK_UN Remove an existing lock held by this process. >>> >>> for this purpose? >>> >>> Sorry that missed this point in the initial review... >>> We will not intersect with libvirt for the case. >> As noted in the cover letter, flock() is nop on NFS mount points on Linux, so >> fcntl is safer. > Actually on Modern linux flock is implemented in terms of fcntl > on Linux. Flock does not do byte-range locking so by using > flock on NFS, IIUC you'll get a fcntl lock for the entire file > range on the server, which will clash with the fcntl lock that > virtlogd has acquired. On older linux flock is a no-op on NFS. > Other UNIX have varying degrees of usable for flock on NFS. So > in general fcntl is a better bet for NFS compatibility and will > not clash with libvirt. > > The only minor issue is that some versions of OCFS do not support > fcntl locks at all, only flock. > > Regards, > Daniel thank you for clarification this is not a bid deal at all :) This approach would work fine. Den
On Mon, 04/18 09:04, Richard W.M. Jones wrote: > On Mon, Apr 18, 2016 at 09:10:36AM +0800, Fam Zheng wrote: > > On Sun, 04/17 20:27, Richard W.M. Jones wrote: > > > On Fri, Apr 15, 2016 at 11:27:55AM +0800, Fam Zheng wrote: > > > > virtlockd in libvirt locks the first byte, we lock byte 1 to avoid > > > > the intervene. > > > > +static int raw_lockf(BlockDriverState *bs, BdrvLockfCmd cmd) > > > > +{ > > > > + > > > > + BDRVRawState *s = bs->opaque; > > > > + int ret; > > > > + struct flock fl = (struct flock) { > > > > + .l_whence = SEEK_SET, > > > > + /* Locking byte 1 avoids interfereing with virtlockd. */ > > > > + .l_start = 1, > > > > + .l_len = 1, > > > > + }; > > > > + > > > > + switch (cmd) { > > > > + case BDRV_LOCKF_RWLOCK: > > > > + fl.l_type = F_WRLCK; > > > > + break; > > > > + case BDRV_LOCKF_ROLOCK: > > > > + fl.l_type = F_RDLCK; > > > > + break; > > > > + case BDRV_LOCKF_UNLOCK: > > > > + fl.l_type = F_UNLCK; > > > > + break; > > > > > > My understanding is this prevents libguestfs from working on live disk > > > images -- we want to be able to read live disk images (using a > > > writable overlay and the real disk image as a read-only backing file). > > > > Do you lock the live image or the backing file? > > At the moment we don't need to do anything, but we do have the problem > that if someone uses libguestfs in write mode on a live image, then > obviously they end up with a corrupted guest. > > However in this email I'm talking about using libguestfs in > "read-only" mode on a live guest, which is a completely different > thing, and should not be prevented. > > My assumption [with this patch applied] is the live image (being live) > is locked by some other qemu. > > Now libguestfs creates a temporary overlay with the live image as > backing, using a command similar to: > > qemu-img create -f qcow2 -b live.img tmp-overlay.img > > and opens 'tmp-overlay.img'. But since the live image has an > exclusive lock, the open will *fail*, and that is a problem. I'm not sure that is what we want to support. The image is being read-write open by the VM, any other accessing is a bad idea. Fam > > > If not, you can just read/write as before, as what the -L option > > does in this series. Otherwise, you should use an RO lock on the > > backing image, which still works. > > fcntl-locks don't allow a shared lock to share with an exclusive lock. > > Rich. > > -- > Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones > Read my programming and virtualization blog: http://rwmj.wordpress.com > virt-df lists disk usage of guests without needing to install any > software inside the virtual machine. Supports Linux and Windows. > http://people.redhat.com/~rjones/virt-df/
On Tue, Apr 19, 2016 at 08:37:14PM +0800, Fam Zheng wrote: > On Mon, 04/18 09:04, Richard W.M. Jones wrote: > > On Mon, Apr 18, 2016 at 09:10:36AM +0800, Fam Zheng wrote: > > > On Sun, 04/17 20:27, Richard W.M. Jones wrote: > > > > On Fri, Apr 15, 2016 at 11:27:55AM +0800, Fam Zheng wrote: > > > > > virtlockd in libvirt locks the first byte, we lock byte 1 to avoid > > > > > the intervene. > > > > > +static int raw_lockf(BlockDriverState *bs, BdrvLockfCmd cmd) > > > > > +{ > > > > > + > > > > > + BDRVRawState *s = bs->opaque; > > > > > + int ret; > > > > > + struct flock fl = (struct flock) { > > > > > + .l_whence = SEEK_SET, > > > > > + /* Locking byte 1 avoids interfereing with virtlockd. */ > > > > > + .l_start = 1, > > > > > + .l_len = 1, > > > > > + }; > > > > > + > > > > > + switch (cmd) { > > > > > + case BDRV_LOCKF_RWLOCK: > > > > > + fl.l_type = F_WRLCK; > > > > > + break; > > > > > + case BDRV_LOCKF_ROLOCK: > > > > > + fl.l_type = F_RDLCK; > > > > > + break; > > > > > + case BDRV_LOCKF_UNLOCK: > > > > > + fl.l_type = F_UNLCK; > > > > > + break; > > > > > > > > My understanding is this prevents libguestfs from working on live disk > > > > images -- we want to be able to read live disk images (using a > > > > writable overlay and the real disk image as a read-only backing file). > > > > > > Do you lock the live image or the backing file? > > > > At the moment we don't need to do anything, but we do have the problem > > that if someone uses libguestfs in write mode on a live image, then > > obviously they end up with a corrupted guest. > > > > However in this email I'm talking about using libguestfs in > > "read-only" mode on a live guest, which is a completely different > > thing, and should not be prevented. > > > > My assumption [with this patch applied] is the live image (being live) > > is locked by some other qemu. > > > > Now libguestfs creates a temporary overlay with the live image as > > backing, using a command similar to: > > > > qemu-img create -f qcow2 -b live.img tmp-overlay.img > > > > and opens 'tmp-overlay.img'. But since the live image has an > > exclusive lock, the open will *fail*, and that is a problem. > > I'm not sure that is what we want to support. The image is being read-write > open by the VM, any other accessing is a bad idea. We've done this successfully for years, for people monitoring their VMs using virt-df, pulling out files using guestfish and so on. We allow you to do it while the guest is live and running, with the proviso that a consistent view cannot always be guaranteed (although it works so reliably that it's never really a problem), or users can briefly pause VMs if they need a guaranteed consistent view. I'm afraid the onus is on you to explain why this existing practice is a bad idea. Rich.
On Tue, 04/19 14:07, Richard W.M. Jones wrote: > On Tue, Apr 19, 2016 at 08:37:14PM +0800, Fam Zheng wrote: > > On Mon, 04/18 09:04, Richard W.M. Jones wrote: > > > On Mon, Apr 18, 2016 at 09:10:36AM +0800, Fam Zheng wrote: > > > > On Sun, 04/17 20:27, Richard W.M. Jones wrote: > > > > > On Fri, Apr 15, 2016 at 11:27:55AM +0800, Fam Zheng wrote: > > > > > > virtlockd in libvirt locks the first byte, we lock byte 1 to avoid > > > > > > the intervene. > > > > > > +static int raw_lockf(BlockDriverState *bs, BdrvLockfCmd cmd) > > > > > > +{ > > > > > > + > > > > > > + BDRVRawState *s = bs->opaque; > > > > > > + int ret; > > > > > > + struct flock fl = (struct flock) { > > > > > > + .l_whence = SEEK_SET, > > > > > > + /* Locking byte 1 avoids interfereing with virtlockd. */ > > > > > > + .l_start = 1, > > > > > > + .l_len = 1, > > > > > > + }; > > > > > > + > > > > > > + switch (cmd) { > > > > > > + case BDRV_LOCKF_RWLOCK: > > > > > > + fl.l_type = F_WRLCK; > > > > > > + break; > > > > > > + case BDRV_LOCKF_ROLOCK: > > > > > > + fl.l_type = F_RDLCK; > > > > > > + break; > > > > > > + case BDRV_LOCKF_UNLOCK: > > > > > > + fl.l_type = F_UNLCK; > > > > > > + break; > > > > > > > > > > My understanding is this prevents libguestfs from working on live disk > > > > > images -- we want to be able to read live disk images (using a > > > > > writable overlay and the real disk image as a read-only backing file). > > > > > > > > Do you lock the live image or the backing file? > > > > > > At the moment we don't need to do anything, but we do have the problem > > > that if someone uses libguestfs in write mode on a live image, then > > > obviously they end up with a corrupted guest. > > > > > > However in this email I'm talking about using libguestfs in > > > "read-only" mode on a live guest, which is a completely different > > > thing, and should not be prevented. > > > > > > My assumption [with this patch applied] is the live image (being live) > > > is locked by some other qemu. > > > > > > Now libguestfs creates a temporary overlay with the live image as > > > backing, using a command similar to: > > > > > > qemu-img create -f qcow2 -b live.img tmp-overlay.img > > > > > > and opens 'tmp-overlay.img'. But since the live image has an > > > exclusive lock, the open will *fail*, and that is a problem. > > > > I'm not sure that is what we want to support. The image is being read-write > > open by the VM, any other accessing is a bad idea. > > We've done this successfully for years, for people monitoring their > VMs using virt-df, pulling out files using guestfish and so on. We > allow you to do it while the guest is live and running, with the > proviso that a consistent view cannot always be guaranteed (although > it works so reliably that it's never really a problem), or users can > briefly pause VMs if they need a guaranteed consistent view. > > I'm afraid the onus is on you to explain why this existing practice is > a bad idea. It is bad idea because it can produce erroneous data. Perhaps it's not critical and is rare enough to be practically useful. As a tradeoff, I guess, we can skip the shared lock in this series. Does that work for you? Fam
On Mon, Apr 18, 2016 at 09:04:19AM +0100, Richard W.M. Jones wrote: > On Mon, Apr 18, 2016 at 09:10:36AM +0800, Fam Zheng wrote: > > On Sun, 04/17 20:27, Richard W.M. Jones wrote: > > > On Fri, Apr 15, 2016 at 11:27:55AM +0800, Fam Zheng wrote: > > > > virtlockd in libvirt locks the first byte, we lock byte 1 to avoid > > > > the intervene. > > > > +static int raw_lockf(BlockDriverState *bs, BdrvLockfCmd cmd) > > > > +{ > > > > + > > > > + BDRVRawState *s = bs->opaque; > > > > + int ret; > > > > + struct flock fl = (struct flock) { > > > > + .l_whence = SEEK_SET, > > > > + /* Locking byte 1 avoids interfereing with virtlockd. */ > > > > + .l_start = 1, > > > > + .l_len = 1, > > > > + }; > > > > + > > > > + switch (cmd) { > > > > + case BDRV_LOCKF_RWLOCK: > > > > + fl.l_type = F_WRLCK; > > > > + break; > > > > + case BDRV_LOCKF_ROLOCK: > > > > + fl.l_type = F_RDLCK; > > > > + break; > > > > + case BDRV_LOCKF_UNLOCK: > > > > + fl.l_type = F_UNLCK; > > > > + break; > > > > > > My understanding is this prevents libguestfs from working on live disk > > > images -- we want to be able to read live disk images (using a > > > writable overlay and the real disk image as a read-only backing file). > > > > Do you lock the live image or the backing file? > > At the moment we don't need to do anything, but we do have the problem > that if someone uses libguestfs in write mode on a live image, then > obviously they end up with a corrupted guest. > > However in this email I'm talking about using libguestfs in > "read-only" mode on a live guest, which is a completely different > thing, and should not be prevented. > > My assumption [with this patch applied] is the live image (being live) > is locked by some other qemu. > > Now libguestfs creates a temporary overlay with the live image as > backing, using a command similar to: > > qemu-img create -f qcow2 -b live.img tmp-overlay.img > > and opens 'tmp-overlay.img'. But since the live image has an > exclusive lock, the open will *fail*, and that is a problem. Have you ever considered integration with the QEMU NBD server. We don't have APIs for enabling it explicitly in libvirt, but it strikes me that it could be ideally suited for your needs. eg a hypothetical libvirt command to export a disk via NBD: virsh dom-export-disk myguest --readonly vda1 localhost 9000 qemu-img create -f qcow2 -b nbd:localhost:9000 tmp-overlay.img ...do stuff... virsh dom-unexport-disk myguest vda1 Or to make cleanup easier, perhaps there's a way to tell QEMU to close its NBD server after it has had 1 client connection. With this approach, you wouldn't need to take any lock on the underlying real image. Regards, Daniel
On Tue, Apr 19, 2016 at 09:19:02PM +0800, Fam Zheng wrote: > On Tue, 04/19 14:07, Richard W.M. Jones wrote: > > We've done this successfully for years, for people monitoring their > > VMs using virt-df, pulling out files using guestfish and so on. We > > allow you to do it while the guest is live and running, with the > > proviso that a consistent view cannot always be guaranteed (although > > it works so reliably that it's never really a problem), or users can > > briefly pause VMs if they need a guaranteed consistent view. > > > > I'm afraid the onus is on you to explain why this existing practice is > > a bad idea. > > It is bad idea because it can produce erroneous data. Perhaps it's not > critical and is rare enough to be practically useful. As explained above, we deal with the inconsistent case (by detecting and ignoring it, or retrying), and then there's the case where we pause the guest to get consistent data. > As a tradeoff, I guess, we can skip the shared lock in this series. Does that > work for you? I'd prefer some kind of no lock / ignore lock. There is a legitimate case where you want to have the shared lock behaviour, but also a legitimate one for turning it off. I'm not opposed to the idea -- there are very real cases where your patch saves people from themselves. Off topic: How does this patch deal with genuinely shared (writable) disks, as used in cluster filesystems? Rich.
On Tue, Apr 19, 2016 at 02:34:30PM +0100, Daniel P. Berrange wrote: > Have you ever considered integration with the QEMU NBD server. We > don't have APIs for enabling it explicitly in libvirt, but it strikes > me that it could be ideally suited for your needs. > > eg a hypothetical libvirt command to export a disk via NBD: > > virsh dom-export-disk myguest --readonly vda1 localhost 9000 > qemu-img create -f qcow2 -b nbd:localhost:9000 tmp-overlay.img > ...do stuff... > virsh dom-unexport-disk myguest vda1 > > Or to make cleanup easier, perhaps there's a way to tell QEMU > to close its NBD server after it has had 1 client connection. > > With this approach, you wouldn't need to take any lock on the > underlying real image. It's been on my todo list for a long time, but this does require libvirt to be involved, and libvirt is not the default for upstream libguestfs. Even with the libvirt backend, there are still uses cases like: virt-df -a /mnt/vms/disk.img -h virt-inspector -a /mnt/vms/disk.img | grep some_insecure_software for f in /mnt/vms/*; do virt-alignment-scan -a $f; done and so on. Rich.
On Tue, Apr 19, 2016 at 02:36:05PM +0100, Richard W.M. Jones wrote: > > I'd prefer some kind of no lock / ignore lock. There is a legitimate > case where you want to have the shared lock behaviour, but also a > legitimate one for turning it off. I'm not opposed to the idea -- > there are very real cases where your patch saves people from > themselves. [snip] > Off topic: How does this patch deal with genuinely shared (writable) > disks, as used in cluster filesystems? This series does add support for a lock_image=no parameter for disks which turns off all locking. IIUC, this was intended for use when the disk image must be shared between guests IOW, this series is doing the following: - read-only, exclusive -> F_RDLOCK - writable, shared -> no locking - writable, exclusive -> F_WRLOCK This stuff leaves open possibility of mistakes if one VM is given a disk in write+exclusive, while other VM gets the same disk in write+shared mode. In virtlockd we took a different approach to this: - read-only, exclusive -> no locking - writable, shared -> F_RDLOCK - writable, exclusive -> F_WRLOCK because this lets the kernel prevent case where the disk image is presented in write+shared mode to one VM and write+exlcusive mode to another VM. Not doing any locking at all in readonly case allowed for the scenario needed with libguestfs where they create an overlay for disks in use. Regards, Daniel
diff --git a/block/raw-posix.c b/block/raw-posix.c index 906d5c9..3a2c17f 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -35,6 +35,7 @@ #include "raw-aio.h" #include "qapi/util.h" #include "qapi/qmp/qstring.h" +#include "glib.h" #if defined(__APPLE__) && (__MACH__) #include <paths.h> @@ -397,6 +398,38 @@ static void raw_attach_aio_context(BlockDriverState *bs, #endif } +static int raw_lockf(BlockDriverState *bs, BdrvLockfCmd cmd) +{ + + BDRVRawState *s = bs->opaque; + int ret; + struct flock fl = (struct flock) { + .l_whence = SEEK_SET, + /* Locking byte 1 avoids interfereing with virtlockd. */ + .l_start = 1, + .l_len = 1, + }; + + switch (cmd) { + case BDRV_LOCKF_RWLOCK: + fl.l_type = F_WRLCK; + break; + case BDRV_LOCKF_ROLOCK: + fl.l_type = F_RDLCK; + break; + case BDRV_LOCKF_UNLOCK: + fl.l_type = F_UNLCK; + break; + default: + abort(); + } + ret = fcntl(s->fd, F_SETLK, &fl); + if (ret) { + ret = -errno; + } + return ret; +} + #ifdef CONFIG_LINUX_AIO static int raw_set_aio(void **aio_ctx, int *use_aio, int bdrv_flags) { @@ -1960,6 +1993,8 @@ BlockDriver bdrv_file = { .bdrv_detach_aio_context = raw_detach_aio_context, .bdrv_attach_aio_context = raw_attach_aio_context, + .bdrv_lockf = raw_lockf, + .create_opts = &raw_create_opts, };
virtlockd in libvirt locks the first byte, we lock byte 1 to avoid the intervene. Suggested-by: "Daniel P. Berrange" <berrange@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> --- block/raw-posix.c | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+)