Message ID | 1462848659-28659-9-git-send-email-famz@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, May 10, 2016 at 10:50:40AM +0800, Fam Zheng wrote:
> +int qemu_lock_fd(int fd, int64_t start, int64_t len, bool readonly)
I find this new API to be very unintuitive. When I was reading the
other code in block/raw-posix.c I had to refer back to this file to
find out what all the integers meant.
It is also misleading. There's aren't really such a thing as
"readonly locks", or "read locks" or "write locks". I know that
(some) POSIX APIs use these terms, but the terms are wrong.
There are shared locks, and there are exclusive locks. The locks have
nothing to do with reading or writing. In fact the locks only apply
when writing, and are to do with whether you want to allow multiple
writers at the same time (shared lock), or only a single writer
(exclusive lock).
Anyway, I think the boolean "readonly" should be replaced by
some flag like:
#define QEMU_LOCK_SHARED 1
#define QEMU_LOCK_EXCLUSIVE 2
or similar.
Not sure about the start/len. Do we need them in the API at all given
that we've decided to lock a particular byte of the file?
Rich.
On Tue, May 10, 2016 at 10:50:40AM +0800, Fam Zheng wrote: > They are wrappers of POSIX fcntl file locking, with the additional > interception of open/close (through qemu_open and qemu_close) to offer a > better semantics that preserves the locks across multiple life cycles of > different fds on the same file. The reason to make this semantics > change over the fcntl locks is to make the API cleaner for QEMU > subsystems. > > More precisely, we remove this "feature" of fcntl lock: > > If a process closes any file descriptor referring to a file, then > all of the process's locks on that file are released, regardless of > the file descriptor(s) on which the locks were obtained. > > as long as the fd is always open/closed through qemu_open and > qemu_close. You're not actually really removing that problem - this is just hacking around it in a manner which is racy & susceptible to silent failure. > +static int qemu_fd_close(int fd) > +{ > + LockEntry *ent, *tmp; > + LockRecord *rec; > + char *path; > + int ret; > + > + assert(fd_to_path); > + path = g_hash_table_lookup(fd_to_path, GINT_TO_POINTER(fd)); > + assert(path); > + g_hash_table_remove(fd_to_path, GINT_TO_POINTER(fd)); > + rec = g_hash_table_lookup(lock_map, path); > + ret = close(fd); > + > + if (rec) { > + QLIST_FOREACH_SAFE(ent, &rec->records, next, tmp) { > + int r; > + if (ent->fd == fd) { > + QLIST_REMOVE(ent, next); > + g_free(ent); > + continue; > + } > + r = qemu_lock_do(ent->fd, ent->start, ent->len, > + ent->readonly ? F_RDLCK : F_WRLCK); > + if (r == -1) { > + fprintf(stderr, "Failed to acquire lock on fd %d: %s\n", > + ent->fd, strerror(errno)); > + } If another app has acquired the lock between the close and this attempt to re-acquire the lock, then QEMU is silently carrying on without any lock being held. For something that's intending to provide protection against concurrent use I think this is not an acceptable failure scenario. > +int qemu_open(const char *name, int flags, ...) > +{ > + int mode = 0; > + int ret; > + > + if (flags & O_CREAT) { > + va_list ap; > + > + va_start(ap, flags); > + mode = va_arg(ap, int); > + va_end(ap); > + } > + ret = qemu_fd_open(name, flags, mode); > + if (ret >= 0) { > + qemu_fd_add_record(ret, name); > + } > + return ret; > +} I think the approach you have here is fundamentally not usable with fcntl locks, because it is still using the pattern fd = open(path) lock(fd) if failed lock close(fd) ...do stuff. As mentioned in previous posting I believe the block layer must be checking whether it already has a lock held against "path" *before* even attempting to open the file. Once QEMU has the file descriptor open it is too later, because closing the FD will release pre-existing locks QEMU has. Regards, Daniel
On Tue, May 10, 2016 at 09:57:48AM +0100, Daniel P. Berrange wrote: > On Tue, May 10, 2016 at 10:50:40AM +0800, Fam Zheng wrote: > > They are wrappers of POSIX fcntl file locking, with the additional > > interception of open/close (through qemu_open and qemu_close) to offer a > > better semantics that preserves the locks across multiple life cycles of > > different fds on the same file. The reason to make this semantics > > change over the fcntl locks is to make the API cleaner for QEMU > > subsystems. > > > > More precisely, we remove this "feature" of fcntl lock: > > > > If a process closes any file descriptor referring to a file, then > > all of the process's locks on that file are released, regardless of > > the file descriptor(s) on which the locks were obtained. > > > > as long as the fd is always open/closed through qemu_open and > > qemu_close. > > You're not actually really removing that problem - this is just hacking > around it in a manner which is racy & susceptible to silent failure. Whatever happened to file-private locks (https://lwn.net/Articles/586904/)? My very recent glibc doesn't appear to include them, unless the final standard used something different from `F_SETLKPW'. Rich.
On Tue, May 10, 2016 at 10:06:35AM +0100, Richard W.M. Jones wrote: > On Tue, May 10, 2016 at 09:57:48AM +0100, Daniel P. Berrange wrote: > > On Tue, May 10, 2016 at 10:50:40AM +0800, Fam Zheng wrote: > > > They are wrappers of POSIX fcntl file locking, with the additional > > > interception of open/close (through qemu_open and qemu_close) to offer a > > > better semantics that preserves the locks across multiple life cycles of > > > different fds on the same file. The reason to make this semantics > > > change over the fcntl locks is to make the API cleaner for QEMU > > > subsystems. > > > > > > More precisely, we remove this "feature" of fcntl lock: > > > > > > If a process closes any file descriptor referring to a file, then > > > all of the process's locks on that file are released, regardless of > > > the file descriptor(s) on which the locks were obtained. > > > > > > as long as the fd is always open/closed through qemu_open and > > > qemu_close. > > > > You're not actually really removing that problem - this is just hacking > > around it in a manner which is racy & susceptible to silent failure. > > Whatever happened to file-private locks (https://lwn.net/Articles/586904/)? > My very recent glibc doesn't appear to include them, unless the > final standard used something different from `F_SETLKPW'. They merged in 3.15, but the constant names got renamed just after merge :-) Look for F_OFD_SETLKW instead commit 0d3f7a2dd2f5cf9642982515e020c1aee2cf7af6 Author: Jeff Layton <jlayton@redhat.com> Date: Tue Apr 22 08:23:58 2014 -0400 locks: rename file-private locks to "open file description locks" File-private locks have been merged into Linux for v3.15, and *now* people are commenting that the name and macro definitions for the new file-private locks suck. ...and I can't even disagree. The names and command macros do suck. We're going to have to live with these for a long time, so it's important that we be happy with the names before we're stuck with them. The consensus on the lists so far is that they should be rechristened as "open file description locks". The name isn't a big deal for the kernel, but the command macros are not visually distinct enough from the traditional POSIX lock macros. The glibc and documentation folks are recommending that we change them to look like F_OFD_{GETLK|SETLK|SETLKW}. That lessens the chance that a programmer will typo one of the commands wrong, and also makes it easier to spot this difference when reading code. This patch makes the following changes that I think are necessary before v3.15 ships: 1) rename the command macros to their new names. These end up in the uapi headers and so are part of the external-facing API. It turns out that glibc doesn't actually use the fcntl.h uapi header, but it's hard to be sure that something else won't. Changing it now is safest. 2) make the the /proc/locks output display these as type "OFDLCK" Regards, Daniel
On Tue, 05/10 09:57, Daniel P. Berrange wrote: > On Tue, May 10, 2016 at 10:50:40AM +0800, Fam Zheng wrote: > > They are wrappers of POSIX fcntl file locking, with the additional > > interception of open/close (through qemu_open and qemu_close) to offer a > > better semantics that preserves the locks across multiple life cycles of > > different fds on the same file. The reason to make this semantics > > change over the fcntl locks is to make the API cleaner for QEMU > > subsystems. > > > > More precisely, we remove this "feature" of fcntl lock: > > > > If a process closes any file descriptor referring to a file, then > > all of the process's locks on that file are released, regardless of > > the file descriptor(s) on which the locks were obtained. > > > > as long as the fd is always open/closed through qemu_open and > > qemu_close. > > You're not actually really removing that problem - this is just hacking > around it in a manner which is racy & susceptible to silent failure. > > > > +static int qemu_fd_close(int fd) > > +{ > > + LockEntry *ent, *tmp; > > + LockRecord *rec; > > + char *path; > > + int ret; > > + > > + assert(fd_to_path); > > + path = g_hash_table_lookup(fd_to_path, GINT_TO_POINTER(fd)); > > + assert(path); > > + g_hash_table_remove(fd_to_path, GINT_TO_POINTER(fd)); > > + rec = g_hash_table_lookup(lock_map, path); > > + ret = close(fd); > > + > > + if (rec) { > > + QLIST_FOREACH_SAFE(ent, &rec->records, next, tmp) { > > + int r; > > + if (ent->fd == fd) { > > + QLIST_REMOVE(ent, next); > > + g_free(ent); > > + continue; > > + } > > + r = qemu_lock_do(ent->fd, ent->start, ent->len, > > + ent->readonly ? F_RDLCK : F_WRLCK); > > + if (r == -1) { > > + fprintf(stderr, "Failed to acquire lock on fd %d: %s\n", > > + ent->fd, strerror(errno)); > > + } > > If another app has acquired the lock between the close and this attempt > to re-acquire the lock, then QEMU is silently carrying on without any > lock being held. For something that's intending to provide protection > against concurrent use I think this is not an acceptable failure > scenario. > > > > +int qemu_open(const char *name, int flags, ...) > > +{ > > + int mode = 0; > > + int ret; > > + > > + if (flags & O_CREAT) { > > + va_list ap; > > + > > + va_start(ap, flags); > > + mode = va_arg(ap, int); > > + va_end(ap); > > + } > > + ret = qemu_fd_open(name, flags, mode); > > + if (ret >= 0) { > > + qemu_fd_add_record(ret, name); > > + } > > + return ret; > > +} > > I think the approach you have here is fundamentally not usable with > fcntl locks, because it is still using the pattern > > fd = open(path) > lock(fd) > if failed lock > close(fd) > ...do stuff. > > > As mentioned in previous posting I believe the block layer must be > checking whether it already has a lock held against "path" *before* > even attempting to open the file. Once QEMU has the file descriptor > open it is too later, because closing the FD will release pre-existing > locks QEMU has. There will still be valid close() calls, that will release necessary locks temporarily. You are right the "close + re-acquire" in qemu_fd_close() is a racy problem. Any suggestion how this could be fixed? Something like introducing a "close2" syscall that doesn't drop locks on other fds? Fam
On Wed, 05/11 08:48, Fam Zheng wrote:
> racy problem. Any suggestion how this could be fixed?
Reading into the subthread I see the answer: the file-private locks look
promising. Will take a look at that! Thanks.
Fam
On Wed, May 11, 2016 at 08:48:18AM +0800, Fam Zheng wrote: > On Tue, 05/10 09:57, Daniel P. Berrange wrote: > > On Tue, May 10, 2016 at 10:50:40AM +0800, Fam Zheng wrote: > > > They are wrappers of POSIX fcntl file locking, with the additional > > > interception of open/close (through qemu_open and qemu_close) to offer a > > > better semantics that preserves the locks across multiple life cycles of > > > different fds on the same file. The reason to make this semantics > > > change over the fcntl locks is to make the API cleaner for QEMU > > > subsystems. > > > > > > More precisely, we remove this "feature" of fcntl lock: > > > > > > If a process closes any file descriptor referring to a file, then > > > all of the process's locks on that file are released, regardless of > > > the file descriptor(s) on which the locks were obtained. > > > > > > as long as the fd is always open/closed through qemu_open and > > > qemu_close. > > > > You're not actually really removing that problem - this is just hacking > > around it in a manner which is racy & susceptible to silent failure. > > > > > > > +static int qemu_fd_close(int fd) > > > +{ > > > + LockEntry *ent, *tmp; > > > + LockRecord *rec; > > > + char *path; > > > + int ret; > > > + > > > + assert(fd_to_path); > > > + path = g_hash_table_lookup(fd_to_path, GINT_TO_POINTER(fd)); > > > + assert(path); > > > + g_hash_table_remove(fd_to_path, GINT_TO_POINTER(fd)); > > > + rec = g_hash_table_lookup(lock_map, path); > > > + ret = close(fd); > > > + > > > + if (rec) { > > > + QLIST_FOREACH_SAFE(ent, &rec->records, next, tmp) { > > > + int r; > > > + if (ent->fd == fd) { > > > + QLIST_REMOVE(ent, next); > > > + g_free(ent); > > > + continue; > > > + } > > > + r = qemu_lock_do(ent->fd, ent->start, ent->len, > > > + ent->readonly ? F_RDLCK : F_WRLCK); > > > + if (r == -1) { > > > + fprintf(stderr, "Failed to acquire lock on fd %d: %s\n", > > > + ent->fd, strerror(errno)); > > > + } > > > > If another app has acquired the lock between the close and this attempt > > to re-acquire the lock, then QEMU is silently carrying on without any > > lock being held. For something that's intending to provide protection > > against concurrent use I think this is not an acceptable failure > > scenario. > > > > > > > +int qemu_open(const char *name, int flags, ...) > > > +{ > > > + int mode = 0; > > > + int ret; > > > + > > > + if (flags & O_CREAT) { > > > + va_list ap; > > > + > > > + va_start(ap, flags); > > > + mode = va_arg(ap, int); > > > + va_end(ap); > > > + } > > > + ret = qemu_fd_open(name, flags, mode); > > > + if (ret >= 0) { > > > + qemu_fd_add_record(ret, name); > > > + } > > > + return ret; > > > +} > > > > I think the approach you have here is fundamentally not usable with > > fcntl locks, because it is still using the pattern > > > > fd = open(path) > > lock(fd) > > if failed lock > > close(fd) > > ...do stuff. > > > > > > As mentioned in previous posting I believe the block layer must be > > checking whether it already has a lock held against "path" *before* > > even attempting to open the file. Once QEMU has the file descriptor > > open it is too later, because closing the FD will release pre-existing > > locks QEMU has. > > There will still be valid close() calls, that will release necessary locks > temporarily. You are right the "close + re-acquire" in qemu_fd_close() is a > racy problem. Any suggestion how this could be fixed? Something like > introducing a "close2" syscall that doesn't drop locks on other fds? I guess the problem scenario is with the "nolock" option - if you have a block driver which has an image open and locked, if you have another block driver opening that same image with 'nolock' then you then hit the potential close() problem. One idea might be to simply have a delayed close(). ie don't close the file descriptor until all locks have been released by the other block driver. Regards, Daniel
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h index 408783f..089c13f 100644 --- a/include/qemu/osdep.h +++ b/include/qemu/osdep.h @@ -251,6 +251,8 @@ int qemu_madvise(void *addr, size_t len, int advice); int qemu_open(const char *name, int flags, ...); int qemu_close(int fd); +int qemu_lock_fd(int fd, int64_t start, int64_t len, bool readonly); +int qemu_unlock_fd(int fd, int64_t start, int64_t len); #if defined(__HAIKU__) && defined(__i386__) #define FMT_pid "%ld" diff --git a/util/osdep.c b/util/osdep.c index d56d071..1510cbf 100644 --- a/util/osdep.c +++ b/util/osdep.c @@ -41,6 +41,7 @@ extern int madvise(caddr_t, size_t, int); #include "qemu/sockets.h" #include "qemu/error-report.h" #include "monitor/monitor.h" +#include <glib.h> static bool fips_enabled = false; @@ -146,13 +147,9 @@ static int qemu_parse_fdset(const char *param) } #endif -/* - * Opens a file with FD_CLOEXEC set - */ -int qemu_open(const char *name, int flags, ...) +static int qemu_fd_open(const char *name, int flags, int mode) { int ret; - int mode = 0; #ifndef _WIN32 const char *fdset_id_str; @@ -189,13 +186,6 @@ int qemu_open(const char *name, int flags, ...) } #endif - if (flags & O_CREAT) { - va_list ap; - - va_start(ap, flags); - mode = va_arg(ap, int); - va_end(ap); - } #ifdef O_CLOEXEC ret = open(name, flags | O_CLOEXEC, mode); @@ -216,6 +206,188 @@ int qemu_open(const char *name, int flags, ...) return ret; } +typedef struct LockEntry LockEntry; +struct LockEntry { + int fd; + bool readonly; + int64_t start; + int64_t len; + QLIST_ENTRY(LockEntry) next; +}; + +typedef struct { + QLIST_HEAD(, LockEntry) records; +} LockRecord; + +static GHashTable *fd_to_path; +static GHashTable *lock_map; + +static void fd_map_init(void) +{ + if (!fd_to_path) { + fd_to_path = g_hash_table_new(g_direct_hash, g_direct_equal); + lock_map = g_hash_table_new(g_str_hash, g_str_equal); + } +} + +static int qemu_lock_do(int fd, int64_t start, int64_t len, int fl_type) +{ + struct flock fl = (struct flock) { + .l_whence = SEEK_SET, + /* Locking byte 1 avoids interfereing with virtlockd. */ + .l_start = start, + .l_len = len, + .l_type = fl_type, + }; + + return fcntl(fd, F_SETLK, &fl); +} + +int qemu_lock_fd(int fd, int64_t start, int64_t len, bool readonly) +{ + LockEntry *ent; + LockRecord *rec; + char *path; + int ret; + + ret = qemu_lock_do(fd, start, len, readonly ? F_RDLCK : F_WRLCK); + if (ret == -1) { + return -errno; + } + path = g_hash_table_lookup(fd_to_path, GINT_TO_POINTER(fd)); + assert(path); + rec = g_hash_table_lookup(lock_map, path); + if (!rec) { + gboolean inserted; + rec = g_new(LockRecord, 1); + QLIST_INIT(&rec->records); + inserted = g_hash_table_insert(lock_map, path, rec); + assert(inserted); + } else { + /* Check the range is not locked by any fd in the record yet. */ + int64_t end = start + len; + QLIST_FOREACH(ent, &rec->records, next) { + int64_t ent_end = ent->start + ent->len; + if ((ent->start >= start && ent_end <= end) || + (ent_end >= start && ent_end <= end)) { + /* There is an overlap between ent and us, only shared locking + * is permitted. */ + if (!(ent->readonly && readonly)) { + return -EAGAIN; + } + } + } + } + ent = g_new(LockEntry, 1); + *ent = (LockEntry) { + .fd = fd, + .start = start, + .len = len, + .readonly = readonly, + }; + QLIST_INSERT_HEAD(&rec->records, ent, next); + return 0; +} + +int qemu_unlock_fd(int fd, int64_t start, int64_t len) +{ + LockEntry *ent; + LockRecord *rec; + int ret; + char *path; + + ret = qemu_lock_do(fd, start, len, F_UNLCK); + if (ret == -1) { + return ret; + } + path = g_hash_table_lookup(fd_to_path, GINT_TO_POINTER(fd)); + assert(path); + rec = g_hash_table_lookup(lock_map, path); + assert(ret); + QLIST_FOREACH(ent, &rec->records, next) { + if (ent->fd == fd && ent->start == start && ent->len == len) { + QLIST_REMOVE(ent, next); + break; + } + } + assert(ent); + return ret; +} + +static void qemu_fd_add_record(int fd, const char *path) +{ + gboolean ret; + char *rpath; + + fd_map_init(); + rpath = realpath(path, NULL); + if (!rpath) { + fprintf(stderr, "Failed to get real path for file %s: %s\n", path, + strerror(errno)); + rpath = strdup(path); + } + ret = g_hash_table_insert(fd_to_path, GINT_TO_POINTER(fd), g_strdup(rpath)); + /* It's a caller bug if the record already exists */ + assert(ret); + free(rpath); +} + +static int qemu_fd_close(int fd) +{ + LockEntry *ent, *tmp; + LockRecord *rec; + char *path; + int ret; + + assert(fd_to_path); + path = g_hash_table_lookup(fd_to_path, GINT_TO_POINTER(fd)); + assert(path); + g_hash_table_remove(fd_to_path, GINT_TO_POINTER(fd)); + rec = g_hash_table_lookup(lock_map, path); + ret = close(fd); + + if (rec) { + QLIST_FOREACH_SAFE(ent, &rec->records, next, tmp) { + int r; + if (ent->fd == fd) { + QLIST_REMOVE(ent, next); + g_free(ent); + continue; + } + r = qemu_lock_do(ent->fd, ent->start, ent->len, + ent->readonly ? F_RDLCK : F_WRLCK); + if (r == -1) { + fprintf(stderr, "Failed to acquire lock on fd %d: %s\n", + ent->fd, strerror(errno)); + } + } + } + + return ret; +} + +/* + * Opens a file with FD_CLOEXEC set + */ +int qemu_open(const char *name, int flags, ...) +{ + int mode = 0; + int ret; + + if (flags & O_CREAT) { + va_list ap; + + va_start(ap, flags); + mode = va_arg(ap, int); + va_end(ap); + } + ret = qemu_fd_open(name, flags, mode); + if (ret >= 0) { + qemu_fd_add_record(ret, name); + } + return ret; +} + int qemu_close(int fd) { int64_t fdset_id; @@ -225,7 +397,7 @@ int qemu_close(int fd) if (fdset_id != -1) { int ret; - ret = close(fd); + ret = qemu_fd_close(fd); if (ret == 0) { monitor_fdset_dup_fd_remove(fd); } @@ -233,7 +405,7 @@ int qemu_close(int fd) return ret; } - return close(fd); + return qemu_fd_close(fd); } /*
They are wrappers of POSIX fcntl file locking, with the additional interception of open/close (through qemu_open and qemu_close) to offer a better semantics that preserves the locks across multiple life cycles of different fds on the same file. The reason to make this semantics change over the fcntl locks is to make the API cleaner for QEMU subsystems. More precisely, we remove this "feature" of fcntl lock: If a process closes any file descriptor referring to a file, then all of the process's locks on that file are released, regardless of the file descriptor(s) on which the locks were obtained. as long as the fd is always open/closed through qemu_open and qemu_close. Signed-off-by: Fam Zheng <famz@redhat.com> --- include/qemu/osdep.h | 2 + util/osdep.c | 200 +++++++++++++++++++++++++++++++++++++++++++++++---- 2 files changed, 188 insertions(+), 14 deletions(-)