Message ID | 20241225-event-v1-1-a58c8d63eb70@daynix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Improve futex usage | expand |
This is somewhat orthogonal to the issue being addressed here, but: While reading the man page to make sense of this patch, I noticed the following: > If the futex value does not match val, then the call fails > immediately with the error EAGAIN. And qemu_futex_wait does not seem to handle that case. In fact it seems like it would take the default: abort(); code path? If I've got this right, I'm surprised there aren't spurious abort()s happening, but I suppose QemuEvent and qemu_futex_* are used fairly sparingly and in low-contention areas. On Wed, 25 Dec 2024 at 06:44, Akihiko Odaki <akihiko.odaki@daynix.com> wrote: > futex(2) - Linux manual page > https://man7.org/linux/man-pages/man2/futex.2.html > > Note that a wake-up can also be caused by common futex usage patterns > > in unrelated code that happened to have previously used the futex > > word's memory location (e.g., typical futex-based implementations of > > Pthreads mutexes can cause this under some conditions). Therefore, > > callers should always conservatively assume that a return value of 0 > > can mean a spurious wake-up, and use the futex word's value (i.e., > > the user-space synchronization scheme) to decide whether to continue > > to block or not. > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> > In any case, for the specific issue addressed here: Reviewed-by: Phil Dennis-Jordan <phil@philjordan.eu> > --- > include/qemu/futex.h | 9 +++++++++ > tests/unit/test-aio-multithread.c | 4 +++- > util/qemu-thread-posix.c | 28 ++++++++++++++++------------ > 3 files changed, 28 insertions(+), 13 deletions(-) > > diff --git a/include/qemu/futex.h b/include/qemu/futex.h > index 91ae88966e12..f57774005330 100644 > --- a/include/qemu/futex.h > +++ b/include/qemu/futex.h > @@ -24,6 +24,15 @@ static inline void qemu_futex_wake(void *f, int n) > qemu_futex(f, FUTEX_WAKE, n, NULL, NULL, 0); > } > > +/* > + * Note that a wake-up can also be caused by common futex usage patterns > in > + * unrelated code that happened to have previously used the futex word's > + * memory location (e.g., typical futex-based implementations of Pthreads > + * mutexes can cause this under some conditions). Therefore, callers > should > + * always conservatively assume that it is a spurious wake-up, and use > the futex > + * word's value (i.e., the user-space synchronization scheme) to decide > whether > + * to continue to block or not. > + */ > static inline void qemu_futex_wait(void *f, unsigned val) > { > while (qemu_futex(f, FUTEX_WAIT, (int) val, NULL, NULL, 0)) { > diff --git a/tests/unit/test-aio-multithread.c > b/tests/unit/test-aio-multithread.c > index 08d4570ccb14..8c2e41545a29 100644 > --- a/tests/unit/test-aio-multithread.c > +++ b/tests/unit/test-aio-multithread.c > @@ -305,7 +305,9 @@ static void mcs_mutex_lock(void) > prev = qatomic_xchg(&mutex_head, id); > if (prev != -1) { > qatomic_set(&nodes[prev].next, id); > - qemu_futex_wait(&nodes[id].locked, 1); > + while (qatomic_read(&nodes[id].locked) == 1) { > + qemu_futex_wait(&nodes[id].locked, 1); > + } > } > } > > diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c > index b2e26e21205b..eade5311d175 100644 > --- a/util/qemu-thread-posix.c > +++ b/util/qemu-thread-posix.c > @@ -428,17 +428,21 @@ void qemu_event_wait(QemuEvent *ev) > > assert(ev->initialized); > > - /* > - * qemu_event_wait must synchronize with qemu_event_set even if it > does > - * not go down the slow path, so this load-acquire is needed that > - * synchronizes with the first memory barrier in qemu_event_set(). > - * > - * If we do go down the slow path, there is no requirement at all: we > - * might miss a qemu_event_set() here but ultimately the memory > barrier in > - * qemu_futex_wait() will ensure the check is done correctly. > - */ > - value = qatomic_load_acquire(&ev->value); > - if (value != EV_SET) { > + while (true) { > + /* > + * qemu_event_wait must synchronize with qemu_event_set even if > it does > + * not go down the slow path, so this load-acquire is needed that > + * synchronizes with the first memory barrier in qemu_event_set(). > + * > + * If we do go down the slow path, there is no requirement at > all: we > + * might miss a qemu_event_set() here but ultimately the memory > barrier > + * in qemu_futex_wait() will ensure the check is done correctly. > + */ > + value = qatomic_load_acquire(&ev->value); > + if (value == EV_SET) { > + break; > + } > + > if (value == EV_FREE) { > /* > * Leave the event reset and tell qemu_event_set that there > are > @@ -452,7 +456,7 @@ void qemu_event_wait(QemuEvent *ev) > * like the load above. > */ > if (qatomic_cmpxchg(&ev->value, EV_FREE, EV_BUSY) == EV_SET) { > - return; > + break; > } > } > > > -- > 2.47.1 > >
On 2024/12/28 20:11, Phil Dennis-Jordan wrote: > This is somewhat orthogonal to the issue being addressed here, but: > While reading the man page to make sense of this patch, I noticed the > following: > > > If the futex value does not match val, then the call fails > > immediately with the error EAGAIN. > > And qemu_futex_wait does not seem to handle that case. In fact it seems > like it would take the default: abort(); code path? It's handled as EWOULDBLOCK. The man page says: > Note: on Linux, the symbolic names EAGAIN and EWOULDBLOCK (both of > which appear in different parts of the kernel futex code) have the > same value. > > If I've got this right, I'm surprised there aren't spurious abort()s > happening, but I suppose QemuEvent and qemu_futex_* are used fairly > sparingly and in low-contention areas. QemuLockCnt, which relies on qemu_futex_*, is used in more contended areas so it will cause trouble if qemu_futex_* is broken.
diff --git a/include/qemu/futex.h b/include/qemu/futex.h index 91ae88966e12..f57774005330 100644 --- a/include/qemu/futex.h +++ b/include/qemu/futex.h @@ -24,6 +24,15 @@ static inline void qemu_futex_wake(void *f, int n) qemu_futex(f, FUTEX_WAKE, n, NULL, NULL, 0); } +/* + * Note that a wake-up can also be caused by common futex usage patterns in + * unrelated code that happened to have previously used the futex word's + * memory location (e.g., typical futex-based implementations of Pthreads + * mutexes can cause this under some conditions). Therefore, callers should + * always conservatively assume that it is a spurious wake-up, and use the futex + * word's value (i.e., the user-space synchronization scheme) to decide whether + * to continue to block or not. + */ static inline void qemu_futex_wait(void *f, unsigned val) { while (qemu_futex(f, FUTEX_WAIT, (int) val, NULL, NULL, 0)) { diff --git a/tests/unit/test-aio-multithread.c b/tests/unit/test-aio-multithread.c index 08d4570ccb14..8c2e41545a29 100644 --- a/tests/unit/test-aio-multithread.c +++ b/tests/unit/test-aio-multithread.c @@ -305,7 +305,9 @@ static void mcs_mutex_lock(void) prev = qatomic_xchg(&mutex_head, id); if (prev != -1) { qatomic_set(&nodes[prev].next, id); - qemu_futex_wait(&nodes[id].locked, 1); + while (qatomic_read(&nodes[id].locked) == 1) { + qemu_futex_wait(&nodes[id].locked, 1); + } } } diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c index b2e26e21205b..eade5311d175 100644 --- a/util/qemu-thread-posix.c +++ b/util/qemu-thread-posix.c @@ -428,17 +428,21 @@ void qemu_event_wait(QemuEvent *ev) assert(ev->initialized); - /* - * qemu_event_wait must synchronize with qemu_event_set even if it does - * not go down the slow path, so this load-acquire is needed that - * synchronizes with the first memory barrier in qemu_event_set(). - * - * If we do go down the slow path, there is no requirement at all: we - * might miss a qemu_event_set() here but ultimately the memory barrier in - * qemu_futex_wait() will ensure the check is done correctly. - */ - value = qatomic_load_acquire(&ev->value); - if (value != EV_SET) { + while (true) { + /* + * qemu_event_wait must synchronize with qemu_event_set even if it does + * not go down the slow path, so this load-acquire is needed that + * synchronizes with the first memory barrier in qemu_event_set(). + * + * If we do go down the slow path, there is no requirement at all: we + * might miss a qemu_event_set() here but ultimately the memory barrier + * in qemu_futex_wait() will ensure the check is done correctly. + */ + value = qatomic_load_acquire(&ev->value); + if (value == EV_SET) { + break; + } + if (value == EV_FREE) { /* * Leave the event reset and tell qemu_event_set that there are @@ -452,7 +456,7 @@ void qemu_event_wait(QemuEvent *ev) * like the load above. */ if (qatomic_cmpxchg(&ev->value, EV_FREE, EV_BUSY) == EV_SET) { - return; + break; } }