diff mbox series

linux-user: un-register threads from RCU before exit

Message ID 20200211173510.16347-1-alex.bennee@linaro.org (mailing list archive)
State New, archived
Headers show
Series linux-user: un-register threads from RCU before exit | expand

Commit Message

Alex Bennée Feb. 11, 2020, 5:35 p.m. UTC
Through a mechanism I don't quite yet understand we can find ourselves
with a left over RCU thread when we exit group. This is a racy failure
that occurs for example with:

  alpha-linux-user running testthread
    with libhowvec.so plugin
    but only when run from make

This may not be the correct fix but it seems to alleviate the
symptoms.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
---
 linux-user/exit.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

Comments

Peter Maydell Feb. 11, 2020, 5:43 p.m. UTC | #1
On Tue, 11 Feb 2020 at 17:36, Alex Bennée <alex.bennee@linaro.org> wrote:
>
> Through a mechanism I don't quite yet understand we can find ourselves
> with a left over RCU thread when we exit group. This is a racy failure
> that occurs for example with:
>
>   alpha-linux-user running testthread
>     with libhowvec.so plugin
>     but only when run from make
>
> This may not be the correct fix but it seems to alleviate the
> symptoms.

This is weird. The only time we call preexit_cleanup()
is when the next thing we do is to terminate the entire
process all at once. (For some reason in one place
we do that by calling _exit() and in another place
by calling exit_group() -- I don't see why we need that
inconsistency).

I'm pretty sure the system emulation threads don't
call rcu_unregister_thread() for the "whole process
is going away" case, so something odd is happening here...

thanks
-- PMM
Paolo Bonzini Feb. 11, 2020, 6:04 p.m. UTC | #2
On 11/02/20 18:35, Alex Bennée wrote:
> Through a mechanism I don't quite yet understand we can find ourselves
> with a left over RCU thread when we exit group. This is a racy failure
> that occurs for example with:
> 
>   alpha-linux-user running testthread
>     with libhowvec.so plugin
>     but only when run from make
> 
> This may not be the correct fix but it seems to alleviate the
> symptoms.

Can you explain what is the effect of this left-over thread?  All
threads should be terminated when the process exits and I'm not sure why
the user-mode emulation is special.

Paolo
Alex Bennée Feb. 11, 2020, 9:58 p.m. UTC | #3
Peter Maydell <peter.maydell@linaro.org> writes:

> On Tue, 11 Feb 2020 at 17:36, Alex Bennée <alex.bennee@linaro.org> wrote:
>>
>> Through a mechanism I don't quite yet understand we can find ourselves
>> with a left over RCU thread when we exit group. This is a racy failure
>> that occurs for example with:
>>
>>   alpha-linux-user running testthread
>>     with libhowvec.so plugin
>>     but only when run from make
>>
>> This may not be the correct fix but it seems to alleviate the
>> symptoms.
>
> This is weird. The only time we call preexit_cleanup()
> is when the next thing we do is to terminate the entire
> process all at once. (For some reason in one place
> we do that by calling _exit() and in another place
> by calling exit_group() -- I don't see why we need that
> inconsistency).
>
> I'm pretty sure the system emulation threads don't
> call rcu_unregister_thread() for the "whole process
> is going away" case, so something odd is happening here...

So what I see is (although possibly confused further by rr's capture):

  End of pthread test.
  [New Thread 7966.7967]

  Thread 3 received signal SIGKILL, Killed.
  [Switching to Thread 7966.7967]
  0x0000000070000002 in ?? ()
  (rr) bt
  #0  0x0000000070000002 in ?? ()
  #1  0x00007f36981a490e in _raw_syscall () at /build/rr-79viaC/rr-5.2.0/src/preload/raw_syscall.S:120
  #2  0x00007f36981a13fe in traced_raw_syscall (call=call@entry=0x7f369656ffa0) at ./src/preload/syscallbuf.c:222
  #3  0x00007f36981a271a in sys_xstat64 (call=<optimized out>) at ./src/preload/syscallbuf.c:2439
  #4  syscall_hook_internal (call=0x7f369656ffa0) at ./src/preload/syscallbuf.c:2651
  #5  syscall_hook (call=0x7f369656ffa0) at ./src/preload/syscallbuf.c:2687
  #6  0x00007f36981a12da in _syscall_hook_trampoline () at /build/rr-79viaC/rr-5.2.0/src/preload/syscall_hook.S:282
  #7  0x00007f36981a130a in __morestack () at /build/rr-79viaC/rr-5.2.0/src/preload/syscall_hook.S:417
  #8  0x00007f36981a1310 in _syscall_hook_trampoline_48_3d_01_f0_ff_ff () at /build/rr-79viaC/rr-5.2.0/src/preload/syscall_hook.S:423
  #9  0x00007f369758bf5f in syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  #10 0x0000556b768b764b in qemu_futex_wait (val=<optimized out>, f=<optimized out>) at /home/alex/lsrc/qemu.git/util/qemu-thread-posix.c:455
  #11 qemu_event_wait (ev=ev@entry=0x556b7897a608 <rcu_call_ready_event>) at /home/alex/lsrc/qemu.git/util/qemu-thread-posix.c:459
  #12 0x0000556b768be29a in call_rcu_thread (opaque=opaque@entry=0x0) at /home/alex/lsrc/qemu.git/util/rcu.c:260
  #13 0x0000556b768b689a in qemu_thread_start (args=<optimized out>) at /home/alex/lsrc/qemu.git/util/qemu-thread-posix.c:519
  #14 0x00007f3697660fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
  #15 0x00007f36975914cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
  (rr) info threads
    Id   Target Id                                     Frame
  * 3    Thread 7966.7967 (mmap_hardlink_3_qemu-alpha) 0x0000000070000002 in ?? ()
  (rr)

Although possibly it hasn't moved on from where it was during exit:

  (rr) b preexit_cleanup
  Breakpoint 1 at 0x556b768646d0: file /home/alex/lsrc/qemu.git/linux-user/exit.c, line 31.
  (rr) rc
  Continuing.
  [New Thread 7966.7966]
  [Switching to Thread 7966.7966]

  Thread 4 hit Breakpoint 1, preexit_cleanup (env=0x556b797aac40, code=code@entry=0) at /home/alex/lsrc/qemu.git/linux-user/exit.c:31
  31          rcu_unregister_thread();
  (rr) bt
  #0  preexit_cleanup (env=0x556b797aac40, code=code@entry=0) at /home/alex/lsrc/qemu.git/linux-user/exit.c:31
  #1  0x0000556b76850a63 in do_syscall1 (cpu_env=cpu_env@entry=0x556b797aac40, num=num@entry=405, arg1=arg1@entry=0, arg2=arg2@entry=0, arg3=arg3@entry=4832687680, arg4=arg4@entry=0, arg5=4095, arg6=4832686256, arg8=0, arg7=0) at /home/alex/lsrc/qemu.git/linux-user/syscall.c:9373
  #2  0x0000556b76859b88 in do_syscall (cpu_env=cpu_env@entry=0x556b797aac40, num=405, arg1=0, arg2=0, arg3=<optimized out>, arg4=<optimized out>, arg5=4095, arg6=4832686256,
  arg7=0, arg8=0) at /home/alex/lsrc/qemu.git/linux-user/syscall.c:12110
  #3  0x0000556b768645c6 in cpu_loop (env=0x556b797aac40) at /home/alex/lsrc/qemu.git/linux-user/alpha/cpu_loop.c:109
  #4  0x0000556b767e13de in main (argc=<optimized out>, argv=0x7ffe9d8f5ca8, envp=<optimized out>) at /home/alex/lsrc/qemu.git/linux-user/main.c:865
  (rr) info threads
    Id   Target Id                          Frame
    3    Thread 7966.7967 (mmap_hardlink_3) 0x0000000070000002 in ?? ()
  * 4    Thread 7966.7966 (mmap_hardlink_3) preexit_cleanup (env=0x556b797aac40, code=code@entry=0) at /home/alex/lsrc/qemu.git/linux-user/exit.c:31
  (rr) thread 3
  [Switching to thread 3 (Thread 7966.7967)]
  #0  0x0000000070000002 in ?? ()
  (rr) bt
  #0  0x0000000070000002 in ?? ()
  #1  0x00007f36981a490e in _raw_syscall () at /build/rr-79viaC/rr-5.2.0/src/preload/raw_syscall.S:120
  #2  0x00007f36981a13fe in traced_raw_syscall (call=call@entry=0x7f369656ffa0) at ./src/preload/syscallbuf.c:222
  #3  0x00007f36981a271a in sys_xstat64 (call=<optimized out>) at ./src/preload/syscallbuf.c:2439
  #4  syscall_hook_internal (call=0x7f369656ffa0) at ./src/preload/syscallbuf.c:2651
  #5  syscall_hook (call=0x7f369656ffa0) at ./src/preload/syscallbuf.c:2687
  #6  0x00007f36981a12da in _syscall_hook_trampoline () at /build/rr-79viaC/rr-5.2.0/src/preload/syscall_hook.S:282
  #7  0x00007f36981a130a in __morestack () at /build/rr-79viaC/rr-5.2.0/src/preload/syscall_hook.S:417
  #8  0x00007f36981a1310 in _syscall_hook_trampoline_48_3d_01_f0_ff_ff () at /build/rr-79viaC/rr-5.2.0/src/preload/syscall_hook.S:423
  #9  0x00007f369758bf5f in syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  #10 0x0000556b768b764b in qemu_futex_wait (val=<optimized out>, f=<optimized out>) at /home/alex/lsrc/qemu.git/util/qemu-thread-posix.c:455
  #11 qemu_event_wait (ev=ev@entry=0x556b7897a608 <rcu_call_ready_event>) at /home/alex/lsrc/qemu.git/util/qemu-thread-posix.c:459
  #12 0x0000556b768be29a in call_rcu_thread (opaque=opaque@entry=0x0) at /home/alex/lsrc/qemu.git/util/rcu.c:260
  #13 0x0000556b768b689a in qemu_thread_start (args=<optimized out>) at /home/alex/lsrc/qemu.git/util/qemu-thread-posix.c:519
  #14 0x00007f3697660fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
  #15 0x00007f36975914cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
  (rr)

Of course it is occurring on my patched tree so I guess the unregister
approach doesn't actually help.

>
> thanks
> -- PMM
diff mbox series

Patch

diff --git a/linux-user/exit.c b/linux-user/exit.c
index a362ef67d2c..1c7ce347324 100644
--- a/linux-user/exit.c
+++ b/linux-user/exit.c
@@ -28,12 +28,13 @@  extern void __gcov_dump(void);
 
 void preexit_cleanup(CPUArchState *env, int code)
 {
+    rcu_unregister_thread();
 #ifdef TARGET_GPROF
-        _mcleanup();
+    _mcleanup();
 #endif
 #ifdef CONFIG_GCOV
-        __gcov_dump();
+    __gcov_dump();
 #endif
-        gdb_exit(env, code);
-        qemu_plugin_atexit_cb();
+    gdb_exit(env, code);
+    qemu_plugin_atexit_cb();
 }