Thread safety of coroutine-sigaltstack

Message ID	7b8155ad-0942-dc1c-f43c-bb5eb518a278@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=MVl1=GX=nongnu.org=qemu-devel-bounces+qemu-devel=archiver.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 33BDF23122 To: "qemu-devel@nongnu.org" <qemu-devel@nongnu.org> From: Max Reitz <mreitz@redhat.com> Subject: Thread safety of coroutine-sigaltstack Message-ID: <7b8155ad-0942-dc1c-f43c-bb5eb518a278@redhat.com> Date: Wed, 20 Jan 2021 17:26:46 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------------65D8FF1061BDB08AE4E62593" Content-Language: en-US Received-SPF: pass client-ip=216.205.24.124; envelope-from=mreitz@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -29 X-Spam_score: -3.0 X-Spam_bar: --- X-Spam_report: (-3.0 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.167, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Cc: Kevin Wolf <kwolf@redhat.com>, Stefan Hajnoczi <stefanha@redhat.com> Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org>
Series	Thread safety of coroutine-sigaltstack \| expand Thread safety of coroutine-sigaltstack

Max Reitz Jan. 20, 2021, 4:26 p.m. UTC

Hi,

I’ve run into trouble with Vladimir’s async backup series on MacOS, 
namely that iotest 256 fails with qemu exiting because of a SIGUSR2.

Turns out this is because MacOS (-xcode) uses coroutine-sigaltstack, 
when I use this on Linux, I get the same error.

(You can find the series applied on my block branch e.g. here:

https://github.com/XanClic/qemu.git block
)

Some debugging later I found that the problem seems to be two threads 
simultaneously creating a coroutine.  It makes sense that this case 
would appear with Vladimir’s series and iotest 256, because 256 runs two 
backup jobs in two different threads in a transaction, i.e. they’re 
launched simultaneously.  The async backup series makes backup use many 
concurrent coroutines and so by default launches 64+x coroutines when 
the backup is started.  Thus, the case of two coroutines created 
concurrently in two threads is very likely to occur.

I think the problem is in coroutine-sigaltstack’s qemu_coroutine_new(). 
It sets up a SIGUSR2 handler, then changes the signal handling stack, 
then raises SIGUSR2, then reverts the signal handling stack and the 
SIGUSR2 handler.  As far as I’m aware, setting up signal handlers and 
changing the signal handling stack are both process-global operations, 
and so if two threads do so concurrently, they will interfere with each 
other.  What usually happens is that one thread sets up everything, 
while the other is already in the process of reverting its changes: So 
the second thread reverts the SIGUSR2 handler to the default, and then 
the first thread raises SIGUSR2, thus making qemu exit.

(Could be worse though.  Both threads could set up the sigaltstack, then 
both raise SIGUSR2, and then we get one coroutine_trampoline() 
invocation in each thread, but both would use the same stack.  But I 
don’t think I’ve ever seen that happen, presumably because the race time 
window is much shorter.)

Now, this all seems obvious to me, but I’m wondering...  If 
coroutine-sigaltstack really couldn’t create coroutines concurrently, 
why wouldn’t we have noticed before?  I mean, this new backup case is 
kind of a stress test, yes, but surely we would have seen the problem 
already, right?  That’s why I’m not sure whether my analysis is correct.

Anyway, I’ve attached a patch that wraps the whole SIGUSR2 handling 
section in a mutex, and that makes 256 pass reliably with Vladimir’s 
async backup series.  Besides being unsure whether the problem is really 
in coroutine-sigaltstack, I also don’t know whether getting out the big 
guns and wrapping everything in the mutex is the best solution.  So, 
it’s an RFC, I guess.

Max

Paolo Bonzini Jan. 20, 2021, 4:50 p.m. UTC | #1

On 20/01/21 17:26, Max Reitz wrote:
> Now, this all seems obvious to me, but I’m wondering...  If 
> coroutine-sigaltstack really couldn’t create coroutines concurrently,
>  why wouldn’t we have noticed before?  I mean, this new backup case
> is kind of a stress test, yes, but surely we would have seen the
> problem already, right?  That’s why I’m not sure whether my analysis
> is correct.

Probably the BQL was hiding the issue?

> Anyway, I’ve attached a patch that wraps the whole
> SIGUSR2 handling section in a mutex, and that makes 256 pass reliably
> with Vladimir’s async backup series.  Besides being unsure whether
> the problem is really in coroutine-sigaltstack, I also don’t know
> whether getting out the big guns and wrapping everything in the mutex
> is the best solution.  So, it’s an RFC, I guess.

I think that's fine.  The coroutine pool will hide scalability issues.

Paolo

Eric Blake Jan. 20, 2021, 4:58 p.m. UTC | #2

On 1/20/21 10:26 AM, Max Reitz wrote:
> Hi,
> 
> I’ve run into trouble with Vladimir’s async backup series on MacOS,
> namely that iotest 256 fails with qemu exiting because of a SIGUSR2.
> 
> Turns out this is because MacOS (-xcode) uses coroutine-sigaltstack,
> when I use this on Linux, I get the same error.
> 
> (You can find the series applied on my block branch e.g. here:
> 
> https://github.com/XanClic/qemu.git block
> )
> 
> Some debugging later I found that the problem seems to be two threads
> simultaneously creating a coroutine.  It makes sense that this case
> would appear with Vladimir’s series and iotest 256, because 256 runs two
> backup jobs in two different threads in a transaction, i.e. they’re
> launched simultaneously.  The async backup series makes backup use many
> concurrent coroutines and so by default launches 64+x coroutines when
> the backup is started.  Thus, the case of two coroutines created
> concurrently in two threads is very likely to occur.
> 
> I think the problem is in coroutine-sigaltstack’s qemu_coroutine_new().
> It sets up a SIGUSR2 handler, then changes the signal handling stack,
> then raises SIGUSR2, then reverts the signal handling stack and the
> SIGUSR2 handler.  As far as I’m aware, setting up signal handlers and
> changing the signal handling stack are both process-global operations,
> and so if two threads do so concurrently, they will interfere with each
> other.

Yes, that is absolutely correct - messing with the signal handlers is
process-wide.  I guess we've been lucky that we haven't been trying to
create coroutines in separate threads in the past.

>  What usually happens is that one thread sets up everything,
> while the other is already in the process of reverting its changes: So
> the second thread reverts the SIGUSR2 handler to the default, and then
> the first thread raises SIGUSR2, thus making qemu exit.
> 
> (Could be worse though.  Both threads could set up the sigaltstack, then
> both raise SIGUSR2, and then we get one coroutine_trampoline()
> invocation in each thread, but both would use the same stack.  But I
> don’t think I’ve ever seen that happen, presumably because the race time
> window is much shorter.)
> 
> Now, this all seems obvious to me, but I’m wondering...  If
> coroutine-sigaltstack really couldn’t create coroutines concurrently,
> why wouldn’t we have noticed before?  I mean, this new backup case is
> kind of a stress test, yes, but surely we would have seen the problem
> already, right?  That’s why I’m not sure whether my analysis is correct.

I'm not sure if there is anything else going wrong, but you have
definitely uncovered a latent problem, and I agree that a mutex is the
right way to fix it.

> 
> Anyway, I’ve attached a patch that wraps the whole SIGUSR2 handling
> section in a mutex, and that makes 256 pass reliably with Vladimir’s
> async backup series.  Besides being unsure whether the problem is really
> in coroutine-sigaltstack, I also don’t know whether getting out the big
> guns and wrapping everything in the mutex is the best solution.  So,
> it’s an RFC, I guess.
> 
> Max


>>From 08d4bb6a98fa731025683f20afe1381291d26031 Mon Sep 17 00:00:00 2001
> From: Max Reitz <mreitz@redhat.com>
> Date: Wed, 20 Jan 2021 16:59:40 +0100
> Subject: [RFC] coroutine-sigaltstack: Add SIGUSR2 mutex
> 
> Modifying signal handlers or the signal handling stack is a
> process-global operation.  When two threads run coroutine-sigaltstack's
> qemu_coroutine_new() concurrently, thay may interfere with each other,

they

> e.g.:
> 
> - One of the threads may revert the SIGUSR2 handler back to the default
>   between the other thread setting up coroutine_trampoline() as the
>   handler and raising SIGUSR2.  That SIGUSR2 will then lead to the
>   process exiting.
> 
> - Both threads may set up their coroutine stack with sigaltstack()
>   simultaneously, so that only one of them sticks.  Both then raise
>   SIGUSR2, which goes to each of the threads separately, but both signal
>   handler invocations will then use the same stack, which cannot work.
> 
> We have to ensure that only one thread at a time can modify the
> process-global SIGUSR2 handler and the signal handling stack.  To do so,
> wrap the whole section where that is done in a mutex.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  util/coroutine-sigaltstack.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)

Reviewed-by: Eric Blake <eblake@redhat.com>

Laszlo Ersek Jan. 20, 2021, 5:25 p.m. UTC | #3

On 01/20/21 17:26, Max Reitz wrote:
> Hi,
> 
> I’ve run into trouble with Vladimir’s async backup series on MacOS,
> namely that iotest 256 fails with qemu exiting because of a SIGUSR2.
> 
> Turns out this is because MacOS (-xcode) uses coroutine-sigaltstack,
> when I use this on Linux, I get the same error.
> 
> (You can find the series applied on my block branch e.g. here:
> 
> https://github.com/XanClic/qemu.git block
> )
> 
> Some debugging later I found that the problem seems to be two threads
> simultaneously creating a coroutine.  It makes sense that this case
> would appear with Vladimir’s series and iotest 256, because 256 runs two
> backup jobs in two different threads in a transaction, i.e. they’re
> launched simultaneously.  The async backup series makes backup use many
> concurrent coroutines and so by default launches 64+x coroutines when
> the backup is started.  Thus, the case of two coroutines created
> concurrently in two threads is very likely to occur.
> 
> I think the problem is in coroutine-sigaltstack’s qemu_coroutine_new().
> It sets up a SIGUSR2 handler, then changes the signal handling stack,
> then raises SIGUSR2, then reverts the signal handling stack and the
> SIGUSR2 handler.  As far as I’m aware, setting up signal handlers and
> changing the signal handling stack are both process-global operations,
> and so if two threads do so concurrently, they will interfere with each
> other.

Signal action (disposition) is process-wide.

Signal mask and signal stack are thread-specific.

A signal may be pending for the whole process, or for a specific thread.
In the former case, the signal is delivered to one of the threads that
are not blocking the signal.

> What usually happens is that one thread sets up everything,
> while the other is already in the process of reverting its changes: So
> the second thread reverts the SIGUSR2 handler to the default, and then
> the first thread raises SIGUSR2, thus making qemu exit.

I agree. The way SIGUSR2 is blocked (for the thread), made pending (for
the thread), and then allowed to be delivered (consequently, to the
thread), looks OK. But by the time it is delivered, the action has been
changed.

> 
> (Could be worse though.  Both threads could set up the sigaltstack, then
> both raise SIGUSR2, and then we get one coroutine_trampoline()
> invocation in each thread, but both would use the same stack.  But I
> don’t think I’ve ever seen that happen, presumably because the race time
> window is much shorter.)

No, the "alternate stack for signal handlers" that sigaltstack()
configures is thread-specific. (I mean one could theoretically mess it
up if the stack were located in the same place between different
threads, but we call qemu_alloc_stack(), so that doesn't happen.)

https://pubs.opengroup.org/onlinepubs/9699919799/functions/sigaltstack.html

> 
> Now, this all seems obvious to me, but I’m wondering...  If
> coroutine-sigaltstack really couldn’t create coroutines concurrently,
> why wouldn’t we have noticed before?  I mean, this new backup case is
> kind of a stress test, yes, but surely we would have seen the problem
> already, right?  That’s why I’m not sure whether my analysis is correct.
> 
> Anyway, I’ve attached a patch that wraps the whole SIGUSR2 handling
> section in a mutex, and that makes 256 pass reliably with Vladimir’s
> async backup series.  Besides being unsure whether the problem is really
> in coroutine-sigaltstack, I also don’t know whether getting out the big
> guns and wrapping everything in the mutex is the best solution.  So,
> it’s an RFC, I guess.

A simple grep for SIGUSR2 seems to indicate that SIGUSR2 is not used by
system emulation for anything else, in practice. Is it possible to
dedicate SIGUSR2 explicitly to coroutine-sigaltstack, and set up the
action beforehand, from some init function that executes on a "central"
thread, before qemu_coroutine_new() is ever called?

... I've tried to see if POSIX says anything on signals being delivered
with mutexen held. I can't find anything specific (the spec seems to
talk about delivery of a signal while the thread waits in
pthread_mutex_lock(), but that's not what we care about, here). I'm just
somewhat uncomfortable with bracketing this whole hackery into a mutex
even... Keeping sigaction() out of the picture could be a small
performance benefit, too.

The logic in the patch doesn't look broken, but the comments should be
updated minimally -- the signal stack is thread-specific (similarly to
how a thread has its own stack anyway, regardless of signals).

Only my two cents of course, feel free to ignore.

Thanks
Laszlo

Max Reitz Jan. 21, 2021, 9:27 a.m. UTC | #4

On 20.01.21 18:25, Laszlo Ersek wrote:
> On 01/20/21 17:26, Max Reitz wrote:
>> Hi,
>>
>> I’ve run into trouble with Vladimir’s async backup series on MacOS,
>> namely that iotest 256 fails with qemu exiting because of a SIGUSR2.
>>
>> Turns out this is because MacOS (-xcode) uses coroutine-sigaltstack,
>> when I use this on Linux, I get the same error.
>>
>> (You can find the series applied on my block branch e.g. here:
>>
>> https://github.com/XanClic/qemu.git block
>> )
>>
>> Some debugging later I found that the problem seems to be two threads
>> simultaneously creating a coroutine.  It makes sense that this case
>> would appear with Vladimir’s series and iotest 256, because 256 runs two
>> backup jobs in two different threads in a transaction, i.e. they’re
>> launched simultaneously.  The async backup series makes backup use many
>> concurrent coroutines and so by default launches 64+x coroutines when
>> the backup is started.  Thus, the case of two coroutines created
>> concurrently in two threads is very likely to occur.
>>
>> I think the problem is in coroutine-sigaltstack’s qemu_coroutine_new().
>> It sets up a SIGUSR2 handler, then changes the signal handling stack,
>> then raises SIGUSR2, then reverts the signal handling stack and the
>> SIGUSR2 handler.  As far as I’m aware, setting up signal handlers and
>> changing the signal handling stack are both process-global operations,
>> and so if two threads do so concurrently, they will interfere with each
>> other.
> 
> Signal action (disposition) is process-wide.
> 
> Signal mask and signal stack are thread-specific.

Ah, OK.  Thanks for the insight!

> A signal may be pending for the whole process, or for a specific thread.
> In the former case, the signal is delivered to one of the threads that
> are not blocking the signal.
> 
>> What usually happens is that one thread sets up everything,
>> while the other is already in the process of reverting its changes: So
>> the second thread reverts the SIGUSR2 handler to the default, and then
>> the first thread raises SIGUSR2, thus making qemu exit.
> 
> I agree. The way SIGUSR2 is blocked (for the thread), made pending (for
> the thread), and then allowed to be delivered (consequently, to the
> thread), looks OK. But by the time it is delivered, the action has been
> changed.
> 
>>
>> (Could be worse though.  Both threads could set up the sigaltstack, then
>> both raise SIGUSR2, and then we get one coroutine_trampoline()
>> invocation in each thread, but both would use the same stack.  But I
>> don’t think I’ve ever seen that happen, presumably because the race time
>> window is much shorter.)
> 
> No, the "alternate stack for signal handlers" that sigaltstack()
> configures is thread-specific. (I mean one could theoretically mess it
> up if the stack were located in the same place between different
> threads, but we call qemu_alloc_stack(), so that doesn't happen.)
> 
> https://pubs.opengroup.org/onlinepubs/9699919799/functions/sigaltstack.html

Explains why I haven’t seen it. :)

>> Now, this all seems obvious to me, but I’m wondering...  If
>> coroutine-sigaltstack really couldn’t create coroutines concurrently,
>> why wouldn’t we have noticed before?  I mean, this new backup case is
>> kind of a stress test, yes, but surely we would have seen the problem
>> already, right?  That’s why I’m not sure whether my analysis is correct.
>>
>> Anyway, I’ve attached a patch that wraps the whole SIGUSR2 handling
>> section in a mutex, and that makes 256 pass reliably with Vladimir’s
>> async backup series.  Besides being unsure whether the problem is really
>> in coroutine-sigaltstack, I also don’t know whether getting out the big
>> guns and wrapping everything in the mutex is the best solution.  So,
>> it’s an RFC, I guess.
> 
> A simple grep for SIGUSR2 seems to indicate that SIGUSR2 is not used by
> system emulation for anything else, in practice. Is it possible to
> dedicate SIGUSR2 explicitly to coroutine-sigaltstack, and set up the
> action beforehand, from some init function that executes on a "central"
> thread, before qemu_coroutine_new() is ever called?

Doesn’t sound unreasonable, but wouldn’t the signal handler then have to 
check whether the SIGUSR2 comes from coroutine-sigaltstack or from the 
outside?  Or should we then keep SIGUSR2 blocked all the time?

> ... I've tried to see if POSIX says anything on signals being delivered
> with mutexen held. I can't find anything specific (the spec seems to
> talk about delivery of a signal while the thread waits in
> pthread_mutex_lock(), but that's not what we care about, here). I'm just
> somewhat uncomfortable with bracketing this whole hackery into a mutex
> even... Keeping sigaction() out of the picture could be a small
> performance benefit, too.

Speaking of signal being delivered in the mutexed section...  What would 
happen if we get an external signal after SIGUSR2 was delivered and 
coroutine_trampoline() set up the sigsetjmp(), but before the stack is 
switched back?  Wouldn’t this new signal then trash the stack?  Should 
we block all signals while using the alternate stack?

(Looking at the x64 objdump, the stack actually seems to be used to 
store @self across sigsetjmp().)

> The logic in the patch doesn't look broken, but the comments should be
> updated minimally -- the signal stack is thread-specific (similarly to
> how a thread has its own stack anyway, regardless of signals).

Sure, I can do that.

I agree that there probably are better solutions than to wrap everything 
in a lock.  OTOH, it looks to me like this lock is the most simple 
solution.  If Daniel is right[1] and we should drop 
coroutine-sigaltstack altogether (at some point...), perhaps it is best 
to go for the most simple solution now.

[1]
https://lists.nongnu.org/archive/html/qemu-block/2021-01/msg00808.html

Max

Laszlo Ersek Jan. 21, 2021, 1:34 p.m. UTC | #5

On 01/21/21 10:27, Max Reitz wrote:
> On 20.01.21 18:25, Laszlo Ersek wrote:
>> On 01/20/21 17:26, Max Reitz wrote:
>>> Hi,
>>>
>>> I’ve run into trouble with Vladimir’s async backup series on MacOS,
>>> namely that iotest 256 fails with qemu exiting because of a SIGUSR2.
>>>
>>> Turns out this is because MacOS (-xcode) uses coroutine-sigaltstack,
>>> when I use this on Linux, I get the same error.
>>>
>>> (You can find the series applied on my block branch e.g. here:
>>>
>>> https://github.com/XanClic/qemu.git block
>>> )
>>>
>>> Some debugging later I found that the problem seems to be two threads
>>> simultaneously creating a coroutine.  It makes sense that this case
>>> would appear with Vladimir’s series and iotest 256, because 256 runs two
>>> backup jobs in two different threads in a transaction, i.e. they’re
>>> launched simultaneously.  The async backup series makes backup use many
>>> concurrent coroutines and so by default launches 64+x coroutines when
>>> the backup is started.  Thus, the case of two coroutines created
>>> concurrently in two threads is very likely to occur.
>>>
>>> I think the problem is in coroutine-sigaltstack’s qemu_coroutine_new().
>>> It sets up a SIGUSR2 handler, then changes the signal handling stack,
>>> then raises SIGUSR2, then reverts the signal handling stack and the
>>> SIGUSR2 handler.  As far as I’m aware, setting up signal handlers and
>>> changing the signal handling stack are both process-global operations,
>>> and so if two threads do so concurrently, they will interfere with each
>>> other.
>>
>> Signal action (disposition) is process-wide.
>>
>> Signal mask and signal stack are thread-specific.
> 
> Ah, OK.  Thanks for the insight!
> 
>> A signal may be pending for the whole process, or for a specific thread.
>> In the former case, the signal is delivered to one of the threads that
>> are not blocking the signal.
>>
>>> What usually happens is that one thread sets up everything,
>>> while the other is already in the process of reverting its changes: So
>>> the second thread reverts the SIGUSR2 handler to the default, and then
>>> the first thread raises SIGUSR2, thus making qemu exit.
>>
>> I agree. The way SIGUSR2 is blocked (for the thread), made pending (for
>> the thread), and then allowed to be delivered (consequently, to the
>> thread), looks OK. But by the time it is delivered, the action has been
>> changed.
>>
>>>
>>> (Could be worse though.  Both threads could set up the sigaltstack, then
>>> both raise SIGUSR2, and then we get one coroutine_trampoline()
>>> invocation in each thread, but both would use the same stack.  But I
>>> don’t think I’ve ever seen that happen, presumably because the race time
>>> window is much shorter.)
>>
>> No, the "alternate stack for signal handlers" that sigaltstack()
>> configures is thread-specific. (I mean one could theoretically mess it
>> up if the stack were located in the same place between different
>> threads, but we call qemu_alloc_stack(), so that doesn't happen.)
>>
>> https://pubs.opengroup.org/onlinepubs/9699919799/functions/sigaltstack.html
>>
> 
> Explains why I haven’t seen it. :)
> 
>>> Now, this all seems obvious to me, but I’m wondering...  If
>>> coroutine-sigaltstack really couldn’t create coroutines concurrently,
>>> why wouldn’t we have noticed before?  I mean, this new backup case is
>>> kind of a stress test, yes, but surely we would have seen the problem
>>> already, right?  That’s why I’m not sure whether my analysis is correct.
>>>
>>> Anyway, I’ve attached a patch that wraps the whole SIGUSR2 handling
>>> section in a mutex, and that makes 256 pass reliably with Vladimir’s
>>> async backup series.  Besides being unsure whether the problem is really
>>> in coroutine-sigaltstack, I also don’t know whether getting out the big
>>> guns and wrapping everything in the mutex is the best solution.  So,
>>> it’s an RFC, I guess.
>>
>> A simple grep for SIGUSR2 seems to indicate that SIGUSR2 is not used by
>> system emulation for anything else, in practice. Is it possible to
>> dedicate SIGUSR2 explicitly to coroutine-sigaltstack, and set up the
>> action beforehand, from some init function that executes on a "central"
>> thread, before qemu_coroutine_new() is ever called?
> 
> Doesn’t sound unreasonable, but wouldn’t the signal handler then have to
> check whether the SIGUSR2 comes from coroutine-sigaltstack or from the
> outside?  Or should we then keep SIGUSR2 blocked all the time?

Blocking SIGUSR2 in all threads at all times, except when temporarily
unblocking it with sigsuspend(), is one approach, but I don't think it
would necessarily be 100% safe against other processes sending SIGUSR2
asynchronously. And IMO that's not even a goal -- sending a signal
requires permission:

https://pubs.opengroup.org/onlinepubs/9699919799/functions/kill.html

    For a process to have permission to send a signal to a process
    designated by pid, unless the sending process has appropriate
    privileges, the real or effective user ID of the sending process
    shall match the real or saved set-user-ID of the receiving process.

(I assume (hope) that SELinux / sVirt further restricts this, so one
qemu process couldn't simply signal another, due to their different labels.)

Thus, when the host kernel permits a different process to generate
SIGUSR2 for QEMU, it's OK to let things just crash and burn. Every other
process with such a permission should *know better* than to send an
unsolicited SIGUSR2 to QEMU.

I mean, what happens if you send an external SIGUSR2 to QEMU right now?
The default action for SIGUSR2 is to terminate the process:

https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html

(It's possible that I missed some already existent treatment of SIGUSR2
in QEMU that's different from coroutine-sigaltstack, for example via
sigfillset(). But that seems unlikely, due to the comment on
qemu_init_main_loop() in "include/qemu/main-loop.h" which indicates
SIGUSR2 is currently available to threads for custom use.)


>> ... I've tried to see if POSIX says anything on signals being delivered
>> with mutexen held. I can't find anything specific (the spec seems to
>> talk about delivery of a signal while the thread waits in
>> pthread_mutex_lock(), but that's not what we care about, here). I'm just
>> somewhat uncomfortable with bracketing this whole hackery into a mutex
>> even... Keeping sigaction() out of the picture could be a small
>> performance benefit, too.
> 
> Speaking of signal being delivered in the mutexed section...  What would
> happen if we get an external signal after SIGUSR2 was delivered and
> coroutine_trampoline() set up the sigsetjmp(), but before the stack is
> switched back?  Wouldn’t this new signal then trash the stack?  Should
> we block all signals while using the alternate stack?
> 
> (Looking at the x64 objdump, the stack actually seems to be used to
> store @self across sigsetjmp().)

I wouldn't worry about it. Signals are a crude interface for programs.
If a program documents that a particular signal can be sent to it for a
particular purpose (which implies the asynchronous generation of that
signal of course), then processes that have proper permission to send
that signal are *welcome* to send that signal at *any* time. If the
program mishandles the signal, that's a bug in the signalee.

Conversely, if a signal is not documented like that by the program, but
another process (having the needed permission) still sends the signal,
breakage is expected, and the signaler process is at fault. In my book,
it's no different from sending a signal that is simply neither caught
nor ignored nor blocked by the signalee process, and whose default
disposition is to terminate the process (marked "T" or "A" in the table
linked above). E.g., if you send a SIGILL to a process out of the blue,
the process is totally expected to blow up, or at least to misbehave.


>> The logic in the patch doesn't look broken, but the comments should be
>> updated minimally -- the signal stack is thread-specific (similarly to
>> how a thread has its own stack anyway, regardless of signals).
> 
> Sure, I can do that.
> 
> I agree that there probably are better solutions than to wrap everything
> in a lock.  OTOH, it looks to me like this lock is the most simple
> solution.  If Daniel is right[1] and we should drop
> coroutine-sigaltstack altogether (at some point...), perhaps it is best
> to go for the most simple solution now.
> 
> [1]
> https://lists.nongnu.org/archive/html/qemu-block/2021-01/msg00808.html

SUSv3 marked ucontext functions obsolescent:

https://pubs.opengroup.org/onlinepubs/000095399/functions/makecontext.html#tag_03_356_08

and they are entirely missing from SUSv4 (aka the latest POSIX):

https://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xsh_chap03.html#tag_22_03_01_07

So you can use ucontext if you #define _XOPEN_SOURCE as 500 or 600, but
(in theory) not if you #define it as 700. How this works out in practice
on OSX -- i.e. how long they intend to support _XOPEN_SOURCE 600 --, I
can't tell.

I don't disagree with Daniel though; you can always bring back
coroutine-sigaltstack from the git history, if Apple decides to drop
ucontext.

If you went for the mutex for the time being, I wouldn't try to nack it. :)

Thanks
Laszlo

Paolo Bonzini Jan. 21, 2021, 3:14 p.m. UTC | #6

On 21/01/21 10:27, Max Reitz wrote:
> 
> Sure, I can do that.
> 
> I agree that there probably are better solutions than to wrap everything 
> in a lock.  OTOH, it looks to me like this lock is the most simple 
> solution.  If Daniel is right[1] and we should drop 
> coroutine-sigaltstack altogether (at some point...), perhaps it is best 
> to go for the most simple solution now.
> 
> [1]
> https://lists.nongnu.org/archive/html/qemu-block/2021-01/msg00808.html

Yes, between coroutine-ucontext and the upcoming coroutine-asm[1] (which 
I have shelved because it was mostly a requirement for x86 CET; but it 
will come back some day), sooner or later there will be no reason to 
keep coroutine-sigaltstack.  Porting coroutine-asm to a new architecture 
is easy, I even managed to do it for s390. ;)

Paolo

[1] https://patchew.org/QEMU/20190504120528.6389-1-pbonzini@redhat.com/

Max Reitz Jan. 21, 2021, 3:42 p.m. UTC | #7

On 21.01.21 14:34, Laszlo Ersek wrote:
> On 01/21/21 10:27, Max Reitz wrote:
>> On 20.01.21 18:25, Laszlo Ersek wrote:
>>> On 01/20/21 17:26, Max Reitz wrote:
>>>> Hi,
>>>>
>>>> I’ve run into trouble with Vladimir’s async backup series on MacOS,
>>>> namely that iotest 256 fails with qemu exiting because of a SIGUSR2.
>>>>
>>>> Turns out this is because MacOS (-xcode) uses coroutine-sigaltstack,
>>>> when I use this on Linux, I get the same error.
>>>>
>>>> (You can find the series applied on my block branch e.g. here:
>>>>
>>>> https://github.com/XanClic/qemu.git block
>>>> )
>>>>
>>>> Some debugging later I found that the problem seems to be two threads
>>>> simultaneously creating a coroutine.  It makes sense that this case
>>>> would appear with Vladimir’s series and iotest 256, because 256 runs two
>>>> backup jobs in two different threads in a transaction, i.e. they’re
>>>> launched simultaneously.  The async backup series makes backup use many
>>>> concurrent coroutines and so by default launches 64+x coroutines when
>>>> the backup is started.  Thus, the case of two coroutines created
>>>> concurrently in two threads is very likely to occur.
>>>>
>>>> I think the problem is in coroutine-sigaltstack’s qemu_coroutine_new().
>>>> It sets up a SIGUSR2 handler, then changes the signal handling stack,
>>>> then raises SIGUSR2, then reverts the signal handling stack and the
>>>> SIGUSR2 handler.  As far as I’m aware, setting up signal handlers and
>>>> changing the signal handling stack are both process-global operations,
>>>> and so if two threads do so concurrently, they will interfere with each
>>>> other.
>>>
>>> Signal action (disposition) is process-wide.
>>>
>>> Signal mask and signal stack are thread-specific.
>>
>> Ah, OK.  Thanks for the insight!
>>
>>> A signal may be pending for the whole process, or for a specific thread.
>>> In the former case, the signal is delivered to one of the threads that
>>> are not blocking the signal.
>>>
>>>> What usually happens is that one thread sets up everything,
>>>> while the other is already in the process of reverting its changes: So
>>>> the second thread reverts the SIGUSR2 handler to the default, and then
>>>> the first thread raises SIGUSR2, thus making qemu exit.
>>>
>>> I agree. The way SIGUSR2 is blocked (for the thread), made pending (for
>>> the thread), and then allowed to be delivered (consequently, to the
>>> thread), looks OK. But by the time it is delivered, the action has been
>>> changed.
>>>
>>>>
>>>> (Could be worse though.  Both threads could set up the sigaltstack, then
>>>> both raise SIGUSR2, and then we get one coroutine_trampoline()
>>>> invocation in each thread, but both would use the same stack.  But I
>>>> don’t think I’ve ever seen that happen, presumably because the race time
>>>> window is much shorter.)
>>>
>>> No, the "alternate stack for signal handlers" that sigaltstack()
>>> configures is thread-specific. (I mean one could theoretically mess it
>>> up if the stack were located in the same place between different
>>> threads, but we call qemu_alloc_stack(), so that doesn't happen.)
>>>
>>> https://pubs.opengroup.org/onlinepubs/9699919799/functions/sigaltstack.html
>>>
>>
>> Explains why I haven’t seen it. :)
>>
>>>> Now, this all seems obvious to me, but I’m wondering...  If
>>>> coroutine-sigaltstack really couldn’t create coroutines concurrently,
>>>> why wouldn’t we have noticed before?  I mean, this new backup case is
>>>> kind of a stress test, yes, but surely we would have seen the problem
>>>> already, right?  That’s why I’m not sure whether my analysis is correct.
>>>>
>>>> Anyway, I’ve attached a patch that wraps the whole SIGUSR2 handling
>>>> section in a mutex, and that makes 256 pass reliably with Vladimir’s
>>>> async backup series.  Besides being unsure whether the problem is really
>>>> in coroutine-sigaltstack, I also don’t know whether getting out the big
>>>> guns and wrapping everything in the mutex is the best solution.  So,
>>>> it’s an RFC, I guess.
>>>
>>> A simple grep for SIGUSR2 seems to indicate that SIGUSR2 is not used by
>>> system emulation for anything else, in practice. Is it possible to
>>> dedicate SIGUSR2 explicitly to coroutine-sigaltstack, and set up the
>>> action beforehand, from some init function that executes on a "central"
>>> thread, before qemu_coroutine_new() is ever called?
>>
>> Doesn’t sound unreasonable, but wouldn’t the signal handler then have to
>> check whether the SIGUSR2 comes from coroutine-sigaltstack or from the
>> outside?  Or should we then keep SIGUSR2 blocked all the time?
> 
> Blocking SIGUSR2 in all threads at all times, except when temporarily
> unblocking it with sigsuspend(), is one approach, but I don't think it
> would necessarily be 100% safe against other processes sending SIGUSR2
> asynchronously. And IMO that's not even a goal -- sending a signal
> requires permission:
> 
> https://pubs.opengroup.org/onlinepubs/9699919799/functions/kill.html
> 
>      For a process to have permission to send a signal to a process
>      designated by pid, unless the sending process has appropriate
>      privileges, the real or effective user ID of the sending process
>      shall match the real or saved set-user-ID of the receiving process.
> 
> (I assume (hope) that SELinux / sVirt further restricts this, so one
> qemu process couldn't simply signal another, due to their different labels.)
> 
> Thus, when the host kernel permits a different process to generate
> SIGUSR2 for QEMU, it's OK to let things just crash and burn. Every other
> process with such a permission should *know better* than to send an
> unsolicited SIGUSR2 to QEMU.
> 
> I mean, what happens if you send an external SIGUSR2 to QEMU right now?
> The default action for SIGUSR2 is to terminate the process:
> 
> https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html

I had the same thought (if you can send SIGUSR2, you can send SIGKILL), 
but terminating the process is one thing; redirecting control flow to a 
signal handler that has not been audited for what happens when it is 
invoked from an actual signal from the outside is another.

>>> ... I've tried to see if POSIX says anything on signals being delivered
>>> with mutexen held. I can't find anything specific (the spec seems to
>>> talk about delivery of a signal while the thread waits in
>>> pthread_mutex_lock(), but that's not what we care about, here). I'm just
>>> somewhat uncomfortable with bracketing this whole hackery into a mutex
>>> even... Keeping sigaction() out of the picture could be a small
>>> performance benefit, too.
>>
>> Speaking of signal being delivered in the mutexed section...  What would
>> happen if we get an external signal after SIGUSR2 was delivered and
>> coroutine_trampoline() set up the sigsetjmp(), but before the stack is
>> switched back?  Wouldn’t this new signal then trash the stack?  Should
>> we block all signals while using the alternate stack?
>>
>> (Looking at the x64 objdump, the stack actually seems to be used to
>> store @self across sigsetjmp().)
> 
> I wouldn't worry about it. Signals are a crude interface for programs.
> If a program documents that a particular signal can be sent to it for a
> particular purpose (which implies the asynchronous generation of that
> signal of course), then processes that have proper permission to send
> that signal are *welcome* to send that signal at *any* time. If the
> program mishandles the signal, that's a bug in the signalee.
> 
> Conversely, if a signal is not documented like that by the program, but
> another process (having the needed permission) still sends the signal,
> breakage is expected, and the signaler process is at fault. In my book,
> it's no different from sending a signal that is simply neither caught
> nor ignored nor blocked by the signalee process, and whose default
> disposition is to terminate the process (marked "T" or "A" in the table
> linked above). E.g., if you send a SIGILL to a process out of the blue,
> the process is totally expected to blow up, or at least to misbehave.

I don’t really understand.  If you send any handled signal (like SIGINT) 
to a thread that has the alternate stack set up, the coroutine 
trampoline stack is thrashed (I think), and while I haven’t investigated 
it, I would expect undefined behavior on siglongjmp().  I find that much 
worse than terminating.

Giving a process A the permission to send signals to a process B usually 
does not automatically allow A to induce undefined behavior in B.
(And the breakage you get when someone violates a protocol should never 
be undefined behavior.)

Perhaps we have the policy of “If another process can send signals, then 
we consider it to have full control over qemu, like a debugger.”  Then 
that’s OK.  Otherwise, I don’t find it OK.


In any case, this question of what other signals do while the alternate 
stack is up is a separate problem from the original one, so we can look 
at one after the other.

>>> The logic in the patch doesn't look broken, but the comments should be
>>> updated minimally -- the signal stack is thread-specific (similarly to
>>> how a thread has its own stack anyway, regardless of signals).
>>
>> Sure, I can do that.
>>
>> I agree that there probably are better solutions than to wrap everything
>> in a lock.  OTOH, it looks to me like this lock is the most simple
>> solution.  If Daniel is right[1] and we should drop
>> coroutine-sigaltstack altogether (at some point...), perhaps it is best
>> to go for the most simple solution now.
>>
>> [1]
>> https://lists.nongnu.org/archive/html/qemu-block/2021-01/msg00808.html
> 
> SUSv3 marked ucontext functions obsolescent:
> 
> https://pubs.opengroup.org/onlinepubs/000095399/functions/makecontext.html#tag_03_356_08
> 
> and they are entirely missing from SUSv4 (aka the latest POSIX):
> 
> https://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xsh_chap03.html#tag_22_03_01_07
> 
> So you can use ucontext if you #define _XOPEN_SOURCE as 500 or 600, but
> (in theory) not if you #define it as 700. How this works out in practice
> on OSX -- i.e. how long they intend to support _XOPEN_SOURCE 600 --, I
> can't tell.

Daniel made it sound like there was a libucontext that might be the way 
to go forward.

> I don't disagree with Daniel though; you can always bring back
> coroutine-sigaltstack from the git history, if Apple decides to drop
> ucontext.

It may be a bit more hassle (the configure option has to be removed, 
then maybe readded), but, well, yes.

> If you went for the mutex for the time being, I wouldn't try to nack it. :)

Hm.  OK.  Doesn’t sound too bad. ;)

Max

Daniel P. Berrangé Jan. 21, 2021, 4:04 p.m. UTC | #8

On Thu, Jan 21, 2021 at 04:42:09PM +0100, Max Reitz wrote:
> On 21.01.21 14:34, Laszlo Ersek wrote:
> > On 01/21/21 10:27, Max Reitz wrote:
> > SUSv3 marked ucontext functions obsolescent:
> > 
> > https://pubs.opengroup.org/onlinepubs/000095399/functions/makecontext.html#tag_03_356_08
> > 
> > and they are entirely missing from SUSv4 (aka the latest POSIX):
> > 
> > https://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xsh_chap03.html#tag_22_03_01_07
> > 
> > So you can use ucontext if you #define _XOPEN_SOURCE as 500 or 600, but
> > (in theory) not if you #define it as 700. How this works out in practice
> > on OSX -- i.e. how long they intend to support _XOPEN_SOURCE 600 --, I
> > can't tell.
> 
> Daniel made it sound like there was a libucontext that might be the way to
> go forward.
> 
> > I don't disagree with Daniel though; you can always bring back
> > coroutine-sigaltstack from the git history, if Apple decides to drop
> > ucontext.
> 
> It may be a bit more hassle (the configure option has to be removed, then
> maybe readded), but, well, yes.

ucontext on macOS may be slightly harder than i imagined. While making
it compile is easy, the qtests then hang so there's something not quite
right on macOS with ucontext and it basically impossible to debug in
Cirrus CI


I still think it is worth exploring, but it will require someone with
direct access to a macOS env to debug this I think.

Regards,
Daniel

Laszlo Ersek Jan. 21, 2021, 4:05 p.m. UTC | #9

On 01/21/21 16:42, Max Reitz wrote:

> Perhaps we have the policy of “If another process can send signals, then
> we consider it to have full control over qemu, like a debugger.”

That's what I had more or less in mind, yes; see e.g.

https://man7.org/linux/man-pages/man2/ptrace.2.html

       EPERM  The specified process cannot be traced.  This could be
              because the tracer has insufficient privileges (the
              required capability is CAP_SYS_PTRACE); unprivileged
              processes cannot trace processes that they cannot send
              signals to or those running set-user-ID/set-group-ID
              programs, for obvious reasons.  Alternatively, the process
              may already be being traced, or (on kernels before 2.6.26)
              be init(1) (PID 1).

Which seems to imply that, if you can send a signal, you can ptrace()
the signalee as well.

(I realize that what I'm saying does not follow from *pure logic*, as
the manual is stating !sendsig -> !trace, hence trace -> sendsig.
Whereas we're discussing the opposite direction: sendsig -> trace.
*But*, IMO, that direction is *suggested* by the manual.)

Anyway, this is kind of moot; I'm OK with the mutex too. :)

Thanks
Laszlo

Daniel P. Berrangé Jan. 21, 2021, 4:07 p.m. UTC | #10

On Thu, Jan 21, 2021 at 04:14:21PM +0100, Paolo Bonzini wrote:
> On 21/01/21 10:27, Max Reitz wrote:
> > 
> > Sure, I can do that.
> > 
> > I agree that there probably are better solutions than to wrap everything
> > in a lock.  OTOH, it looks to me like this lock is the most simple
> > solution.  If Daniel is right[1] and we should drop
> > coroutine-sigaltstack altogether (at some point...), perhaps it is best
> > to go for the most simple solution now.
> > 
> > [1]
> > https://lists.nongnu.org/archive/html/qemu-block/2021-01/msg00808.html
> 
> Yes, between coroutine-ucontext and the upcoming coroutine-asm[1] (which I
> have shelved because it was mostly a requirement for x86 CET; but it will
> come back some day), sooner or later there will be no reason to keep
> coroutine-sigaltstack.  Porting coroutine-asm to a new architecture is easy,
> I even managed to do it for s390. ;)

FWIW The libucontext impl is all ASM based and has coverage for all the
arches we care about:

   https://github.com/kaniini/libucontext

so doesn't seem like there's a need for  coroutine-asm if we can rely
on libucontext for portability where neede.

Regards,
Daniel

Peter Maydell Jan. 21, 2021, 4:44 p.m. UTC | #11

On Thu, 21 Jan 2021 at 16:10, Daniel P. Berrangé <berrange@redhat.com> wrote:
> FWIW The libucontext impl is all ASM based and has coverage for all the
> arches we care about:
>
>    https://github.com/kaniini/libucontext
>
> so doesn't seem like there's a need for  coroutine-asm if we can rely
> on libucontext for portability where neede.

The README for that notes a couple of pretty major omissions:
 * it doesn't handle floating point registers
 * it doesn't do anything about the signal mask
I'm pretty sure that not handling the fp regs could cause breakage
for Arm at least (depending on what the compiler chooses to do
in the functions that work with the ucontext functions). The
signal mask stuff might be OK for us because of the carefully
limited use we make of the ucontext functions, but it would be
a bit of a pain to have to analyse that code for both sets of semantics.

thanks
-- PMM

Paolo Bonzini Jan. 21, 2021, 5:24 p.m. UTC | #12

On 21/01/21 17:44, Peter Maydell wrote:
> On Thu, 21 Jan 2021 at 16:10, Daniel P. Berrangé <berrange@redhat.com> wrote:
>> FWIW The libucontext impl is all ASM based and has coverage for all the
>> arches we care about:
>>
>>     https://github.com/kaniini/libucontext
>>
>> so doesn't seem like there's a need for  coroutine-asm if we can rely
>> on libucontext for portability where neede.
> 
> The README for that notes a couple of pretty major omissions:
>   * it doesn't handle floating point registers
>   * it doesn't do anything about the signal mask
> I'm pretty sure that not handling the fp regs could cause breakage
> for Arm at least (depending on what the compiler chooses to do
> in the functions that work with the ucontext functions). The
> signal mask stuff might be OK for us because of the carefully
> limited use we make of the ucontext functions, but it would be
> a bit of a pain to have to analyse that code for both sets of semantics.

The lack of signal mask handling is an improvement for us. :)  We want 
the signal mask to be per thread, not per coroutine.

Floating point however is an issue if they don't save-restore v8-v15 
(for aarch64, I don't remember what the AAPCS32 says).

Paolo

Markus Armbruster Jan. 22, 2021, 7:55 a.m. UTC | #13

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 21/01/21 10:27, Max Reitz wrote:
>> Sure, I can do that.
>> I agree that there probably are better solutions than to wrap
>> everything in a lock.  OTOH, it looks to me like this lock is the
>> most simple solution.  If Daniel is right[1] and we should drop 
>> coroutine-sigaltstack altogether (at some point...), perhaps it is
>> best to go for the most simple solution now.
>> [1]
>> https://lists.nongnu.org/archive/html/qemu-block/2021-01/msg00808.html
>
> Yes, between coroutine-ucontext and the upcoming coroutine-asm[1]
> (which I have shelved because it was mostly a requirement for x86 CET;
> but it will come back some day), sooner or later there will be no
> reason to keep coroutine-sigaltstack.  Porting coroutine-asm to a new
> architecture is easy, I even managed to do it for s390. ;)

POSIX dropping ucontext without a replacement was shortsighted.  Yes,
ucontext is warty, but that's no excuse for letting your users fend for
themselves.  Standard building blocks for coroutines are clearly needed.
Without them, everybody gets to build from scratch or with ill-suited
blocks like sigaltstack.

> Paolo
>
> [1] https://patchew.org/QEMU/20190504120528.6389-1-pbonzini@redhat.com/

Max Reitz Jan. 22, 2021, 8:48 a.m. UTC | #14

On 20.01.21 18:25, Laszlo Ersek wrote:

[...]

> A simple grep for SIGUSR2 seems to indicate that SIGUSR2 is not used by
> system emulation for anything else, in practice. Is it possible to
> dedicate SIGUSR2 explicitly to coroutine-sigaltstack, and set up the
> action beforehand, from some init function that executes on a "central"
> thread, before qemu_coroutine_new() is ever called?

I wrote a patch to that effect, but just before sending I wondered 
whether SIGUSR2 cannot be registered by the “guest” in user-mode 
emulation, and whether that would then break coroutines from there on.

(I have no experience dealing with user-mode emulation, but it does look 
like the guest can just register handlers for any signal but SIGSEGV and 
SIGBUS.)

Max

Peter Maydell Jan. 22, 2021, 10:14 a.m. UTC | #15

On Fri, 22 Jan 2021 at 08:50, Max Reitz <mreitz@redhat.com> wrote:
>
> On 20.01.21 18:25, Laszlo Ersek wrote:
>
> [...]
>
> > A simple grep for SIGUSR2 seems to indicate that SIGUSR2 is not used by
> > system emulation for anything else, in practice. Is it possible to
> > dedicate SIGUSR2 explicitly to coroutine-sigaltstack, and set up the
> > action beforehand, from some init function that executes on a "central"
> > thread, before qemu_coroutine_new() is ever called?
>
> I wrote a patch to that effect, but just before sending I wondered
> whether SIGUSR2 cannot be registered by the “guest” in user-mode
> emulation, and whether that would then break coroutines from there on.
>
> (I have no experience dealing with user-mode emulation, but it does look
> like the guest can just register handlers for any signal but SIGSEGV and
> SIGBUS.)

Yes, SIGUSR2 is for the guest in user-emulation mode. OTOH do we
even use the coroutine code in user-emulation mode? Looking at
the meson.build files, we only add the coroutine_*.c to util_ss
if 'have_block', and we set have_block = have_system or have_tools.
I think (but have not checked) that that means we will build and
link the object file into the user-mode binaries if you happen
to build them in the same run as system-mode binaries, but won't
build them in if you built the user-mode binaries as a separate
build. Which is odd and probably worth fixing, but does mean we
know that we aren't actually using coroutines in user-mode.
(Also user-mode really means Linux or BSD and I think both of
those have working ucontext.)

thanks
-- PMM

Max Reitz Jan. 22, 2021, 10:16 a.m. UTC | #16

On 22.01.21 11:14, Peter Maydell wrote:
> On Fri, 22 Jan 2021 at 08:50, Max Reitz <mreitz@redhat.com> wrote:
>>
>> On 20.01.21 18:25, Laszlo Ersek wrote:
>>
>> [...]
>>
>>> A simple grep for SIGUSR2 seems to indicate that SIGUSR2 is not used by
>>> system emulation for anything else, in practice. Is it possible to
>>> dedicate SIGUSR2 explicitly to coroutine-sigaltstack, and set up the
>>> action beforehand, from some init function that executes on a "central"
>>> thread, before qemu_coroutine_new() is ever called?
>>
>> I wrote a patch to that effect, but just before sending I wondered
>> whether SIGUSR2 cannot be registered by the “guest” in user-mode
>> emulation, and whether that would then break coroutines from there on.
>>
>> (I have no experience dealing with user-mode emulation, but it does look
>> like the guest can just register handlers for any signal but SIGSEGV and
>> SIGBUS.)
> 
> Yes, SIGUSR2 is for the guest in user-emulation mode. OTOH do we
> even use the coroutine code in user-emulation mode? Looking at
> the meson.build files, we only add the coroutine_*.c to util_ss
> if 'have_block', and we set have_block = have_system or have_tools.
> I think (but have not checked) that that means we will build and
> link the object file into the user-mode binaries if you happen
> to build them in the same run as system-mode binaries, but won't
> build them in if you built the user-mode binaries as a separate
> build. Which is odd and probably worth fixing, but does mean we
> know that we aren't actually using coroutines in user-mode.
> (Also user-mode really means Linux or BSD and I think both of
> those have working ucontext.)

OK, great.  Thanks for looking that up.

Max

Laszlo Ersek Jan. 22, 2021, 12:24 p.m. UTC | #17

On 01/22/21 11:14, Peter Maydell wrote:
> On Fri, 22 Jan 2021 at 08:50, Max Reitz <mreitz@redhat.com> wrote:
>>
>> On 20.01.21 18:25, Laszlo Ersek wrote:
>>
>> [...]
>>
>>> A simple grep for SIGUSR2 seems to indicate that SIGUSR2 is not used
>>> by system emulation for anything else, in practice. Is it possible
>>> to dedicate SIGUSR2 explicitly to coroutine-sigaltstack, and set up
>>> the action beforehand, from some init function that executes on a
>>> "central" thread, before qemu_coroutine_new() is ever called?
>>
>> I wrote a patch to that effect, but just before sending I wondered
>> whether SIGUSR2 cannot be registered by the "guest" in user-mode
>> emulation, and whether that would then break coroutines from there
>> on.
>>
>> (I have no experience dealing with user-mode emulation, but it does
>> look like the guest can just register handlers for any signal but
>> SIGSEGV and SIGBUS.)
>
> Yes, SIGUSR2 is for the guest in user-emulation mode.

Yes, my grep found those occurrences of SIGUSR2 of course.

> OTOH do we even use the coroutine code in user-emulation mode?

I assumed not. I assumed coroutines were only used by the block system,
and user mode emulation is about running userspace Linux programs --
virtual disks are not emulated for those.

> Looking at the meson.build files, we only add the coroutine_*.c to
> util_ss if 'have_block', and we set have_block = have_system or
> have_tools. I think (but have not checked) that that means we will
> build and link the object file into the user-mode binaries if you
> happen to build them in the same run as system-mode binaries, but
> won't build them in if you built the user-mode binaries as a separate
> build.

Huh.

> Which is odd and probably worth fixing, but does mean we
> know that we aren't actually using coroutines in user-mode.

Thanks for checking!
Laszlo

> (Also user-mode really means Linux or BSD and I think both of
> those have working ucontext.)
>
> thanks
> -- PMM
>

Laszlo Ersek Jan. 22, 2021, 8:38 p.m. UTC | #18

On 01/21/21 18:24, Paolo Bonzini wrote:
> On 21/01/21 17:44, Peter Maydell wrote:
>> On Thu, 21 Jan 2021 at 16:10, Daniel P. Berrangé <berrange@redhat.com>
>> wrote:
>>> FWIW The libucontext impl is all ASM based and has coverage for all the
>>> arches we care about:
>>>
>>>     https://github.com/kaniini/libucontext
>>>
>>> so doesn't seem like there's a need for  coroutine-asm if we can rely
>>> on libucontext for portability where neede.
>>
>> The README for that notes a couple of pretty major omissions:
>>   * it doesn't handle floating point registers
>>   * it doesn't do anything about the signal mask
>> I'm pretty sure that not handling the fp regs could cause breakage
>> for Arm at least (depending on what the compiler chooses to do
>> in the functions that work with the ucontext functions). The
>> signal mask stuff might be OK for us because of the carefully
>> limited use we make of the ucontext functions, but it would be
>> a bit of a pain to have to analyse that code for both sets of semantics.
> 
> The lack of signal mask handling is an improvement for us. :)  We want
> the signal mask to be per thread, not per coroutine.

I didn't quite get this when I first read it, but now that I'm digging
through the code, I have a follow-up comment.

According to POSIX, passing savemask=0 to sigsetjmp() may or may not
save the current signal mask, into "env". A nonzero savemask is required
to save the signal mask, but a zero savemask is not forbidden to -- it
is only not required to:

https://pubs.opengroup.org/onlinepubs/9699919799/functions/sigsetjmp.html#tag_16_554_07

    Note that since this function is defined in terms of setjmp(), if
    savemask is zero, it is unspecified whether the signal mask is
    saved.

And I feel that's a bit of a problem, because when we first exit the
trampoline -- executed as a signal handler -- via sigsetjmp(), *all
signals* are masked, and sigsetjmp might actually stash that mask in
"tr_reenter", because savemask=0 does not suffice for forbidding that.

When we reenter the trampoline via siglongjmp(tr_reenter), and
subsequently call coroutine_bootstrap(), it's possible (per POSIX, see
above) that all signals are masked again. And then that could further be
remembered in "self->env", in coroutine_bootstrap(). Which would be
wrong IMO; co-routines in general should receive synchronous signals if
they mess up somewhere (terminating the process).

IOW, just before the call to coroutine_bootstrap(),
coroutine_trampoline() should explicitly restore the signal mask that
was in effect when qemu_coroutine_new() was entered.

Has this been a problem in practice, or should we ignore it?

IOW, should we assume "savemask=0" for *never* saving the signal mask?

The behavior of "savemask=0" is a platform trait that platforms are not
required to document (the behavior is unspecified, not
implementation-defined), so it really boils down to where this code
actually runs...

NB Linux is more specific:

https://man7.org/linux/man-pages/man3/setjmp.3.html

   sigsetjmp() and siglongjmp()
       sigsetjmp() and siglongjmp() also perform nonlocal gotos, but
       provide predictable handling of the process signal mask.

       If, and only if, the savesigs argument provided to sigsetjmp() is
       nonzero, the process's current signal mask is saved in env and
       will be restored if a siglongjmp() is later performed with this
       env.

Cue "and only if".

Thanks
Laszlo

> 
> Floating point however is an issue if they don't save-restore v8-v15
> (for aarch64, I don't remember what the AAPCS32 says).
> 
> Paolo
> 
>

Laszlo Ersek Jan. 22, 2021, 9:34 p.m. UTC | #19

On 01/22/21 21:38, Laszlo Ersek wrote:
> On 01/21/21 18:24, Paolo Bonzini wrote:
>> On 21/01/21 17:44, Peter Maydell wrote:
>>> On Thu, 21 Jan 2021 at 16:10, Daniel P. Berrangé <berrange@redhat.com>
>>> wrote:
>>>> FWIW The libucontext impl is all ASM based and has coverage for all the
>>>> arches we care about:
>>>>
>>>>     https://github.com/kaniini/libucontext
>>>>
>>>> so doesn't seem like there's a need for  coroutine-asm if we can rely
>>>> on libucontext for portability where neede.
>>>
>>> The README for that notes a couple of pretty major omissions:
>>>   * it doesn't handle floating point registers
>>>   * it doesn't do anything about the signal mask
>>> I'm pretty sure that not handling the fp regs could cause breakage
>>> for Arm at least (depending on what the compiler chooses to do
>>> in the functions that work with the ucontext functions). The
>>> signal mask stuff might be OK for us because of the carefully
>>> limited use we make of the ucontext functions, but it would be
>>> a bit of a pain to have to analyse that code for both sets of semantics.
>>
>> The lack of signal mask handling is an improvement for us. :)  We want
>> the signal mask to be per thread, not per coroutine.
> 
> I didn't quite get this when I first read it, but now that I'm digging
> through the code, I have a follow-up comment.
> 
> According to POSIX, passing savemask=0 to sigsetjmp() may or may not
> save the current signal mask, into "env". A nonzero savemask is required
> to save the signal mask, but a zero savemask is not forbidden to -- it
> is only not required to:
> 
> https://pubs.opengroup.org/onlinepubs/9699919799/functions/sigsetjmp.html#tag_16_554_07
> 
>     Note that since this function is defined in terms of setjmp(), if
>     savemask is zero, it is unspecified whether the signal mask is
>     saved.
> 
> And I feel that's a bit of a problem, because when we first exit the
> trampoline -- executed as a signal handler -- via sigsetjmp(), *all
> signals* are masked, and sigsetjmp might actually stash that mask in
> "tr_reenter", because savemask=0 does not suffice for forbidding that.
> 
> When we reenter the trampoline via siglongjmp(tr_reenter), and
> subsequently call coroutine_bootstrap(), it's possible (per POSIX, see
> above) that all signals are masked again. And then that could further be
> remembered in "self->env", in coroutine_bootstrap(). Which would be
> wrong IMO; co-routines in general should receive synchronous signals if
> they mess up somewhere (terminating the process).
> 
> IOW, just before the call to coroutine_bootstrap(),
> coroutine_trampoline() should explicitly restore the signal mask that
> was in effect when qemu_coroutine_new() was entered.
> 
> Has this been a problem in practice, or should we ignore it?
> 
> IOW, should we assume "savemask=0" for *never* saving the signal mask?
> 
> The behavior of "savemask=0" is a platform trait that platforms are not
> required to document (the behavior is unspecified, not
> implementation-defined), so it really boils down to where this code
> actually runs...
> 
> NB Linux is more specific:
> 
> https://man7.org/linux/man-pages/man3/setjmp.3.html
> 
>    sigsetjmp() and siglongjmp()
>        sigsetjmp() and siglongjmp() also perform nonlocal gotos, but
>        provide predictable handling of the process signal mask.
> 
>        If, and only if, the savesigs argument provided to sigsetjmp() is
>        nonzero, the process's current signal mask is saved in env and
>        will be restored if a siglongjmp() is later performed with this
>        env.
> 
> Cue "and only if".

... I notice commit 6ab7e5465a4d ("Replace all setjmp()/longjmp() with
sigsetjmp()/siglongjmp()", 2013-02-23) chose the Linux definition, not
the POSIX one.

Thanks
Laszlo

Laszlo Ersek Jan. 22, 2021, 9:41 p.m. UTC | #20

On 01/22/21 22:34, Laszlo Ersek wrote:
> On 01/22/21 21:38, Laszlo Ersek wrote:

>> The behavior of "savemask=0" is a platform trait that platforms are not
>> required to document (the behavior is unspecified, not
>> implementation-defined), so it really boils down to where this code
>> actually runs...
>>
>> NB Linux is more specific:
>>
>> https://man7.org/linux/man-pages/man3/setjmp.3.html
>>
>>    sigsetjmp() and siglongjmp()
>>        sigsetjmp() and siglongjmp() also perform nonlocal gotos, but
>>        provide predictable handling of the process signal mask.
>>
>>        If, and only if, the savesigs argument provided to sigsetjmp() is
>>        nonzero, the process's current signal mask is saved in env and
>>        will be restored if a siglongjmp() is later performed with this
>>        env.
>>
>> Cue "and only if".
> 
> ... I notice commit 6ab7e5465a4d ("Replace all setjmp()/longjmp() with
> sigsetjmp()/siglongjmp()", 2013-02-23) chose the Linux definition, not
> the POSIX one.

My bad: the commit message is correct. While the effect of savemask=0 is
indeed unspecified for sigsetjmp(), it is completely defined for
siglongjmp().

https://pubs.opengroup.org/onlinepubs/9699919799/functions/siglongjmp.html

Commit 6ab7e5465a4d even carries my R-b :/

Sorry about the noise,
Laszlo

Laszlo Ersek Jan. 23, 2021, 12:06 a.m. UTC | #21

On 01/22/21 11:14, Peter Maydell wrote:
> On Fri, 22 Jan 2021 at 08:50, Max Reitz <mreitz@redhat.com> wrote:
>>
>> On 20.01.21 18:25, Laszlo Ersek wrote:
>>
>> [...]
>>
>>> A simple grep for SIGUSR2 seems to indicate that SIGUSR2 is not used by
>>> system emulation for anything else, in practice. Is it possible to
>>> dedicate SIGUSR2 explicitly to coroutine-sigaltstack, and set up the
>>> action beforehand, from some init function that executes on a "central"
>>> thread, before qemu_coroutine_new() is ever called?
>>
>> I wrote a patch to that effect, but just before sending I wondered
>> whether SIGUSR2 cannot be registered by the “guest” in user-mode
>> emulation, and whether that would then break coroutines from there on.
>>
>> (I have no experience dealing with user-mode emulation, but it does look
>> like the guest can just register handlers for any signal but SIGSEGV and
>> SIGBUS.)
> 
> Yes, SIGUSR2 is for the guest in user-emulation mode. OTOH do we
> even use the coroutine code in user-emulation mode? Looking at
> the meson.build files, we only add the coroutine_*.c to util_ss
> if 'have_block', and we set have_block = have_system or have_tools.
> I think (but have not checked) that that means we will build and
> link the object file into the user-mode binaries if you happen
> to build them in the same run as system-mode binaries,

I did that, first running

 ./configure \
    --enable-debug \
    --target-list==x86_64-softmmu,x86_64-linux-user \
    --with-coroutine=sigaltstack

Then I checked the "qemu-system-x86_64" and "qemu-x86_64" binaries with
"nm". Only the former contains "coroutine_init":

00000000009725e4 t coroutine_init

So I believe the coroutine object file(s) are not even linked into the
user-mode emulators. (coroutine_init() is a constructor function, so I
think it would be preserved otherwise, even if it had no explicit caller.)

I tried a different approach too: an #error in
"coroutine-sigaltstack.c", if CONFIG_LINUX_USER were #defined. But that
aborted the build, due to CONFIG_LINUX_USER being poisoned in the first
place. Maybe that result was already enough to answer the question, but
I wasn't sure, hence the check with "nm".

Thanks,
Laszlo

> but won't
> build them in if you built the user-mode binaries as a separate
> build. Which is odd and probably worth fixing, but does mean we
> know that we aren't actually using coroutines in user-mode.
> (Also user-mode really means Linux or BSD and I think both of
> those have working ucontext.)
> 
> thanks
> -- PMM
>

Peter Maydell Jan. 23, 2021, 1:35 p.m. UTC | #22

On Sat, 23 Jan 2021 at 00:06, Laszlo Ersek <lersek@redhat.com> wrote:
> On 01/22/21 11:14, Peter Maydell wrote:
> > I think (but have not checked) that that means we will build and
> > link the object file into the user-mode binaries if you happen
> > to build them in the same run as system-mode binaries,
>
> I did that, first running
>
>  ./configure \
>     --enable-debug \
>     --target-list==x86_64-softmmu,x86_64-linux-user \
>     --with-coroutine=sigaltstack
>
> Then I checked the "qemu-system-x86_64" and "qemu-x86_64" binaries with
> "nm". Only the former contains "coroutine_init":
>
> 00000000009725e4 t coroutine_init
>
> So I believe the coroutine object file(s) are not even linked into the
> user-mode emulators. (coroutine_init() is a constructor function, so I
> think it would be preserved otherwise, even if it had no explicit caller.)

I think the linker will only pull in a .o file from a static
library (for us, libqemuutil.a) if there's an explicit reference
to a real function that it needs; it won't pull it in merely because
it has a constructor function in it. I can't offhand find an official
docs reference for this, but here's a stack overflow question:
https://stackoverflow.com/questions/6589772/gcc-functions-with-constructor-attribute-are-not-being-linked

You can also touch the coroutine source file and watch the
build system rebuild the usermode binary. If you do a complete build,
then touch util/coroutine-ucontext.c, then do 'make -C builddir qemu-arm'
(or whatever usermode emulator you've configured) then:
 * we build coroutine-ucontext.c into
libqemuutil.a.p/util_coroutine-ucontext.c.o
 * we blow away libqemuutil.a and then run 'ar' to put a lot of .o
   files into it, including libqemuutil.a.p/util_coroutine-ucontext.c.o
 * we do some things with scripts/undefsym.py that aren't important here
 * we re-link qemu-arm with all its directly used objects and also with
   libqemuutil.a

So right now we aren't actually using the coroutine functions, but
if you did put in a call to one in code used by the usermode
emulator then that code would only fail to compile if you
were building the usermode emulators without either the tools
or the system emulators.

thanks
-- PMM

Laszlo Ersek Jan. 25, 2021, 10:15 p.m. UTC | #23

On 01/23/21 14:35, Peter Maydell wrote:
> On Sat, 23 Jan 2021 at 00:06, Laszlo Ersek <lersek@redhat.com> wrote:
>> On 01/22/21 11:14, Peter Maydell wrote:
>>> I think (but have not checked) that that means we will build and
>>> link the object file into the user-mode binaries if you happen
>>> to build them in the same run as system-mode binaries,
>>
>> I did that, first running
>>
>>  ./configure \
>>     --enable-debug \
>>     --target-list==x86_64-softmmu,x86_64-linux-user \
>>     --with-coroutine=sigaltstack
>>
>> Then I checked the "qemu-system-x86_64" and "qemu-x86_64" binaries with
>> "nm". Only the former contains "coroutine_init":
>>
>> 00000000009725e4 t coroutine_init
>>
>> So I believe the coroutine object file(s) are not even linked into the
>> user-mode emulators. (coroutine_init() is a constructor function, so I
>> think it would be preserved otherwise, even if it had no explicit caller.)
> 
> I think the linker will only pull in a .o file from a static
> library (for us, libqemuutil.a) if there's an explicit reference
> to a real function that it needs; it won't pull it in merely because
> it has a constructor function in it. I can't offhand find an official
> docs reference for this, but here's a stack overflow question:
> https://stackoverflow.com/questions/6589772/gcc-functions-with-constructor-attribute-are-not-being-linked
> 
> You can also touch the coroutine source file and watch the
> build system rebuild the usermode binary. If you do a complete build,
> then touch util/coroutine-ucontext.c, then do 'make -C builddir qemu-arm'
> (or whatever usermode emulator you've configured) then:
>  * we build coroutine-ucontext.c into
> libqemuutil.a.p/util_coroutine-ucontext.c.o
>  * we blow away libqemuutil.a and then run 'ar' to put a lot of .o
>    files into it, including libqemuutil.a.p/util_coroutine-ucontext.c.o
>  * we do some things with scripts/undefsym.py that aren't important here
>  * we re-link qemu-arm with all its directly used objects and also with
>    libqemuutil.a
> 
> So right now we aren't actually using the coroutine functions, but
> if you did put in a call to one in code used by the usermode
> emulator then that code would only fail to compile if you
> were building the usermode emulators without either the tools
> or the system emulators.

Two thoughts (apart from the original topic):

- Should we file an LP ticket about this, perhaps? It looks moderately
risky.

- How does this case differ from the registration of types? type_init()
is a constructor function, and it generally ends up adding some
structures with function pointers (I reckon) via type_table_add(). The
main executable doesn't call into the device code directly -- only the
constructor function calls another function from the same module. The
main executable only goes through the registered type interfaces. How
does the linker know in that case to preserve the whole object?

Thanks!
Laszlo

Paolo Bonzini Jan. 25, 2021, 10:45 p.m. UTC | #24

On 25/01/21 23:15, Laszlo Ersek wrote:
> 
> - How does this case differ from the registration of types? type_init()
> is a constructor function, and it generally ends up adding some
> structures with function pointers (I reckon) via type_table_add(). The
> main executable doesn't call into the device code directly -- only the
> constructor function calls another function from the same module. The
> main executable only goes through the registered type interfaces. How
> does the linker know in that case to preserve the whole object?

We use either link_whole or extract_objects() for libraries that have 
type-registering constructors.

Paolo

Laszlo Ersek Jan. 26, 2021, 8:57 a.m. UTC | #25

On 01/25/21 23:45, Paolo Bonzini wrote:
> On 25/01/21 23:15, Laszlo Ersek wrote:
>>
>> - How does this case differ from the registration of types? type_init()
>> is a constructor function, and it generally ends up adding some
>> structures with function pointers (I reckon) via type_table_add(). The
>> main executable doesn't call into the device code directly -- only the
>> constructor function calls another function from the same module. The
>> main executable only goes through the registered type interfaces. How
>> does the linker know in that case to preserve the whole object?
> 
> We use either link_whole or extract_objects() for libraries that have
> type-registering constructors.

Thanks! The meson reference manual explains these.

Laszlo

Thread safety of coroutine-sigaltstack

Commit Message

Comments

Patch