diff mbox

[RESEND,1/1] qmp.c: system_wakeup: adding RUN_STATE_SUSPENDED check before proceeding

Message ID 20180102170645.5877-1-danielhb@linux.vnet.ibm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Daniel Henrique Barboza Jan. 2, 2018, 5:06 p.m. UTC
The qmp/hmp command 'system_wakeup' is simply a direct call to
'qemu_system_wakeup_request' from vl.c. This function verifies if
runstate is SUSPENDED and if the wake up reason is valid before
proceeding. However, no error or warning is thrown if any of those
pre-requirements isn't met.

This leads to situations such as the one described in
https://github.com/open-power-host-os/qemu/issues/31, where one
can induce the OS to be suspended by using pm-suspend (via
dompmsuspend, for example) but for some reason the machine failed to
go to the SUSPENDED runstate, staying at runstate RUNNING. The user
then tries to wake up the guest using system_wakeup (or dompmwakeup),
no error is thrown but nothing happened either because the wake up
wasn't fired at all. In the end, the user is left with a guest that
is dormant and believing that system_wakeup isn't working.

This patch changes qmp_system_wakeup to make the runstate verification
before proceeding to call qemu_system_wakeup_request, firing up
an error message if the user tries to wake up a machine that
isn't in SUSPENDED state. The change isn't made inside
qemu_system_wakeup_request because it is used in migration,
ACPI and others where this usage might be valid. This patch
by no means fixes the situation described above, but it can direct
the user/management closer to the real problem.

Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 hmp.c | 4 +++-
 qmp.c | 5 +++++
 2 files changed, 8 insertions(+), 1 deletion(-)

Comments

Eric Blake Jan. 2, 2018, 8:03 p.m. UTC | #1
On 01/02/2018 11:06 AM, Daniel Henrique Barboza wrote:
> The qmp/hmp command 'system_wakeup' is simply a direct call to
> 'qemu_system_wakeup_request' from vl.c. This function verifies if
> runstate is SUSPENDED and if the wake up reason is valid before
> proceeding. However, no error or warning is thrown if any of those
> pre-requirements isn't met.
> 
> This leads to situations such as the one described in
> https://github.com/open-power-host-os/qemu/issues/31, where one
> can induce the OS to be suspended by using pm-suspend (via
> dompmsuspend, for example) but for some reason the machine failed to
> go to the SUSPENDED runstate, staying at runstate RUNNING. The user
> then tries to wake up the guest using system_wakeup (or dompmwakeup),
> no error is thrown but nothing happened either because the wake up
> wasn't fired at all. In the end, the user is left with a guest that
> is dormant and believing that system_wakeup isn't working.
> 
> This patch changes qmp_system_wakeup to make the runstate verification
> before proceeding to call qemu_system_wakeup_request, firing up
> an error message if the user tries to wake up a machine that
> isn't in SUSPENDED state. The change isn't made inside
> qemu_system_wakeup_request because it is used in migration,
> ACPI and others where this usage might be valid. This patch
> by no means fixes the situation described above, but it can direct
> the user/management closer to the real problem.

Converting something that silently did nothing successfully into now
returning an error is somewhat backwards-incompatible.  Is this
something that needs to go through a deprecation cycle, to give people a
chance to fix QMP scripts that were relying on the previous no-op
behavior to now work with the new semantics?  Or should we add an
optional bool parameter to the command, where omitting the parameter
gives the old semantics, but new enough clients can pass the bool to
choose which behavior (no-op or error) they prefer?

> +++ b/qmp.c
> @@ -201,6 +201,11 @@ void qmp_cont(Error **errp)
>  
>  void qmp_system_wakeup(Error **errp)
>  {
> +    if (!runstate_check(RUN_STATE_SUSPENDED)) {
> +        error_setg(errp,
> +                   "Unable to wake up: guest is not in suspended state");
> +        return;
> +    }
>      qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
>  }
>  
>
Daniel Henrique Barboza Jan. 2, 2018, 8:39 p.m. UTC | #2
On 01/02/2018 06:03 PM, Eric Blake wrote:
> On 01/02/2018 11:06 AM, Daniel Henrique Barboza wrote:
>> The qmp/hmp command 'system_wakeup' is simply a direct call to
>> 'qemu_system_wakeup_request' from vl.c. This function verifies if
>> runstate is SUSPENDED and if the wake up reason is valid before
>> proceeding. However, no error or warning is thrown if any of those
>> pre-requirements isn't met.
>>
>> This leads to situations such as the one described in
>> https://github.com/open-power-host-os/qemu/issues/31, where one
>> can induce the OS to be suspended by using pm-suspend (via
>> dompmsuspend, for example) but for some reason the machine failed to
>> go to the SUSPENDED runstate, staying at runstate RUNNING. The user
>> then tries to wake up the guest using system_wakeup (or dompmwakeup),
>> no error is thrown but nothing happened either because the wake up
>> wasn't fired at all. In the end, the user is left with a guest that
>> is dormant and believing that system_wakeup isn't working.
>>
>> This patch changes qmp_system_wakeup to make the runstate verification
>> before proceeding to call qemu_system_wakeup_request, firing up
>> an error message if the user tries to wake up a machine that
>> isn't in SUSPENDED state. The change isn't made inside
>> qemu_system_wakeup_request because it is used in migration,
>> ACPI and others where this usage might be valid. This patch
>> by no means fixes the situation described above, but it can direct
>> the user/management closer to the real problem.
> Converting something that silently did nothing successfully into now
> returning an error is somewhat backwards-incompatible.  Is this
> something that needs to go through a deprecation cycle, to give people a
> chance to fix QMP scripts that were relying on the previous no-op
> behavior to now work with the new semantics?  Or should we add an
> optional bool parameter to the command, where omitting the parameter
> gives the old semantics, but new enough clients can pass the bool to
> choose which behavior (no-op or error) they prefer?

Good point. Back when I first coded this patch I made sure that no QEMU test
were broken, but I didn't consider that existing user scripts that uses
system_wakeup might now yield a different result.

The extra optional parameter seems a good approach to take here IMO. The
current system_wakeup implementation works fine if the user is careful
enough to not call it if the guest isn't on SUSPENDED state, so I am unsure
if deprecating this behavior in favor of this new one is a bit too extreme.


Thanks,

Daniel

>
>> +++ b/qmp.c
>> @@ -201,6 +201,11 @@ void qmp_cont(Error **errp)
>>   
>>   void qmp_system_wakeup(Error **errp)
>>   {
>> +    if (!runstate_check(RUN_STATE_SUSPENDED)) {
>> +        error_setg(errp,
>> +                   "Unable to wake up: guest is not in suspended state");
>> +        return;
>> +    }
>>       qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
>>   }
>>   
>>
Markus Armbruster Jan. 3, 2018, 1:41 p.m. UTC | #3
Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> writes:

> On 01/02/2018 06:03 PM, Eric Blake wrote:
>> On 01/02/2018 11:06 AM, Daniel Henrique Barboza wrote:
>>> The qmp/hmp command 'system_wakeup' is simply a direct call to
>>> 'qemu_system_wakeup_request' from vl.c. This function verifies if
>>> runstate is SUSPENDED and if the wake up reason is valid before
>>> proceeding. However, no error or warning is thrown if any of those
>>> pre-requirements isn't met.
>>>
>>> This leads to situations such as the one described in
>>> https://github.com/open-power-host-os/qemu/issues/31, where one
>>> can induce the OS to be suspended by using pm-suspend (via
>>> dompmsuspend, for example) but for some reason the machine failed to
>>> go to the SUSPENDED runstate, staying at runstate RUNNING. The user
>>> then tries to wake up the guest using system_wakeup (or dompmwakeup),
>>> no error is thrown but nothing happened either because the wake up
>>> wasn't fired at all. In the end, the user is left with a guest that
>>> is dormant and believing that system_wakeup isn't working.
>>>
>>> This patch changes qmp_system_wakeup to make the runstate verification
>>> before proceeding to call qemu_system_wakeup_request, firing up
>>> an error message if the user tries to wake up a machine that
>>> isn't in SUSPENDED state. The change isn't made inside
>>> qemu_system_wakeup_request because it is used in migration,
>>> ACPI and others where this usage might be valid. This patch
>>> by no means fixes the situation described above, but it can direct
>>> the user/management closer to the real problem.
>> Converting something that silently did nothing successfully into now
>> returning an error is somewhat backwards-incompatible.  Is this
>> something that needs to go through a deprecation cycle, to give people a
>> chance to fix QMP scripts that were relying on the previous no-op
>> behavior to now work with the new semantics?

I doubt it.

We're fixing a silent failure to be "loud".  How exactly could that
upset applications?

Applications that don't bother to check the result remain exactly as
broken as before.

Applications that do check become aware of the fact that the command did
not work as advertized.  I guess this risks breaking them differently.

qmp-spec.txt section 5. gives us license to add errors: "Clients must
not assume any particular [...]  errors generated by a command, that is,
new errors can be added to any existing command in newer versions of the
Server.

>> behavior to now work with the new semantics?  Or should we add an
>> optional bool parameter to the command, where omitting the parameter
>> gives the old semantics, but new enough clients can pass the bool to
>> choose which behavior (no-op or error) they prefer?
>
> Good point. Back when I first coded this patch I made sure that no QEMU test
> were broken, but I didn't consider that existing user scripts that uses
> system_wakeup might now yield a different result.
>
> The extra optional parameter seems a good approach to take here IMO. The
> current system_wakeup implementation works fine if the user is careful
> enough to not call it if the guest isn't on SUSPENDED state, so I am unsure
> if deprecating this behavior in favor of this new one is a bit too extreme.

I'd say we call the silent failure to wake up the guest a bug, and your
patch a bug fix.

I figure we could find precedence for turning broken successful behavior
into a clean error if we looked for it.
Daniel Henrique Barboza Jan. 3, 2018, 3:52 p.m. UTC | #4
On 01/03/2018 11:41 AM, Markus Armbruster wrote:
> Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> writes:
>
>> On 01/02/2018 06:03 PM, Eric Blake wrote:
>>> On 01/02/2018 11:06 AM, Daniel Henrique Barboza wrote:
>>>> The qmp/hmp command 'system_wakeup' is simply a direct call to
>>>> 'qemu_system_wakeup_request' from vl.c. This function verifies if
>>>> runstate is SUSPENDED and if the wake up reason is valid before
>>>> proceeding. However, no error or warning is thrown if any of those
>>>> pre-requirements isn't met.
>>>>
>>>> This leads to situations such as the one described in
>>>> https://github.com/open-power-host-os/qemu/issues/31, where one
>>>> can induce the OS to be suspended by using pm-suspend (via
>>>> dompmsuspend, for example) but for some reason the machine failed to
>>>> go to the SUSPENDED runstate, staying at runstate RUNNING. The user
>>>> then tries to wake up the guest using system_wakeup (or dompmwakeup),
>>>> no error is thrown but nothing happened either because the wake up
>>>> wasn't fired at all. In the end, the user is left with a guest that
>>>> is dormant and believing that system_wakeup isn't working.
>>>>
>>>> This patch changes qmp_system_wakeup to make the runstate verification
>>>> before proceeding to call qemu_system_wakeup_request, firing up
>>>> an error message if the user tries to wake up a machine that
>>>> isn't in SUSPENDED state. The change isn't made inside
>>>> qemu_system_wakeup_request because it is used in migration,
>>>> ACPI and others where this usage might be valid. This patch
>>>> by no means fixes the situation described above, but it can direct
>>>> the user/management closer to the real problem.
>>> Converting something that silently did nothing successfully into now
>>> returning an error is somewhat backwards-incompatible.  Is this
>>> something that needs to go through a deprecation cycle, to give people a
>>> chance to fix QMP scripts that were relying on the previous no-op
>>> behavior to now work with the new semantics?
> I doubt it.
>
> We're fixing a silent failure to be "loud".  How exactly could that
> upset applications?
>
> Applications that don't bother to check the result remain exactly as
> broken as before.

That's true.

>
> Applications that do check become aware of the fact that the command did
> not work as advertized.  I guess this risks breaking them differently.
>
> qmp-spec.txt section 5. gives us license to add errors: "Clients must
> not assume any particular [...]  errors generated by a command, that is,
> new errors can be added to any existing command in newer versions of the
> Server.
>
>>> behavior to now work with the new semantics?  Or should we add an
>>> optional bool parameter to the command, where omitting the parameter
>>> gives the old semantics, but new enough clients can pass the bool to
>>> choose which behavior (no-op or error) they prefer?
>> Good point. Back when I first coded this patch I made sure that no QEMU test
>> were broken, but I didn't consider that existing user scripts that uses
>> system_wakeup might now yield a different result.
>>
>> The extra optional parameter seems a good approach to take here IMO. The
>> current system_wakeup implementation works fine if the user is careful
>> enough to not call it if the guest isn't on SUSPENDED state, so I am unsure
>> if deprecating this behavior in favor of this new one is a bit too extreme.
> I'd say we call the silent failure to wake up the guest a bug, and your
> patch a bug fix.

I am fine with this approach, specially after your argument about an
application that doesn't check the result of system_wakeup is already
broken. This patch would simply make the error explicit.


Daniel

>
> I figure we could find precedence for turning broken successful behavior
> into a clean error if we looked for it.
>
Daniel Henrique Barboza Jan. 22, 2018, 9:29 a.m. UTC | #5
Ping

On 01/02/2018 03:06 PM, Daniel Henrique Barboza wrote:
> The qmp/hmp command 'system_wakeup' is simply a direct call to
> 'qemu_system_wakeup_request' from vl.c. This function verifies if
> runstate is SUSPENDED and if the wake up reason is valid before
> proceeding. However, no error or warning is thrown if any of those
> pre-requirements isn't met.
>
> This leads to situations such as the one described in
> https://github.com/open-power-host-os/qemu/issues/31, where one
> can induce the OS to be suspended by using pm-suspend (via
> dompmsuspend, for example) but for some reason the machine failed to
> go to the SUSPENDED runstate, staying at runstate RUNNING. The user
> then tries to wake up the guest using system_wakeup (or dompmwakeup),
> no error is thrown but nothing happened either because the wake up
> wasn't fired at all. In the end, the user is left with a guest that
> is dormant and believing that system_wakeup isn't working.
>
> This patch changes qmp_system_wakeup to make the runstate verification
> before proceeding to call qemu_system_wakeup_request, firing up
> an error message if the user tries to wake up a machine that
> isn't in SUSPENDED state. The change isn't made inside
> qemu_system_wakeup_request because it is used in migration,
> ACPI and others where this usage might be valid. This patch
> by no means fixes the situation described above, but it can direct
> the user/management closer to the real problem.
>
> Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
> CC: Markus Armbruster <armbru@redhat.com>
> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>   hmp.c | 4 +++-
>   qmp.c | 5 +++++
>   2 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/hmp.c b/hmp.c
> index 35a7041824..2c2326a00e 100644
> --- a/hmp.c
> +++ b/hmp.c
> @@ -1158,7 +1158,9 @@ void hmp_cont(Monitor *mon, const QDict *qdict)
>
>   void hmp_system_wakeup(Monitor *mon, const QDict *qdict)
>   {
> -    qmp_system_wakeup(NULL);
> +    Error *err = NULL;
> +    qmp_system_wakeup(&err);
> +    hmp_handle_error(mon, &err);
>   }
>
>   void hmp_nmi(Monitor *mon, const QDict *qdict)
> diff --git a/qmp.c b/qmp.c
> index 52cfd2d81c..d0be43fa1a 100644
> --- a/qmp.c
> +++ b/qmp.c
> @@ -201,6 +201,11 @@ void qmp_cont(Error **errp)
>
>   void qmp_system_wakeup(Error **errp)
>   {
> +    if (!runstate_check(RUN_STATE_SUSPENDED)) {
> +        error_setg(errp,
> +                   "Unable to wake up: guest is not in suspended state");
> +        return;
> +    }
>       qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
>   }
>
diff mbox

Patch

diff --git a/hmp.c b/hmp.c
index 35a7041824..2c2326a00e 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1158,7 +1158,9 @@  void hmp_cont(Monitor *mon, const QDict *qdict)
 
 void hmp_system_wakeup(Monitor *mon, const QDict *qdict)
 {
-    qmp_system_wakeup(NULL);
+    Error *err = NULL;
+    qmp_system_wakeup(&err);
+    hmp_handle_error(mon, &err);
 }
 
 void hmp_nmi(Monitor *mon, const QDict *qdict)
diff --git a/qmp.c b/qmp.c
index 52cfd2d81c..d0be43fa1a 100644
--- a/qmp.c
+++ b/qmp.c
@@ -201,6 +201,11 @@  void qmp_cont(Error **errp)
 
 void qmp_system_wakeup(Error **errp)
 {
+    if (!runstate_check(RUN_STATE_SUSPENDED)) {
+        error_setg(errp,
+                   "Unable to wake up: guest is not in suspended state");
+        return;
+    }
     qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
 }