diff mbox series

[v2,1/2] qapi/machine.json: add DEVICE_UNPLUG_ERROR QAPI event

Message ID 20210604200353.1206897-2-danielhb413@gmail.com (mailing list archive)
State New, archived
Headers show
Series DEVICE_UNPLUG_ERROR QAPI event | expand

Commit Message

Daniel Henrique Barboza June 4, 2021, 8:03 p.m. UTC
At this moment we only provide one event to report a hotunplug error,
MEM_UNPLUG_ERROR. As of Linux kernel 5.12 and QEMU 6.0.0, the pseries
machine is now able to report unplug errors for other device types, such
as CPUs.

Instead of creating a (device_type)_UNPLUG_ERROR for each new device,
create a generic DEVICE_UNPLUG_ERROR event that can be used by all
unplug errors in the future.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
---
 qapi/machine.json | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

Comments

David Gibson June 7, 2021, 2:23 a.m. UTC | #1
On Fri, Jun 04, 2021 at 05:03:52PM -0300, Daniel Henrique Barboza wrote:
> At this moment we only provide one event to report a hotunplug error,
> MEM_UNPLUG_ERROR. As of Linux kernel 5.12 and QEMU 6.0.0, the pseries
> machine is now able to report unplug errors for other device types, such
> as CPUs.
> 
> Instead of creating a (device_type)_UNPLUG_ERROR for each new device,
> create a generic DEVICE_UNPLUG_ERROR event that can be used by all
> unplug errors in the future.
> 
> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  qapi/machine.json | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/qapi/machine.json b/qapi/machine.json
> index 58a9c86b36..f0c7e56be0 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -1274,3 +1274,26 @@
>  ##
>  { 'event': 'MEM_UNPLUG_ERROR',
>    'data': { 'device': 'str', 'msg': 'str' } }
> +
> +##
> +# @DEVICE_UNPLUG_ERROR:
> +#
> +# Emitted when a device hot unplug error occurs.
> +#
> +# @device: device name
> +#
> +# @msg: Informative message
> +#
> +# Since: 6.1
> +#
> +# Example:
> +#
> +# <- { "event": "DEVICE_UNPLUG_ERROR"
> +#      "data": { "device": "dimm1",
> +#                "msg": "Memory hotunplug rejected by the guest for device dimm1"
> +#      },
> +#      "timestamp": { "seconds": 1615570772, "microseconds": 202844 } }
> +#
> +##
> +{ 'event': 'DEVICE_UNPLUG_ERROR',
> +  'data': { 'device': 'str', 'msg': 'str' } }
David Gibson June 7, 2021, 2:23 a.m. UTC | #2
On Mon, Jun 07, 2021 at 12:23:08PM +1000, David Gibson wrote:
> On Fri, Jun 04, 2021 at 05:03:52PM -0300, Daniel Henrique Barboza wrote:
> > At this moment we only provide one event to report a hotunplug error,
> > MEM_UNPLUG_ERROR. As of Linux kernel 5.12 and QEMU 6.0.0, the pseries
> > machine is now able to report unplug errors for other device types, such
> > as CPUs.
> > 
> > Instead of creating a (device_type)_UNPLUG_ERROR for each new device,
> > create a generic DEVICE_UNPLUG_ERROR event that can be used by all
> > unplug errors in the future.
> > 
> > Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> 
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Markus, I'm happy to take this through my tree if that's convenient
for you, but I'd like to get an ack.

> 
> > ---
> >  qapi/machine.json | 23 +++++++++++++++++++++++
> >  1 file changed, 23 insertions(+)
> > 
> > diff --git a/qapi/machine.json b/qapi/machine.json
> > index 58a9c86b36..f0c7e56be0 100644
> > --- a/qapi/machine.json
> > +++ b/qapi/machine.json
> > @@ -1274,3 +1274,26 @@
> >  ##
> >  { 'event': 'MEM_UNPLUG_ERROR',
> >    'data': { 'device': 'str', 'msg': 'str' } }
> > +
> > +##
> > +# @DEVICE_UNPLUG_ERROR:
> > +#
> > +# Emitted when a device hot unplug error occurs.
> > +#
> > +# @device: device name
> > +#
> > +# @msg: Informative message
> > +#
> > +# Since: 6.1
> > +#
> > +# Example:
> > +#
> > +# <- { "event": "DEVICE_UNPLUG_ERROR"
> > +#      "data": { "device": "dimm1",
> > +#                "msg": "Memory hotunplug rejected by the guest for device dimm1"
> > +#      },
> > +#      "timestamp": { "seconds": 1615570772, "microseconds": 202844 } }
> > +#
> > +##
> > +{ 'event': 'DEVICE_UNPLUG_ERROR',
> > +  'data': { 'device': 'str', 'msg': 'str' } }
>
Eric Blake June 7, 2021, 6:41 p.m. UTC | #3
On Mon, Jun 07, 2021 at 12:23:53PM +1000, David Gibson wrote:
> On Mon, Jun 07, 2021 at 12:23:08PM +1000, David Gibson wrote:
> > On Fri, Jun 04, 2021 at 05:03:52PM -0300, Daniel Henrique Barboza wrote:
> > > At this moment we only provide one event to report a hotunplug error,
> > > MEM_UNPLUG_ERROR. As of Linux kernel 5.12 and QEMU 6.0.0, the pseries
> > > machine is now able to report unplug errors for other device types, such
> > > as CPUs.
> > > 
> > > Instead of creating a (device_type)_UNPLUG_ERROR for each new device,
> > > create a generic DEVICE_UNPLUG_ERROR event that can be used by all
> > > unplug errors in the future.
> > > 
> > > Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> > 
> > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> 
> Markus, I'm happy to take this through my tree if that's convenient
> for you, but I'd like to get an ack.

I'm not Markus, but if you're happy with my interface review,

Reviewed-by: Eric Blake <eblake@redhat.com>
Markus Armbruster June 11, 2021, 12:12 p.m. UTC | #4
Daniel Henrique Barboza <danielhb413@gmail.com> writes:

> At this moment we only provide one event to report a hotunplug error,
> MEM_UNPLUG_ERROR. As of Linux kernel 5.12 and QEMU 6.0.0, the pseries
> machine is now able to report unplug errors for other device types, such
> as CPUs.
>
> Instead of creating a (device_type)_UNPLUG_ERROR for each new device,
> create a generic DEVICE_UNPLUG_ERROR event that can be used by all
> unplug errors in the future.
>
> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> ---
>  qapi/machine.json | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
>
> diff --git a/qapi/machine.json b/qapi/machine.json
> index 58a9c86b36..f0c7e56be0 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -1274,3 +1274,26 @@
>  ##
>  { 'event': 'MEM_UNPLUG_ERROR',
>    'data': { 'device': 'str', 'msg': 'str' } }
> +
> +##
> +# @DEVICE_UNPLUG_ERROR:
> +#
> +# Emitted when a device hot unplug error occurs.
> +#
> +# @device: device name
> +#
> +# @msg: Informative message
> +#
> +# Since: 6.1
> +#
> +# Example:
> +#
> +# <- { "event": "DEVICE_UNPLUG_ERROR"
> +#      "data": { "device": "dimm1",
> +#                "msg": "Memory hotunplug rejected by the guest for device dimm1"
> +#      },
> +#      "timestamp": { "seconds": 1615570772, "microseconds": 202844 } }
> +#
> +##
> +{ 'event': 'DEVICE_UNPLUG_ERROR',
> +  'data': { 'device': 'str', 'msg': 'str' } }

Missing: update of device_add's doc comment in qdev.json:

    # Notes: When this command completes, the device may not be removed from the
    #        guest.  Hot removal is an operation that requires guest cooperation.
    #        This command merely requests that the guest begin the hot removal
    #        process.  Completion of the device removal process is signaled with a
    #        DEVICE_DELETED event. Guest reset will automatically complete removal
    #        for all devices.

This sure could use some polish.

If I understand things correctly, we're aiming for the following device
unplug protocol:

   Unplug the device with device_del (or possibly equivalent)

   If we know we can't unplug the device, fail immediately.  Also emit
   DEVICE_UNPLUG_ERROR.

   If possible, unplug the device synchronously and succeed.  Also emit
   DEVICE_DELETED.

   Else, initiate unplug and succeed.

   When unplug finishes, emit either DEVICE_DELETED or
   DEVICE_UNPLUG_ERROR.

   For some machines and devices, unplug may never finish.

Correct?

Any particular reason for not putting event DEVICE_UNPLUG_ERROR next to
DEVICE_DELETED in qdev.json?
Daniel Henrique Barboza June 16, 2021, 4:41 p.m. UTC | #5
On 6/11/21 9:12 AM, Markus Armbruster wrote:
> Daniel Henrique Barboza <danielhb413@gmail.com> writes:
> 
>> At this moment we only provide one event to report a hotunplug error,
>> MEM_UNPLUG_ERROR. As of Linux kernel 5.12 and QEMU 6.0.0, the pseries
>> machine is now able to report unplug errors for other device types, such
>> as CPUs.
>>
>> Instead of creating a (device_type)_UNPLUG_ERROR for each new device,
>> create a generic DEVICE_UNPLUG_ERROR event that can be used by all
>> unplug errors in the future.
>>
>> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
>> ---
>>   qapi/machine.json | 23 +++++++++++++++++++++++
>>   1 file changed, 23 insertions(+)
>>
>> diff --git a/qapi/machine.json b/qapi/machine.json
>> index 58a9c86b36..f0c7e56be0 100644
>> --- a/qapi/machine.json
>> +++ b/qapi/machine.json
>> @@ -1274,3 +1274,26 @@
>>   ##
>>   { 'event': 'MEM_UNPLUG_ERROR',
>>     'data': { 'device': 'str', 'msg': 'str' } }
>> +
>> +##
>> +# @DEVICE_UNPLUG_ERROR:
>> +#
>> +# Emitted when a device hot unplug error occurs.
>> +#
>> +# @device: device name
>> +#
>> +# @msg: Informative message
>> +#
>> +# Since: 6.1
>> +#
>> +# Example:
>> +#
>> +# <- { "event": "DEVICE_UNPLUG_ERROR"
>> +#      "data": { "device": "dimm1",
>> +#                "msg": "Memory hotunplug rejected by the guest for device dimm1"
>> +#      },
>> +#      "timestamp": { "seconds": 1615570772, "microseconds": 202844 } }
>> +#
>> +##
>> +{ 'event': 'DEVICE_UNPLUG_ERROR',
>> +  'data': { 'device': 'str', 'msg': 'str' } }
> 
> Missing: update of device_add's doc comment in qdev.json:
> 
>      # Notes: When this command completes, the device may not be removed from the
>      #        guest.  Hot removal is an operation that requires guest cooperation.
>      #        This command merely requests that the guest begin the hot removal
>      #        process.  Completion of the device removal process is signaled with a
>      #        DEVICE_DELETED event. Guest reset will automatically complete removal
>      #        for all devices.

Ok

> 
> This sure could use some polish.
> 
> If I understand things correctly, we're aiming for the following device
> unplug protocol:

One thing to note is that DEVICE_UNPLUG_ERROR isn't guaranteed to be send for
every hotunplug error. The event depends on machine/architecture support to
detect a guest side error.

> 
>     Unplug the device with device_del (or possibly equivalent)
> 
>     If we know we can't unplug the device, fail immediately.  Also emit
>     DEVICE_UNPLUG_ERROR.


I haven't predicted to use this event in those cases as well, although it
seems reasonable to do so now that you mentioned it.

> 
>     If possible, unplug the device synchronously and succeed.  Also emit
>     DEVICE_DELETED.
> 
>     Else, initiate unplug and succeed.
> 
>     When unplug finishes, emit either DEVICE_DELETED or
>     DEVICE_UNPLUG_ERROR.

Since there's no 100% guarantee that DEVICE_UNPLUG_ERROR will be emitted for
guest side errors, the wording here would be

"When unplug finishes, emit DEVICE_DELETED. A DEVICE_UNPLUG_ERROR can be
emitted if a guest side error was detected"


> 
>     For some machines and devices, unplug may never finish.
> 
> Correct?
> 
> Any particular reason for not putting event DEVICE_UNPLUG_ERROR next to
> DEVICE_DELETED in qdev.json?


Not really. I looked where MEM_UNPLUG_ERROR was declared and put it right
after it. I can change it to qdev.json near DEVICE_DELETED.



Thanks,


Daniel

> 
>
Markus Armbruster June 17, 2021, 5:59 a.m. UTC | #6
Daniel Henrique Barboza <danielhb413@gmail.com> writes:

> On 6/11/21 9:12 AM, Markus Armbruster wrote:
>> Daniel Henrique Barboza <danielhb413@gmail.com> writes:
>> 
>>> At this moment we only provide one event to report a hotunplug error,
>>> MEM_UNPLUG_ERROR. As of Linux kernel 5.12 and QEMU 6.0.0, the pseries
>>> machine is now able to report unplug errors for other device types, such
>>> as CPUs.
>>>
>>> Instead of creating a (device_type)_UNPLUG_ERROR for each new device,
>>> create a generic DEVICE_UNPLUG_ERROR event that can be used by all
>>> unplug errors in the future.
>>>
>>> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
>>> ---
>>>   qapi/machine.json | 23 +++++++++++++++++++++++
>>>   1 file changed, 23 insertions(+)
>>>
>>> diff --git a/qapi/machine.json b/qapi/machine.json
>>> index 58a9c86b36..f0c7e56be0 100644
>>> --- a/qapi/machine.json
>>> +++ b/qapi/machine.json
>>> @@ -1274,3 +1274,26 @@
>>>   ##
>>>   { 'event': 'MEM_UNPLUG_ERROR',
>>>     'data': { 'device': 'str', 'msg': 'str' } }
>>> +
>>> +##
>>> +# @DEVICE_UNPLUG_ERROR:
>>> +#
>>> +# Emitted when a device hot unplug error occurs.
>>> +#
>>> +# @device: device name
>>> +#
>>> +# @msg: Informative message
>>> +#
>>> +# Since: 6.1
>>> +#
>>> +# Example:
>>> +#
>>> +# <- { "event": "DEVICE_UNPLUG_ERROR"
>>> +#      "data": { "device": "dimm1",
>>> +#                "msg": "Memory hotunplug rejected by the guest for device dimm1"
>>> +#      },
>>> +#      "timestamp": { "seconds": 1615570772, "microseconds": 202844 } }
>>> +#
>>> +##
>>> +{ 'event': 'DEVICE_UNPLUG_ERROR',
>>> +  'data': { 'device': 'str', 'msg': 'str' } }
>> 
>> Missing: update of device_add's doc comment in qdev.json:
>> 
>>      # Notes: When this command completes, the device may not be removed from the
>>      #        guest.  Hot removal is an operation that requires guest cooperation.
>>      #        This command merely requests that the guest begin the hot removal
>>      #        process.  Completion of the device removal process is signaled with a
>>      #        DEVICE_DELETED event. Guest reset will automatically complete removal
>>      #        for all devices.
>
> Ok
>
>> 
>> This sure could use some polish.
>> 
>> If I understand things correctly, we're aiming for the following device
>> unplug protocol:
>
> One thing to note is that DEVICE_UNPLUG_ERROR isn't guaranteed to be send for
> every hotunplug error. The event depends on machine/architecture support to
> detect a guest side error.

Yes.  I tried to provide for that in my description of the protocol.

>> 
>>     Unplug the device with device_del (or possibly equivalent)
>> 
>>     If we know we can't unplug the device, fail immediately.  Also emit
>>     DEVICE_UNPLUG_ERROR.
>
>
> I haven't predicted to use this event in those cases as well, although it
> seems reasonable to do so now that you mentioned it.

I think this is a matter of taste.  For what it's worth, we do emit
DEVICE_DELETED on immediate success (see next paragraph).

Emitting DEVICE_DELETED always lets QMP clients track device deletion
even when it's triggered by something else.  Feature.

No such tracking is needed for unplug failure.

>>     If possible, unplug the device synchronously and succeed.  Also emit
>>     DEVICE_DELETED.
>> 
>>     Else, initiate unplug and succeed.
>> 
>>     When unplug finishes, emit either DEVICE_DELETED or
>>     DEVICE_UNPLUG_ERROR.
>
> Since there's no 100% guarantee that DEVICE_UNPLUG_ERROR will be emitted for
> guest side errors, the wording here would be
>
> "When unplug finishes, emit DEVICE_DELETED. A DEVICE_UNPLUG_ERROR can be
> emitted if a guest side error was detected"

My assumption is that QEMU can detect asynchronous success reliably, but
not asynchronous failure.

From QEMU's point of view, "asynchronous unplug has not finished" is
indistinguishable from "asynchronous unplug finished unsuccessfully, but
we can't detect it".  So I lumped these two cases together in the next
paragraph:

>>     For some machines and devices, unplug may never finish.

My goal is a clear description of the state machine as it can be
observed in QMP.

>> Correct?
>> 
>> Any particular reason for not putting event DEVICE_UNPLUG_ERROR next to
>> DEVICE_DELETED in qdev.json?
>
>
> Not really. I looked where MEM_UNPLUG_ERROR was declared and put it right
> after it. I can change it to qdev.json near DEVICE_DELETED.

Yes, please.
diff mbox series

Patch

diff --git a/qapi/machine.json b/qapi/machine.json
index 58a9c86b36..f0c7e56be0 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -1274,3 +1274,26 @@ 
 ##
 { 'event': 'MEM_UNPLUG_ERROR',
   'data': { 'device': 'str', 'msg': 'str' } }
+
+##
+# @DEVICE_UNPLUG_ERROR:
+#
+# Emitted when a device hot unplug error occurs.
+#
+# @device: device name
+#
+# @msg: Informative message
+#
+# Since: 6.1
+#
+# Example:
+#
+# <- { "event": "DEVICE_UNPLUG_ERROR"
+#      "data": { "device": "dimm1",
+#                "msg": "Memory hotunplug rejected by the guest for device dimm1"
+#      },
+#      "timestamp": { "seconds": 1615570772, "microseconds": 202844 } }
+#
+##
+{ 'event': 'DEVICE_UNPLUG_ERROR',
+  'data': { 'device': 'str', 'msg': 'str' } }