diff mbox series

[RFC,1/3] hw/intc/s390_flic: Migrate pending state

Message ID 20240525131241.378473-2-npiggin@gmail.com (mailing list archive)
State New, archived
Headers show
Series Fix s390x flic migration and add some more qtests | expand

Commit Message

Nicholas Piggin May 25, 2024, 1:12 p.m. UTC
The flic pending state is not migrated, so if the machine is migrated
while an interrupt is pending, it can be lost. This shows up in
qtest migration test, an extint is pending (due to console writes?)
and the CPU waits via s390_cpu_set_psw and expects the interrupt to
wake it. However when the flic pending state is lost, s390_cpu_has_int
returns false, so s390_cpu_exec_interrupt falls through to halting
again.

Fix this by migrating pending. This prevents the qtest from hanging.
Does service_param need to be migrated? Or the IO lists?

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 hw/intc/s390_flic.c | 1 +
 1 file changed, 1 insertion(+)

Comments

David Hildenbrand May 26, 2024, 3:53 p.m. UTC | #1
Am 25.05.24 um 15:12 schrieb Nicholas Piggin:
> The flic pending state is not migrated, so if the machine is migrated
> while an interrupt is pending, it can be lost. This shows up in
> qtest migration test, an extint is pending (due to console writes?)
> and the CPU waits via s390_cpu_set_psw and expects the interrupt to
> wake it. However when the flic pending state is lost, s390_cpu_has_int
> returns false, so s390_cpu_exec_interrupt falls through to halting
> again.
> 
> Fix this by migrating pending. This prevents the qtest from hanging.
> Does service_param need to be migrated? Or the IO lists?
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   hw/intc/s390_flic.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/hw/intc/s390_flic.c b/hw/intc/s390_flic.c
> index 6771645699..b70cf2295a 100644
> --- a/hw/intc/s390_flic.c
> +++ b/hw/intc/s390_flic.c
> @@ -369,6 +369,7 @@ static const VMStateDescription qemu_s390_flic_vmstate = {
>       .fields = (const VMStateField[]) {
>           VMSTATE_UINT8(simm, QEMUS390FLICState),
>           VMSTATE_UINT8(nimm, QEMUS390FLICState),
> +        VMSTATE_UINT32(pending, QEMUS390FLICState),
>           VMSTATE_END_OF_LIST()
>       }
>   };

Likely you have to handle this using QEMU compat machines.
Richard Henderson May 26, 2024, 7:44 p.m. UTC | #2
On 5/26/24 08:53, David Hildenbrand wrote:
> Am 25.05.24 um 15:12 schrieb Nicholas Piggin:
>> The flic pending state is not migrated, so if the machine is migrated
>> while an interrupt is pending, it can be lost. This shows up in
>> qtest migration test, an extint is pending (due to console writes?)
>> and the CPU waits via s390_cpu_set_psw and expects the interrupt to
>> wake it. However when the flic pending state is lost, s390_cpu_has_int
>> returns false, so s390_cpu_exec_interrupt falls through to halting
>> again.
>>
>> Fix this by migrating pending. This prevents the qtest from hanging.
>> Does service_param need to be migrated? Or the IO lists?
>>
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>>   hw/intc/s390_flic.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/hw/intc/s390_flic.c b/hw/intc/s390_flic.c
>> index 6771645699..b70cf2295a 100644
>> --- a/hw/intc/s390_flic.c
>> +++ b/hw/intc/s390_flic.c
>> @@ -369,6 +369,7 @@ static const VMStateDescription qemu_s390_flic_vmstate = {
>>       .fields = (const VMStateField[]) {
>>           VMSTATE_UINT8(simm, QEMUS390FLICState),
>>           VMSTATE_UINT8(nimm, QEMUS390FLICState),
>> +        VMSTATE_UINT32(pending, QEMUS390FLICState),
>>           VMSTATE_END_OF_LIST()
>>       }
>>   };
> 
> Likely you have to handle this using QEMU compat machines.

Well, since existing migration is broken, I don't think you have to preserve 
compatibility.  But you do have to bump the version number.


r~
David Hildenbrand May 26, 2024, 8:33 p.m. UTC | #3
Am 26.05.24 um 21:44 schrieb Richard Henderson:
> On 5/26/24 08:53, David Hildenbrand wrote:
>> Am 25.05.24 um 15:12 schrieb Nicholas Piggin:
>>> The flic pending state is not migrated, so if the machine is migrated
>>> while an interrupt is pending, it can be lost. This shows up in
>>> qtest migration test, an extint is pending (due to console writes?)
>>> and the CPU waits via s390_cpu_set_psw and expects the interrupt to
>>> wake it. However when the flic pending state is lost, s390_cpu_has_int
>>> returns false, so s390_cpu_exec_interrupt falls through to halting
>>> again.
>>>
>>> Fix this by migrating pending. This prevents the qtest from hanging.
>>> Does service_param need to be migrated? Or the IO lists?
>>>
>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>> ---
>>>   hw/intc/s390_flic.c | 1 +
>>>   1 file changed, 1 insertion(+)
>>>
>>> diff --git a/hw/intc/s390_flic.c b/hw/intc/s390_flic.c
>>> index 6771645699..b70cf2295a 100644
>>> --- a/hw/intc/s390_flic.c
>>> +++ b/hw/intc/s390_flic.c
>>> @@ -369,6 +369,7 @@ static const VMStateDescription qemu_s390_flic_vmstate = {
>>>       .fields = (const VMStateField[]) {
>>>           VMSTATE_UINT8(simm, QEMUS390FLICState),
>>>           VMSTATE_UINT8(nimm, QEMUS390FLICState),
>>> +        VMSTATE_UINT32(pending, QEMUS390FLICState),
>>>           VMSTATE_END_OF_LIST()
>>>       }
>>>   };
>>
>> Likely you have to handle this using QEMU compat machines.
> 
> Well, since existing migration is broken, I don't think you have to preserve 

Migration is broken only in some case "while an interrupt is pending, it can be 
lost".

> compatibility.  But you do have to bump the version number.

Looking at it, this is TCG only, so likely we don't care that much about 
migration compatibility. But I have no idea what level of compatibility we want 
to support there.
Thomas Huth May 27, 2024, 5:51 a.m. UTC | #4
On 26/05/2024 22.33, David Hildenbrand wrote:
> Am 26.05.24 um 21:44 schrieb Richard Henderson:
>> On 5/26/24 08:53, David Hildenbrand wrote:
>>> Am 25.05.24 um 15:12 schrieb Nicholas Piggin:
>>>> The flic pending state is not migrated, so if the machine is migrated
>>>> while an interrupt is pending, it can be lost. This shows up in
>>>> qtest migration test, an extint is pending (due to console writes?)
>>>> and the CPU waits via s390_cpu_set_psw and expects the interrupt to
>>>> wake it. However when the flic pending state is lost, s390_cpu_has_int
>>>> returns false, so s390_cpu_exec_interrupt falls through to halting
>>>> again.
>>>>
>>>> Fix this by migrating pending. This prevents the qtest from hanging.
>>>> Does service_param need to be migrated? Or the IO lists?
>>>>
>>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>>> ---
>>>>   hw/intc/s390_flic.c | 1 +
>>>>   1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/hw/intc/s390_flic.c b/hw/intc/s390_flic.c
>>>> index 6771645699..b70cf2295a 100644
>>>> --- a/hw/intc/s390_flic.c
>>>> +++ b/hw/intc/s390_flic.c
>>>> @@ -369,6 +369,7 @@ static const VMStateDescription 
>>>> qemu_s390_flic_vmstate = {
>>>>       .fields = (const VMStateField[]) {
>>>>           VMSTATE_UINT8(simm, QEMUS390FLICState),
>>>>           VMSTATE_UINT8(nimm, QEMUS390FLICState),
>>>> +        VMSTATE_UINT32(pending, QEMUS390FLICState),
>>>>           VMSTATE_END_OF_LIST()
>>>>       }
>>>>   };
>>>
>>> Likely you have to handle this using QEMU compat machines.
>>
>> Well, since existing migration is broken, I don't think you have to preserve 
> 
> Migration is broken only in some case "while an interrupt is pending, it can 
> be lost".
> 
>> compatibility.  But you do have to bump the version number.
> 
> Looking at it, this is TCG only, so likely we don't care that much about 
> migration compatibility. But I have no idea what level of compatibility we 
> want to support there.

Yes, this seems to only affect the TCG-only flic device, where migration has 
never been working very reliably. So I think we don't really need the whole 
compat-machine dance here. But I think we should at least bump the 
version_id to 2 now and then use

    VMSTATE_UINT32_V(pending, QEMUS390FLICState, 2);

for the new field. That way we would at least support forward migrations 
without too much hassle.

  Thomas
Nicholas Piggin May 27, 2024, 7:11 a.m. UTC | #5
On Mon May 27, 2024 at 3:51 PM AEST, Thomas Huth wrote:
> On 26/05/2024 22.33, David Hildenbrand wrote:
> > Am 26.05.24 um 21:44 schrieb Richard Henderson:
> >> On 5/26/24 08:53, David Hildenbrand wrote:
> >>> Am 25.05.24 um 15:12 schrieb Nicholas Piggin:
> >>>> The flic pending state is not migrated, so if the machine is migrated
> >>>> while an interrupt is pending, it can be lost. This shows up in
> >>>> qtest migration test, an extint is pending (due to console writes?)
> >>>> and the CPU waits via s390_cpu_set_psw and expects the interrupt to
> >>>> wake it. However when the flic pending state is lost, s390_cpu_has_int
> >>>> returns false, so s390_cpu_exec_interrupt falls through to halting
> >>>> again.
> >>>>
> >>>> Fix this by migrating pending. This prevents the qtest from hanging.
> >>>> Does service_param need to be migrated? Or the IO lists?
> >>>>
> >>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> >>>> ---
> >>>>   hw/intc/s390_flic.c | 1 +
> >>>>   1 file changed, 1 insertion(+)
> >>>>
> >>>> diff --git a/hw/intc/s390_flic.c b/hw/intc/s390_flic.c
> >>>> index 6771645699..b70cf2295a 100644
> >>>> --- a/hw/intc/s390_flic.c
> >>>> +++ b/hw/intc/s390_flic.c
> >>>> @@ -369,6 +369,7 @@ static const VMStateDescription 
> >>>> qemu_s390_flic_vmstate = {
> >>>>       .fields = (const VMStateField[]) {
> >>>>           VMSTATE_UINT8(simm, QEMUS390FLICState),
> >>>>           VMSTATE_UINT8(nimm, QEMUS390FLICState),
> >>>> +        VMSTATE_UINT32(pending, QEMUS390FLICState),
> >>>>           VMSTATE_END_OF_LIST()
> >>>>       }
> >>>>   };
> >>>
> >>> Likely you have to handle this using QEMU compat machines.
> >>
> >> Well, since existing migration is broken, I don't think you have to preserve 
> > 
> > Migration is broken only in some case "while an interrupt is pending, it can 
> > be lost".
> > 
> >> compatibility.  But you do have to bump the version number.
> > 
> > Looking at it, this is TCG only, so likely we don't care that much about 
> > migration compatibility. But I have no idea what level of compatibility we 
> > want to support there.
>
> Yes, this seems to only affect the TCG-only flic device, where migration has 
> never been working very reliably. So I think we don't really need the whole 
> compat-machine dance here. But I think we should at least bump the 
> version_id to 2 now and then use
>
>     VMSTATE_UINT32_V(pending, QEMUS390FLICState, 2);
>
> for the new field. That way we would at least support forward migrations 
> without too much hassle.

Well if you could rebuild this state by checking sources or something
it might be possible to avoid. Or if you could always mark pending and
software can tolerate superflous. But that also seems like a lot of
headache for something that was always flaky.

Thanks,
Nick
diff mbox series

Patch

diff --git a/hw/intc/s390_flic.c b/hw/intc/s390_flic.c
index 6771645699..b70cf2295a 100644
--- a/hw/intc/s390_flic.c
+++ b/hw/intc/s390_flic.c
@@ -369,6 +369,7 @@  static const VMStateDescription qemu_s390_flic_vmstate = {
     .fields = (const VMStateField[]) {
         VMSTATE_UINT8(simm, QEMUS390FLICState),
         VMSTATE_UINT8(nimm, QEMUS390FLICState),
+        VMSTATE_UINT32(pending, QEMUS390FLICState),
         VMSTATE_END_OF_LIST()
     }
 };