diff mbox

[for-2.7,1/1] ide: fix halted IO segfault at reset

Message ID 1469570853-19770-2-git-send-email-jsnow@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

John Snow July 26, 2016, 10:07 p.m. UTC
If one attempts to perform a system_reset after a failed IO request
that causes the VM to enter a paused state, QEMU will segfault trying
to free up the pending IO requests.

These requests have already been completed and freed, though, so all
we need to do is free them before we enter the paused state.

Existing AHCI tests verify that halted requests are still resumed
successfully after a STOP event.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 hw/ide/core.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Laszlo Ersek July 27, 2016, 1:04 p.m. UTC | #1
On 07/27/16 00:07, John Snow wrote:
> If one attempts to perform a system_reset after a failed IO request
> that causes the VM to enter a paused state, QEMU will segfault trying
> to free up the pending IO requests.
> 
> These requests have already been completed and freed, though, so all
> we need to do is free them before we enter the paused state.
> 
> Existing AHCI tests verify that halted requests are still resumed
> successfully after a STOP event.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  hw/ide/core.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/ide/core.c b/hw/ide/core.c
> index 081c9eb..d117b7c 100644
> --- a/hw/ide/core.c
> +++ b/hw/ide/core.c
> @@ -823,6 +823,7 @@ static void ide_dma_cb(void *opaque, int ret)
>      }
>      if (ret < 0) {
>          if (ide_handle_rw_error(s, -ret, ide_dma_cmd_to_retry(s->dma_cmd))) {
> +            s->bus->dma->aiocb = NULL;
>              return;
>          }
>      }
> 

Reviewed-by: Laszlo Ersek <lersek@redhat.com>

Should this be a candidate for 2.6 stable?

Thanks
Laszlo
John Snow July 27, 2016, 2:30 p.m. UTC | #2
On 07/27/2016 09:04 AM, Laszlo Ersek wrote:
> On 07/27/16 00:07, John Snow wrote:
>> If one attempts to perform a system_reset after a failed IO request
>> that causes the VM to enter a paused state, QEMU will segfault trying
>> to free up the pending IO requests.
>>
>> These requests have already been completed and freed, though, so all
>> we need to do is free them before we enter the paused state.
>>

s|free them|null them| ... will fix on commit.

>> Existing AHCI tests verify that halted requests are still resumed
>> successfully after a STOP event.
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>> ---
>>  hw/ide/core.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/hw/ide/core.c b/hw/ide/core.c
>> index 081c9eb..d117b7c 100644
>> --- a/hw/ide/core.c
>> +++ b/hw/ide/core.c
>> @@ -823,6 +823,7 @@ static void ide_dma_cb(void *opaque, int ret)
>>      }
>>      if (ret < 0) {
>>          if (ide_handle_rw_error(s, -ret, ide_dma_cmd_to_retry(s->dma_cmd))) {
>> +            s->bus->dma->aiocb = NULL;
>>              return;
>>          }
>>      }
>>
> 
> Reviewed-by: Laszlo Ersek <lersek@redhat.com>
> 
> Should this be a candidate for 2.6 stable?
> 
> Thanks
> Laszlo
> 

You're right. I'll do a [RESEND] to -stable, thanks.

And since I neglected to mention it in the commit message, thanks to
Laszlo Ersek here for an excellent diagnostic on the cause of the segfault.

--js
Paolo Bonzini Aug. 1, 2016, 8:52 a.m. UTC | #3
On 27/07/2016 00:07, John Snow wrote:
> If one attempts to perform a system_reset after a failed IO request
> that causes the VM to enter a paused state, QEMU will segfault trying
> to free up the pending IO requests.
> 
> These requests have already been completed and freed, though, so all
> we need to do is free them before we enter the paused state.
> 
> Existing AHCI tests verify that halted requests are still resumed
> successfully after a STOP event.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  hw/ide/core.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/ide/core.c b/hw/ide/core.c
> index 081c9eb..d117b7c 100644
> --- a/hw/ide/core.c
> +++ b/hw/ide/core.c
> @@ -823,6 +823,7 @@ static void ide_dma_cb(void *opaque, int ret)
>      }
>      if (ret < 0) {
>          if (ide_handle_rw_error(s, -ret, ide_dma_cmd_to_retry(s->dma_cmd))) {
> +            s->bus->dma->aiocb = NULL;
>              return;
>          }
>      }
> 

The patch is (was, since it's committed :)) okay, but I think there is
another bug in the REPORT case, where ide_rw_error and
ide_atapi_io_error are not calling ide_set_inactive and thus are leaving
s->bus->dma->aiocb non-NULL.

Paolo
John Snow Aug. 2, 2016, 3:31 a.m. UTC | #4
On 08/01/2016 04:52 AM, Paolo Bonzini wrote:
>
>
> On 27/07/2016 00:07, John Snow wrote:
>> If one attempts to perform a system_reset after a failed IO request
>> that causes the VM to enter a paused state, QEMU will segfault trying
>> to free up the pending IO requests.
>>
>> These requests have already been completed and freed, though, so all
>> we need to do is free them before we enter the paused state.
>>
>> Existing AHCI tests verify that halted requests are still resumed
>> successfully after a STOP event.
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>> ---
>>  hw/ide/core.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/hw/ide/core.c b/hw/ide/core.c
>> index 081c9eb..d117b7c 100644
>> --- a/hw/ide/core.c
>> +++ b/hw/ide/core.c
>> @@ -823,6 +823,7 @@ static void ide_dma_cb(void *opaque, int ret)
>>      }
>>      if (ret < 0) {
>>          if (ide_handle_rw_error(s, -ret, ide_dma_cmd_to_retry(s->dma_cmd))) {
>> +            s->bus->dma->aiocb = NULL;
>>              return;
>>          }
>>      }
>>
>
> The patch is (was, since it's committed :)) okay, but I think there is
> another bug in the REPORT case, where ide_rw_error and
> ide_atapi_io_error are not calling ide_set_inactive and thus are leaving
> s->bus->dma->aiocb non-NULL.
>
> Paolo
>

...Aha. Good eye.

I can probably just shift the aiocb nulling up a bit, but leave it in 
ide_dma_cb.
John Snow Aug. 2, 2016, 3:37 a.m. UTC | #5
On 08/01/2016 04:52 AM, Paolo Bonzini wrote:
>
>
> On 27/07/2016 00:07, John Snow wrote:
>> If one attempts to perform a system_reset after a failed IO request
>> that causes the VM to enter a paused state, QEMU will segfault trying
>> to free up the pending IO requests.
>>
>> These requests have already been completed and freed, though, so all
>> we need to do is free them before we enter the paused state.
>>
>> Existing AHCI tests verify that halted requests are still resumed
>> successfully after a STOP event.
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>> ---
>>  hw/ide/core.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/hw/ide/core.c b/hw/ide/core.c
>> index 081c9eb..d117b7c 100644
>> --- a/hw/ide/core.c
>> +++ b/hw/ide/core.c
>> @@ -823,6 +823,7 @@ static void ide_dma_cb(void *opaque, int ret)
>>      }
>>      if (ret < 0) {
>>          if (ide_handle_rw_error(s, -ret, ide_dma_cmd_to_retry(s->dma_cmd))) {
>> +            s->bus->dma->aiocb = NULL;
>>              return;
>>          }
>>      }
>>
>
> The patch is (was, since it's committed :)) okay, but I think there is
> another bug in the REPORT case, where ide_rw_error and
> ide_atapi_io_error are not calling ide_set_inactive and thus are leaving
> s->bus->dma->aiocb non-NULL.
>
> Paolo
>

Actually, won't we hit ide_dma_error on REPORT which calls 
ide_set_inactive? I think this might be OK, but I have to audit a little 
more carefully -- I will do so tomorrow.

I think the ide_rw_error case is likely OK, but I always manage to 
forget exactly how the ATAPI DMA looks.
Paolo Bonzini Aug. 2, 2016, 5:06 p.m. UTC | #6
> > The patch is (was, since it's committed :)) okay, but I think there is
> > another bug in the REPORT case, where ide_rw_error and
> > ide_atapi_io_error are not calling ide_set_inactive and thus are leaving
> > s->bus->dma->aiocb non-NULL.
> >
> > Paolo
> >
> 
> Actually, won't we hit ide_dma_error on REPORT which calls
> ide_set_inactive? I think this might be OK, but I have to audit a little
> more carefully -- I will do so tomorrow.
> 
> I think the ide_rw_error case is likely OK, but I always manage to
> forget exactly how the ATAPI DMA looks.

Indeed ide_rw_error is okay because ide_sector_read and ide_sector_write
do reset pio_aiocb early enough; ATAPI is wrong because IDE_RETRY_ATAPI
does not pass IS_IDE_RETRY_DMA.

Paolo
Paolo Bonzini Aug. 2, 2016, 5:08 p.m. UTC | #7
> >> diff --git a/hw/ide/core.c b/hw/ide/core.c
> >> index 081c9eb..d117b7c 100644
> >> --- a/hw/ide/core.c
> >> +++ b/hw/ide/core.c
> >> @@ -823,6 +823,7 @@ static void ide_dma_cb(void *opaque, int ret)
> >>      }
> >>      if (ret < 0) {
> >>          if (ide_handle_rw_error(s, -ret,
> >>          ide_dma_cmd_to_retry(s->dma_cmd))) {
> >> +            s->bus->dma->aiocb = NULL;
> >>              return;
> >>          }
> >>      }
> >>
> >
> > The patch is (was, since it's committed :)) okay, but I think there is
> > another bug in the REPORT case, where ide_rw_error and
> > ide_atapi_io_error are not calling ide_set_inactive and thus are leaving
> > s->bus->dma->aiocb non-NULL.
> 
> I can probably just shift the aiocb nulling up a bit, but leave it in
> ide_dma_cb.

ATAPI is ide_atapi_cmd_read_dma_cb, you can do the same fix there that you
did in this patch.

Paolo
diff mbox

Patch

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 081c9eb..d117b7c 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -823,6 +823,7 @@  static void ide_dma_cb(void *opaque, int ret)
     }
     if (ret < 0) {
         if (ide_handle_rw_error(s, -ret, ide_dma_cmd_to_retry(s->dma_cmd))) {
+            s->bus->dma->aiocb = NULL;
             return;
         }
     }