Message ID | 20221220125349.45091-1-yangxingui@huawei.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | scsi: libsas: Grab the host lock in sas_ata_device_link_abort() | expand |
On 20/12/2022 12:53, Xingui Yang wrote: > Grab the host lock in sas_ata_device_link_abort() before calling This is should be the ata port lock, right? I know that the ata comments say differently. > ata_link_abort(), as the comment in ata_link_abort() mentions. > Can you please add a fixes tag? > Signed-off-by: Xingui Yang <yangxingui@huawei.com> Reviewed-by: John Garry <john.g.garry@oracle.com> > --- > drivers/scsi/libsas/sas_ata.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c > index f7439bf9cdc6..4f2017b21e6d 100644 > --- a/drivers/scsi/libsas/sas_ata.c > +++ b/drivers/scsi/libsas/sas_ata.c > @@ -889,7 +889,9 @@ void sas_ata_device_link_abort(struct domain_device *device, bool force_reset) > { > struct ata_port *ap = device->sata_dev.ap; > struct ata_link *link = &ap->link; > + unsigned long flags; > > + spin_lock_irqsave(ap->lock, flags); > device->sata_dev.fis[2] = ATA_ERR | ATA_DRDY; /* tf status */ > device->sata_dev.fis[3] = ATA_ABORTED; /* tf error */ > > @@ -897,6 +899,7 @@ void sas_ata_device_link_abort(struct domain_device *device, bool force_reset) > if (force_reset) > link->eh_info.action |= ATA_EH_RESET; > ata_link_abort(link); > + spin_unlock_irqrestore(ap->lock, flags); > } > EXPORT_SYMBOL_GPL(sas_ata_device_link_abort); >
On 2022/12/20 23:59, John Garry wrote: > On 20/12/2022 12:53, Xingui Yang wrote: >> Grab the host lock in sas_ata_device_link_abort() before calling > > This is should be the ata port lock, right? I know that the ata comments > say differently. > >> ata_link_abort(), as the comment in ata_link_abort() mentions. >> > > Can you please add a fixes tag? > >> Signed-off-by: Xingui Yang <yangxingui@huawei.com> > > Reviewed-by: John Garry <john.g.garry@oracle.com> > >> --- >> drivers/scsi/libsas/sas_ata.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c >> index f7439bf9cdc6..4f2017b21e6d 100644 >> --- a/drivers/scsi/libsas/sas_ata.c >> +++ b/drivers/scsi/libsas/sas_ata.c >> @@ -889,7 +889,9 @@ void sas_ata_device_link_abort(struct domain_device *device, bool force_reset) >> { >> struct ata_port *ap = device->sata_dev.ap; >> struct ata_link *link = &ap->link; >> + unsigned long flags; >> >> + spin_lock_irqsave(ap->lock, flags); >> device->sata_dev.fis[2] = ATA_ERR | ATA_DRDY; /* tf status */ >> device->sata_dev.fis[3] = ATA_ABORTED; /* tf error */ >> >> @@ -897,6 +899,7 @@ void sas_ata_device_link_abort(struct domain_device *device, bool force_reset) >> if (force_reset) >> link->eh_info.action |= ATA_EH_RESET; >> ata_link_abort(link); Really need to add lockdep annotations in libata to avoid/catch such bugs... Will work on that. >> + spin_unlock_irqrestore(ap->lock, flags); >> } >> EXPORT_SYMBOL_GPL(sas_ata_device_link_abort); >> >
On 2022/12/20 22:59, John Garry wrote: > On 20/12/2022 12:53, Xingui Yang wrote: >> Grab the host lock in sas_ata_device_link_abort() before calling > > This is should be the ata port lock, right? I know that the ata comments > say differently. ok, I will update the commit message and use ata port lock instead. > >> ata_link_abort(), as the comment in ata_link_abort() mentions. >> > > Can you please add a fixes tag? ok, I will update and add a fix tag. Thanks, Xingui > >> Signed-off-by: Xingui Yang <yangxingui@huawei.com> > > Reviewed-by: John Garry <john.g.garry@oracle.com> > >> --- >> drivers/scsi/libsas/sas_ata.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/drivers/scsi/libsas/sas_ata.c >> b/drivers/scsi/libsas/sas_ata.c >> index f7439bf9cdc6..4f2017b21e6d 100644 >> --- a/drivers/scsi/libsas/sas_ata.c >> +++ b/drivers/scsi/libsas/sas_ata.c >> @@ -889,7 +889,9 @@ void sas_ata_device_link_abort(struct >> domain_device *device, bool force_reset) >> { >> struct ata_port *ap = device->sata_dev.ap; >> struct ata_link *link = &ap->link; >> + unsigned long flags; >> + spin_lock_irqsave(ap->lock, flags); >> device->sata_dev.fis[2] = ATA_ERR | ATA_DRDY; /* tf status */ >> device->sata_dev.fis[3] = ATA_ABORTED; /* tf error */ >> @@ -897,6 +899,7 @@ void sas_ata_device_link_abort(struct >> domain_device *device, bool force_reset) >> if (force_reset) >> link->eh_info.action |= ATA_EH_RESET; >> ata_link_abort(link); >> + spin_unlock_irqrestore(ap->lock, flags); >> } >> EXPORT_SYMBOL_GPL(sas_ata_device_link_abort); > > .
On 2022/12/21 8:36, Damien Le Moal wrote: > On 2022/12/20 23:59, John Garry wrote: >> On 20/12/2022 12:53, Xingui Yang wrote: >>> Grab the host lock in sas_ata_device_link_abort() before calling >> >> This is should be the ata port lock, right? I know that the ata comments >> say differently. >> >>> ata_link_abort(), as the comment in ata_link_abort() mentions. >>> >> >> Can you please add a fixes tag? >> >>> Signed-off-by: Xingui Yang <yangxingui@huawei.com> >> >> Reviewed-by: John Garry <john.g.garry@oracle.com> >> >>> --- >>> drivers/scsi/libsas/sas_ata.c | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c >>> index f7439bf9cdc6..4f2017b21e6d 100644 >>> --- a/drivers/scsi/libsas/sas_ata.c >>> +++ b/drivers/scsi/libsas/sas_ata.c >>> @@ -889,7 +889,9 @@ void sas_ata_device_link_abort(struct domain_device *device, bool force_reset) >>> { >>> struct ata_port *ap = device->sata_dev.ap; >>> struct ata_link *link = &ap->link; >>> + unsigned long flags; >>> >>> + spin_lock_irqsave(ap->lock, flags); >>> device->sata_dev.fis[2] = ATA_ERR | ATA_DRDY; /* tf status */ >>> device->sata_dev.fis[3] = ATA_ABORTED; /* tf error */ >>> >>> @@ -897,6 +899,7 @@ void sas_ata_device_link_abort(struct domain_device *device, bool force_reset) >>> if (force_reset) >>> link->eh_info.action |= ATA_EH_RESET; >>> ata_link_abort(link); > > Really need to add lockdep annotations in libata to avoid/catch such bugs... > Will work on that. Actually in libata there are many places calling ata_link_abort() not holding port lock. And some places are holding the real host lock(ata_host->lock) while calling ata_link_abort(). So if you add the lockdep annotations, there may be too many warnings. If these are real issues, we should fix them first. Thanks, Jason
On 2022/12/21 11:42, Jason Yan wrote: > On 2022/12/21 8:36, Damien Le Moal wrote: >> On 2022/12/20 23:59, John Garry wrote: >>> On 20/12/2022 12:53, Xingui Yang wrote: >>>> Grab the host lock in sas_ata_device_link_abort() before calling >>> >>> This is should be the ata port lock, right? I know that the ata comments >>> say differently. >>> >>>> ata_link_abort(), as the comment in ata_link_abort() mentions. >>>> >>> >>> Can you please add a fixes tag? >>> >>>> Signed-off-by: Xingui Yang <yangxingui@huawei.com> >>> >>> Reviewed-by: John Garry <john.g.garry@oracle.com> >>> >>>> --- >>>> drivers/scsi/libsas/sas_ata.c | 3 +++ >>>> 1 file changed, 3 insertions(+) >>>> >>>> diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c >>>> index f7439bf9cdc6..4f2017b21e6d 100644 >>>> --- a/drivers/scsi/libsas/sas_ata.c >>>> +++ b/drivers/scsi/libsas/sas_ata.c >>>> @@ -889,7 +889,9 @@ void sas_ata_device_link_abort(struct domain_device *device, bool force_reset) >>>> { >>>> struct ata_port *ap = device->sata_dev.ap; >>>> struct ata_link *link = &ap->link; >>>> + unsigned long flags; >>>> >>>> + spin_lock_irqsave(ap->lock, flags); >>>> device->sata_dev.fis[2] = ATA_ERR | ATA_DRDY; /* tf status */ >>>> device->sata_dev.fis[3] = ATA_ABORTED; /* tf error */ >>>> >>>> @@ -897,6 +899,7 @@ void sas_ata_device_link_abort(struct domain_device *device, bool force_reset) >>>> if (force_reset) >>>> link->eh_info.action |= ATA_EH_RESET; >>>> ata_link_abort(link); >> >> Really need to add lockdep annotations in libata to avoid/catch such bugs... >> Will work on that. > > Actually in libata there are many places calling ata_link_abort() not > holding port lock. And some places are holding the real host > lock(ata_host->lock) while calling ata_link_abort(). So if you add the > lockdep annotations, there may be too many warnings. If these are real > issues, we should fix them first. libata-EH does most of its work without the port lock held because by the time we get EH started, we are guaranteed to be idle with no commands in flight. That is why the calls you mention look like "bugs" but are not. lockdep annotation will be a little tricky, but not too hard to do either. > > Thanks, > Jason
On 2022/12/21 11:59, Damien Le Moal wrote: > On 2022/12/21 11:42, Jason Yan wrote: >> On 2022/12/21 8:36, Damien Le Moal wrote: >>> On 2022/12/20 23:59, John Garry wrote: >>>> On 20/12/2022 12:53, Xingui Yang wrote: >>>>> Grab the host lock in sas_ata_device_link_abort() before calling >>>> >>>> This is should be the ata port lock, right? I know that the ata comments >>>> say differently. >>>> >>>>> ata_link_abort(), as the comment in ata_link_abort() mentions. >>>>> >>>> >>>> Can you please add a fixes tag? >>>> >>>>> Signed-off-by: Xingui Yang <yangxingui@huawei.com> >>>> >>>> Reviewed-by: John Garry <john.g.garry@oracle.com> >>>> >>>>> --- >>>>> drivers/scsi/libsas/sas_ata.c | 3 +++ >>>>> 1 file changed, 3 insertions(+) >>>>> >>>>> diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c >>>>> index f7439bf9cdc6..4f2017b21e6d 100644 >>>>> --- a/drivers/scsi/libsas/sas_ata.c >>>>> +++ b/drivers/scsi/libsas/sas_ata.c >>>>> @@ -889,7 +889,9 @@ void sas_ata_device_link_abort(struct domain_device *device, bool force_reset) >>>>> { >>>>> struct ata_port *ap = device->sata_dev.ap; >>>>> struct ata_link *link = &ap->link; >>>>> + unsigned long flags; >>>>> >>>>> + spin_lock_irqsave(ap->lock, flags); >>>>> device->sata_dev.fis[2] = ATA_ERR | ATA_DRDY; /* tf status */ >>>>> device->sata_dev.fis[3] = ATA_ABORTED; /* tf error */ >>>>> >>>>> @@ -897,6 +899,7 @@ void sas_ata_device_link_abort(struct domain_device *device, bool force_reset) >>>>> if (force_reset) >>>>> link->eh_info.action |= ATA_EH_RESET; >>>>> ata_link_abort(link); >>> >>> Really need to add lockdep annotations in libata to avoid/catch such bugs... >>> Will work on that. >> >> Actually in libata there are many places calling ata_link_abort() not >> holding port lock. And some places are holding the real host >> lock(ata_host->lock) while calling ata_link_abort(). So if you add the >> lockdep annotations, there may be too many warnings. If these are real >> issues, we should fix them first. > > libata-EH does most of its work without the port lock held because by the time > we get EH started, we are guaranteed to be idle with no commands in flight. That > is why the calls you mention look like "bugs" but are not. What about the interrupt handler such as ahci_error_intr()? I didn't see the callers hold the port lock too. Do they need the port lock?
On 12/21/22 15:35, Jason Yan wrote: > On 2022/12/21 11:59, Damien Le Moal wrote: >> On 2022/12/21 11:42, Jason Yan wrote: >>> On 2022/12/21 8:36, Damien Le Moal wrote: >>>> On 2022/12/20 23:59, John Garry wrote: >>>>> On 20/12/2022 12:53, Xingui Yang wrote: >>>>>> Grab the host lock in sas_ata_device_link_abort() before calling >>>>> >>>>> This is should be the ata port lock, right? I know that the ata >>>>> comments >>>>> say differently. >>>>> >>>>>> ata_link_abort(), as the comment in ata_link_abort() mentions. >>>>>> >>>>> >>>>> Can you please add a fixes tag? >>>>> >>>>>> Signed-off-by: Xingui Yang <yangxingui@huawei.com> >>>>> >>>>> Reviewed-by: John Garry <john.g.garry@oracle.com> >>>>> >>>>>> --- >>>>>> drivers/scsi/libsas/sas_ata.c | 3 +++ >>>>>> 1 file changed, 3 insertions(+) >>>>>> >>>>>> diff --git a/drivers/scsi/libsas/sas_ata.c >>>>>> b/drivers/scsi/libsas/sas_ata.c >>>>>> index f7439bf9cdc6..4f2017b21e6d 100644 >>>>>> --- a/drivers/scsi/libsas/sas_ata.c >>>>>> +++ b/drivers/scsi/libsas/sas_ata.c >>>>>> @@ -889,7 +889,9 @@ void sas_ata_device_link_abort(struct >>>>>> domain_device *device, bool force_reset) >>>>>> { >>>>>> struct ata_port *ap = device->sata_dev.ap; >>>>>> struct ata_link *link = &ap->link; >>>>>> + unsigned long flags; >>>>>> + spin_lock_irqsave(ap->lock, flags); >>>>>> device->sata_dev.fis[2] = ATA_ERR | ATA_DRDY; /* tf status */ >>>>>> device->sata_dev.fis[3] = ATA_ABORTED; /* tf error */ >>>>>> @@ -897,6 +899,7 @@ void sas_ata_device_link_abort(struct >>>>>> domain_device *device, bool force_reset) >>>>>> if (force_reset) >>>>>> link->eh_info.action |= ATA_EH_RESET; >>>>>> ata_link_abort(link); >>>> >>>> Really need to add lockdep annotations in libata to avoid/catch such >>>> bugs... >>>> Will work on that. >>> >>> Actually in libata there are many places calling ata_link_abort() not >>> holding port lock. And some places are holding the real host >>> lock(ata_host->lock) while calling ata_link_abort(). So if you add the >>> lockdep annotations, there may be too many warnings. If these are real >>> issues, we should fix them first. >> >> libata-EH does most of its work without the port lock held because by >> the time >> we get EH started, we are guaranteed to be idle with no commands in >> flight. That >> is why the calls you mention look like "bugs" but are not. > > What about the interrupt handler such as ahci_error_intr()? I didn't see > the callers hold the port lock too. Do they need the port lock? It looks like it is missing for ahci_thunderx_irq_handler() but that one takes the host lock. Same for xgene_ahci_irq_intr(), again no port lock but host lock taken. And again for ahci_single_level_irq_intr() for the non MSI case. For modern MSI adapters, the port lock is taken in For other cases, ahci_multi_irqs_intr_hard) takes the port lock. So it looks like ahci_port_intr() needs to take the lock and some cleanups overall (the host lock should not be necessary in the command path. But nobody seems to have issues with the "bad" cases... Probably because they are not mainstream adapters. Definitely some work needed here.
On Wed, Dec 21, 2022 at 05:31:59PM +0900, Damien Le Moal wrote: > > > > What about the interrupt handler such as ahci_error_intr()? I didn't see > > the callers hold the port lock too. Do they need the port lock? > > It looks like it is missing for ahci_thunderx_irq_handler() but that one > takes the host lock. Same for xgene_ahci_irq_intr(), again no port lock > but host lock taken. And again for ahci_single_level_irq_intr() for the > non MSI case. For modern MSI adapters, the port lock is taken in > > For other cases, ahci_multi_irqs_intr_hard) takes the port lock. > > So it looks like ahci_port_intr() needs to take the lock and some > cleanups overall (the host lock should not be necessary in the command > path. But nobody seems to have issues with the "bad" cases... Probably > because they are not mainstream adapters. > > Definitely some work needed here. ahci_multi_irqs_intr_hard() takes the ap->lock before calling ahci_handle_port_interrupt(), which calls ahci_port_intr(), so I don't think there is any work needed for multi IRQ AHCI. For ahci_single_level_irq_intr() the host lock is taken before calling ahci_handle_port_intr(), so I don't see why we need any extra work for single IRQ AHCI. Remember, while the default is that: ap->lock = &host->lock; see: https://github.com/torvalds/linux/blob/v6.1/drivers/ata/libata-core.c#L5305 In case of MULTI MSI, the ap->lock is using its own lock: https://github.com/torvalds/linux/blob/v6.1/drivers/ata/libahci.c#L2460 So what is it that needs to be fixed for AHCI? I haven't looked at ahci_thunderx_irq_handler() and xgene_ahci_irq_intr() so I can't speak for these. Kind regards, Niklas
On 2022/12/21 18:44, Niklas Cassel wrote: > On Wed, Dec 21, 2022 at 05:31:59PM +0900, Damien Le Moal wrote: >>> >>> What about the interrupt handler such as ahci_error_intr()? I didn't see >>> the callers hold the port lock too. Do they need the port lock? >> >> It looks like it is missing for ahci_thunderx_irq_handler() but that one >> takes the host lock. Same for xgene_ahci_irq_intr(), again no port lock >> but host lock taken. And again for ahci_single_level_irq_intr() for the >> non MSI case. For modern MSI adapters, the port lock is taken in >> >> For other cases, ahci_multi_irqs_intr_hard) takes the port lock. >> >> So it looks like ahci_port_intr() needs to take the lock and some >> cleanups overall (the host lock should not be necessary in the command >> path. But nobody seems to have issues with the "bad" cases... Probably >> because they are not mainstream adapters. >> >> Definitely some work needed here. > > ahci_multi_irqs_intr_hard() takes the ap->lock before calling > ahci_handle_port_interrupt(), which calls ahci_port_intr(), > so I don't think there is any work needed for multi IRQ AHCI. Yes. I tried to say that above. > > For ahci_single_level_irq_intr() the host lock is taken before > calling ahci_handle_port_intr(), so I don't see why we need any > extra work for single IRQ AHCI. > > > Remember, while the default is that: > ap->lock = &host->lock; Ah ! Yes ! good point ! So there are no issues anywhere. This is only confusing because of the comments I think. > see: > https://github.com/torvalds/linux/blob/v6.1/drivers/ata/libata-core.c#L5305 > > In case of MULTI MSI, the ap->lock is using its own lock: > https://github.com/torvalds/linux/blob/v6.1/drivers/ata/libahci.c#L2460 > > > So what is it that needs to be fixed for AHCI? > > I haven't looked at ahci_thunderx_irq_handler() and xgene_ahci_irq_intr() > so I can't speak for these. These are not multi-irq and use the &host->lock directly, which is the same as taking the ap->lock. We could clean that up for consistency and always use ap->lock. But not a big deal. No bugs, just a little confusing with a simple reading of the code. > > > Kind regards, > Niklas
diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c index f7439bf9cdc6..4f2017b21e6d 100644 --- a/drivers/scsi/libsas/sas_ata.c +++ b/drivers/scsi/libsas/sas_ata.c @@ -889,7 +889,9 @@ void sas_ata_device_link_abort(struct domain_device *device, bool force_reset) { struct ata_port *ap = device->sata_dev.ap; struct ata_link *link = &ap->link; + unsigned long flags; + spin_lock_irqsave(ap->lock, flags); device->sata_dev.fis[2] = ATA_ERR | ATA_DRDY; /* tf status */ device->sata_dev.fis[3] = ATA_ABORTED; /* tf error */ @@ -897,6 +899,7 @@ void sas_ata_device_link_abort(struct domain_device *device, bool force_reset) if (force_reset) link->eh_info.action |= ATA_EH_RESET; ata_link_abort(link); + spin_unlock_irqrestore(ap->lock, flags); } EXPORT_SYMBOL_GPL(sas_ata_device_link_abort);
Grab the host lock in sas_ata_device_link_abort() before calling ata_link_abort(), as the comment in ata_link_abort() mentions. Signed-off-by: Xingui Yang <yangxingui@huawei.com> --- drivers/scsi/libsas/sas_ata.c | 3 +++ 1 file changed, 3 insertions(+)