iommu/arm-smmu-v3: Fail aliasing StreamIDs more gracefully

Message ID	39d54e49c8476efc4653e352150d44b185d6d50f.1744380554.git.robin.murphy@arm.com (mailing list archive)
State	New
Headers	show Return-Path: <linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org> From: Robin Murphy <robin.murphy@arm.com> To: joro@8bytes.org, will@kernel.org Cc: iommu@lists.linux.dev, linux-arm-kernel@lists.infradead.org, jgg@nvidia.com, nicolinc@nvidia.com Subject: [PATCH] iommu/arm-smmu-v3: Fail aliasing StreamIDs more gracefully Date: Fri, 11 Apr 2025 15:09:14 +0100 Message-Id: <39d54e49c8476efc4653e352150d44b185d6d50f.1744380554.git.robin.murphy@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org
Series	iommu/arm-smmu-v3: Fail aliasing StreamIDs more gracefully \| expand iommu/arm-smmu-v3: Fail aliasing StreamIDs more gracefully

Robin Murphy April 11, 2025, 2:09 p.m. UTC

We've never supported StreamID aliasing between devices, and as such
they will never have had functioning DMA, but this is not fatal to the
SMMU itself. Although aliasing between hard-wired platform device
StreamIDs would tend to raise questions about the whole system, in
practice it's far more likely to occur relatively innocently due to
legacy PCI bridges, where the underlying StreamID mappings are still
perfectly reasonable.

As such, return a more benign -ENODEV when failing probe for such an
unsupported device (and log a more obvious error message), so that it
doesn't break the entire SMMU probe now that bus_iommu_probe() runs in
the right order and can propagate that error back. The end result is
still that the device doesn't get an IOMMU group and probably won't
work, same as before.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Jason Gunthorpe April 11, 2025, 3:21 p.m. UTC | #1

On Fri, Apr 11, 2025 at 03:09:14PM +0100, Robin Murphy wrote:
> We've never supported StreamID aliasing between devices, and as such
> they will never have had functioning DMA, but this is not fatal to the
> SMMU itself. Although aliasing between hard-wired platform device
> StreamIDs would tend to raise questions about the whole system, in
> practice it's far more likely to occur relatively innocently due to
> legacy PCI bridges, where the underlying StreamID mappings are still
> perfectly reasonable.
> 
> As such, return a more benign -ENODEV when failing probe for such an
> unsupported device (and log a more obvious error message), so that it
> doesn't break the entire SMMU probe now that bus_iommu_probe() runs in
> the right order and can propagate that error back. The end result is
> still that the device doesn't get an IOMMU group and probably won't
> work, same as before.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index b4c21aaed126..c06459f7077b 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -3400,9 +3400,9 @@ static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
>  		/* Insert into SID tree */
>  		if (rb_find_add(&new_stream->node, &smmu->streams,
>  				arm_smmu_streams_cmp_node)) {
> -			dev_warn(master->dev, "stream %u already in tree\n",
> +			dev_warn(master->dev, "Aliasing StreamID 0x%x unsupported, expect DMA to be broken\n",
>  				 sid);
> -			ret = -EINVAL;
> +			ret = -ENODEV;
>  			break;

I don't think we should dillute the meaning of ENODEV here. It is
supposed to mean that the driver does not recognize the device and
this IOMMU instance doesn't translate for it.

That is different from the iommu instance owning the device but being
unable to probe it for some reason. The failure recovery in the core
code should be different.

IMHO it makes more sense to change this and related:

static int bus_iommu_probe(const struct bus_type *bus)
{
	struct iommu_group *group, *next;
	LIST_HEAD(group_list);
	int ret;

	ret = bus_for_each_dev(bus, NULL, &group_list, probe_iommu_group);
	if (ret)
		return ret;

So that recognized, but failed, devices are contained to only effect
that device and do not fail the entire iommu or bus registration,
which is more like how things used to work when the probe path would
ignore the per-driver failure codes.

I can't think of a reason why we'd want any per-device failure to also
abort the whole iommu registration??

It would be nice if the dev->iommu would record that the struct device
is inoperable and then we can fail
iommu_device_use_default_domain()/etc in a contained way.

Jason

Pranjal Shrivastava April 11, 2025, 3:50 p.m. UTC | #2

On Fri, Apr 11, 2025 at 12:21:22PM -0300, Jason Gunthorpe wrote:
> On Fri, Apr 11, 2025 at 03:09:14PM +0100, Robin Murphy wrote:
> > We've never supported StreamID aliasing between devices, and as such
> > they will never have had functioning DMA, but this is not fatal to the
> > SMMU itself. Although aliasing between hard-wired platform device
> > StreamIDs would tend to raise questions about the whole system, in
> > practice it's far more likely to occur relatively innocently due to
> > legacy PCI bridges, where the underlying StreamID mappings are still
> > perfectly reasonable.
> > 
> > As such, return a more benign -ENODEV when failing probe for such an
> > unsupported device (and log a more obvious error message), so that it
> > doesn't break the entire SMMU probe now that bus_iommu_probe() runs in
> > the right order and can propagate that error back. The end result is
> > still that the device doesn't get an IOMMU group and probably won't
> > work, same as before.
> > 
> > Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> > ---
> >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > index b4c21aaed126..c06459f7077b 100644
> > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > @@ -3400,9 +3400,9 @@ static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
> >  		/* Insert into SID tree */
> >  		if (rb_find_add(&new_stream->node, &smmu->streams,
> >  				arm_smmu_streams_cmp_node)) {
> > -			dev_warn(master->dev, "stream %u already in tree\n",
> > +			dev_warn(master->dev, "Aliasing StreamID 0x%x unsupported, expect DMA to be broken\n",
> >  				 sid);
> > -			ret = -EINVAL;
> > +			ret = -ENODEV;
> >  			break;
> 
> I don't think we should dillute the meaning of ENODEV here. It is
> supposed to mean that the driver does not recognize the device and
> this IOMMU instance doesn't translate for it.
> 
> That is different from the iommu instance owning the device but being
> unable to probe it for some reason. The failure recovery in the core
> code should be different.
> 
> IMHO it makes more sense to change this and related:
> 
> static int bus_iommu_probe(const struct bus_type *bus)
> {
> 	struct iommu_group *group, *next;
> 	LIST_HEAD(group_list);
> 	int ret;
> 
> 	ret = bus_for_each_dev(bus, NULL, &group_list, probe_iommu_group);
> 	if (ret)
> 		return ret;
> 
> So that recognized, but failed, devices are contained to only effect
> that device and do not fail the entire iommu or bus registration,
> which is more like how things used to work when the probe path would
> ignore the per-driver failure codes.
> 
> I can't think of a reason why we'd want any per-device failure to also
> abort the whole iommu registration??

I don't think this change would abort the iommu registration?
The probe_iommu_group would simply ignore the -ENODEV. Or are you
talking about a case where we don't return -ENODEV?

>
> It would be nice if the dev->iommu would record that the struct device
> is inoperable and then we can fail
> iommu_device_use_default_domain()/etc in a contained way.

Ohh okay.. you mean the dev->iommu_group == NULL may give an illusion to
<bus>_dma_configure that DMA is allowed? Even in that case the iommu 
would block the DMA right? And upon inspection of dmesg it would be
clear that something went wrong with the probe?


> 
> Jason
> 

Thanks,
Praan

Jason Gunthorpe April 11, 2025, 3:58 p.m. UTC | #3

On Fri, Apr 11, 2025 at 03:50:10PM +0000, Pranjal Shrivastava wrote:

> > I can't think of a reason why we'd want any per-device failure to also
> > abort the whole iommu registration??
> 
> I don't think this change would abort the iommu registration?

Right, this patch causes iommu registration to succeed because the
only the ENODEV is ignored.

> The probe_iommu_group would simply ignore the -ENODEV. Or are you
> talking about a case where we don't return -ENODEV?

Yes, I mean the current situation the probe returns -EINVAL because
the SMMUv3 driver knows it controls the translation but cannot startup
the device due to the SID conflict.

> > It would be nice if the dev->iommu would record that the struct device
> > is inoperable and then we can fail
> > iommu_device_use_default_domain()/etc in a contained way.
> 
> Ohh okay.. you mean the dev->iommu_group == NULL may give an illusion to
> <bus>_dma_configure that DMA is allowed? 

Right, IMHO it is the different between ENODEV and Exxx. ENODEV
(ideally) means the iommu instance doesn't translate that device so it
has no information to add. EINVAL means the iommu instance does
translate and then we can be pretty sure DMA won't work since the
default STE is abort.

Thus, IMHO, the core code should ignore ENODEV and assume if no iommu
claims the device that it is identity. This is what we do today.

However EINVAL should disable future device drivers probing on the
broken struct device because most likely the DMA API won't work at
all.

> Even in that case the iommu would block the DMA right? And upon
> inspection of dmesg it would be clear that something went wrong with
> the probe?

Yes and yes. A better chance of the system booting if the iommu is
left registered and running with only some devices broken/disabled,
which is what was going on in v6.14

Jason

Robin Murphy April 11, 2025, 6:16 p.m. UTC | #4

On 11/04/2025 4:21 pm, Jason Gunthorpe wrote:
> On Fri, Apr 11, 2025 at 03:09:14PM +0100, Robin Murphy wrote:
>> We've never supported StreamID aliasing between devices, and as such
>> they will never have had functioning DMA, but this is not fatal to the
>> SMMU itself. Although aliasing between hard-wired platform device
>> StreamIDs would tend to raise questions about the whole system, in
>> practice it's far more likely to occur relatively innocently due to
>> legacy PCI bridges, where the underlying StreamID mappings are still
>> perfectly reasonable.
>>
>> As such, return a more benign -ENODEV when failing probe for such an
>> unsupported device (and log a more obvious error message), so that it
>> doesn't break the entire SMMU probe now that bus_iommu_probe() runs in
>> the right order and can propagate that error back. The end result is
>> still that the device doesn't get an IOMMU group and probably won't
>> work, same as before.
>>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> ---
>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index b4c21aaed126..c06459f7077b 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -3400,9 +3400,9 @@ static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
>>   		/* Insert into SID tree */
>>   		if (rb_find_add(&new_stream->node, &smmu->streams,
>>   				arm_smmu_streams_cmp_node)) {
>> -			dev_warn(master->dev, "stream %u already in tree\n",
>> +			dev_warn(master->dev, "Aliasing StreamID 0x%x unsupported, expect DMA to be broken\n",
>>   				 sid);
>> -			ret = -EINVAL;
>> +			ret = -ENODEV;
>>   			break;
> 
> I don't think we should dillute the meaning of ENODEV here. It is
> supposed to mean that the driver does not recognize the device and
> this IOMMU instance doesn't translate for it.

There are already numerous other examples, nothing's being diluted more 
than it already is. TBH I don't see it being particularly valuable 
anyway to try to infer anything more from ENODEV than "this IOMMU 
instance is not providing IOMMU API functionality for this device" - 
there are just as many cases where a driver doesn't even know whether 
it's supposed to be responsible for a given device at a given time, so 
in general it's never going to be possible to be exhaustively accurate. 
That then only leaves a pretty narrow range of conditions where some 
drivers might be able to tell when something's a bit wrong, but not 
terminally so.

> That is different from the iommu instance owning the device but being
> unable to probe it for some reason. The failure recovery in the core
> code should be different.

Perhaps, but right now this is a trivial, obvious fix using the 
mechanisms that we already have and that other drivers are already using 
similarly, that can effectively restore the exact same overall behaviour 
as 6.14 with zero risk.

Nothing in this two-line patch precludes coming back and doing more 
later. In fact in hindsight I wish I'd thought about this a bit more, 
realised it was so simple, and already sent this very patch back with 
the other SMMU-probe-failure fixes back in December, because guess how I 
was triggering those failures? :) Annoyingly I chose to kick the can 
down the road because I know I'm deliberately using a quirky unsupported 
hardware setup...

> IMHO it makes more sense to change this and related:
> 
> static int bus_iommu_probe(const struct bus_type *bus)
> {
> 	struct iommu_group *group, *next;
> 	LIST_HEAD(group_list);
> 	int ret;
> 
> 	ret = bus_for_each_dev(bus, NULL, &group_list, probe_iommu_group);
> 	if (ret)
> 		return ret;
> 
> So that recognized, but failed, devices are contained to only effect
> that device and do not fail the entire iommu or bus registration,
> which is more like how things used to work when the probe path would
> ignore the per-driver failure codes.
> 
> I can't think of a reason why we'd want any per-device failure to also
> abort the whole iommu registration??

Errors which occur during .probe_device clearly don't have to be 
specific to the device - e.g. OOM - but even device-adjacent conditions 
may be sufficiently catastrophic to indicate that it's probably not 
worth continuing - for instance if the arm_smmu_sid_in_range() check 
were to fail then it's clear firmware is giving us bad data, at which 
point we can't necessarily trust *anything* it's said about the SMMU at all.

> It would be nice if the dev->iommu would record that the struct device
> is inoperable and then we can fail
> iommu_device_use_default_domain()/etc in a contained way.

Maybe, but what would the nature of that containment be? I mean, after 2 
years even the blocking domain stuff we already have still isn't any 
closer to doing anything "nice". And if an IOMMU driver has already 
reported some unexpected condition with a device that it can't handle, 
what can the core code actually expect to do? It's not like it can ask 
the same IOMMU driver to then block the device that it couldn't probe. I 
don't see that there's really much the core code can do to "recover" 
from unexpected driver errors, other than leave the user to pick up the 
pieces, and often the best chance for that *is* going to be to give up 
entirely, fail and disable the IOMMU, because otherwise blocking I/O 
might prevent the user seeing anything at all.

And for any remaining niche cases of "expected" errors like this one 
where the driver could in principle support the thing but doesn't due to 
its own limitation, honestly don't spend effort on gold-plating that 
limitation in core code, just spend it on making the driver support the 
thing instead.

Thanks,
Robin.

Jason Gunthorpe April 11, 2025, 6:46 p.m. UTC | #5

On Fri, Apr 11, 2025 at 07:16:57PM +0100, Robin Murphy wrote:
> > That is different from the iommu instance owning the device but being
> > unable to probe it for some reason. The failure recovery in the core
> > code should be different.
> 
> Perhaps, but right now this is a trivial, obvious fix using the mechanisms
> that we already have and that other drivers are already using similarly,
> that can effectively restore the exact same overall behaviour as 6.14 with
> zero risk.

Yes, but so would ignoring the error within bus_iommu_probe, which is
close to what used to be happening. I'm a little worried this will
turn into a wack a mole..

If we keep it this way it further muddies what the driver is supposed
to be doing with it's error codes. How should a driver author decide
which one to use?

> Nothing in this two-line patch precludes coming back and doing more later.

Yes

> triggering those failures? :) Annoyingly I chose to kick the can down the
> road because I know I'm deliberately using a quirky unsupported hardware
> setup...

Oh interesting, I'm actually quite surprised that very new chips are
still pretending to be PCI bridges. :\

> Errors which occur during .probe_device clearly don't have to be specific to
> the device - e.g. OOM - but even device-adjacent conditions may be
> sufficiently catastrophic to indicate that it's probably not worth
> continuing - for instance if the arm_smmu_sid_in_range() check were to fail
> then it's clear firmware is giving us bad data, at which point we can't
> necessarily trust *anything* it's said about the SMMU at all.

Maybe, but also, maybe not. The more optimistic view is that the FW is
generally correct and tested by someone and mistakes are narrow to a
single device. That is my general experiance with server FW at least..

Plug in some weird thing, the FW freaks out and does something wrong,
for that thing, but everything else remains fine.

> > It would be nice if the dev->iommu would record that the struct device
> > is inoperable and then we can fail
> > iommu_device_use_default_domain()/etc in a contained way.
> 
> Maybe, but what would the nature of that containment be? 

Just block the end driver from binding so we don't compound the
problems. Not much you can do on the iommu side. For PCI devices
without a driver bind they won't enable bus master so they should be
contained at that point.

> other than leave the user to pick up the pieces, and often the best chance
> for that *is* going to be to give up entirely, fail and disable the IOMMU,
> because otherwise blocking I/O might prevent the user seeing anything at
> all.

Maybe, or the untested unregister iommu driver path will crash :)

Or the fault is localized to the device and we will get a better
outcome by just not loading that driver.

In the end we are guessing, I'm just looking at this from the POV that
for a long time we ignored errors per-device and now we are failing
the whole iommu startup. That is a big change in v6.15, and it doesn't
seem really necessary.

> And for any remaining niche cases of "expected" errors like this one where
> the driver could in principle support the thing but doesn't due to its own
> limitation, honestly don't spend effort on gold-plating that limitation in
> core code, just spend it on making the driver support the thing instead.

Yes, Nicolin will try a v2

So, go ahead with this if you think it is best, but I do prefer
ignoring all errors. If we get a 2nd similar issue lets agree to
change bus_iommu_probe()/etc to ignore probe error codes rather than
wack-a-mole ENODEV across many drivers.

Jason

iommu/arm-smmu-v3: Fail aliasing StreamIDs more gracefully

Commit Message

Comments

Patch