iommu/arm-smmu: Report USF more clearly
diff mbox series

Message ID 2762ffd4c196dc91d62e10eb8b753f256ea9b629.1568375317.git.robin.murphy@arm.com
State New
Headers show
Series
  • iommu/arm-smmu: Report USF more clearly
Related show

Commit Message

Robin Murphy Sept. 13, 2019, 11:48 a.m. UTC
Although CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT is a welcome tool
for smoking out inadequate firmware, the failure mode is non-obvious
and can be confusing for end users. Add some special-case reporting of
Unidentified Stream Faults to help clarify this particular symptom.

CC: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/arm-smmu.c | 5 +++++
 drivers/iommu/arm-smmu.h | 2 ++
 2 files changed, 7 insertions(+)

Comments

Robin Murphy Sept. 13, 2019, 2:34 p.m. UTC | #1
On 13/09/2019 12:48, Robin Murphy wrote:
> Although CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT is a welcome tool
> for smoking out inadequate firmware, the failure mode is non-obvious
> and can be confusing for end users. Add some special-case reporting of
> Unidentified Stream Faults to help clarify this particular symptom.
> 
> CC: Douglas Anderson <dianders@chromium.org>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>   drivers/iommu/arm-smmu.c | 5 +++++
>   drivers/iommu/arm-smmu.h | 2 ++
>   2 files changed, 7 insertions(+)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index b7cf24402a94..76ac8c180695 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -499,6 +499,11 @@ static irqreturn_t arm_smmu_global_fault(int irq, void *dev)
>   	dev_err_ratelimited(smmu->dev,
>   		"\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 0x%08x\n",
>   		gfsr, gfsynr0, gfsynr1, gfsynr2);
> +	if (IS_ENABLED(CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT) &&
> +	    (gfsr & sGFSR_USF))
> +		dev_err_ratelimited(smmu->dev,
> +			"Stream ID %hu may not be described by firmware, try booting with \"arm-smmu.disable_bypass=0\"\n",
> +			(u16)gfsynr1);
>   
>   	arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_sGFSR, gfsr);
>   	return IRQ_HANDLED;
> diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
> index c9c13b5785f2..46f7e161e83e 100644
> --- a/drivers/iommu/arm-smmu.h
> +++ b/drivers/iommu/arm-smmu.h
> @@ -79,6 +79,8 @@
>   #define ID7_MINOR			GENMASK(3, 0)
>   
>   #define ARM_SMMU_GR0_sGFSR		0x48
> +#define sGFSR_USF			BIT(2)

Sigh... and of course what I actually meant here was that this is the 
2nd bit, which is bit 1, which is also 2. I blame Friday :(

Robin.

> +
>   #define ARM_SMMU_GR0_sGFSYNR0		0x50
>   #define ARM_SMMU_GR0_sGFSYNR1		0x54
>   #define ARM_SMMU_GR0_sGFSYNR2		0x58
>
Qian Cai Sept. 13, 2019, 2:35 p.m. UTC | #2
On Fri, 2019-09-13 at 12:48 +0100, Robin Murphy wrote:
> Although CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT is a welcome tool
> for smoking out inadequate firmware, the failure mode is non-obvious
> and can be confusing for end users. Add some special-case reporting of
> Unidentified Stream Faults to help clarify this particular symptom.
> 
> CC: Douglas Anderson <dianders@chromium.org>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/arm-smmu.c | 5 +++++
>  drivers/iommu/arm-smmu.h | 2 ++
>  2 files changed, 7 insertions(+)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index b7cf24402a94..76ac8c180695 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -499,6 +499,11 @@ static irqreturn_t arm_smmu_global_fault(int irq, void *dev)
>  	dev_err_ratelimited(smmu->dev,
>  		"\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 0x%08x\n",
>  		gfsr, gfsynr0, gfsynr1, gfsynr2);
> +	if (IS_ENABLED(CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT) &&
> +	    (gfsr & sGFSR_USF))
> +		dev_err_ratelimited(smmu->dev,
> +			"Stream ID %hu may not be described by firmware, try booting with \"arm-smmu.disable_bypass=0\"\n",
> +			(u16)gfsynr1);

dev_err_once(), i.e., don't need to remind people to set "arm-
smmu.disable_bypass=0" multiple times.

>  
>  	arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_sGFSR, gfsr);
>  	return IRQ_HANDLED;
> diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
> index c9c13b5785f2..46f7e161e83e 100644
> --- a/drivers/iommu/arm-smmu.h
> +++ b/drivers/iommu/arm-smmu.h
> @@ -79,6 +79,8 @@
>  #define ID7_MINOR			GENMASK(3, 0)
>  
>  #define ARM_SMMU_GR0_sGFSR		0x48
> +#define sGFSR_USF			BIT(2)
> +
>  #define ARM_SMMU_GR0_sGFSYNR0		0x50
>  #define ARM_SMMU_GR0_sGFSYNR1		0x54
>  #define ARM_SMMU_GR0_sGFSYNR2		0x58
Robin Murphy Sept. 13, 2019, 2:43 p.m. UTC | #3
On 13/09/2019 15:35, Qian Cai wrote:
> On Fri, 2019-09-13 at 12:48 +0100, Robin Murphy wrote:
>> Although CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT is a welcome tool
>> for smoking out inadequate firmware, the failure mode is non-obvious
>> and can be confusing for end users. Add some special-case reporting of
>> Unidentified Stream Faults to help clarify this particular symptom.
>>
>> CC: Douglas Anderson <dianders@chromium.org>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> ---
>>   drivers/iommu/arm-smmu.c | 5 +++++
>>   drivers/iommu/arm-smmu.h | 2 ++
>>   2 files changed, 7 insertions(+)
>>
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index b7cf24402a94..76ac8c180695 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -499,6 +499,11 @@ static irqreturn_t arm_smmu_global_fault(int irq, void *dev)
>>   	dev_err_ratelimited(smmu->dev,
>>   		"\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 0x%08x\n",
>>   		gfsr, gfsynr0, gfsynr1, gfsynr2);
>> +	if (IS_ENABLED(CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT) &&
>> +	    (gfsr & sGFSR_USF))
>> +		dev_err_ratelimited(smmu->dev,
>> +			"Stream ID %hu may not be described by firmware, try booting with \"arm-smmu.disable_bypass=0\"\n",
>> +			(u16)gfsynr1);
> 
> dev_err_once(), i.e., don't need to remind people to set "arm-
> smmu.disable_bypass=0" multiple times.

Indeed, but in many cases it then quickly gets buried by an unending 
storm of repeated faults (not every console has capture and scrollback...)

Given that it's a "this is why your machine is on fire" kind of message, 
I figured that it's probably best to err on the side of visibility.

Robin.

>>   
>>   	arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_sGFSR, gfsr);
>>   	return IRQ_HANDLED;
>> diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
>> index c9c13b5785f2..46f7e161e83e 100644
>> --- a/drivers/iommu/arm-smmu.h
>> +++ b/drivers/iommu/arm-smmu.h
>> @@ -79,6 +79,8 @@
>>   #define ID7_MINOR			GENMASK(3, 0)
>>   
>>   #define ARM_SMMU_GR0_sGFSR		0x48
>> +#define sGFSR_USF			BIT(2)
>> +
>>   #define ARM_SMMU_GR0_sGFSYNR0		0x50
>>   #define ARM_SMMU_GR0_sGFSYNR1		0x54
>>   #define ARM_SMMU_GR0_sGFSYNR2		0x58
Doug Anderson Sept. 13, 2019, 10:44 p.m. UTC | #4
Hi,

On Fri, Sep 13, 2019 at 4:48 AM Robin Murphy <robin.murphy@arm.com> wrote:
>
> Although CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT is a welcome tool
> for smoking out inadequate firmware, the failure mode is non-obvious
> and can be confusing for end users. Add some special-case reporting of
> Unidentified Stream Faults to help clarify this particular symptom.
>
> CC: Douglas Anderson <dianders@chromium.org>

nit that I believe that "Cc" (lowercase 2nd c) is correct.

> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/arm-smmu.c | 5 +++++
>  drivers/iommu/arm-smmu.h | 2 ++
>  2 files changed, 7 insertions(+)
>
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index b7cf24402a94..76ac8c180695 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -499,6 +499,11 @@ static irqreturn_t arm_smmu_global_fault(int irq, void *dev)
>         dev_err_ratelimited(smmu->dev,
>                 "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 0x%08x\n",
>                 gfsr, gfsynr0, gfsynr1, gfsynr2);
> +       if (IS_ENABLED(CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT) &&
> +           (gfsr & sGFSR_USF))
> +               dev_err_ratelimited(smmu->dev,
> +                       "Stream ID %hu may not be described by firmware, try booting with \"arm-smmu.disable_bypass=0\"\n",
> +                       (u16)gfsynr1);

In general it seems like a sane idea to surface an error like this.  I
guess a few nits:

1. "By firmware" might be a bit misleading.  In most cases I'm aware
of the problem is in the device tree that was bundled together with
the kernel.  If there are actually cases where firmware has baked in a
device tree and it got this wrong then we might want to spend time
figuring out what to do about it.

2. Presumably booting with "arm-smmu.disable_bypass=0" is in most
cases the least desirable option available.  I always consider kernel
command line parameters as something of a last resort for
configuration and would only be something that and end user might do
if they were given a kernel compiled by someone else (like if someone
where taking a prebuilt Linux distro and trying to install it onto a
generic PC).  Are you seeing cases where this is happening?  If people
are compiling their own kernel I'd argue that telling them to set
"CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT" to "no" is much better
than trying to jam a command line option on.  Command line options
don't scale well.

3. Any chance you could make it more obvious that this change is
undesirable and a last resort?  AKA:

"Stream ID x blocked for security reasons; allow anyway by booting
with arm-smmu.disable_bypass=0"

-Doug
Russell King - ARM Linux admin Sept. 13, 2019, 10:59 p.m. UTC | #5
On Fri, Sep 13, 2019 at 12:48:37PM +0100, Robin Murphy wrote:
> Although CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT is a welcome tool
> for smoking out inadequate firmware, the failure mode is non-obvious
> and can be confusing for end users. Add some special-case reporting of
> Unidentified Stream Faults to help clarify this particular symptom.

Having encountered this on a board that turned up this week, it may
be better to use the hex representation of the stream ID, especially
as it seems normal for the stream ID to be made up of implementation
defined bitfields.

If we want to stick with decimal, maybe masking the stream ID with
the number of allowable bits would be a good idea, so that the
decimal value remains meaningful should other bits be non-zero?

> CC: Douglas Anderson <dianders@chromium.org>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/arm-smmu.c | 5 +++++
>  drivers/iommu/arm-smmu.h | 2 ++
>  2 files changed, 7 insertions(+)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index b7cf24402a94..76ac8c180695 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -499,6 +499,11 @@ static irqreturn_t arm_smmu_global_fault(int irq, void *dev)
>  	dev_err_ratelimited(smmu->dev,
>  		"\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 0x%08x\n",
>  		gfsr, gfsynr0, gfsynr1, gfsynr2);
> +	if (IS_ENABLED(CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT) &&
> +	    (gfsr & sGFSR_USF))
> +		dev_err_ratelimited(smmu->dev,
> +			"Stream ID %hu may not be described by firmware, try booting with \"arm-smmu.disable_bypass=0\"\n",
> +			(u16)gfsynr1);
>  
>  	arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_sGFSR, gfsr);
>  	return IRQ_HANDLED;
> diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
> index c9c13b5785f2..46f7e161e83e 100644
> --- a/drivers/iommu/arm-smmu.h
> +++ b/drivers/iommu/arm-smmu.h
> @@ -79,6 +79,8 @@
>  #define ID7_MINOR			GENMASK(3, 0)
>  
>  #define ARM_SMMU_GR0_sGFSR		0x48
> +#define sGFSR_USF			BIT(2)

I do wonder if this is another instance where writing "(1 << 1)"
would have resulted in less chance of a mistake being made...
wrapping stuff up into macros is not always better!

9.6.15    SMMU_sGFSR, Global Fault Status Register

The SMMU_sGFSR bit assignments are:

USF, bit[1]       Unidentified stream fault. The possible values of this
                  bit are:
                  0          No Unidentified stream fault.
                  1          Unidentified stream fault.

So this wants to be:

#define sGFSR_USF			BIT(1)
Will Deacon Sept. 16, 2019, 6 p.m. UTC | #6
On Fri, Sep 13, 2019 at 03:44:12PM -0700, Doug Anderson wrote:
> On Fri, Sep 13, 2019 at 4:48 AM Robin Murphy <robin.murphy@arm.com> wrote:
> >
> > Although CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT is a welcome tool
> > for smoking out inadequate firmware, the failure mode is non-obvious
> > and can be confusing for end users. Add some special-case reporting of
> > Unidentified Stream Faults to help clarify this particular symptom.
> >
> > CC: Douglas Anderson <dianders@chromium.org>
> 
> nit that I believe that "Cc" (lowercase 2nd c) is correct.
> 
> > Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> > ---
> >  drivers/iommu/arm-smmu.c | 5 +++++
> >  drivers/iommu/arm-smmu.h | 2 ++
> >  2 files changed, 7 insertions(+)
> >
> > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > index b7cf24402a94..76ac8c180695 100644
> > --- a/drivers/iommu/arm-smmu.c
> > +++ b/drivers/iommu/arm-smmu.c
> > @@ -499,6 +499,11 @@ static irqreturn_t arm_smmu_global_fault(int irq, void *dev)
> >         dev_err_ratelimited(smmu->dev,
> >                 "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 0x%08x\n",
> >                 gfsr, gfsynr0, gfsynr1, gfsynr2);
> > +       if (IS_ENABLED(CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT) &&
> > +           (gfsr & sGFSR_USF))
> > +               dev_err_ratelimited(smmu->dev,
> > +                       "Stream ID %hu may not be described by firmware, try booting with \"arm-smmu.disable_bypass=0\"\n",
> > +                       (u16)gfsynr1);
> 
> In general it seems like a sane idea to surface an error like this.  I
> guess a few nits:
> 
> 1. "By firmware" might be a bit misleading.  In most cases I'm aware
> of the problem is in the device tree that was bundled together with
> the kernel.  If there are actually cases where firmware has baked in a
> device tree and it got this wrong then we might want to spend time
> figuring out what to do about it.

I thought that was usually the way UEFI systems worked, where the kernel
is updated independently of the device-tree? Either way, that should be
what we're aiming for, even if many platforms require the two to be tied
together.

> 2. Presumably booting with "arm-smmu.disable_bypass=0" is in most
> cases the least desirable option available.  I always consider kernel
> command line parameters as something of a last resort for
> configuration and would only be something that and end user might do
> if they were given a kernel compiled by someone else (like if someone
> where taking a prebuilt Linux distro and trying to install it onto a
> generic PC).  Are you seeing cases where this is happening?  If people
> are compiling their own kernel I'd argue that telling them to set
> "CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT" to "no" is much better
> than trying to jam a command line option on.  Command line options
> don't scale well.

Hmm. Recompiling seems like even more of a last resort to me!

> 3. Any chance you could make it more obvious that this change is
> undesirable and a last resort?  AKA:
> 
> "Stream ID x blocked for security reasons; allow anyway by booting
> with arm-smmu.disable_bypass=0"

How about:

  "Blocked transaction from unknown Stream ID x; boot with
   \"arm-smmu.disable_bypass=0\" to allow these transactions, although this
   may have security implications."

Will
Doug Anderson Sept. 16, 2019, 6:19 p.m. UTC | #7
Hi,

On Mon, Sep 16, 2019 at 11:00 AM Will Deacon <will@kernel.org> wrote:
>
> On Fri, Sep 13, 2019 at 03:44:12PM -0700, Doug Anderson wrote:
> > On Fri, Sep 13, 2019 at 4:48 AM Robin Murphy <robin.murphy@arm.com> wrote:
> > >
> > > Although CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT is a welcome tool
> > > for smoking out inadequate firmware, the failure mode is non-obvious
> > > and can be confusing for end users. Add some special-case reporting of
> > > Unidentified Stream Faults to help clarify this particular symptom.
> > >
> > > CC: Douglas Anderson <dianders@chromium.org>
> >
> > nit that I believe that "Cc" (lowercase 2nd c) is correct.
> >
> > > Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> > > ---
> > >  drivers/iommu/arm-smmu.c | 5 +++++
> > >  drivers/iommu/arm-smmu.h | 2 ++
> > >  2 files changed, 7 insertions(+)
> > >
> > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > > index b7cf24402a94..76ac8c180695 100644
> > > --- a/drivers/iommu/arm-smmu.c
> > > +++ b/drivers/iommu/arm-smmu.c
> > > @@ -499,6 +499,11 @@ static irqreturn_t arm_smmu_global_fault(int irq, void *dev)
> > >         dev_err_ratelimited(smmu->dev,
> > >                 "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 0x%08x\n",
> > >                 gfsr, gfsynr0, gfsynr1, gfsynr2);
> > > +       if (IS_ENABLED(CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT) &&
> > > +           (gfsr & sGFSR_USF))
> > > +               dev_err_ratelimited(smmu->dev,
> > > +                       "Stream ID %hu may not be described by firmware, try booting with \"arm-smmu.disable_bypass=0\"\n",
> > > +                       (u16)gfsynr1);
> >
> > In general it seems like a sane idea to surface an error like this.  I
> > guess a few nits:
> >
> > 1. "By firmware" might be a bit misleading.  In most cases I'm aware
> > of the problem is in the device tree that was bundled together with
> > the kernel.  If there are actually cases where firmware has baked in a
> > device tree and it got this wrong then we might want to spend time
> > figuring out what to do about it.
>
> I thought that was usually the way UEFI systems worked, where the kernel
> is updated independently of the device-tree? Either way, that should be
> what we're aiming for, even if many platforms require the two to be tied
> together.

It's my opinion that until there is a place in the kernel to "fixup"
broken device trees that were baked in firmware that it's a bad idea
to ship device trees separate from the kernel except if the device
trees are exceedingly simple.  We'll run into too many problems
otherwise, either because the kernel the device tree was written for
had downstream patches or someone just made a mistake in them and
nobody noticed.  I know device trees are supposed to be ABI, but
people make mistakes and we need a way to fix them up.

...but that's getting pretty far afield from Robin's patch.


> > 2. Presumably booting with "arm-smmu.disable_bypass=0" is in most
> > cases the least desirable option available.  I always consider kernel
> > command line parameters as something of a last resort for
> > configuration and would only be something that and end user might do
> > if they were given a kernel compiled by someone else (like if someone
> > where taking a prebuilt Linux distro and trying to install it onto a
> > generic PC).  Are you seeing cases where this is happening?  If people
> > are compiling their own kernel I'd argue that telling them to set
> > "CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT" to "no" is much better
> > than trying to jam a command line option on.  Command line options
> > don't scale well.
>
> Hmm. Recompiling seems like even more of a last resort to me!

Depends on what you're doing.  If you're not in the habit of compiling
a kernel and you're just trying to make one work then the command line
is great.  If you're trying to manage configuration for a whole bunch
of different hardware products then the command line is a terrible
place to store config.

...but I guess the summary is that we wouldn't want someone to
actually ship a kernel with this option on anyway.  ;-)


> > 3. Any chance you could make it more obvious that this change is
> > undesirable and a last resort?  AKA:
> >
> > "Stream ID x blocked for security reasons; allow anyway by booting
> > with arm-smmu.disable_bypass=0"
>
> How about:
>
>   "Blocked transaction from unknown Stream ID x; boot with
>    \"arm-smmu.disable_bypass=0\" to allow these transactions, although this
>    may have security implications."

Fine with me if it's not too long for an error message.

-Doug
Robin Murphy Sept. 16, 2019, 9:42 p.m. UTC | #8
On 2019-09-16 7:19 pm, Doug Anderson wrote:
[...]
>>> 1. "By firmware" might be a bit misleading.  In most cases I'm aware
>>> of the problem is in the device tree that was bundled together with
>>> the kernel.  If there are actually cases where firmware has baked in a
>>> device tree and it got this wrong then we might want to spend time
>>> figuring out what to do about it.
>>
>> I thought that was usually the way UEFI systems worked, where the kernel
>> is updated independently of the device-tree? Either way, that should be
>> what we're aiming for, even if many platforms require the two to be tied
>> together.
> 
> It's my opinion that until there is a place in the kernel to "fixup"
> broken device trees that were baked in firmware that it's a bad idea
> to ship device trees separate from the kernel except if the device
> trees are exceedingly simple.  We'll run into too many problems
> otherwise, either because the kernel the device tree was written for
> had downstream patches or someone just made a mistake in them and
> nobody noticed.  I know device trees are supposed to be ABI, but
> people make mistakes and we need a way to fix them up.
> 
> ...but that's getting pretty far afield from Robin's patch.

Let's not get too hung up on devicetree - you can go out and buy certain 
ACPI-only platforms today that also fall foul of this, for which AFAIK 
the necessary firmware update is in the SoC vendor's hands.

>>> 2. Presumably booting with "arm-smmu.disable_bypass=0" is in most
>>> cases the least desirable option available.  I always consider kernel
>>> command line parameters as something of a last resort for
>>> configuration and would only be something that and end user might do
>>> if they were given a kernel compiled by someone else (like if someone
>>> where taking a prebuilt Linux distro and trying to install it onto a
>>> generic PC).  Are you seeing cases where this is happening?  If people
>>> are compiling their own kernel I'd argue that telling them to set
>>> "CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT" to "no" is much better
>>> than trying to jam a command line option on.  Command line options
>>> don't scale well.
>>
>> Hmm. Recompiling seems like even more of a last resort to me!
> 
> Depends on what you're doing.  If you're not in the habit of compiling
> a kernel and you're just trying to make one work then the command line
> is great.  If you're trying to manage configuration for a whole bunch
> of different hardware products then the command line is a terrible
> place to store config.
> 
> ...but I guess the summary is that we wouldn't want someone to
> actually ship a kernel with this option on anyway.  ;-)

FWIW the meta here is really "oops, you've just installed a new kernel 
and now your machine is unusable - you need to take it up with whoever 
supports your platform, but in the meantime this is the minimal thing 
you can do to get things back working as before."

Personally I'm less concerned about folks maintaining "hardware 
products", as I'd like to assume they would hit this in QA and have a 
relatively short loop back to kernel people who know what's up (or at 
least know enough to join the dots to punt it to my inbox). My main 
concern is end users of SBSA-ish platforms who are free to pick and 
choose distros - and/or kernel packages within their distro - and thus 
may bugger up their machine inadvertently if the distro package happens 
to have picked this option up from defconfig (from a quick look at least 
my preferred one has).

>>> 3. Any chance you could make it more obvious that this change is
>>> undesirable and a last resort?  AKA:
>>>
>>> "Stream ID x blocked for security reasons; allow anyway by booting
>>> with arm-smmu.disable_bypass=0"
>>
>> How about:
>>
>>    "Blocked transaction from unknown Stream ID x; boot with
>>     \"arm-smmu.disable_bypass=0\" to allow these transactions, although this
>>     may have security implications."
> 
> Fine with me if it's not too long for an error message.

Sounds good, I'll respin with a slight abbreviation of that (and minus 
the embarrassingly stupid thinko) tomorrow.

Cheers,
Robin.

Patch
diff mbox series

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index b7cf24402a94..76ac8c180695 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -499,6 +499,11 @@  static irqreturn_t arm_smmu_global_fault(int irq, void *dev)
 	dev_err_ratelimited(smmu->dev,
 		"\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 0x%08x\n",
 		gfsr, gfsynr0, gfsynr1, gfsynr2);
+	if (IS_ENABLED(CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT) &&
+	    (gfsr & sGFSR_USF))
+		dev_err_ratelimited(smmu->dev,
+			"Stream ID %hu may not be described by firmware, try booting with \"arm-smmu.disable_bypass=0\"\n",
+			(u16)gfsynr1);
 
 	arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_sGFSR, gfsr);
 	return IRQ_HANDLED;
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index c9c13b5785f2..46f7e161e83e 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -79,6 +79,8 @@ 
 #define ID7_MINOR			GENMASK(3, 0)
 
 #define ARM_SMMU_GR0_sGFSR		0x48
+#define sGFSR_USF			BIT(2)
+
 #define ARM_SMMU_GR0_sGFSYNR0		0x50
 #define ARM_SMMU_GR0_sGFSYNR1		0x54
 #define ARM_SMMU_GR0_sGFSYNR2		0x58