diff mbox series

iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne

Message ID 20241001180346.1485194-1-yang@os.amperecomputing.com (mailing list archive)
State New, archived
Headers show
Series iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne | expand

Commit Message

Yang Shi Oct. 1, 2024, 6:03 p.m. UTC
The commit ce410410f1a7 ("iommu/arm-smmu-v3: Add arm_smmu_strtab_l1/2_idx()")
calculated the last index of L1 stream table by 1 << smmu->sid_bits. 1
is 32 bit value.
However AmpereOne has 32-bit stream id size.  This resulted in
ouf-of-bound shift.  The disassembly of shift is:

    ldr     w2, [x19, 828]  //, smmu_7(D)->sid_bits
    mov     w20, 1
    lsl     w20, w20, w2

According to ARM spec, if the registers are 32 bit, the instruction actually
does:
    dest = src << (shift % 32)

So it actually shifted by zero bit.

This caused v6.12-rc1 failed to boot on AmpereOne and UBSAN also
reported:

UBSAN: shift-out-of-bounds in drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:3628:29
shift exponent 32 is too large for 32-bit type 'int'
CPU: 70 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0-rc1 #4
Hardware name: ZOLLNER SUNMOONLAKE/SunMoon Lake, BIOS 00.00. 2024-08-28 18:42:45 08/28/2024
Call trace:
 dump_backtrace+0xdc/0x140
 show_stack+0x20/0x40
 dump_stack_lvl+0x60/0x80
 dump_stack+0x18/0x28
 ubsan_epilogue+0x10/0x48
 __ubsan_handle_shift_out_of_bounds+0xd8/0x1a0
 arm_smmu_init_structures+0x374/0x3c8
 arm_smmu_device_probe+0x208/0x600
 platform_probe+0x70/0xe8
 really_probe+0xc8/0x3a0
 __driver_probe_device+0x84/0x160
 driver_probe_device+0x44/0x130
 __driver_attach+0xcc/0x208
 bus_for_each_dev+0x84/0x100
 driver_attach+0x2c/0x40
 bus_add_driver+0x158/0x290
 driver_register+0x70/0x138
 __platform_driver_register+0x2c/0x40
 arm_smmu_driver_init+0x28/0x40
 do_one_initcall+0x60/0x318
 do_initcalls+0x198/0x1e0
 kernel_init_freeable+0x18c/0x1e8
 kernel_init+0x28/0x160
 ret_from_fork+0x10/0x20

Using 64 bit immediate when doing shift can solve the problem.  The
disssembly after the fix looks like:
    ldr     w20, [x19, 828] //, smmu_7(D)->sid_bits
    mov     x0, 1
    lsl     x0, x0, x20

Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Nicolin Chen Oct. 1, 2024, 6:27 p.m. UTC | #1
On Tue, Oct 01, 2024 at 11:03:46AM -0700, Yang Shi wrote:
> Using 64 bit immediate when doing shift can solve the problem.  The
> disssembly after the fix looks like:

[...]

>         unsigned int last_sid_idx =
> -               arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1);
> +               arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1);

Could a 32-bit build be a corner case where UL is no longer a
"64 bit" stated in the commit message?

Also, there are other places doing "1 << smmu->sid_bits", e.g.
arm_smmu_init_strtab_linear().

Then, can ssid_bits/s1cdmax be a concern similarly?

Thanks
Nicolin
Yang Shi Oct. 1, 2024, 7:09 p.m. UTC | #2
On 10/1/24 11:27 AM, Nicolin Chen wrote:
> On Tue, Oct 01, 2024 at 11:03:46AM -0700, Yang Shi wrote:
>> Using 64 bit immediate when doing shift can solve the problem.  The
>> disssembly after the fix looks like:
> [...]
>
>>          unsigned int last_sid_idx =
>> -               arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1);
>> +               arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1);
> Could a 32-bit build be a corner case where UL is no longer a
> "64 bit" stated in the commit message?

It shouldn't. Because smmu v3 depends on ARM64.

config ARM_SMMU_V3
         tristate "ARM Ltd. System MMU Version 3 (SMMUv3) Support"
         depends on ARM64

>
> Also, there are other places doing "1 << smmu->sid_bits", e.g.
> arm_smmu_init_strtab_linear().

The disassembly shows it uses "sbfiz   x21, x20, 6, 32" instead of lsl. 
1UL should be used if we want to take extra caution and don't prefer 
rely on compiler.

>
> Then, can ssid_bits/s1cdmax be a concern similarly?

IIUC, ssid_bits is determined by IDR1_SSIDSIZE. It is GENMASK(10, 6). So 
it shouldn't be 32. IDR1_SIDSIZE is GENMASK(5, 0).

>
> Thanks
> Nicolin
Jason Gunthorpe Oct. 1, 2024, 7:18 p.m. UTC | #3
On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote:
> > Also, there are other places doing "1 << smmu->sid_bits", e.g.
> > arm_smmu_init_strtab_linear().
> 
> The disassembly shows it uses "sbfiz x21, x20, 6, 32" instead of lsl. 1UL
> should be used if we want to take extra caution and don't prefer rely on
> compiler.

Still, we should be fixing them all if sid_bits == 32, all those
shifts should be throwing a UBSAN error. It would be crazy to have a
32 bit linear table but I guess it is allowed?

Jason
Nicolin Chen Oct. 1, 2024, 7:29 p.m. UTC | #4
On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote:
> On 10/1/24 11:27 AM, Nicolin Chen wrote:
> > On Tue, Oct 01, 2024 at 11:03:46AM -0700, Yang Shi wrote:
> > > Using 64 bit immediate when doing shift can solve the problem.  The
> > > disssembly after the fix looks like:
> > [...]
> > 
> > >          unsigned int last_sid_idx =
> > > -               arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1);
> > > +               arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1);
> > Could a 32-bit build be a corner case where UL is no longer a
> > "64 bit" stated in the commit message?
> 
> It shouldn't. Because smmu v3 depends on ARM64.
> 
> config ARM_SMMU_V3
>         tristate "ARM Ltd. System MMU Version 3 (SMMUv3) Support"
>         depends on ARM64

ARM64 can have aarch32 support. I am not sure if ARM64 running a
32-bit OS can be a case though, (and not confined to AmpereOne).

> > Then, can ssid_bits/s1cdmax be a concern similarly?
> 
> IIUC, ssid_bits is determined by IDR1_SSIDSIZE. It is GENMASK(10, 6). So
> it shouldn't be 32. IDR1_SIDSIZE is GENMASK(5, 0).

Rechecked the RM. Yea, max sid can be 32 but max ssid is 20 at
this moment, so we should be safe.

Thanks
Nicolin
Yang Shi Oct. 1, 2024, 7:38 p.m. UTC | #5
On 10/1/24 12:18 PM, Jason Gunthorpe wrote:
> On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote:
>>> Also, there are other places doing "1 << smmu->sid_bits", e.g.
>>> arm_smmu_init_strtab_linear().
>> The disassembly shows it uses "sbfiz x21, x20, 6, 32" instead of lsl. 1UL
>> should be used if we want to take extra caution and don't prefer rely on
>> compiler.
> Still, we should be fixing them all if sid_bits == 32, all those
> shifts should be throwing a UBSAN error. It would be crazy to have a

OK, will cover this is v2.

> 32 bit linear table but I guess it is allowed?

I'm not smmu expert, but if sid_bits is 32, it looks like the linear 
table will consume 256GB IIUC? That is crazy.

>
> Jason
Jason Gunthorpe Oct. 1, 2024, 7:46 p.m. UTC | #6
On Tue, Oct 01, 2024 at 12:38:56PM -0700, Yang Shi wrote:
> 
> 
> On 10/1/24 12:18 PM, Jason Gunthorpe wrote:
> > On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote:
> > > > Also, there are other places doing "1 << smmu->sid_bits", e.g.
> > > > arm_smmu_init_strtab_linear().
> > > The disassembly shows it uses "sbfiz x21, x20, 6, 32" instead of lsl. 1UL
> > > should be used if we want to take extra caution and don't prefer rely on
> > > compiler.
> > Still, we should be fixing them all if sid_bits == 32, all those
> > shifts should be throwing a UBSAN error. It would be crazy to have a
> 
> OK, will cover this is v2.

Maybe just make a little inline function to do this math and remove
the repated open coding? Then the types can be right, etc.

Jason
Yang Shi Oct. 1, 2024, 7:48 p.m. UTC | #7
On 10/1/24 12:29 PM, Nicolin Chen wrote:
> On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote:
>> On 10/1/24 11:27 AM, Nicolin Chen wrote:
>>> On Tue, Oct 01, 2024 at 11:03:46AM -0700, Yang Shi wrote:
>>>> Using 64 bit immediate when doing shift can solve the problem.  The
>>>> disssembly after the fix looks like:
>>> [...]
>>>
>>>>           unsigned int last_sid_idx =
>>>> -               arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1);
>>>> +               arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1);
>>> Could a 32-bit build be a corner case where UL is no longer a
>>> "64 bit" stated in the commit message?
>> It shouldn't. Because smmu v3 depends on ARM64.
>>
>> config ARM_SMMU_V3
>>          tristate "ARM Ltd. System MMU Version 3 (SMMUv3) Support"
>>          depends on ARM64
> ARM64 can have aarch32 support. I am not sure if ARM64 running a
> 32-bit OS can be a case though, (and not confined to AmpereOne).

I don't think ARM64 runs 32-bit kernel, at least for newer kernel.

>
>>> Then, can ssid_bits/s1cdmax be a concern similarly?
>> IIUC, ssid_bits is determined by IDR1_SSIDSIZE. It is GENMASK(10, 6). So
>> it shouldn't be 32. IDR1_SIDSIZE is GENMASK(5, 0).
> Rechecked the RM. Yea, max sid can be 32 but max ssid is 20 at
> this moment, so we should be safe.
>
> Thanks
> Nicolin
Yang Shi Oct. 1, 2024, 8:31 p.m. UTC | #8
On 10/1/24 12:46 PM, Jason Gunthorpe wrote:
> On Tue, Oct 01, 2024 at 12:38:56PM -0700, Yang Shi wrote:
>>
>> On 10/1/24 12:18 PM, Jason Gunthorpe wrote:
>>> On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote:
>>>>> Also, there are other places doing "1 << smmu->sid_bits", e.g.
>>>>> arm_smmu_init_strtab_linear().
>>>> The disassembly shows it uses "sbfiz x21, x20, 6, 32" instead of lsl. 1UL
>>>> should be used if we want to take extra caution and don't prefer rely on
>>>> compiler.
>>> Still, we should be fixing them all if sid_bits == 32, all those
>>> shifts should be throwing a UBSAN error. It would be crazy to have a
>> OK, will cover this is v2.
> Maybe just make a little inline function to do this math and remove
> the repated open coding? Then the types can be right, etc.

Fine to me.

>
> Jason
Yang Shi Oct. 1, 2024, 8:47 p.m. UTC | #9
On 10/1/24 12:46 PM, Jason Gunthorpe wrote:
> On Tue, Oct 01, 2024 at 12:38:56PM -0700, Yang Shi wrote:
>>
>> On 10/1/24 12:18 PM, Jason Gunthorpe wrote:
>>> On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote:
>>>>> Also, there are other places doing "1 << smmu->sid_bits", e.g.
>>>>> arm_smmu_init_strtab_linear().
>>>> The disassembly shows it uses "sbfiz x21, x20, 6, 32" instead of lsl. 1UL
>>>> should be used if we want to take extra caution and don't prefer rely on
>>>> compiler.
>>> Still, we should be fixing them all if sid_bits == 32, all those
>>> shifts should be throwing a UBSAN error. It would be crazy to have a
>> OK, will cover this is v2.
> Maybe just make a little inline function to do this math and remove
> the repated open coding? Then the types can be right, etc.

Something like this?

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 01a2faee04bc..0f3aa7962117 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3624,8 +3624,9 @@ static int arm_smmu_init_strtab_2lvl(struct 
arm_smmu_device *smmu)
  {
         u32 l1size;
         struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
+       unsigned int max_sid = arm_smmu_strtab_max_sid(smmu->sid_bits);
         unsigned int last_sid_idx =
-               arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1);
+               arm_smmu_strtab_l1_idx(max_sid - 1);

         /* Calculate the L1 size, capped to the SIDSIZE. */
         cfg->l2.num_l1_ents = min(last_sid_idx + 1, STRTAB_MAX_L1_ENTRIES);
@@ -3657,8 +3658,9 @@ static int arm_smmu_init_strtab_linear(struct 
arm_smmu_device *smmu)
  {
         u32 size;
         struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
+       unsigned int max_sid = arm_smmu_strtab_max_sid(smmu->sid_bits);

-       size = (1 << smmu->sid_bits) * sizeof(struct arm_smmu_ste);
+       size = max_sid * sizeof(struct arm_smmu_ste);
         cfg->linear.table = dmam_alloc_coherent(smmu->dev, size,
&cfg->linear.ste_dma,
                                                 GFP_KERNEL);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 1e9952ca989f..de9f14293485 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -235,6 +235,11 @@ static inline u32 arm_smmu_strtab_l2_idx(u32 sid)
         return sid % STRTAB_NUM_L2_STES;
  }

+static inline unsigned int arm_smmu_strtab_max_sid(unsigned int sid_bits)
+{
+       return (1UL << sid_bits);
+}
+

>
> Jason
Jason Gunthorpe Oct. 1, 2024, 9:02 p.m. UTC | #10
On Tue, Oct 01, 2024 at 01:47:24PM -0700, Yang Shi wrote:
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index 1e9952ca989f..de9f14293485 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -235,6 +235,11 @@ static inline u32 arm_smmu_strtab_l2_idx(u32 sid)
>         return sid % STRTAB_NUM_L2_STES;
>  }
> 
> +static inline unsigned int arm_smmu_strtab_max_sid(unsigned int sid_bits)
> +{

Can we take the smmu struct here instead of the int?

But yes, this looks OK

Jason
Yang Shi Oct. 1, 2024, 9:32 p.m. UTC | #11
On 10/1/24 2:02 PM, Jason Gunthorpe wrote:
> On Tue, Oct 01, 2024 at 01:47:24PM -0700, Yang Shi wrote:
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> index 1e9952ca989f..de9f14293485 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> @@ -235,6 +235,11 @@ static inline u32 arm_smmu_strtab_l2_idx(u32 sid)
>>          return sid % STRTAB_NUM_L2_STES;
>>   }
>>
>> +static inline unsigned int arm_smmu_strtab_max_sid(unsigned int sid_bits)
>> +{
> Can we take the smmu struct here instead of the int?

No problem.

>
> But yes, this looks OK

Thank you.

>
> Jason
Robin Murphy Oct. 2, 2024, 9:59 a.m. UTC | #12
On 2024-10-01 8:48 pm, Yang Shi wrote:
> 
> 
> On 10/1/24 12:29 PM, Nicolin Chen wrote:
>> On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote:
>>> On 10/1/24 11:27 AM, Nicolin Chen wrote:
>>>> On Tue, Oct 01, 2024 at 11:03:46AM -0700, Yang Shi wrote:
>>>>> Using 64 bit immediate when doing shift can solve the problem.  The
>>>>> disssembly after the fix looks like:
>>>> [...]
>>>>
>>>>>           unsigned int last_sid_idx =
>>>>> -               arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1);
>>>>> +               arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1);
>>>> Could a 32-bit build be a corner case where UL is no longer a
>>>> "64 bit" stated in the commit message?
>>> It shouldn't. Because smmu v3 depends on ARM64.
>>>
>>> config ARM_SMMU_V3
>>>          tristate "ARM Ltd. System MMU Version 3 (SMMUv3) Support"
>>>          depends on ARM64
>> ARM64 can have aarch32 support. I am not sure if ARM64 running a
>> 32-bit OS can be a case though, (and not confined to AmpereOne).
> 
> I don't think ARM64 runs 32-bit kernel, at least for newer kernel.

Just use ULL - if the point is that it must be a 64-bit shift for 
correctness, then being clear about that intent is far more valuable 
than saving one character of source code.

Thanks,
Robin.

> 
>>
>>>> Then, can ssid_bits/s1cdmax be a concern similarly?
>>> IIUC, ssid_bits is determined by IDR1_SSIDSIZE. It is GENMASK(10, 6). So
>>> it shouldn't be 32. IDR1_SIDSIZE is GENMASK(5, 0).
>> Rechecked the RM. Yea, max sid can be 32 but max ssid is 20 at
>> this moment, so we should be safe.
>>
>> Thanks
>> Nicolin
>
Yang Shi Oct. 2, 2024, 3:38 p.m. UTC | #13
On 10/2/24 2:59 AM, Robin Murphy wrote:
> On 2024-10-01 8:48 pm, Yang Shi wrote:
>>
>>
>> On 10/1/24 12:29 PM, Nicolin Chen wrote:
>>> On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote:
>>>> On 10/1/24 11:27 AM, Nicolin Chen wrote:
>>>>> On Tue, Oct 01, 2024 at 11:03:46AM -0700, Yang Shi wrote:
>>>>>> Using 64 bit immediate when doing shift can solve the problem.  The
>>>>>> disssembly after the fix looks like:
>>>>> [...]
>>>>>
>>>>>>           unsigned int last_sid_idx =
>>>>>> -               arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1);
>>>>>> +               arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1);
>>>>> Could a 32-bit build be a corner case where UL is no longer a
>>>>> "64 bit" stated in the commit message?
>>>> It shouldn't. Because smmu v3 depends on ARM64.
>>>>
>>>> config ARM_SMMU_V3
>>>>          tristate "ARM Ltd. System MMU Version 3 (SMMUv3) Support"
>>>>          depends on ARM64
>>> ARM64 can have aarch32 support. I am not sure if ARM64 running a
>>> 32-bit OS can be a case though, (and not confined to AmpereOne).
>>
>> I don't think ARM64 runs 32-bit kernel, at least for newer kernel.
>
> Just use ULL - if the point is that it must be a 64-bit shift for 
> correctness, then being clear about that intent is far more valuable 
> than saving one character of source code.

Yeah, it must be 64 bit. Will fix in v2.

>
> Thanks,
> Robin.
>
>>
>>>
>>>>> Then, can ssid_bits/s1cdmax be a concern similarly?
>>>> IIUC, ssid_bits is determined by IDR1_SSIDSIZE. It is GENMASK(10, 
>>>> 6). So
>>>> it shouldn't be 32. IDR1_SIDSIZE is GENMASK(5, 0).
>>> Rechecked the RM. Yea, max sid can be 32 but max ssid is 20 at
>>> this moment, so we should be safe.
>>>
>>> Thanks
>>> Nicolin
>>
James Morse Oct. 2, 2024, 4:58 p.m. UTC | #14
Hello!

On 01/10/2024 19:03, Yang Shi wrote:
> The commit ce410410f1a7 ("iommu/arm-smmu-v3: Add arm_smmu_strtab_l1/2_idx()")
> calculated the last index of L1 stream table by 1 << smmu->sid_bits. 1
> is 32 bit value.
> However AmpereOne has 32-bit stream id size.  This resulted in
> ouf-of-bound shift.  The disassembly of shift is:
> 
>     ldr     w2, [x19, 828]  //, smmu_7(D)->sid_bits
>     mov     w20, 1
>     lsl     w20, w20, w2
> 
> According to ARM spec, if the registers are 32 bit, the instruction actually
> does:
>     dest = src << (shift % 32)
> 
> So it actually shifted by zero bit.
> 
> This caused v6.12-rc1 failed to boot on AmpereOne

Same here - one of arm's reference designs prints 1 giga-tonne of:
| arm-smmu-v3 arm-smmu-v3.5.auto: event 0x02 received:
| arm-smmu-v3 arm-smmu-v3.5.auto:  0x0000c90000000002
| arm-smmu-v3 arm-smmu-v3.5.auto:  0x0000000000000000
| arm-smmu-v3 arm-smmu-v3.5.auto:  0x0000000000000000
| arm-smmu-v3 arm-smmu-v3.5.auto:  0x0000000000000000

during boot - then fails to find the nvme.
Bisect points to ce410410f1a7, and the below diff fixes it.

> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 737c5b882355..01a2faee04bc 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -3625,7 +3625,7 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
>  	u32 l1size;
>  	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
>  	unsigned int last_sid_idx =
> -		arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1);
> +		arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1);
>  
>  	/* Calculate the L1 size, capped to the SIDSIZE. */
>  	cfg->l2.num_l1_ents = min(last_sid_idx + 1, STRTAB_MAX_L1_ENTRIES);


Tested-by: James Morse <james.morse@arm.com>


Thanks,

James
James Morse Oct. 2, 2024, 4:58 p.m. UTC | #15
Hello,

On 01/10/2024 21:47, Yang Shi wrote:
> On 10/1/24 12:46 PM, Jason Gunthorpe wrote:
>> On Tue, Oct 01, 2024 at 12:38:56PM -0700, Yang Shi wrote:
>>>
>>> On 10/1/24 12:18 PM, Jason Gunthorpe wrote:
>>>> On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote:
>>>>>> Also, there are other places doing "1 << smmu->sid_bits", e.g.
>>>>>> arm_smmu_init_strtab_linear().
>>>>> The disassembly shows it uses "sbfiz x21, x20, 6, 32" instead of lsl. 1UL
>>>>> should be used if we want to take extra caution and don't prefer rely on
>>>>> compiler.
>>>> Still, we should be fixing them all if sid_bits == 32, all those
>>>> shifts should be throwing a UBSAN error. It would be crazy to have a
>>> OK, will cover this is v2.
>> Maybe just make a little inline function to do this math and remove
>> the repated open coding? Then the types can be right, etc.
> 
> Something like this?
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 01a2faee04bc..0f3aa7962117 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -3624,8 +3624,9 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
>  {
>         u32 l1size;
>         struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> +       unsigned int max_sid = arm_smmu_strtab_max_sid(smmu->sid_bits);
>         unsigned int last_sid_idx =
> -               arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1);
> +               arm_smmu_strtab_l1_idx(max_sid - 1);
> 
>         /* Calculate the L1 size, capped to the SIDSIZE. */
>         cfg->l2.num_l1_ents = min(last_sid_idx + 1, STRTAB_MAX_L1_ENTRIES);
> @@ -3657,8 +3658,9 @@ static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
>  {
>         u32 size;
>         struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> +       unsigned int max_sid = arm_smmu_strtab_max_sid(smmu->sid_bits);
> 
> -       size = (1 << smmu->sid_bits) * sizeof(struct arm_smmu_ste);
> +       size = max_sid * sizeof(struct arm_smmu_ste);
>         cfg->linear.table = dmam_alloc_coherent(smmu->dev, size,
> &cfg->linear.ste_dma,
>                                                 GFP_KERNEL);
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index 1e9952ca989f..de9f14293485 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -235,6 +235,11 @@ static inline u32 arm_smmu_strtab_l2_idx(u32 sid)
>         return sid % STRTAB_NUM_L2_STES;
>  }
> 
> +static inline unsigned int arm_smmu_strtab_max_sid(unsigned int sid_bits)
> +{
> +       return (1UL << sid_bits);
> +}
> +

On the same platform - this also fixes the issue:
Tested-by: James Morse <james.morse@arm.com>


Thanks,

James
Yang Shi Oct. 2, 2024, 5:39 p.m. UTC | #16
On 10/2/24 9:58 AM, James Morse wrote:
> Hello!
>
> On 01/10/2024 19:03, Yang Shi wrote:
>> The commit ce410410f1a7 ("iommu/arm-smmu-v3: Add arm_smmu_strtab_l1/2_idx()")
>> calculated the last index of L1 stream table by 1 << smmu->sid_bits. 1
>> is 32 bit value.
>> However AmpereOne has 32-bit stream id size.  This resulted in
>> ouf-of-bound shift.  The disassembly of shift is:
>>
>>      ldr     w2, [x19, 828]  //, smmu_7(D)->sid_bits
>>      mov     w20, 1
>>      lsl     w20, w20, w2
>>
>> According to ARM spec, if the registers are 32 bit, the instruction actually
>> does:
>>      dest = src << (shift % 32)
>>
>> So it actually shifted by zero bit.
>>
>> This caused v6.12-rc1 failed to boot on AmpereOne
> Same here - one of arm's reference designs prints 1 giga-tonne of:
> | arm-smmu-v3 arm-smmu-v3.5.auto: event 0x02 received:
> | arm-smmu-v3 arm-smmu-v3.5.auto:  0x0000c90000000002
> | arm-smmu-v3 arm-smmu-v3.5.auto:  0x0000000000000000
> | arm-smmu-v3 arm-smmu-v3.5.auto:  0x0000000000000000
> | arm-smmu-v3 arm-smmu-v3.5.auto:  0x0000000000000000
>
> during boot - then fails to find the nvme.
> Bisect points to ce410410f1a7, and the below diff fixes it.
>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 737c5b882355..01a2faee04bc 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -3625,7 +3625,7 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
>>   	u32 l1size;
>>   	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
>>   	unsigned int last_sid_idx =
>> -		arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1);
>> +		arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1);
>>   
>>   	/* Calculate the L1 size, capped to the SIDSIZE. */
>>   	cfg->l2.num_l1_ents = min(last_sid_idx + 1, STRTAB_MAX_L1_ENTRIES);
>
> Tested-by: James Morse <james.morse@arm.com>

Thank you for testing. I will change the patch subject a little bit to 
make it more general.

>
>
> Thanks,
>
> James
diff mbox series

Patch

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 737c5b882355..01a2faee04bc 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3625,7 +3625,7 @@  static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
 	u32 l1size;
 	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
 	unsigned int last_sid_idx =
-		arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1);
+		arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1);
 
 	/* Calculate the L1 size, capped to the SIDSIZE. */
 	cfg->l2.num_l1_ents = min(last_sid_idx + 1, STRTAB_MAX_L1_ENTRIES);