Message ID | 20241001180346.1485194-1-yang@os.amperecomputing.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne | expand |
On Tue, Oct 01, 2024 at 11:03:46AM -0700, Yang Shi wrote: > Using 64 bit immediate when doing shift can solve the problem. The > disssembly after the fix looks like: [...] > unsigned int last_sid_idx = > - arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1); > + arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1); Could a 32-bit build be a corner case where UL is no longer a "64 bit" stated in the commit message? Also, there are other places doing "1 << smmu->sid_bits", e.g. arm_smmu_init_strtab_linear(). Then, can ssid_bits/s1cdmax be a concern similarly? Thanks Nicolin
On 10/1/24 11:27 AM, Nicolin Chen wrote: > On Tue, Oct 01, 2024 at 11:03:46AM -0700, Yang Shi wrote: >> Using 64 bit immediate when doing shift can solve the problem. The >> disssembly after the fix looks like: > [...] > >> unsigned int last_sid_idx = >> - arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1); >> + arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1); > Could a 32-bit build be a corner case where UL is no longer a > "64 bit" stated in the commit message? It shouldn't. Because smmu v3 depends on ARM64. config ARM_SMMU_V3 tristate "ARM Ltd. System MMU Version 3 (SMMUv3) Support" depends on ARM64 > > Also, there are other places doing "1 << smmu->sid_bits", e.g. > arm_smmu_init_strtab_linear(). The disassembly shows it uses "sbfiz x21, x20, 6, 32" instead of lsl. 1UL should be used if we want to take extra caution and don't prefer rely on compiler. > > Then, can ssid_bits/s1cdmax be a concern similarly? IIUC, ssid_bits is determined by IDR1_SSIDSIZE. It is GENMASK(10, 6). So it shouldn't be 32. IDR1_SIDSIZE is GENMASK(5, 0). > > Thanks > Nicolin
On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote: > > Also, there are other places doing "1 << smmu->sid_bits", e.g. > > arm_smmu_init_strtab_linear(). > > The disassembly shows it uses "sbfiz x21, x20, 6, 32" instead of lsl. 1UL > should be used if we want to take extra caution and don't prefer rely on > compiler. Still, we should be fixing them all if sid_bits == 32, all those shifts should be throwing a UBSAN error. It would be crazy to have a 32 bit linear table but I guess it is allowed? Jason
On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote: > On 10/1/24 11:27 AM, Nicolin Chen wrote: > > On Tue, Oct 01, 2024 at 11:03:46AM -0700, Yang Shi wrote: > > > Using 64 bit immediate when doing shift can solve the problem. The > > > disssembly after the fix looks like: > > [...] > > > > > unsigned int last_sid_idx = > > > - arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1); > > > + arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1); > > Could a 32-bit build be a corner case where UL is no longer a > > "64 bit" stated in the commit message? > > It shouldn't. Because smmu v3 depends on ARM64. > > config ARM_SMMU_V3 > tristate "ARM Ltd. System MMU Version 3 (SMMUv3) Support" > depends on ARM64 ARM64 can have aarch32 support. I am not sure if ARM64 running a 32-bit OS can be a case though, (and not confined to AmpereOne). > > Then, can ssid_bits/s1cdmax be a concern similarly? > > IIUC, ssid_bits is determined by IDR1_SSIDSIZE. It is GENMASK(10, 6). So > it shouldn't be 32. IDR1_SIDSIZE is GENMASK(5, 0). Rechecked the RM. Yea, max sid can be 32 but max ssid is 20 at this moment, so we should be safe. Thanks Nicolin
On 10/1/24 12:18 PM, Jason Gunthorpe wrote: > On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote: >>> Also, there are other places doing "1 << smmu->sid_bits", e.g. >>> arm_smmu_init_strtab_linear(). >> The disassembly shows it uses "sbfiz x21, x20, 6, 32" instead of lsl. 1UL >> should be used if we want to take extra caution and don't prefer rely on >> compiler. > Still, we should be fixing them all if sid_bits == 32, all those > shifts should be throwing a UBSAN error. It would be crazy to have a OK, will cover this is v2. > 32 bit linear table but I guess it is allowed? I'm not smmu expert, but if sid_bits is 32, it looks like the linear table will consume 256GB IIUC? That is crazy. > > Jason
On Tue, Oct 01, 2024 at 12:38:56PM -0700, Yang Shi wrote: > > > On 10/1/24 12:18 PM, Jason Gunthorpe wrote: > > On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote: > > > > Also, there are other places doing "1 << smmu->sid_bits", e.g. > > > > arm_smmu_init_strtab_linear(). > > > The disassembly shows it uses "sbfiz x21, x20, 6, 32" instead of lsl. 1UL > > > should be used if we want to take extra caution and don't prefer rely on > > > compiler. > > Still, we should be fixing them all if sid_bits == 32, all those > > shifts should be throwing a UBSAN error. It would be crazy to have a > > OK, will cover this is v2. Maybe just make a little inline function to do this math and remove the repated open coding? Then the types can be right, etc. Jason
On 10/1/24 12:29 PM, Nicolin Chen wrote: > On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote: >> On 10/1/24 11:27 AM, Nicolin Chen wrote: >>> On Tue, Oct 01, 2024 at 11:03:46AM -0700, Yang Shi wrote: >>>> Using 64 bit immediate when doing shift can solve the problem. The >>>> disssembly after the fix looks like: >>> [...] >>> >>>> unsigned int last_sid_idx = >>>> - arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1); >>>> + arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1); >>> Could a 32-bit build be a corner case where UL is no longer a >>> "64 bit" stated in the commit message? >> It shouldn't. Because smmu v3 depends on ARM64. >> >> config ARM_SMMU_V3 >> tristate "ARM Ltd. System MMU Version 3 (SMMUv3) Support" >> depends on ARM64 > ARM64 can have aarch32 support. I am not sure if ARM64 running a > 32-bit OS can be a case though, (and not confined to AmpereOne). I don't think ARM64 runs 32-bit kernel, at least for newer kernel. > >>> Then, can ssid_bits/s1cdmax be a concern similarly? >> IIUC, ssid_bits is determined by IDR1_SSIDSIZE. It is GENMASK(10, 6). So >> it shouldn't be 32. IDR1_SIDSIZE is GENMASK(5, 0). > Rechecked the RM. Yea, max sid can be 32 but max ssid is 20 at > this moment, so we should be safe. > > Thanks > Nicolin
On 10/1/24 12:46 PM, Jason Gunthorpe wrote: > On Tue, Oct 01, 2024 at 12:38:56PM -0700, Yang Shi wrote: >> >> On 10/1/24 12:18 PM, Jason Gunthorpe wrote: >>> On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote: >>>>> Also, there are other places doing "1 << smmu->sid_bits", e.g. >>>>> arm_smmu_init_strtab_linear(). >>>> The disassembly shows it uses "sbfiz x21, x20, 6, 32" instead of lsl. 1UL >>>> should be used if we want to take extra caution and don't prefer rely on >>>> compiler. >>> Still, we should be fixing them all if sid_bits == 32, all those >>> shifts should be throwing a UBSAN error. It would be crazy to have a >> OK, will cover this is v2. > Maybe just make a little inline function to do this math and remove > the repated open coding? Then the types can be right, etc. Fine to me. > > Jason
On 10/1/24 12:46 PM, Jason Gunthorpe wrote: > On Tue, Oct 01, 2024 at 12:38:56PM -0700, Yang Shi wrote: >> >> On 10/1/24 12:18 PM, Jason Gunthorpe wrote: >>> On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote: >>>>> Also, there are other places doing "1 << smmu->sid_bits", e.g. >>>>> arm_smmu_init_strtab_linear(). >>>> The disassembly shows it uses "sbfiz x21, x20, 6, 32" instead of lsl. 1UL >>>> should be used if we want to take extra caution and don't prefer rely on >>>> compiler. >>> Still, we should be fixing them all if sid_bits == 32, all those >>> shifts should be throwing a UBSAN error. It would be crazy to have a >> OK, will cover this is v2. > Maybe just make a little inline function to do this math and remove > the repated open coding? Then the types can be right, etc. Something like this? diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 01a2faee04bc..0f3aa7962117 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3624,8 +3624,9 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu) { u32 l1size; struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; + unsigned int max_sid = arm_smmu_strtab_max_sid(smmu->sid_bits); unsigned int last_sid_idx = - arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1); + arm_smmu_strtab_l1_idx(max_sid - 1); /* Calculate the L1 size, capped to the SIDSIZE. */ cfg->l2.num_l1_ents = min(last_sid_idx + 1, STRTAB_MAX_L1_ENTRIES); @@ -3657,8 +3658,9 @@ static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu) { u32 size; struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; + unsigned int max_sid = arm_smmu_strtab_max_sid(smmu->sid_bits); - size = (1 << smmu->sid_bits) * sizeof(struct arm_smmu_ste); + size = max_sid * sizeof(struct arm_smmu_ste); cfg->linear.table = dmam_alloc_coherent(smmu->dev, size, &cfg->linear.ste_dma, GFP_KERNEL); diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 1e9952ca989f..de9f14293485 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -235,6 +235,11 @@ static inline u32 arm_smmu_strtab_l2_idx(u32 sid) return sid % STRTAB_NUM_L2_STES; } +static inline unsigned int arm_smmu_strtab_max_sid(unsigned int sid_bits) +{ + return (1UL << sid_bits); +} + > > Jason
On Tue, Oct 01, 2024 at 01:47:24PM -0700, Yang Shi wrote: > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > index 1e9952ca989f..de9f14293485 100644 > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > @@ -235,6 +235,11 @@ static inline u32 arm_smmu_strtab_l2_idx(u32 sid) > return sid % STRTAB_NUM_L2_STES; > } > > +static inline unsigned int arm_smmu_strtab_max_sid(unsigned int sid_bits) > +{ Can we take the smmu struct here instead of the int? But yes, this looks OK Jason
On 10/1/24 2:02 PM, Jason Gunthorpe wrote: > On Tue, Oct 01, 2024 at 01:47:24PM -0700, Yang Shi wrote: >> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h >> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h >> index 1e9952ca989f..de9f14293485 100644 >> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h >> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h >> @@ -235,6 +235,11 @@ static inline u32 arm_smmu_strtab_l2_idx(u32 sid) >> return sid % STRTAB_NUM_L2_STES; >> } >> >> +static inline unsigned int arm_smmu_strtab_max_sid(unsigned int sid_bits) >> +{ > Can we take the smmu struct here instead of the int? No problem. > > But yes, this looks OK Thank you. > > Jason
On 2024-10-01 8:48 pm, Yang Shi wrote: > > > On 10/1/24 12:29 PM, Nicolin Chen wrote: >> On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote: >>> On 10/1/24 11:27 AM, Nicolin Chen wrote: >>>> On Tue, Oct 01, 2024 at 11:03:46AM -0700, Yang Shi wrote: >>>>> Using 64 bit immediate when doing shift can solve the problem. The >>>>> disssembly after the fix looks like: >>>> [...] >>>> >>>>> unsigned int last_sid_idx = >>>>> - arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1); >>>>> + arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1); >>>> Could a 32-bit build be a corner case where UL is no longer a >>>> "64 bit" stated in the commit message? >>> It shouldn't. Because smmu v3 depends on ARM64. >>> >>> config ARM_SMMU_V3 >>> tristate "ARM Ltd. System MMU Version 3 (SMMUv3) Support" >>> depends on ARM64 >> ARM64 can have aarch32 support. I am not sure if ARM64 running a >> 32-bit OS can be a case though, (and not confined to AmpereOne). > > I don't think ARM64 runs 32-bit kernel, at least for newer kernel. Just use ULL - if the point is that it must be a 64-bit shift for correctness, then being clear about that intent is far more valuable than saving one character of source code. Thanks, Robin. > >> >>>> Then, can ssid_bits/s1cdmax be a concern similarly? >>> IIUC, ssid_bits is determined by IDR1_SSIDSIZE. It is GENMASK(10, 6). So >>> it shouldn't be 32. IDR1_SIDSIZE is GENMASK(5, 0). >> Rechecked the RM. Yea, max sid can be 32 but max ssid is 20 at >> this moment, so we should be safe. >> >> Thanks >> Nicolin >
On 10/2/24 2:59 AM, Robin Murphy wrote: > On 2024-10-01 8:48 pm, Yang Shi wrote: >> >> >> On 10/1/24 12:29 PM, Nicolin Chen wrote: >>> On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote: >>>> On 10/1/24 11:27 AM, Nicolin Chen wrote: >>>>> On Tue, Oct 01, 2024 at 11:03:46AM -0700, Yang Shi wrote: >>>>>> Using 64 bit immediate when doing shift can solve the problem. The >>>>>> disssembly after the fix looks like: >>>>> [...] >>>>> >>>>>> unsigned int last_sid_idx = >>>>>> - arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1); >>>>>> + arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1); >>>>> Could a 32-bit build be a corner case where UL is no longer a >>>>> "64 bit" stated in the commit message? >>>> It shouldn't. Because smmu v3 depends on ARM64. >>>> >>>> config ARM_SMMU_V3 >>>> tristate "ARM Ltd. System MMU Version 3 (SMMUv3) Support" >>>> depends on ARM64 >>> ARM64 can have aarch32 support. I am not sure if ARM64 running a >>> 32-bit OS can be a case though, (and not confined to AmpereOne). >> >> I don't think ARM64 runs 32-bit kernel, at least for newer kernel. > > Just use ULL - if the point is that it must be a 64-bit shift for > correctness, then being clear about that intent is far more valuable > than saving one character of source code. Yeah, it must be 64 bit. Will fix in v2. > > Thanks, > Robin. > >> >>> >>>>> Then, can ssid_bits/s1cdmax be a concern similarly? >>>> IIUC, ssid_bits is determined by IDR1_SSIDSIZE. It is GENMASK(10, >>>> 6). So >>>> it shouldn't be 32. IDR1_SIDSIZE is GENMASK(5, 0). >>> Rechecked the RM. Yea, max sid can be 32 but max ssid is 20 at >>> this moment, so we should be safe. >>> >>> Thanks >>> Nicolin >>
Hello! On 01/10/2024 19:03, Yang Shi wrote: > The commit ce410410f1a7 ("iommu/arm-smmu-v3: Add arm_smmu_strtab_l1/2_idx()") > calculated the last index of L1 stream table by 1 << smmu->sid_bits. 1 > is 32 bit value. > However AmpereOne has 32-bit stream id size. This resulted in > ouf-of-bound shift. The disassembly of shift is: > > ldr w2, [x19, 828] //, smmu_7(D)->sid_bits > mov w20, 1 > lsl w20, w20, w2 > > According to ARM spec, if the registers are 32 bit, the instruction actually > does: > dest = src << (shift % 32) > > So it actually shifted by zero bit. > > This caused v6.12-rc1 failed to boot on AmpereOne Same here - one of arm's reference designs prints 1 giga-tonne of: | arm-smmu-v3 arm-smmu-v3.5.auto: event 0x02 received: | arm-smmu-v3 arm-smmu-v3.5.auto: 0x0000c90000000002 | arm-smmu-v3 arm-smmu-v3.5.auto: 0x0000000000000000 | arm-smmu-v3 arm-smmu-v3.5.auto: 0x0000000000000000 | arm-smmu-v3 arm-smmu-v3.5.auto: 0x0000000000000000 during boot - then fails to find the nvme. Bisect points to ce410410f1a7, and the below diff fixes it. > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > index 737c5b882355..01a2faee04bc 100644 > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > @@ -3625,7 +3625,7 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu) > u32 l1size; > struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; > unsigned int last_sid_idx = > - arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1); > + arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1); > > /* Calculate the L1 size, capped to the SIDSIZE. */ > cfg->l2.num_l1_ents = min(last_sid_idx + 1, STRTAB_MAX_L1_ENTRIES); Tested-by: James Morse <james.morse@arm.com> Thanks, James
Hello, On 01/10/2024 21:47, Yang Shi wrote: > On 10/1/24 12:46 PM, Jason Gunthorpe wrote: >> On Tue, Oct 01, 2024 at 12:38:56PM -0700, Yang Shi wrote: >>> >>> On 10/1/24 12:18 PM, Jason Gunthorpe wrote: >>>> On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote: >>>>>> Also, there are other places doing "1 << smmu->sid_bits", e.g. >>>>>> arm_smmu_init_strtab_linear(). >>>>> The disassembly shows it uses "sbfiz x21, x20, 6, 32" instead of lsl. 1UL >>>>> should be used if we want to take extra caution and don't prefer rely on >>>>> compiler. >>>> Still, we should be fixing them all if sid_bits == 32, all those >>>> shifts should be throwing a UBSAN error. It would be crazy to have a >>> OK, will cover this is v2. >> Maybe just make a little inline function to do this math and remove >> the repated open coding? Then the types can be right, etc. > > Something like this? > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > index 01a2faee04bc..0f3aa7962117 100644 > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > @@ -3624,8 +3624,9 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu) > { > u32 l1size; > struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; > + unsigned int max_sid = arm_smmu_strtab_max_sid(smmu->sid_bits); > unsigned int last_sid_idx = > - arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1); > + arm_smmu_strtab_l1_idx(max_sid - 1); > > /* Calculate the L1 size, capped to the SIDSIZE. */ > cfg->l2.num_l1_ents = min(last_sid_idx + 1, STRTAB_MAX_L1_ENTRIES); > @@ -3657,8 +3658,9 @@ static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu) > { > u32 size; > struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; > + unsigned int max_sid = arm_smmu_strtab_max_sid(smmu->sid_bits); > > - size = (1 << smmu->sid_bits) * sizeof(struct arm_smmu_ste); > + size = max_sid * sizeof(struct arm_smmu_ste); > cfg->linear.table = dmam_alloc_coherent(smmu->dev, size, > &cfg->linear.ste_dma, > GFP_KERNEL); > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > index 1e9952ca989f..de9f14293485 100644 > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > @@ -235,6 +235,11 @@ static inline u32 arm_smmu_strtab_l2_idx(u32 sid) > return sid % STRTAB_NUM_L2_STES; > } > > +static inline unsigned int arm_smmu_strtab_max_sid(unsigned int sid_bits) > +{ > + return (1UL << sid_bits); > +} > + On the same platform - this also fixes the issue: Tested-by: James Morse <james.morse@arm.com> Thanks, James
On 10/2/24 9:58 AM, James Morse wrote: > Hello! > > On 01/10/2024 19:03, Yang Shi wrote: >> The commit ce410410f1a7 ("iommu/arm-smmu-v3: Add arm_smmu_strtab_l1/2_idx()") >> calculated the last index of L1 stream table by 1 << smmu->sid_bits. 1 >> is 32 bit value. >> However AmpereOne has 32-bit stream id size. This resulted in >> ouf-of-bound shift. The disassembly of shift is: >> >> ldr w2, [x19, 828] //, smmu_7(D)->sid_bits >> mov w20, 1 >> lsl w20, w20, w2 >> >> According to ARM spec, if the registers are 32 bit, the instruction actually >> does: >> dest = src << (shift % 32) >> >> So it actually shifted by zero bit. >> >> This caused v6.12-rc1 failed to boot on AmpereOne > Same here - one of arm's reference designs prints 1 giga-tonne of: > | arm-smmu-v3 arm-smmu-v3.5.auto: event 0x02 received: > | arm-smmu-v3 arm-smmu-v3.5.auto: 0x0000c90000000002 > | arm-smmu-v3 arm-smmu-v3.5.auto: 0x0000000000000000 > | arm-smmu-v3 arm-smmu-v3.5.auto: 0x0000000000000000 > | arm-smmu-v3 arm-smmu-v3.5.auto: 0x0000000000000000 > > during boot - then fails to find the nvme. > Bisect points to ce410410f1a7, and the below diff fixes it. > >> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c >> index 737c5b882355..01a2faee04bc 100644 >> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c >> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c >> @@ -3625,7 +3625,7 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu) >> u32 l1size; >> struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; >> unsigned int last_sid_idx = >> - arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1); >> + arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1); >> >> /* Calculate the L1 size, capped to the SIDSIZE. */ >> cfg->l2.num_l1_ents = min(last_sid_idx + 1, STRTAB_MAX_L1_ENTRIES); > > Tested-by: James Morse <james.morse@arm.com> Thank you for testing. I will change the patch subject a little bit to make it more general. > > > Thanks, > > James
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 737c5b882355..01a2faee04bc 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3625,7 +3625,7 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu) u32 l1size; struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; unsigned int last_sid_idx = - arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1); + arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1); /* Calculate the L1 size, capped to the SIDSIZE. */ cfg->l2.num_l1_ents = min(last_sid_idx + 1, STRTAB_MAX_L1_ENTRIES);
The commit ce410410f1a7 ("iommu/arm-smmu-v3: Add arm_smmu_strtab_l1/2_idx()") calculated the last index of L1 stream table by 1 << smmu->sid_bits. 1 is 32 bit value. However AmpereOne has 32-bit stream id size. This resulted in ouf-of-bound shift. The disassembly of shift is: ldr w2, [x19, 828] //, smmu_7(D)->sid_bits mov w20, 1 lsl w20, w20, w2 According to ARM spec, if the registers are 32 bit, the instruction actually does: dest = src << (shift % 32) So it actually shifted by zero bit. This caused v6.12-rc1 failed to boot on AmpereOne and UBSAN also reported: UBSAN: shift-out-of-bounds in drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:3628:29 shift exponent 32 is too large for 32-bit type 'int' CPU: 70 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0-rc1 #4 Hardware name: ZOLLNER SUNMOONLAKE/SunMoon Lake, BIOS 00.00. 2024-08-28 18:42:45 08/28/2024 Call trace: dump_backtrace+0xdc/0x140 show_stack+0x20/0x40 dump_stack_lvl+0x60/0x80 dump_stack+0x18/0x28 ubsan_epilogue+0x10/0x48 __ubsan_handle_shift_out_of_bounds+0xd8/0x1a0 arm_smmu_init_structures+0x374/0x3c8 arm_smmu_device_probe+0x208/0x600 platform_probe+0x70/0xe8 really_probe+0xc8/0x3a0 __driver_probe_device+0x84/0x160 driver_probe_device+0x44/0x130 __driver_attach+0xcc/0x208 bus_for_each_dev+0x84/0x100 driver_attach+0x2c/0x40 bus_add_driver+0x158/0x290 driver_register+0x70/0x138 __platform_driver_register+0x2c/0x40 arm_smmu_driver_init+0x28/0x40 do_one_initcall+0x60/0x318 do_initcalls+0x198/0x1e0 kernel_init_freeable+0x18c/0x1e8 kernel_init+0x28/0x160 ret_from_fork+0x10/0x20 Using 64 bit immediate when doing shift can solve the problem. The disssembly after the fix looks like: ldr w20, [x19, 828] //, smmu_7(D)->sid_bits mov x0, 1 lsl x0, x0, x20 Signed-off-by: Yang Shi <yang@os.amperecomputing.com> --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)