diff mbox series

[v3] iommu/arm-smmu-v3: Fix L1 stream table index calculation for 32-bit sid size

Message ID 20241004180405.555194-1-yang@os.amperecomputing.com (mailing list archive)
State New
Headers show
Series [v3] iommu/arm-smmu-v3: Fix L1 stream table index calculation for 32-bit sid size | expand

Commit Message

Yang Shi Oct. 4, 2024, 6:04 p.m. UTC
The commit ce410410f1a7 ("iommu/arm-smmu-v3: Add arm_smmu_strtab_l1/2_idx()")
calculated the last index of L1 stream table by 1 << smmu->sid_bits. 1
is 32 bit value.
However some platforms, for example, AmpereOne and the platforms with
ARM MMU-700, have 32-bit stream id size.  This resulted in ouf-of-bound shift.
The disassembly of shift is:

    ldr     w2, [x19, 828]  //, smmu_7(D)->sid_bits
    mov     w20, 1
    lsl     w20, w20, w2

According to ARM spec, if the registers are 32 bit, the instruction actually
does:
    dest = src << (shift % 32)

So it actually shifted by zero bit.

The out-of-bound shift is also undefined behavior according to C
language standard.

This caused v6.12-rc1 failed to boot on such platforms.

UBSAN also reported:

UBSAN: shift-out-of-bounds in drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:3628:29
shift exponent 32 is too large for 32-bit type 'int'

Using 64 bit immediate when doing shift can solve the problem.  The
disassembly after the fix looks like:
    ldr     w20, [x19, 828] //, smmu_7(D)->sid_bits
    mov     x0, 1
    lsl     x0, x0, x20

There are a couple of problematic places, extracted the shift into a helper.

Fixes: ce410410f1a7 ("iommu/arm-smmu-v3: Add arm_smmu_strtab_l1/2_idx()")
Tested-by: James Morse <james.morse@arm.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 +++++++++++-----
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  5 +++++
 2 files changed, 16 insertions(+), 5 deletions(-)

v3: * Some trivial modification to the commit log per Robin Murphy.
    * Used "num_sids" instead of "max_sids" per Robin Murphy.
    * Returned u64 type for arm_smmu_strtab_num_sids() per Nicolin Chen.
    * Checked size in arm_smmu_init_strtab_linear() in order to avoid
      overflow per Jason Gunthorpe.
    * Collected r-b tag from Jason Gunthorpe.
v2: * Extracted the shift into a helper per Jason Gunthorpe.
    * Covered more places per Nicolin Chen and Jason Gunthorpe.
    * Used 1ULL instead of 1UL to guarantee 64 bit per Robin Murphy.
    * Made the subject more general since this is not AmpereOne specific
      problem per the report from James Morse.
    * Collected t-b tag from James Morse.
    * Added Fixes tag in commit log.

Comments

Daniel Mentz Oct. 4, 2024, 9:14 p.m. UTC | #1
On Fri, Oct 4, 2024 at 11:04 AM Yang Shi <yang@os.amperecomputing.com> wrote:
>  static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
>  {
> -       u32 size;
> +       u64 size;
>         struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> +       u64 num_sids = arm_smmu_strtab_num_sids(smmu);
> +
> +       size = num_sids * sizeof(struct arm_smmu_ste);
> +       /* The max size for dmam_alloc_coherent() is 32-bit */

I'd remove this comment. I assume the intent here was to say that the
maximum size is 4GB (not 32 bit). I also can't find any reference to
this limitation. Where does dmam_alloc_coherent() limit the size of an
allocation to 4GB? Also, this comment might not be applicable to 64
bit platforms.

> +       if (size > SIZE_MAX)
> +               return -EINVAL;

I'm assuming this is for platforms where the range of a u64 is larger
than that of a size_t type? If we're printing an error message if an
allocation fails (i.e. "failed to allocate linear stream table (%llu
bytes)\n"), then we might also want to print an error message here.

> -       cfg->linear.num_ents = 1 << smmu->sid_bits;
> +       cfg->linear.num_ents = num_sids;

If you're worried about 32 bit platforms, then I'm wondering if this
also needs some attention. cfg->linear.num_ents is defined as an
unsigned int and num_sids could potentially be outside the range of an
unsigned int on 32 bit platforms.

> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index 1e9952ca989f..c8ceddc5e8ef 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -853,6 +853,11 @@ struct arm_smmu_master_domain {
>         ioasid_t ssid;
>  };
>
> +static inline u64 arm_smmu_strtab_num_sids(struct arm_smmu_device *smmu)
> +{
> +       return (1ULL << smmu->sid_bits);
> +}
> +

I'm wondering if it makes sense to move this up and put it right
before arm_smmu_strtab_l1_idx(). That way, all the arm_smmu_strtab_*
functions are in one place.

On a related note, in arm_smmu_init_strtab_2lvl() we're capping the
number of l1 entries at STRTAB_MAX_L1_ENTRIES for 2 level stream
tables. I'm thinking it would make sense to limit the size of linear
stream tables for the same reasons.
Yang Shi Oct. 4, 2024, 9:47 p.m. UTC | #2
On 10/4/24 2:14 PM, Daniel Mentz wrote:
> On Fri, Oct 4, 2024 at 11:04 AM Yang Shi <yang@os.amperecomputing.com> wrote:
>>   static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
>>   {
>> -       u32 size;
>> +       u64 size;
>>          struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
>> +       u64 num_sids = arm_smmu_strtab_num_sids(smmu);
>> +
>> +       size = num_sids * sizeof(struct arm_smmu_ste);
>> +       /* The max size for dmam_alloc_coherent() is 32-bit */
> I'd remove this comment. I assume the intent here was to say that the
> maximum size is 4GB (not 32 bit). I also can't find any reference to
> this limitation. Where does dmam_alloc_coherent() limit the size of an
> allocation to 4GB? Also, this comment might not be applicable to 64
> bit platforms.

The "size" parameter passed to dmam_alloc_coherent() is size_t type 
which is unsigned int.

>
>> +       if (size > SIZE_MAX)
>> +               return -EINVAL;
> I'm assuming this is for platforms where the range of a u64 is larger
> than that of a size_t type? If we're printing an error message if an
> allocation fails (i.e. "failed to allocate linear stream table (%llu
> bytes)\n"), then we might also want to print an error message here.

Thanks for the suggestion. Yes, we can if it really helps.

>
>> -       cfg->linear.num_ents = 1 << smmu->sid_bits;
>> +       cfg->linear.num_ents = num_sids;
> If you're worried about 32 bit platforms, then I'm wondering if this
> also needs some attention. cfg->linear.num_ents is defined as an
> unsigned int and num_sids could potentially be outside the range of an
> unsigned int on 32 bit platforms.

The (size > SIZE_MAX) check can guarantee excessively large num_sids 
won't reach here.

>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> index 1e9952ca989f..c8ceddc5e8ef 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> @@ -853,6 +853,11 @@ struct arm_smmu_master_domain {
>>          ioasid_t ssid;
>>   };
>>
>> +static inline u64 arm_smmu_strtab_num_sids(struct arm_smmu_device *smmu)
>> +{
>> +       return (1ULL << smmu->sid_bits);
>> +}
>> +
> I'm wondering if it makes sense to move this up and put it right
> before arm_smmu_strtab_l1_idx(). That way, all the arm_smmu_strtab_*
> functions are in one place.

I did it. But the function uses struct arm_smmu_device which is defined 
after those arm_smmu_strtab_* helpers. I have to put the helper after 
struct arm_smmu_device definition to avoid compile error. We may 
consider re-organize the header file to group them better, but I don't 
think it is urgent enough and it seems out of the scope of the bug fix 
patch. I really want to have the bug fix landed in upstream ASAP.

>
> On a related note, in arm_smmu_init_strtab_2lvl() we're capping the
> number of l1 entries at STRTAB_MAX_L1_ENTRIES for 2 level stream
> tables. I'm thinking it would make sense to limit the size of linear
> stream tables for the same reasons.

Yes, this also works. But I don't know what value should be used. Jason 
actually suggested (size > SIZE_512M) in v2 review, but I thought the 
value is a magic number. Why 512M? Just because it is too large for 
allocation. So I picked up SIZE_MAX, just because it is the largest size 
supported by size_t type.
Daniel Mentz Oct. 5, 2024, 1:03 a.m. UTC | #3
On Fri, Oct 4, 2024 at 2:47 PM Yang Shi <yang@os.amperecomputing.com> wrote:
> On 10/4/24 2:14 PM, Daniel Mentz wrote:
> > On Fri, Oct 4, 2024 at 11:04 AM Yang Shi <yang@os.amperecomputing.com> wrote:
> >>   static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
> >>   {
> >> -       u32 size;
> >> +       u64 size;
> >>          struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> >> +       u64 num_sids = arm_smmu_strtab_num_sids(smmu);
> >> +
> >> +       size = num_sids * sizeof(struct arm_smmu_ste);
> >> +       /* The max size for dmam_alloc_coherent() is 32-bit */
> > I'd remove this comment. I assume the intent here was to say that the
> > maximum size is 4GB (not 32 bit). I also can't find any reference to
> > this limitation. Where does dmam_alloc_coherent() limit the size of an
> > allocation to 4GB? Also, this comment might not be applicable to 64
> > bit platforms.
>
> The "size" parameter passed to dmam_alloc_coherent() is size_t type
> which is unsigned int.

I believe that this is true only for 32 bit platforms. On arm64,
unsigned int is 32 bit, whereas size_t is 64 bit. I'm still in favor
of removing that comment, because it's not applicable to arm64.

> >
> >> -       cfg->linear.num_ents = 1 << smmu->sid_bits;
> >> +       cfg->linear.num_ents = num_sids;
> > If you're worried about 32 bit platforms, then I'm wondering if this
> > also needs some attention. cfg->linear.num_ents is defined as an
> > unsigned int and num_sids could potentially be outside the range of an
> > unsigned int on 32 bit platforms.
>
> The (size > SIZE_MAX) check can guarantee excessively large num_sids
> won't reach here.

Now that I think about it, unsigned int is 32 bit even on arm64. So,
I'm afraid this could (theoretically) overflow. On arm64, I don't
think that the (size > SIZE_MAX) check will prevent this.

> >> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> >> index 1e9952ca989f..c8ceddc5e8ef 100644
> >> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> >> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> >> @@ -853,6 +853,11 @@ struct arm_smmu_master_domain {
> >>          ioasid_t ssid;
> >>   };
> >>
> >> +static inline u64 arm_smmu_strtab_num_sids(struct arm_smmu_device *smmu)
> >> +{
> >> +       return (1ULL << smmu->sid_bits);
> >> +}
> >> +
> > I'm wondering if it makes sense to move this up and put it right
> > before arm_smmu_strtab_l1_idx(). That way, all the arm_smmu_strtab_*
> > functions are in one place.
>
> I did it. But the function uses struct arm_smmu_device which is defined
> after those arm_smmu_strtab_* helpers. I have to put the helper after
> struct arm_smmu_device definition to avoid compile error. We may
> consider re-organize the header file to group them better, but I don't
> think it is urgent enough and it seems out of the scope of the bug fix
> patch. I really want to have the bug fix landed in upstream ASAP.

Understood. Thanks. We could move the changes in
arm_smmu_init_strtab_linear() into a separate patch to accelerate the
process. I'm fine either way, though. I don't want to get in the way
of this landing upstream.

>
> >
> > On a related note, in arm_smmu_init_strtab_2lvl() we're capping the
> > number of l1 entries at STRTAB_MAX_L1_ENTRIES for 2 level stream
> > tables. I'm thinking it would make sense to limit the size of linear
> > stream tables for the same reasons.
>
> Yes, this also works. But I don't know what value should be used. Jason
> actually suggested (size > SIZE_512M) in v2 review, but I thought the
> value is a magic number. Why 512M? Just because it is too large for
> allocation. So I picked up SIZE_MAX, just because it is the largest size
> supported by size_t type.

I think it should be capped to STRTAB_MAX_L1_ENTRIES
Yang Shi Oct. 5, 2024, 1:53 a.m. UTC | #4
On 10/4/24 6:03 PM, Daniel Mentz wrote:
> On Fri, Oct 4, 2024 at 2:47 PM Yang Shi <yang@os.amperecomputing.com> wrote:
>> On 10/4/24 2:14 PM, Daniel Mentz wrote:
>>> On Fri, Oct 4, 2024 at 11:04 AM Yang Shi <yang@os.amperecomputing.com> wrote:
>>>>    static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
>>>>    {
>>>> -       u32 size;
>>>> +       u64 size;
>>>>           struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
>>>> +       u64 num_sids = arm_smmu_strtab_num_sids(smmu);
>>>> +
>>>> +       size = num_sids * sizeof(struct arm_smmu_ste);
>>>> +       /* The max size for dmam_alloc_coherent() is 32-bit */
>>> I'd remove this comment. I assume the intent here was to say that the
>>> maximum size is 4GB (not 32 bit). I also can't find any reference to
>>> this limitation. Where does dmam_alloc_coherent() limit the size of an
>>> allocation to 4GB? Also, this comment might not be applicable to 64
>>> bit platforms.
>> The "size" parameter passed to dmam_alloc_coherent() is size_t type
>> which is unsigned int.
> I believe that this is true only for 32 bit platforms. On arm64,
> unsigned int is 32 bit, whereas size_t is 64 bit. I'm still in favor
> of removing that comment, because it's not applicable to arm64.

Thanks for figuring this out. Yes, you are right. I missed this point.

>
>>>> -       cfg->linear.num_ents = 1 << smmu->sid_bits;
>>>> +       cfg->linear.num_ents = num_sids;
>>> If you're worried about 32 bit platforms, then I'm wondering if this
>>> also needs some attention. cfg->linear.num_ents is defined as an
>>> unsigned int and num_sids could potentially be outside the range of an
>>> unsigned int on 32 bit platforms.
>> The (size > SIZE_MAX) check can guarantee excessively large num_sids
>> won't reach here.
> Now that I think about it, unsigned int is 32 bit even on arm64. So,
> I'm afraid this could (theoretically) overflow. On arm64, I don't
> think that the (size > SIZE_MAX) check will prevent this.

Yes, SIZE_MAX is ~(size_t)0, but size_t is unsigned long on ARM64. So 
the check actually doesn't do what I expect it should do. U32_MAX should 
be used.

>
>>>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>>>> index 1e9952ca989f..c8ceddc5e8ef 100644
>>>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>>>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>>>> @@ -853,6 +853,11 @@ struct arm_smmu_master_domain {
>>>>           ioasid_t ssid;
>>>>    };
>>>>
>>>> +static inline u64 arm_smmu_strtab_num_sids(struct arm_smmu_device *smmu)
>>>> +{
>>>> +       return (1ULL << smmu->sid_bits);
>>>> +}
>>>> +
>>> I'm wondering if it makes sense to move this up and put it right
>>> before arm_smmu_strtab_l1_idx(). That way, all the arm_smmu_strtab_*
>>> functions are in one place.
>> I did it. But the function uses struct arm_smmu_device which is defined
>> after those arm_smmu_strtab_* helpers. I have to put the helper after
>> struct arm_smmu_device definition to avoid compile error. We may
>> consider re-organize the header file to group them better, but I don't
>> think it is urgent enough and it seems out of the scope of the bug fix
>> patch. I really want to have the bug fix landed in upstream ASAP.
> Understood. Thanks. We could move the changes in
> arm_smmu_init_strtab_linear() into a separate patch to accelerate the
> process. I'm fine either way, though. I don't want to get in the way
> of this landing upstream.

Thank you for your understanding.

>
>>> On a related note, in arm_smmu_init_strtab_2lvl() we're capping the
>>> number of l1 entries at STRTAB_MAX_L1_ENTRIES for 2 level stream
>>> tables. I'm thinking it would make sense to limit the size of linear
>>> stream tables for the same reasons.
>> Yes, this also works. But I don't know what value should be used. Jason
>> actually suggested (size > SIZE_512M) in v2 review, but I thought the
>> value is a magic number. Why 512M? Just because it is too large for
>> allocation. So I picked up SIZE_MAX, just because it is the largest size
>> supported by size_t type.
> I think it should be capped to STRTAB_MAX_L1_ENTRIES

I'm not expert on SMMU. Does the linear stream table have the same cap 
as 2-level stream table? Is this defined by the hardware spec? If it is 
not, why should we pick this value?
diff mbox series

Patch

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 737c5b882355..9d4fc91d9258 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3624,8 +3624,9 @@  static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
 {
 	u32 l1size;
 	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
+	u64 num_sids = arm_smmu_strtab_num_sids(smmu);
 	unsigned int last_sid_idx =
-		arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1);
+		arm_smmu_strtab_l1_idx(num_sids - 1);
 
 	/* Calculate the L1 size, capped to the SIDSIZE. */
 	cfg->l2.num_l1_ents = min(last_sid_idx + 1, STRTAB_MAX_L1_ENTRIES);
@@ -3655,20 +3656,25 @@  static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
 
 static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
 {
-	u32 size;
+	u64 size;
 	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
+	u64 num_sids = arm_smmu_strtab_num_sids(smmu);
+
+	size = num_sids * sizeof(struct arm_smmu_ste);
+	/* The max size for dmam_alloc_coherent() is 32-bit */
+	if (size > SIZE_MAX)
+		return -EINVAL;
 
-	size = (1 << smmu->sid_bits) * sizeof(struct arm_smmu_ste);
 	cfg->linear.table = dmam_alloc_coherent(smmu->dev, size,
 						&cfg->linear.ste_dma,
 						GFP_KERNEL);
 	if (!cfg->linear.table) {
 		dev_err(smmu->dev,
-			"failed to allocate linear stream table (%u bytes)\n",
+			"failed to allocate linear stream table (%llu bytes)\n",
 			size);
 		return -ENOMEM;
 	}
-	cfg->linear.num_ents = 1 << smmu->sid_bits;
+	cfg->linear.num_ents = num_sids;
 
 	arm_smmu_init_initial_stes(cfg->linear.table, cfg->linear.num_ents);
 	return 0;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 1e9952ca989f..c8ceddc5e8ef 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -853,6 +853,11 @@  struct arm_smmu_master_domain {
 	ioasid_t ssid;
 };
 
+static inline u64 arm_smmu_strtab_num_sids(struct arm_smmu_device *smmu)
+{
+	return (1ULL << smmu->sid_bits);
+}
+
 static inline struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
 {
 	return container_of(dom, struct arm_smmu_domain, domain);