Message ID | 20241004180405.555194-1-yang@os.amperecomputing.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v3] iommu/arm-smmu-v3: Fix L1 stream table index calculation for 32-bit sid size | expand |
On Fri, Oct 4, 2024 at 11:04 AM Yang Shi <yang@os.amperecomputing.com> wrote: > static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu) > { > - u32 size; > + u64 size; > struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; > + u64 num_sids = arm_smmu_strtab_num_sids(smmu); > + > + size = num_sids * sizeof(struct arm_smmu_ste); > + /* The max size for dmam_alloc_coherent() is 32-bit */ I'd remove this comment. I assume the intent here was to say that the maximum size is 4GB (not 32 bit). I also can't find any reference to this limitation. Where does dmam_alloc_coherent() limit the size of an allocation to 4GB? Also, this comment might not be applicable to 64 bit platforms. > + if (size > SIZE_MAX) > + return -EINVAL; I'm assuming this is for platforms where the range of a u64 is larger than that of a size_t type? If we're printing an error message if an allocation fails (i.e. "failed to allocate linear stream table (%llu bytes)\n"), then we might also want to print an error message here. > - cfg->linear.num_ents = 1 << smmu->sid_bits; > + cfg->linear.num_ents = num_sids; If you're worried about 32 bit platforms, then I'm wondering if this also needs some attention. cfg->linear.num_ents is defined as an unsigned int and num_sids could potentially be outside the range of an unsigned int on 32 bit platforms. > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > index 1e9952ca989f..c8ceddc5e8ef 100644 > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > @@ -853,6 +853,11 @@ struct arm_smmu_master_domain { > ioasid_t ssid; > }; > > +static inline u64 arm_smmu_strtab_num_sids(struct arm_smmu_device *smmu) > +{ > + return (1ULL << smmu->sid_bits); > +} > + I'm wondering if it makes sense to move this up and put it right before arm_smmu_strtab_l1_idx(). That way, all the arm_smmu_strtab_* functions are in one place. On a related note, in arm_smmu_init_strtab_2lvl() we're capping the number of l1 entries at STRTAB_MAX_L1_ENTRIES for 2 level stream tables. I'm thinking it would make sense to limit the size of linear stream tables for the same reasons.
On 10/4/24 2:14 PM, Daniel Mentz wrote: > On Fri, Oct 4, 2024 at 11:04 AM Yang Shi <yang@os.amperecomputing.com> wrote: >> static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu) >> { >> - u32 size; >> + u64 size; >> struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; >> + u64 num_sids = arm_smmu_strtab_num_sids(smmu); >> + >> + size = num_sids * sizeof(struct arm_smmu_ste); >> + /* The max size for dmam_alloc_coherent() is 32-bit */ > I'd remove this comment. I assume the intent here was to say that the > maximum size is 4GB (not 32 bit). I also can't find any reference to > this limitation. Where does dmam_alloc_coherent() limit the size of an > allocation to 4GB? Also, this comment might not be applicable to 64 > bit platforms. The "size" parameter passed to dmam_alloc_coherent() is size_t type which is unsigned int. > >> + if (size > SIZE_MAX) >> + return -EINVAL; > I'm assuming this is for platforms where the range of a u64 is larger > than that of a size_t type? If we're printing an error message if an > allocation fails (i.e. "failed to allocate linear stream table (%llu > bytes)\n"), then we might also want to print an error message here. Thanks for the suggestion. Yes, we can if it really helps. > >> - cfg->linear.num_ents = 1 << smmu->sid_bits; >> + cfg->linear.num_ents = num_sids; > If you're worried about 32 bit platforms, then I'm wondering if this > also needs some attention. cfg->linear.num_ents is defined as an > unsigned int and num_sids could potentially be outside the range of an > unsigned int on 32 bit platforms. The (size > SIZE_MAX) check can guarantee excessively large num_sids won't reach here. > >> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h >> index 1e9952ca989f..c8ceddc5e8ef 100644 >> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h >> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h >> @@ -853,6 +853,11 @@ struct arm_smmu_master_domain { >> ioasid_t ssid; >> }; >> >> +static inline u64 arm_smmu_strtab_num_sids(struct arm_smmu_device *smmu) >> +{ >> + return (1ULL << smmu->sid_bits); >> +} >> + > I'm wondering if it makes sense to move this up and put it right > before arm_smmu_strtab_l1_idx(). That way, all the arm_smmu_strtab_* > functions are in one place. I did it. But the function uses struct arm_smmu_device which is defined after those arm_smmu_strtab_* helpers. I have to put the helper after struct arm_smmu_device definition to avoid compile error. We may consider re-organize the header file to group them better, but I don't think it is urgent enough and it seems out of the scope of the bug fix patch. I really want to have the bug fix landed in upstream ASAP. > > On a related note, in arm_smmu_init_strtab_2lvl() we're capping the > number of l1 entries at STRTAB_MAX_L1_ENTRIES for 2 level stream > tables. I'm thinking it would make sense to limit the size of linear > stream tables for the same reasons. Yes, this also works. But I don't know what value should be used. Jason actually suggested (size > SIZE_512M) in v2 review, but I thought the value is a magic number. Why 512M? Just because it is too large for allocation. So I picked up SIZE_MAX, just because it is the largest size supported by size_t type.
On Fri, Oct 4, 2024 at 2:47 PM Yang Shi <yang@os.amperecomputing.com> wrote: > On 10/4/24 2:14 PM, Daniel Mentz wrote: > > On Fri, Oct 4, 2024 at 11:04 AM Yang Shi <yang@os.amperecomputing.com> wrote: > >> static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu) > >> { > >> - u32 size; > >> + u64 size; > >> struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; > >> + u64 num_sids = arm_smmu_strtab_num_sids(smmu); > >> + > >> + size = num_sids * sizeof(struct arm_smmu_ste); > >> + /* The max size for dmam_alloc_coherent() is 32-bit */ > > I'd remove this comment. I assume the intent here was to say that the > > maximum size is 4GB (not 32 bit). I also can't find any reference to > > this limitation. Where does dmam_alloc_coherent() limit the size of an > > allocation to 4GB? Also, this comment might not be applicable to 64 > > bit platforms. > > The "size" parameter passed to dmam_alloc_coherent() is size_t type > which is unsigned int. I believe that this is true only for 32 bit platforms. On arm64, unsigned int is 32 bit, whereas size_t is 64 bit. I'm still in favor of removing that comment, because it's not applicable to arm64. > > > >> - cfg->linear.num_ents = 1 << smmu->sid_bits; > >> + cfg->linear.num_ents = num_sids; > > If you're worried about 32 bit platforms, then I'm wondering if this > > also needs some attention. cfg->linear.num_ents is defined as an > > unsigned int and num_sids could potentially be outside the range of an > > unsigned int on 32 bit platforms. > > The (size > SIZE_MAX) check can guarantee excessively large num_sids > won't reach here. Now that I think about it, unsigned int is 32 bit even on arm64. So, I'm afraid this could (theoretically) overflow. On arm64, I don't think that the (size > SIZE_MAX) check will prevent this. > >> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > >> index 1e9952ca989f..c8ceddc5e8ef 100644 > >> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > >> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > >> @@ -853,6 +853,11 @@ struct arm_smmu_master_domain { > >> ioasid_t ssid; > >> }; > >> > >> +static inline u64 arm_smmu_strtab_num_sids(struct arm_smmu_device *smmu) > >> +{ > >> + return (1ULL << smmu->sid_bits); > >> +} > >> + > > I'm wondering if it makes sense to move this up and put it right > > before arm_smmu_strtab_l1_idx(). That way, all the arm_smmu_strtab_* > > functions are in one place. > > I did it. But the function uses struct arm_smmu_device which is defined > after those arm_smmu_strtab_* helpers. I have to put the helper after > struct arm_smmu_device definition to avoid compile error. We may > consider re-organize the header file to group them better, but I don't > think it is urgent enough and it seems out of the scope of the bug fix > patch. I really want to have the bug fix landed in upstream ASAP. Understood. Thanks. We could move the changes in arm_smmu_init_strtab_linear() into a separate patch to accelerate the process. I'm fine either way, though. I don't want to get in the way of this landing upstream. > > > > > On a related note, in arm_smmu_init_strtab_2lvl() we're capping the > > number of l1 entries at STRTAB_MAX_L1_ENTRIES for 2 level stream > > tables. I'm thinking it would make sense to limit the size of linear > > stream tables for the same reasons. > > Yes, this also works. But I don't know what value should be used. Jason > actually suggested (size > SIZE_512M) in v2 review, but I thought the > value is a magic number. Why 512M? Just because it is too large for > allocation. So I picked up SIZE_MAX, just because it is the largest size > supported by size_t type. I think it should be capped to STRTAB_MAX_L1_ENTRIES
On 10/4/24 6:03 PM, Daniel Mentz wrote: > On Fri, Oct 4, 2024 at 2:47 PM Yang Shi <yang@os.amperecomputing.com> wrote: >> On 10/4/24 2:14 PM, Daniel Mentz wrote: >>> On Fri, Oct 4, 2024 at 11:04 AM Yang Shi <yang@os.amperecomputing.com> wrote: >>>> static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu) >>>> { >>>> - u32 size; >>>> + u64 size; >>>> struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; >>>> + u64 num_sids = arm_smmu_strtab_num_sids(smmu); >>>> + >>>> + size = num_sids * sizeof(struct arm_smmu_ste); >>>> + /* The max size for dmam_alloc_coherent() is 32-bit */ >>> I'd remove this comment. I assume the intent here was to say that the >>> maximum size is 4GB (not 32 bit). I also can't find any reference to >>> this limitation. Where does dmam_alloc_coherent() limit the size of an >>> allocation to 4GB? Also, this comment might not be applicable to 64 >>> bit platforms. >> The "size" parameter passed to dmam_alloc_coherent() is size_t type >> which is unsigned int. > I believe that this is true only for 32 bit platforms. On arm64, > unsigned int is 32 bit, whereas size_t is 64 bit. I'm still in favor > of removing that comment, because it's not applicable to arm64. Thanks for figuring this out. Yes, you are right. I missed this point. > >>>> - cfg->linear.num_ents = 1 << smmu->sid_bits; >>>> + cfg->linear.num_ents = num_sids; >>> If you're worried about 32 bit platforms, then I'm wondering if this >>> also needs some attention. cfg->linear.num_ents is defined as an >>> unsigned int and num_sids could potentially be outside the range of an >>> unsigned int on 32 bit platforms. >> The (size > SIZE_MAX) check can guarantee excessively large num_sids >> won't reach here. > Now that I think about it, unsigned int is 32 bit even on arm64. So, > I'm afraid this could (theoretically) overflow. On arm64, I don't > think that the (size > SIZE_MAX) check will prevent this. Yes, SIZE_MAX is ~(size_t)0, but size_t is unsigned long on ARM64. So the check actually doesn't do what I expect it should do. U32_MAX should be used. > >>>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h >>>> index 1e9952ca989f..c8ceddc5e8ef 100644 >>>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h >>>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h >>>> @@ -853,6 +853,11 @@ struct arm_smmu_master_domain { >>>> ioasid_t ssid; >>>> }; >>>> >>>> +static inline u64 arm_smmu_strtab_num_sids(struct arm_smmu_device *smmu) >>>> +{ >>>> + return (1ULL << smmu->sid_bits); >>>> +} >>>> + >>> I'm wondering if it makes sense to move this up and put it right >>> before arm_smmu_strtab_l1_idx(). That way, all the arm_smmu_strtab_* >>> functions are in one place. >> I did it. But the function uses struct arm_smmu_device which is defined >> after those arm_smmu_strtab_* helpers. I have to put the helper after >> struct arm_smmu_device definition to avoid compile error. We may >> consider re-organize the header file to group them better, but I don't >> think it is urgent enough and it seems out of the scope of the bug fix >> patch. I really want to have the bug fix landed in upstream ASAP. > Understood. Thanks. We could move the changes in > arm_smmu_init_strtab_linear() into a separate patch to accelerate the > process. I'm fine either way, though. I don't want to get in the way > of this landing upstream. Thank you for your understanding. > >>> On a related note, in arm_smmu_init_strtab_2lvl() we're capping the >>> number of l1 entries at STRTAB_MAX_L1_ENTRIES for 2 level stream >>> tables. I'm thinking it would make sense to limit the size of linear >>> stream tables for the same reasons. >> Yes, this also works. But I don't know what value should be used. Jason >> actually suggested (size > SIZE_512M) in v2 review, but I thought the >> value is a magic number. Why 512M? Just because it is too large for >> allocation. So I picked up SIZE_MAX, just because it is the largest size >> supported by size_t type. > I think it should be capped to STRTAB_MAX_L1_ENTRIES I'm not expert on SMMU. Does the linear stream table have the same cap as 2-level stream table? Is this defined by the hardware spec? If it is not, why should we pick this value?
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 737c5b882355..9d4fc91d9258 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3624,8 +3624,9 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu) { u32 l1size; struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; + u64 num_sids = arm_smmu_strtab_num_sids(smmu); unsigned int last_sid_idx = - arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1); + arm_smmu_strtab_l1_idx(num_sids - 1); /* Calculate the L1 size, capped to the SIDSIZE. */ cfg->l2.num_l1_ents = min(last_sid_idx + 1, STRTAB_MAX_L1_ENTRIES); @@ -3655,20 +3656,25 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu) static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu) { - u32 size; + u64 size; struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; + u64 num_sids = arm_smmu_strtab_num_sids(smmu); + + size = num_sids * sizeof(struct arm_smmu_ste); + /* The max size for dmam_alloc_coherent() is 32-bit */ + if (size > SIZE_MAX) + return -EINVAL; - size = (1 << smmu->sid_bits) * sizeof(struct arm_smmu_ste); cfg->linear.table = dmam_alloc_coherent(smmu->dev, size, &cfg->linear.ste_dma, GFP_KERNEL); if (!cfg->linear.table) { dev_err(smmu->dev, - "failed to allocate linear stream table (%u bytes)\n", + "failed to allocate linear stream table (%llu bytes)\n", size); return -ENOMEM; } - cfg->linear.num_ents = 1 << smmu->sid_bits; + cfg->linear.num_ents = num_sids; arm_smmu_init_initial_stes(cfg->linear.table, cfg->linear.num_ents); return 0; diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 1e9952ca989f..c8ceddc5e8ef 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -853,6 +853,11 @@ struct arm_smmu_master_domain { ioasid_t ssid; }; +static inline u64 arm_smmu_strtab_num_sids(struct arm_smmu_device *smmu) +{ + return (1ULL << smmu->sid_bits); +} + static inline struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) { return container_of(dom, struct arm_smmu_domain, domain);