Message ID | 20240701110241.2005222-10-smostafa@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | SMMUv3 nested translation support | expand |
On Mon, Jul 01, 2024 at 11:02:31AM +0000, Mostafa Saleh wrote: > In the next patch, combine_tlb() will be added which combines 2 TLB > entries into one for nested translations, which chooses the granule > and level from the smallest entry. > > This means that with nested translation, an entry can be cached with > the granule of stage-2 and not stage-1. > > However, currently, the lookup for an IOVA is done with input stage > granule, which is stage-1 for nested configuration, which will not > work with the above logic. > This patch reworks lookup in that case, so it falls back to stage-2 > granule if no entry is found using stage-1 granule. Why not initialize tt_combined to the minimum granule of stages 1 and 2? It looks like you introduced it for this. I'm wondering if we lookup the wrong IOVA if changing the granule size after the address is masked in smmu_translate() Thanks, Jean > > Signed-off-by: Mostafa Saleh <smostafa@google.com> > --- > hw/arm/smmu-common.c | 36 ++++++++++++++++++++++++++++++++++-- > 1 file changed, 34 insertions(+), 2 deletions(-) > > diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c > index 21982621c0..0840b5cffd 100644 > --- a/hw/arm/smmu-common.c > +++ b/hw/arm/smmu-common.c > @@ -66,8 +66,10 @@ SMMUIOTLBKey smmu_get_iotlb_key(int asid, int vmid, uint64_t iova, > return key; > } > > -SMMUTLBEntry *smmu_iotlb_lookup(SMMUState *bs, SMMUTransCfg *cfg, > - SMMUTransTableInfo *tt, hwaddr iova) > +static SMMUTLBEntry *smmu_iotlb_lookup_all_levels(SMMUState *bs, > + SMMUTransCfg *cfg, > + SMMUTransTableInfo *tt, > + hwaddr iova) > { > uint8_t tg = (tt->granule_sz - 10) / 2; > uint8_t inputsize = 64 - tt->tsz; > @@ -88,6 +90,36 @@ SMMUTLBEntry *smmu_iotlb_lookup(SMMUState *bs, SMMUTransCfg *cfg, > } > level++; > } > + return entry; > +} > + > +/** > + * smmu_iotlb_lookup - Look up for a TLB entry. > + * @bs: SMMU state which includes the TLB instance > + * @cfg: Configuration of the translation > + * @tt: Translation table info (granule and tsz) > + * @iova: IOVA address to lookup > + * > + * returns a valid entry on success, otherwise NULL. > + * In case of nested translation, tt can be updated to include > + * the granule of the found entry as it might different from > + * the IOVA granule. > + */ > +SMMUTLBEntry *smmu_iotlb_lookup(SMMUState *bs, SMMUTransCfg *cfg, > + SMMUTransTableInfo *tt, hwaddr iova) > +{ > + SMMUTLBEntry *entry = NULL; > + > + entry = smmu_iotlb_lookup_all_levels(bs, cfg, tt, iova); > + /* > + * For nested translation also try the s2 granule, as the TLB will insert > + * it if the size of s2 tlb entry was smaller. > + */ > + if (!entry && (cfg->stage == SMMU_NESTED) && > + (cfg->s2cfg.granule_sz != tt->granule_sz)) { > + tt->granule_sz = cfg->s2cfg.granule_sz; > + entry = smmu_iotlb_lookup_all_levels(bs, cfg, tt, iova); > + } > > if (entry) { > cfg->iotlb_hits++; > -- > 2.45.2.803.g4e1b14247a-goog >
Hi Jean, On Thu, Jul 04, 2024 at 07:12:35PM +0100, Jean-Philippe Brucker wrote: > On Mon, Jul 01, 2024 at 11:02:31AM +0000, Mostafa Saleh wrote: > > In the next patch, combine_tlb() will be added which combines 2 TLB > > entries into one for nested translations, which chooses the granule > > and level from the smallest entry. > > > > This means that with nested translation, an entry can be cached with > > the granule of stage-2 and not stage-1. > > > > However, currently, the lookup for an IOVA is done with input stage > > granule, which is stage-1 for nested configuration, which will not > > work with the above logic. > > This patch reworks lookup in that case, so it falls back to stage-2 > > granule if no entry is found using stage-1 granule. > > Why not initialize tt_combined to the minimum granule of stages 1 and 2? > It looks like you introduced it for this. I'm wondering if we lookup the > wrong IOVA if changing the granule size after the address is masked in > smmu_translate() I am not sure I fully understand, but I don’t think that would work as it is not guaranteed that the minimum granule is the one that would be cached, as we might hit block mappings. The IOVA at first is masked with the first stage mask for the expected page address, and the lookup logic would mask the address for each level look up, so It should match the alignment of the cached page of that granule and level, and as the combine logic is done with the aligned_addr it is guaranteed by construction that it has to be aligned with stage-1. Thanks, Mostafa > > Thanks, > Jean > > > > > Signed-off-by: Mostafa Saleh <smostafa@google.com> > > --- > > hw/arm/smmu-common.c | 36 ++++++++++++++++++++++++++++++++++-- > > 1 file changed, 34 insertions(+), 2 deletions(-) > > > > diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c > > index 21982621c0..0840b5cffd 100644 > > --- a/hw/arm/smmu-common.c > > +++ b/hw/arm/smmu-common.c > > @@ -66,8 +66,10 @@ SMMUIOTLBKey smmu_get_iotlb_key(int asid, int vmid, uint64_t iova, > > return key; > > } > > > > -SMMUTLBEntry *smmu_iotlb_lookup(SMMUState *bs, SMMUTransCfg *cfg, > > - SMMUTransTableInfo *tt, hwaddr iova) > > +static SMMUTLBEntry *smmu_iotlb_lookup_all_levels(SMMUState *bs, > > + SMMUTransCfg *cfg, > > + SMMUTransTableInfo *tt, > > + hwaddr iova) > > { > > uint8_t tg = (tt->granule_sz - 10) / 2; > > uint8_t inputsize = 64 - tt->tsz; > > @@ -88,6 +90,36 @@ SMMUTLBEntry *smmu_iotlb_lookup(SMMUState *bs, SMMUTransCfg *cfg, > > } > > level++; > > } > > + return entry; > > +} > > + > > +/** > > + * smmu_iotlb_lookup - Look up for a TLB entry. > > + * @bs: SMMU state which includes the TLB instance > > + * @cfg: Configuration of the translation > > + * @tt: Translation table info (granule and tsz) > > + * @iova: IOVA address to lookup > > + * > > + * returns a valid entry on success, otherwise NULL. > > + * In case of nested translation, tt can be updated to include > > + * the granule of the found entry as it might different from > > + * the IOVA granule. > > + */ > > +SMMUTLBEntry *smmu_iotlb_lookup(SMMUState *bs, SMMUTransCfg *cfg, > > + SMMUTransTableInfo *tt, hwaddr iova) > > +{ > > + SMMUTLBEntry *entry = NULL; > > + > > + entry = smmu_iotlb_lookup_all_levels(bs, cfg, tt, iova); > > + /* > > + * For nested translation also try the s2 granule, as the TLB will insert > > + * it if the size of s2 tlb entry was smaller. > > + */ > > + if (!entry && (cfg->stage == SMMU_NESTED) && > > + (cfg->s2cfg.granule_sz != tt->granule_sz)) { > > + tt->granule_sz = cfg->s2cfg.granule_sz; > > + entry = smmu_iotlb_lookup_all_levels(bs, cfg, tt, iova); > > + } > > > > if (entry) { > > cfg->iotlb_hits++; > > -- > > 2.45.2.803.g4e1b14247a-goog > >
On Tue, Jul 09, 2024 at 07:14:19AM +0000, Mostafa Saleh wrote: > Hi Jean, > > On Thu, Jul 04, 2024 at 07:12:35PM +0100, Jean-Philippe Brucker wrote: > > On Mon, Jul 01, 2024 at 11:02:31AM +0000, Mostafa Saleh wrote: > > > In the next patch, combine_tlb() will be added which combines 2 TLB > > > entries into one for nested translations, which chooses the granule > > > and level from the smallest entry. > > > > > > This means that with nested translation, an entry can be cached with > > > the granule of stage-2 and not stage-1. > > > > > > However, currently, the lookup for an IOVA is done with input stage > > > granule, which is stage-1 for nested configuration, which will not > > > work with the above logic. > > > This patch reworks lookup in that case, so it falls back to stage-2 > > > granule if no entry is found using stage-1 granule. > > > > Why not initialize tt_combined to the minimum granule of stages 1 and 2? > > It looks like you introduced it for this. I'm wondering if we lookup the > > wrong IOVA if changing the granule size after the address is masked in > > smmu_translate() > > I am not sure I fully understand, but I don’t think that would work as it is > not guaranteed that the minimum granule is the one that would be cached, > as we might hit block mappings. > > The IOVA at first is masked with the first stage mask for the expected page > address, and the lookup logic would mask the address for each level look up, > so It should match the alignment of the cached page of that granule and level, > and as the combine logic is done with the aligned_addr it is guaranteed by > construction that it has to be aligned with stage-1. I missed something, this is what I had in mind initially: * s1 granule is 64k, s2 granule is 4k * the tlb already contains a translations for IOVA 0x30000, tg=4k * now we lookup IOVA 0x31000. Masked with the s1 granule, aligned_addr is 0x30000. Not found at first because lookup is with tg=64k, but then we call smmu_iotlb_lookup_all_levels() again with the s2 granule and the same IOVA, which returns the wrong translation But it's not actually possible, because if cfg->stage == SMMU_NESTED, then in smmu_translate() we end up with } else { /* Stage2. */ tt_combined.granule_sz = cfg->s2cfg.granule_sz; So I think the condition (cfg->stage == SMMU_NESTED) && (cfg->s2cfg.granule_sz != tt->granule_sz) in this patch is never true? Then the following scenario: * s1 granule is 4k, s2 granule is 64k * we lookup IOVA A, miss. The translation gets cached with granule 4k * we lookup IOVA A again, but with tt->granule_sz = 64k so we'll never find the entry? I guess we want to start the lookup with the smallest granule, and then if the s1 and s2 granules differ, retry with the other one. Or with SMMU_NESTED, start with the s1 granule and keep this patch to fallback to s2 granule, but without masking the IOVA in smmu_translate() (it will be masked correctly by smmu_iotlb_lookup_all_levels()). Thanks, Jean > > Thanks, > Mostafa > > > > > Thanks, > > Jean > > > > > > > > Signed-off-by: Mostafa Saleh <smostafa@google.com> > > > --- > > > hw/arm/smmu-common.c | 36 ++++++++++++++++++++++++++++++++++-- > > > 1 file changed, 34 insertions(+), 2 deletions(-) > > > > > > diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c > > > index 21982621c0..0840b5cffd 100644 > > > --- a/hw/arm/smmu-common.c > > > +++ b/hw/arm/smmu-common.c > > > @@ -66,8 +66,10 @@ SMMUIOTLBKey smmu_get_iotlb_key(int asid, int vmid, uint64_t iova, > > > return key; > > > } > > > > > > -SMMUTLBEntry *smmu_iotlb_lookup(SMMUState *bs, SMMUTransCfg *cfg, > > > - SMMUTransTableInfo *tt, hwaddr iova) > > > +static SMMUTLBEntry *smmu_iotlb_lookup_all_levels(SMMUState *bs, > > > + SMMUTransCfg *cfg, > > > + SMMUTransTableInfo *tt, > > > + hwaddr iova) > > > { > > > uint8_t tg = (tt->granule_sz - 10) / 2; > > > uint8_t inputsize = 64 - tt->tsz; > > > @@ -88,6 +90,36 @@ SMMUTLBEntry *smmu_iotlb_lookup(SMMUState *bs, SMMUTransCfg *cfg, > > > } > > > level++; > > > } > > > + return entry; > > > +} > > > + > > > +/** > > > + * smmu_iotlb_lookup - Look up for a TLB entry. > > > + * @bs: SMMU state which includes the TLB instance > > > + * @cfg: Configuration of the translation > > > + * @tt: Translation table info (granule and tsz) > > > + * @iova: IOVA address to lookup > > > + * > > > + * returns a valid entry on success, otherwise NULL. > > > + * In case of nested translation, tt can be updated to include > > > + * the granule of the found entry as it might different from > > > + * the IOVA granule. > > > + */ > > > +SMMUTLBEntry *smmu_iotlb_lookup(SMMUState *bs, SMMUTransCfg *cfg, > > > + SMMUTransTableInfo *tt, hwaddr iova) > > > +{ > > > + SMMUTLBEntry *entry = NULL; > > > + > > > + entry = smmu_iotlb_lookup_all_levels(bs, cfg, tt, iova); > > > + /* > > > + * For nested translation also try the s2 granule, as the TLB will insert > > > + * it if the size of s2 tlb entry was smaller. > > > + */ > > > + if (!entry && (cfg->stage == SMMU_NESTED) && > > > + (cfg->s2cfg.granule_sz != tt->granule_sz)) { > > > + tt->granule_sz = cfg->s2cfg.granule_sz; > > > + entry = smmu_iotlb_lookup_all_levels(bs, cfg, tt, iova); > > > + } > > > > > > if (entry) { > > > cfg->iotlb_hits++; > > > -- > > > 2.45.2.803.g4e1b14247a-goog > > >
Hi Jean, On Tue, Jul 09, 2024 at 06:13:45PM +0100, Jean-Philippe Brucker wrote: > On Tue, Jul 09, 2024 at 07:14:19AM +0000, Mostafa Saleh wrote: > > Hi Jean, > > > > On Thu, Jul 04, 2024 at 07:12:35PM +0100, Jean-Philippe Brucker wrote: > > > On Mon, Jul 01, 2024 at 11:02:31AM +0000, Mostafa Saleh wrote: > > > > In the next patch, combine_tlb() will be added which combines 2 TLB > > > > entries into one for nested translations, which chooses the granule > > > > and level from the smallest entry. > > > > > > > > This means that with nested translation, an entry can be cached with > > > > the granule of stage-2 and not stage-1. > > > > > > > > However, currently, the lookup for an IOVA is done with input stage > > > > granule, which is stage-1 for nested configuration, which will not > > > > work with the above logic. > > > > This patch reworks lookup in that case, so it falls back to stage-2 > > > > granule if no entry is found using stage-1 granule. > > > > > > Why not initialize tt_combined to the minimum granule of stages 1 and 2? > > > It looks like you introduced it for this. I'm wondering if we lookup the > > > wrong IOVA if changing the granule size after the address is masked in > > > smmu_translate() > > > > I am not sure I fully understand, but I don’t think that would work as it is > > not guaranteed that the minimum granule is the one that would be cached, > > as we might hit block mappings. > > > > The IOVA at first is masked with the first stage mask for the expected page > > address, and the lookup logic would mask the address for each level look up, > > so It should match the alignment of the cached page of that granule and level, > > and as the combine logic is done with the aligned_addr it is guaranteed by > > construction that it has to be aligned with stage-1. > > I missed something, this is what I had in mind initially: > > * s1 granule is 64k, s2 granule is 4k > * the tlb already contains a translations for IOVA 0x30000, tg=4k > * now we lookup IOVA 0x31000. Masked with the s1 granule, aligned_addr is > 0x30000. Not found at first because lookup is with tg=64k, but then we > call smmu_iotlb_lookup_all_levels() again with the s2 granule and the > same IOVA, which returns the wrong translation If the granules are s1=64k, s2=4k, the only way we get a cached entry as (IOVA 0x30000, tg=4k) would be for s2 and level-3 as for level-2 it has to be aligned with 0x200000 So when we look up for 0x31000, there is no entry for it anyway. But I can see some problems here: In case also s1 granule is 64k, s2 granule is 4k - Translation A: 0x31000 - TLB is empty => PTW, entry s1 = 64k 0x30000, s2 = 4k, 0x30000 and the cached entry would be 0x30000,tg=4k as the combine logic also uses the aligned address - Translation B: 0x31000 => also misses as the only cached entry is 0x30000, 4k I think this is actually a bug and not just a TLB inefficiency, I need to think more about it, but my initial thought is not to align the iova until it’s used by a stage so it can use its granule. > > But it's not actually possible, because if cfg->stage == SMMU_NESTED, then > in smmu_translate() we end up with > > } else { > /* Stage2. */ > tt_combined.granule_sz = cfg->s2cfg.granule_sz; > > So I think the condition > > (cfg->stage == SMMU_NESTED) && (cfg->s2cfg.granule_sz != tt->granule_sz) > > in this patch is never true? > Ah, that’s a bug, I will fix it, NESTED should use stage-1 granule. > > Then the following scenario: > > * s1 granule is 4k, s2 granule is 64k > * we lookup IOVA A, miss. The translation gets cached with granule 4k > * we lookup IOVA A again, but with tt->granule_sz = 64k so we'll > never find the entry? > > > I guess we want to start the lookup with the smallest granule, and then if > the s1 and s2 granules differ, retry with the other one. Or with > SMMU_NESTED, start with the s1 granule and keep this patch to fallback to > s2 granule, but without masking the IOVA in smmu_translate() (it will be > masked correctly by smmu_iotlb_lookup_all_levels()). Thanks for pointing that out, I will think more about it but I sense that we would need to modify where we align the iova, for translation and lookup. Thanks, Mostafa > > Thanks, > Jean > > > > > Thanks, > > Mostafa > > > > > > > > Thanks, > > > Jean > > > > > > > > > > > Signed-off-by: Mostafa Saleh <smostafa@google.com> > > > > --- > > > > hw/arm/smmu-common.c | 36 ++++++++++++++++++++++++++++++++++-- > > > > 1 file changed, 34 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c > > > > index 21982621c0..0840b5cffd 100644 > > > > --- a/hw/arm/smmu-common.c > > > > +++ b/hw/arm/smmu-common.c > > > > @@ -66,8 +66,10 @@ SMMUIOTLBKey smmu_get_iotlb_key(int asid, int vmid, uint64_t iova, > > > > return key; > > > > } > > > > > > > > -SMMUTLBEntry *smmu_iotlb_lookup(SMMUState *bs, SMMUTransCfg *cfg, > > > > - SMMUTransTableInfo *tt, hwaddr iova) > > > > +static SMMUTLBEntry *smmu_iotlb_lookup_all_levels(SMMUState *bs, > > > > + SMMUTransCfg *cfg, > > > > + SMMUTransTableInfo *tt, > > > > + hwaddr iova) > > > > { > > > > uint8_t tg = (tt->granule_sz - 10) / 2; > > > > uint8_t inputsize = 64 - tt->tsz; > > > > @@ -88,6 +90,36 @@ SMMUTLBEntry *smmu_iotlb_lookup(SMMUState *bs, SMMUTransCfg *cfg, > > > > } > > > > level++; > > > > } > > > > + return entry; > > > > +} > > > > + > > > > +/** > > > > + * smmu_iotlb_lookup - Look up for a TLB entry. > > > > + * @bs: SMMU state which includes the TLB instance > > > > + * @cfg: Configuration of the translation > > > > + * @tt: Translation table info (granule and tsz) > > > > + * @iova: IOVA address to lookup > > > > + * > > > > + * returns a valid entry on success, otherwise NULL. > > > > + * In case of nested translation, tt can be updated to include > > > > + * the granule of the found entry as it might different from > > > > + * the IOVA granule. > > > > + */ > > > > +SMMUTLBEntry *smmu_iotlb_lookup(SMMUState *bs, SMMUTransCfg *cfg, > > > > + SMMUTransTableInfo *tt, hwaddr iova) > > > > +{ > > > > + SMMUTLBEntry *entry = NULL; > > > > + > > > > + entry = smmu_iotlb_lookup_all_levels(bs, cfg, tt, iova); > > > > + /* > > > > + * For nested translation also try the s2 granule, as the TLB will insert > > > > + * it if the size of s2 tlb entry was smaller. > > > > + */ > > > > + if (!entry && (cfg->stage == SMMU_NESTED) && > > > > + (cfg->s2cfg.granule_sz != tt->granule_sz)) { > > > > + tt->granule_sz = cfg->s2cfg.granule_sz; > > > > + entry = smmu_iotlb_lookup_all_levels(bs, cfg, tt, iova); > > > > + } > > > > > > > > if (entry) { > > > > cfg->iotlb_hits++; > > > > -- > > > > 2.45.2.803.g4e1b14247a-goog > > > >
diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c index 21982621c0..0840b5cffd 100644 --- a/hw/arm/smmu-common.c +++ b/hw/arm/smmu-common.c @@ -66,8 +66,10 @@ SMMUIOTLBKey smmu_get_iotlb_key(int asid, int vmid, uint64_t iova, return key; } -SMMUTLBEntry *smmu_iotlb_lookup(SMMUState *bs, SMMUTransCfg *cfg, - SMMUTransTableInfo *tt, hwaddr iova) +static SMMUTLBEntry *smmu_iotlb_lookup_all_levels(SMMUState *bs, + SMMUTransCfg *cfg, + SMMUTransTableInfo *tt, + hwaddr iova) { uint8_t tg = (tt->granule_sz - 10) / 2; uint8_t inputsize = 64 - tt->tsz; @@ -88,6 +90,36 @@ SMMUTLBEntry *smmu_iotlb_lookup(SMMUState *bs, SMMUTransCfg *cfg, } level++; } + return entry; +} + +/** + * smmu_iotlb_lookup - Look up for a TLB entry. + * @bs: SMMU state which includes the TLB instance + * @cfg: Configuration of the translation + * @tt: Translation table info (granule and tsz) + * @iova: IOVA address to lookup + * + * returns a valid entry on success, otherwise NULL. + * In case of nested translation, tt can be updated to include + * the granule of the found entry as it might different from + * the IOVA granule. + */ +SMMUTLBEntry *smmu_iotlb_lookup(SMMUState *bs, SMMUTransCfg *cfg, + SMMUTransTableInfo *tt, hwaddr iova) +{ + SMMUTLBEntry *entry = NULL; + + entry = smmu_iotlb_lookup_all_levels(bs, cfg, tt, iova); + /* + * For nested translation also try the s2 granule, as the TLB will insert + * it if the size of s2 tlb entry was smaller. + */ + if (!entry && (cfg->stage == SMMU_NESTED) && + (cfg->s2cfg.granule_sz != tt->granule_sz)) { + tt->granule_sz = cfg->s2cfg.granule_sz; + entry = smmu_iotlb_lookup_all_levels(bs, cfg, tt, iova); + } if (entry) { cfg->iotlb_hits++;
In the next patch, combine_tlb() will be added which combines 2 TLB entries into one for nested translations, which chooses the granule and level from the smallest entry. This means that with nested translation, an entry can be cached with the granule of stage-2 and not stage-1. However, currently, the lookup for an IOVA is done with input stage granule, which is stage-1 for nested configuration, which will not work with the above logic. This patch reworks lookup in that case, so it falls back to stage-2 granule if no entry is found using stage-1 granule. Signed-off-by: Mostafa Saleh <smostafa@google.com> --- hw/arm/smmu-common.c | 36 ++++++++++++++++++++++++++++++++++-- 1 file changed, 34 insertions(+), 2 deletions(-)