Message ID | 20230409063000.3559991-5-ricarkol@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Implement Eager Page Splitting for ARM | expand |
On 4/9/23 2:29 PM, Ricardo Koller wrote: > Add a stage2 helper, kvm_pgtable_stage2_create_unlinked(), for > creating unlinked tables (which is the opposite of > kvm_pgtable_stage2_free_unlinked()). Creating an unlinked table is > useful for splitting level 1 and 2 entries into subtrees of PAGE_SIZE > PTEs. For example, a level 1 entry can be split into PAGE_SIZE PTEs > by first creating a fully populated tree, and then use it to replace > the level 1 entry in a single step. This will be used in a subsequent > commit for eager huge-page splitting (a dirty-logging optimization). > > Signed-off-by: Ricardo Koller <ricarkol@google.com> > Reviewed-by: Shaoqin Huang <shahuang@redhat.com> > --- > arch/arm64/include/asm/kvm_pgtable.h | 26 +++++++++++++++ > arch/arm64/kvm/hyp/pgtable.c | 49 ++++++++++++++++++++++++++++ > 2 files changed, 75 insertions(+) > With the following nits addressed: Reviewed-by: Gavin Shan <gshan@redhat.com> > diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h > index 3f2d43ba2b628..c8e0e7d9303b2 100644 > --- a/arch/arm64/include/asm/kvm_pgtable.h > +++ b/arch/arm64/include/asm/kvm_pgtable.h > @@ -458,6 +458,32 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt); > */ > void kvm_pgtable_stage2_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level); > > +/** > + * kvm_pgtable_stage2_create_unlinked() - Create an unlinked stage-2 paging structure. > + * @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*(). > + * @phys: Physical address of the memory to map. > + * @level: Starting level of the stage-2 paging structure to be created. > + * @prot: Permissions and attributes for the mapping. > + * @mc: Cache of pre-allocated and zeroed memory from which to allocate ^^^^^^^^ Alignment. > + * page-table pages. > + * @force_pte: Force mappings to PAGE_SIZE granularity. > + * > + * Returns an unlinked page-table tree. This new page-table tree is > + * not reachable (i.e., it is unlinked) from the root pgd and it's > + * therefore unreachableby the hardware page-table walker. No TLB > + * invalidation or CMOs are performed. > + * > + * If device attributes are not explicitly requested in @prot, then the > + * mapping will be normal, cacheable. > + * > + * Return: The fully populated (unlinked) stage-2 paging structure, or > + * an ERR_PTR(error) on failure. > + */ > +kvm_pte_t *kvm_pgtable_stage2_create_unlinked(struct kvm_pgtable *pgt, > + u64 phys, u32 level, > + enum kvm_pgtable_prot prot, > + void *mc, bool force_pte); > + > /** > * kvm_pgtable_stage2_map() - Install a mapping in a guest stage-2 page-table. > * @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*(). > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c > index 633679ee3c49a..477d2be67d401 100644 > --- a/arch/arm64/kvm/hyp/pgtable.c > +++ b/arch/arm64/kvm/hyp/pgtable.c > @@ -1222,6 +1222,55 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size) > return kvm_pgtable_walk(pgt, addr, size, &walker); > } > > +kvm_pte_t *kvm_pgtable_stage2_create_unlinked(struct kvm_pgtable *pgt, > + u64 phys, u32 level, > + enum kvm_pgtable_prot prot, > + void *mc, bool force_pte) > +{ > + struct stage2_map_data map_data = { > + .phys = phys, > + .mmu = pgt->mmu, > + .memcache = mc, > + .force_pte = force_pte, > + }; > + struct kvm_pgtable_walker walker = { > + .cb = stage2_map_walker, > + .flags = KVM_PGTABLE_WALK_LEAF | > + KVM_PGTABLE_WALK_SKIP_BBM_TLBI | > + KVM_PGTABLE_WALK_SKIP_CMO, > + .arg = &map_data, > + }; > + /* .addr (the IPA) is irrelevant for an unlinked table */ > + struct kvm_pgtable_walk_data data = { > + .walker = &walker, > + .addr = 0, > + .end = kvm_granule_size(level), > + }; The comment about '.addr' seems incorrect. The IPA address is still used to locate the page table entry, so I think it would be something like below: /* The IPA address (.addr) is relative to zero */ > + struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops; > + kvm_pte_t *pgtable; > + int ret; > + > + if (!IS_ALIGNED(phys, kvm_granule_size(level))) > + return ERR_PTR(-EINVAL); > + > + ret = stage2_set_prot_attr(pgt, prot, &map_data.attr); > + if (ret) > + return ERR_PTR(ret); > + > + pgtable = mm_ops->zalloc_page(mc); > + if (!pgtable) > + return ERR_PTR(-ENOMEM); > + > + ret = __kvm_pgtable_walk(&data, mm_ops, (kvm_pteref_t)pgtable, > + level + 1); > + if (ret) { > + kvm_pgtable_stage2_free_unlinked(mm_ops, pgtable, level); > + mm_ops->put_page(pgtable); > + return ERR_PTR(ret); > + } > + > + return pgtable; > +} > > int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu, > struct kvm_pgtable_mm_ops *mm_ops, > Thanks, Gavin
On Mon, Apr 17, 2023 at 02:18:26PM +0800, Gavin Shan wrote: > On 4/9/23 2:29 PM, Ricardo Koller wrote: > > Add a stage2 helper, kvm_pgtable_stage2_create_unlinked(), for > > creating unlinked tables (which is the opposite of > > kvm_pgtable_stage2_free_unlinked()). Creating an unlinked table is > > useful for splitting level 1 and 2 entries into subtrees of PAGE_SIZE > > PTEs. For example, a level 1 entry can be split into PAGE_SIZE PTEs > > by first creating a fully populated tree, and then use it to replace > > the level 1 entry in a single step. This will be used in a subsequent > > commit for eager huge-page splitting (a dirty-logging optimization). > > > > Signed-off-by: Ricardo Koller <ricarkol@google.com> > > Reviewed-by: Shaoqin Huang <shahuang@redhat.com> > > --- > > arch/arm64/include/asm/kvm_pgtable.h | 26 +++++++++++++++ > > arch/arm64/kvm/hyp/pgtable.c | 49 ++++++++++++++++++++++++++++ > > 2 files changed, 75 insertions(+) > > > > With the following nits addressed: > > Reviewed-by: Gavin Shan <gshan@redhat.com> > > > diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h > > index 3f2d43ba2b628..c8e0e7d9303b2 100644 > > --- a/arch/arm64/include/asm/kvm_pgtable.h > > +++ b/arch/arm64/include/asm/kvm_pgtable.h > > @@ -458,6 +458,32 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt); > > */ > > void kvm_pgtable_stage2_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level); > > +/** > > + * kvm_pgtable_stage2_create_unlinked() - Create an unlinked stage-2 paging structure. > > + * @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*(). > > + * @phys: Physical address of the memory to map. > > + * @level: Starting level of the stage-2 paging structure to be created. > > + * @prot: Permissions and attributes for the mapping. > > + * @mc: Cache of pre-allocated and zeroed memory from which to allocate > ^^^^^^^^ > Alignment. This seems to be due to the "+ ". It looks like this without it: * kvm_pgtable_stage2_create_unlinked() - Create an unlinked stage-2 paging structure. * @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*(). * @phys: Physical address of the memory to map. * @level: Starting level of the stage-2 paging structure to be created. * @prot: Permissions and attributes for the mapping. * @mc: Cache of pre-allocated and zeroed memory from which to allocate * page-table pages. > > > + * page-table pages. > > + * @force_pte: Force mappings to PAGE_SIZE granularity. > > + * > > + * Returns an unlinked page-table tree. This new page-table tree is > > + * not reachable (i.e., it is unlinked) from the root pgd and it's > > + * therefore unreachableby the hardware page-table walker. No TLB > > + * invalidation or CMOs are performed. > > + * > > + * If device attributes are not explicitly requested in @prot, then the > > + * mapping will be normal, cacheable. > > + * > > + * Return: The fully populated (unlinked) stage-2 paging structure, or > > + * an ERR_PTR(error) on failure. > > + */ > > +kvm_pte_t *kvm_pgtable_stage2_create_unlinked(struct kvm_pgtable *pgt, > > + u64 phys, u32 level, > > + enum kvm_pgtable_prot prot, > > + void *mc, bool force_pte); > > + > > /** > > * kvm_pgtable_stage2_map() - Install a mapping in a guest stage-2 page-table. > > * @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*(). > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c > > index 633679ee3c49a..477d2be67d401 100644 > > --- a/arch/arm64/kvm/hyp/pgtable.c > > +++ b/arch/arm64/kvm/hyp/pgtable.c > > @@ -1222,6 +1222,55 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size) > > return kvm_pgtable_walk(pgt, addr, size, &walker); > > } > > +kvm_pte_t *kvm_pgtable_stage2_create_unlinked(struct kvm_pgtable *pgt, > > + u64 phys, u32 level, > > + enum kvm_pgtable_prot prot, > > + void *mc, bool force_pte) > > +{ > > + struct stage2_map_data map_data = { > > + .phys = phys, > > + .mmu = pgt->mmu, > > + .memcache = mc, > > + .force_pte = force_pte, > > + }; > > + struct kvm_pgtable_walker walker = { > > + .cb = stage2_map_walker, > > + .flags = KVM_PGTABLE_WALK_LEAF | > > + KVM_PGTABLE_WALK_SKIP_BBM_TLBI | > > + KVM_PGTABLE_WALK_SKIP_CMO, > > + .arg = &map_data, > > + }; > > + /* .addr (the IPA) is irrelevant for an unlinked table */ > > + struct kvm_pgtable_walk_data data = { > > + .walker = &walker, > > + .addr = 0, > > + .end = kvm_granule_size(level), > > + }; > > The comment about '.addr' seems incorrect. The IPA address is still > used to locate the page table entry, so I think it would be something > like below: > > /* The IPA address (.addr) is relative to zero */ > Extended it to say this: * The IPA address (.addr) is relative to zero. The goal is to * map "kvm_granule_size(level) - 0" worth of pages. > > + struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops; > > + kvm_pte_t *pgtable; > > + int ret; > > + > > + if (!IS_ALIGNED(phys, kvm_granule_size(level))) > > + return ERR_PTR(-EINVAL); > > + > > + ret = stage2_set_prot_attr(pgt, prot, &map_data.attr); > > + if (ret) > > + return ERR_PTR(ret); > > + > > + pgtable = mm_ops->zalloc_page(mc); > > + if (!pgtable) > > + return ERR_PTR(-ENOMEM); > > + > > + ret = __kvm_pgtable_walk(&data, mm_ops, (kvm_pteref_t)pgtable, > > + level + 1); > > + if (ret) { > > + kvm_pgtable_stage2_free_unlinked(mm_ops, pgtable, level); > > + mm_ops->put_page(pgtable); > > + return ERR_PTR(ret); > > + } > > + > > + return pgtable; > > +} > > int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu, > > struct kvm_pgtable_mm_ops *mm_ops, > > > > Thanks, > Gavin >
On Sat, Apr 22, 2023 at 01:09:26PM -0700, Ricardo Koller wrote: > On Mon, Apr 17, 2023 at 02:18:26PM +0800, Gavin Shan wrote: > > > + /* .addr (the IPA) is irrelevant for an unlinked table */ > > > + struct kvm_pgtable_walk_data data = { > > > + .walker = &walker, > > > + .addr = 0, > > > + .end = kvm_granule_size(level), > > > + }; > > > > The comment about '.addr' seems incorrect. The IPA address is still > > used to locate the page table entry, so I think it would be something > > like below: > > > > /* The IPA address (.addr) is relative to zero */ > > > > Extended it to say this: > > * The IPA address (.addr) is relative to zero. The goal is to > * map "kvm_granule_size(level) - 0" worth of pages. I actually prefer the original wording, as Gavin's suggestion makes this comment read as though the IPA of the walk bears some degree of validity, which it does not. The intent of the code is to create some *ambiguous* input address range, so maybe: /* * The input address (.addr) is irrelevant for walking an * unlinked table. Construct an ambiguous IA range to map * kvm_granule_size(level) worth of memory. */
On Sat, Apr 22, 2023 at 08:32:02PM +0000, Oliver Upton wrote: > On Sat, Apr 22, 2023 at 01:09:26PM -0700, Ricardo Koller wrote: > > On Mon, Apr 17, 2023 at 02:18:26PM +0800, Gavin Shan wrote: > > > > + /* .addr (the IPA) is irrelevant for an unlinked table */ > > > > + struct kvm_pgtable_walk_data data = { > > > > + .walker = &walker, > > > > + .addr = 0, > > > > + .end = kvm_granule_size(level), > > > > + }; > > > > > > The comment about '.addr' seems incorrect. The IPA address is still > > > used to locate the page table entry, so I think it would be something > > > like below: > > > > > > /* The IPA address (.addr) is relative to zero */ > > > > > > > Extended it to say this: > > > > * The IPA address (.addr) is relative to zero. The goal is to > > * map "kvm_granule_size(level) - 0" worth of pages. > > I actually prefer the original wording, as Gavin's suggestion makes this > comment read as though the IPA of the walk bears some degree of > validity, which it does not. > > The intent of the code is to create some *ambiguous* input address > range, so maybe: > > /* > * The input address (.addr) is irrelevant for walking an > * unlinked table. Construct an ambiguous IA range to map > * kvm_granule_size(level) worth of memory. > */ > OK, this is the winner. Will go with this one in v8. Gavin, let me know if you are not OK with this. Thank you both, Ricardo > -- > Thanks, > Oliver
On 4/23/23 4:37 AM, Ricardo Koller wrote: > On Sat, Apr 22, 2023 at 08:32:02PM +0000, Oliver Upton wrote: >> On Sat, Apr 22, 2023 at 01:09:26PM -0700, Ricardo Koller wrote: >>> On Mon, Apr 17, 2023 at 02:18:26PM +0800, Gavin Shan wrote: >>>>> + /* .addr (the IPA) is irrelevant for an unlinked table */ >>>>> + struct kvm_pgtable_walk_data data = { >>>>> + .walker = &walker, >>>>> + .addr = 0, >>>>> + .end = kvm_granule_size(level), >>>>> + }; >>>> >>>> The comment about '.addr' seems incorrect. The IPA address is still >>>> used to locate the page table entry, so I think it would be something >>>> like below: >>>> >>>> /* The IPA address (.addr) is relative to zero */ >>>> >>> >>> Extended it to say this: >>> >>> * The IPA address (.addr) is relative to zero. The goal is to >>> * map "kvm_granule_size(level) - 0" worth of pages. >> >> I actually prefer the original wording, as Gavin's suggestion makes this >> comment read as though the IPA of the walk bears some degree of >> validity, which it does not. >> >> The intent of the code is to create some *ambiguous* input address >> range, so maybe: >> >> /* >> * The input address (.addr) is irrelevant for walking an >> * unlinked table. Construct an ambiguous IA range to map >> * kvm_granule_size(level) worth of memory. >> */ >> > > OK, this is the winner. Will go with this one in v8. Gavin, let me know > if you are not OK with this. > Looks good to me either. Thanks, Gavin
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h index 3f2d43ba2b628..c8e0e7d9303b2 100644 --- a/arch/arm64/include/asm/kvm_pgtable.h +++ b/arch/arm64/include/asm/kvm_pgtable.h @@ -458,6 +458,32 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt); */ void kvm_pgtable_stage2_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level); +/** + * kvm_pgtable_stage2_create_unlinked() - Create an unlinked stage-2 paging structure. + * @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*(). + * @phys: Physical address of the memory to map. + * @level: Starting level of the stage-2 paging structure to be created. + * @prot: Permissions and attributes for the mapping. + * @mc: Cache of pre-allocated and zeroed memory from which to allocate + * page-table pages. + * @force_pte: Force mappings to PAGE_SIZE granularity. + * + * Returns an unlinked page-table tree. This new page-table tree is + * not reachable (i.e., it is unlinked) from the root pgd and it's + * therefore unreachableby the hardware page-table walker. No TLB + * invalidation or CMOs are performed. + * + * If device attributes are not explicitly requested in @prot, then the + * mapping will be normal, cacheable. + * + * Return: The fully populated (unlinked) stage-2 paging structure, or + * an ERR_PTR(error) on failure. + */ +kvm_pte_t *kvm_pgtable_stage2_create_unlinked(struct kvm_pgtable *pgt, + u64 phys, u32 level, + enum kvm_pgtable_prot prot, + void *mc, bool force_pte); + /** * kvm_pgtable_stage2_map() - Install a mapping in a guest stage-2 page-table. * @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*(). diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index 633679ee3c49a..477d2be67d401 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -1222,6 +1222,55 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size) return kvm_pgtable_walk(pgt, addr, size, &walker); } +kvm_pte_t *kvm_pgtable_stage2_create_unlinked(struct kvm_pgtable *pgt, + u64 phys, u32 level, + enum kvm_pgtable_prot prot, + void *mc, bool force_pte) +{ + struct stage2_map_data map_data = { + .phys = phys, + .mmu = pgt->mmu, + .memcache = mc, + .force_pte = force_pte, + }; + struct kvm_pgtable_walker walker = { + .cb = stage2_map_walker, + .flags = KVM_PGTABLE_WALK_LEAF | + KVM_PGTABLE_WALK_SKIP_BBM_TLBI | + KVM_PGTABLE_WALK_SKIP_CMO, + .arg = &map_data, + }; + /* .addr (the IPA) is irrelevant for an unlinked table */ + struct kvm_pgtable_walk_data data = { + .walker = &walker, + .addr = 0, + .end = kvm_granule_size(level), + }; + struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops; + kvm_pte_t *pgtable; + int ret; + + if (!IS_ALIGNED(phys, kvm_granule_size(level))) + return ERR_PTR(-EINVAL); + + ret = stage2_set_prot_attr(pgt, prot, &map_data.attr); + if (ret) + return ERR_PTR(ret); + + pgtable = mm_ops->zalloc_page(mc); + if (!pgtable) + return ERR_PTR(-ENOMEM); + + ret = __kvm_pgtable_walk(&data, mm_ops, (kvm_pteref_t)pgtable, + level + 1); + if (ret) { + kvm_pgtable_stage2_free_unlinked(mm_ops, pgtable, level); + mm_ops->put_page(pgtable); + return ERR_PTR(ret); + } + + return pgtable; +} int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu, struct kvm_pgtable_mm_ops *mm_ops,