Message ID | 20240329212444.395559-6-michael.roth@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | gmem fix-ups and interfaces for populating gmem pages | expand |
On 3/29/24 10:24 PM, Michael Roth wrote: > truncate_inode_pages_range() may attempt to zero pages before truncating > them, and this will occur before arch-specific invalidations can be > triggered via .invalidate_folio/.free_folio hooks via kvm_gmem_aops. For > AMD SEV-SNP this would result in an RMP #PF being generated by the > hardware, which is currently treated as fatal (and even if specifically > allowed for, would not result in anything other than garbage being > written to guest pages due to encryption). On Intel TDX this would also > result in undesirable behavior. > > Set the AS_INACCESSIBLE flag to prevent the MM from attempting > unexpected accesses of this sort during operations like truncation. > > This may also in some cases yield a decent performance improvement for > guest_memfd userspace implementations that hole-punch ranges immediately > after private->shared conversions via KVM_SET_MEMORY_ATTRIBUTES, since > the current implementation of truncate_inode_pages_range() always ends > up zero'ing an entire 4K range if it is backing by a 2M folio. > > Link: https://lore.kernel.org/lkml/ZR9LYhpxTaTk6PJX@google.com/ > Suggested-by: Sean Christopherson <seanjc@google.com> > Signed-off-by: Michael Roth <michael.roth@amd.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> > --- > virt/kvm/guest_memfd.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c > index 4ce0056d1149..3668a5f1d82b 100644 > --- a/virt/kvm/guest_memfd.c > +++ b/virt/kvm/guest_memfd.c > @@ -428,6 +428,7 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) > inode->i_private = (void *)(unsigned long)flags; > inode->i_op = &kvm_gmem_iops; > inode->i_mapping->a_ops = &kvm_gmem_aops; > + inode->i_mapping->flags |= AS_INACCESSIBLE; > inode->i_mode |= S_IFREG; > inode->i_size = size; > mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 4ce0056d1149..3668a5f1d82b 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -428,6 +428,7 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) inode->i_private = (void *)(unsigned long)flags; inode->i_op = &kvm_gmem_iops; inode->i_mapping->a_ops = &kvm_gmem_aops; + inode->i_mapping->flags |= AS_INACCESSIBLE; inode->i_mode |= S_IFREG; inode->i_size = size; mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);
truncate_inode_pages_range() may attempt to zero pages before truncating them, and this will occur before arch-specific invalidations can be triggered via .invalidate_folio/.free_folio hooks via kvm_gmem_aops. For AMD SEV-SNP this would result in an RMP #PF being generated by the hardware, which is currently treated as fatal (and even if specifically allowed for, would not result in anything other than garbage being written to guest pages due to encryption). On Intel TDX this would also result in undesirable behavior. Set the AS_INACCESSIBLE flag to prevent the MM from attempting unexpected accesses of this sort during operations like truncation. This may also in some cases yield a decent performance improvement for guest_memfd userspace implementations that hole-punch ranges immediately after private->shared conversions via KVM_SET_MEMORY_ATTRIBUTES, since the current implementation of truncate_inode_pages_range() always ends up zero'ing an entire 4K range if it is backing by a 2M folio. Link: https://lore.kernel.org/lkml/ZR9LYhpxTaTk6PJX@google.com/ Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Michael Roth <michael.roth@amd.com> --- virt/kvm/guest_memfd.c | 1 + 1 file changed, 1 insertion(+)