diff mbox series

KVM: arm64: Drop mte_allowed check during memslot creation

Message ID 20250224093938.3934386-1-aneesh.kumar@kernel.org (mailing list archive)
State New
Headers show
Series KVM: arm64: Drop mte_allowed check during memslot creation | expand

Commit Message

Aneesh Kumar K.V Feb. 24, 2025, 9:39 a.m. UTC
Before commit d89585fbb308 ("KVM: arm64: unify the tests for VMAs in
memslots when MTE is enabled"), kvm_arch_prepare_memory_region() only
rejected a memory slot if VM_SHARED was set. This commit unified the
checking with user_mem_abort(), with slots being rejected if either
VM_MTE_ALLOWED is not set or VM_SHARED set. A subsequent commit
c911f0d46879 ("KVM: arm64: permit all VM_MTE_ALLOWED mappings with MTE
enabled") dropped the VM_SHARED check, so we ended up with memory slots
being rejected if VM_MTE_ALLOWED is not set. This wasn't the case before
the commit d89585fbb308. The rejection of the memory slot with VM_SHARED
set was done to avoid a race condition with the test/set of the
PG_mte_tagged flag. Before Commit d77e59a8fccd ("arm64: mte: Lock a page
for MTE tag initialization") the kernel avoided allowing MTE with shared
pages, thereby preventing two tasks sharing a page from setting up the
PG_mte_tagged flag racily.

Commit d77e59a8fccd ("arm64: mte: Lock a page for MTE tag
initialization") further updated the locking so that the kernel
allows VM_SHARED mapping with MTE. With this commit, we can enable
memslot creation with VM_SHARED VMA mapping.

This patch results in a minor tweak to the ABI. We now allow creating
memslots that don't have the VM_MTE_ALLOWED flag set. If the guest uses
such a memslot with Allocation Tags, the kernel will generate -EFAULT.
ie, instead of failing early, we now fail later during KVM_RUN.

This change is needed because, without it, users are not able to use MTE
with VFIO passthrough (currently the mapping is either Device or
NonCacheable for which tag access check is not applied.), as shown
below (kvmtool VMM).

[  617.921030] vfio-pci 0000:01:00.0: resetting
[  618.024719] vfio-pci 0000:01:00.0: reset done
  Error: 0000:01:00.0: failed to register region with KVM
  Warning: [0abc:aced] Error activating emulation for BAR 0
  Error: 0000:01:00.0: failed to configure regions
  Warning: Failed init: vfio__init

  Fatal: Initialisation failed

Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
---
 arch/arm64/kvm/mmu.c | 5 -----
 1 file changed, 5 deletions(-)

Comments

Suzuki K Poulose Feb. 24, 2025, 10:32 a.m. UTC | #1
On 24/02/2025 09:39, Aneesh Kumar K.V (Arm) wrote:
> Before commit d89585fbb308 ("KVM: arm64: unify the tests for VMAs in
> memslots when MTE is enabled"), kvm_arch_prepare_memory_region() only
> rejected a memory slot if VM_SHARED was set. This commit unified the
> checking with user_mem_abort(), with slots being rejected if either
> VM_MTE_ALLOWED is not set or VM_SHARED set. A subsequent commit
> c911f0d46879 ("KVM: arm64: permit all VM_MTE_ALLOWED mappings with MTE
> enabled") dropped the VM_SHARED check, so we ended up with memory slots
> being rejected if VM_MTE_ALLOWED is not set. This wasn't the case before
> the commit d89585fbb308. The rejection of the memory slot with VM_SHARED
> set was done to avoid a race condition with the test/set of the
> PG_mte_tagged flag. Before Commit d77e59a8fccd ("arm64: mte: Lock a page
> for MTE tag initialization") the kernel avoided allowing MTE with shared
> pages, thereby preventing two tasks sharing a page from setting up the
> PG_mte_tagged flag racily.
> 
> Commit d77e59a8fccd ("arm64: mte: Lock a page for MTE tag
> initialization") further updated the locking so that the kernel
> allows VM_SHARED mapping with MTE. With this commit, we can enable
> memslot creation with VM_SHARED VMA mapping.
> 
> This patch results in a minor tweak to the ABI. We now allow creating
> memslots that don't have the VM_MTE_ALLOWED flag set. If the guest uses
> such a memslot with Allocation Tags, the kernel will generate -EFAULT.
> ie, instead of failing early, we now fail later during KVM_RUN.
> 
> This change is needed because, without it, users are not able to use MTE
> with VFIO passthrough (currently the mapping is either Device or
> NonCacheable for which tag access check is not applied.), as shown
> below (kvmtool VMM).
> 
> [  617.921030] vfio-pci 0000:01:00.0: resetting
> [  618.024719] vfio-pci 0000:01:00.0: reset done
>    Error: 0000:01:00.0: failed to register region with KVM
>    Warning: [0abc:aced] Error activating emulation for BAR 0
>    Error: 0000:01:00.0: failed to configure regions
>    Warning: Failed init: vfio__init
> 
>    Fatal: Initialisation failed
> 
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>

Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Catalin Marinas Feb. 24, 2025, 11:05 a.m. UTC | #2
On Mon, Feb 24, 2025 at 03:09:38PM +0530, Aneesh Kumar K.V (Arm) wrote:
> Before commit d89585fbb308 ("KVM: arm64: unify the tests for VMAs in
> memslots when MTE is enabled"), kvm_arch_prepare_memory_region() only
> rejected a memory slot if VM_SHARED was set. This commit unified the
> checking with user_mem_abort(), with slots being rejected if either
> VM_MTE_ALLOWED is not set or VM_SHARED set. A subsequent commit
> c911f0d46879 ("KVM: arm64: permit all VM_MTE_ALLOWED mappings with MTE
> enabled") dropped the VM_SHARED check, so we ended up with memory slots
> being rejected if VM_MTE_ALLOWED is not set. This wasn't the case before
> the commit d89585fbb308. The rejection of the memory slot with VM_SHARED
> set was done to avoid a race condition with the test/set of the
> PG_mte_tagged flag. Before Commit d77e59a8fccd ("arm64: mte: Lock a page
> for MTE tag initialization") the kernel avoided allowing MTE with shared
> pages, thereby preventing two tasks sharing a page from setting up the
> PG_mte_tagged flag racily.
> 
> Commit d77e59a8fccd ("arm64: mte: Lock a page for MTE tag
> initialization") further updated the locking so that the kernel
> allows VM_SHARED mapping with MTE. With this commit, we can enable
> memslot creation with VM_SHARED VMA mapping.
> 
> This patch results in a minor tweak to the ABI. We now allow creating
> memslots that don't have the VM_MTE_ALLOWED flag set.

As I commented here:

https://lore.kernel.org/r/Z4e04P1bQlFBDHo7@arm.com

I'm fine with the change, we basically go back to the original ABI prior
to relaxing this for VM_SHARED.

> If the guest uses
> such a memslot with Allocation Tags, the kernel will generate -EFAULT.
> ie, instead of failing early, we now fail later during KVM_RUN.

Nit: more like the kernel "will return -EFAULT" to the VMM rather than
"generate".

> This change is needed because, without it, users are not able to use MTE
> with VFIO passthrough (currently the mapping is either Device or
> NonCacheable for which tag access check is not applied.), as shown
> below (kvmtool VMM).

Another nit: "users are not able to user VFIO passthrough when MTE is
enabled". At a first read, the above sounded to me like one wants to
enable MTE for VFIO passthrough mappings.

> [  617.921030] vfio-pci 0000:01:00.0: resetting
> [  618.024719] vfio-pci 0000:01:00.0: reset done
>   Error: 0000:01:00.0: failed to register region with KVM
>   Warning: [0abc:aced] Error activating emulation for BAR 0
>   Error: 0000:01:00.0: failed to configure regions
>   Warning: Failed init: vfio__init
> 
>   Fatal: Initialisation failed
> 
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> ---
>  arch/arm64/kvm/mmu.c | 5 -----
>  1 file changed, 5 deletions(-)
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index ef53af6df6de..1f1b5aa43d2d 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -2178,11 +2178,6 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>  		if (!vma)
>  			break;
>  
> -		if (kvm_has_mte(kvm) && !kvm_vma_mte_allowed(vma)) {
> -			ret = -EINVAL;
> -			break;
> -		}
> -
>  		if (vma->vm_flags & VM_PFNMAP) {
>  			/* IO region dirty page logging not allowed */
>  			if (new->flags & KVM_MEM_LOG_DIRTY_PAGES) {

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Marc Zyngier Feb. 24, 2025, 12:24 p.m. UTC | #3
On Mon, 24 Feb 2025 11:05:33 +0000,
Catalin Marinas <catalin.marinas@arm.com> wrote:
> 
> On Mon, Feb 24, 2025 at 03:09:38PM +0530, Aneesh Kumar K.V (Arm) wrote:
> > Before commit d89585fbb308 ("KVM: arm64: unify the tests for VMAs in
> > memslots when MTE is enabled"), kvm_arch_prepare_memory_region() only
> > rejected a memory slot if VM_SHARED was set. This commit unified the
> > checking with user_mem_abort(), with slots being rejected if either
> > VM_MTE_ALLOWED is not set or VM_SHARED set. A subsequent commit
> > c911f0d46879 ("KVM: arm64: permit all VM_MTE_ALLOWED mappings with MTE
> > enabled") dropped the VM_SHARED check, so we ended up with memory slots
> > being rejected if VM_MTE_ALLOWED is not set. This wasn't the case before
> > the commit d89585fbb308. The rejection of the memory slot with VM_SHARED
> > set was done to avoid a race condition with the test/set of the
> > PG_mte_tagged flag. Before Commit d77e59a8fccd ("arm64: mte: Lock a page
> > for MTE tag initialization") the kernel avoided allowing MTE with shared
> > pages, thereby preventing two tasks sharing a page from setting up the
> > PG_mte_tagged flag racily.
> > 
> > Commit d77e59a8fccd ("arm64: mte: Lock a page for MTE tag
> > initialization") further updated the locking so that the kernel
> > allows VM_SHARED mapping with MTE. With this commit, we can enable
> > memslot creation with VM_SHARED VMA mapping.
> > 
> > This patch results in a minor tweak to the ABI. We now allow creating
> > memslots that don't have the VM_MTE_ALLOWED flag set.
> 
> As I commented here:
> 
> https://lore.kernel.org/r/Z4e04P1bQlFBDHo7@arm.com
> 
> I'm fine with the change, we basically go back to the original ABI prior
> to relaxing this for VM_SHARED.
> 
> > If the guest uses
> > such a memslot with Allocation Tags, the kernel will generate -EFAULT.
> > ie, instead of failing early, we now fail later during KVM_RUN.
> 
> Nit: more like the kernel "will return -EFAULT" to the VMM rather than
> "generate".
> 
> > This change is needed because, without it, users are not able to use MTE
> > with VFIO passthrough (currently the mapping is either Device or
> > NonCacheable for which tag access check is not applied.), as shown
> > below (kvmtool VMM).
> 
> Another nit: "users are not able to user VFIO passthrough when MTE is
> enabled". At a first read, the above sounded to me like one wants to
> enable MTE for VFIO passthrough mappings.

What the commit message doesn't spell out is how MTE and VFIO are
interacting here. I also don't understand the reference to Device or
NC memory here.

Isn't the issue that DMA doesn't check/update tags, and therefore it
makes little sense to prevent non-tagged memory being associated with
a memslot?

My other concern is that this gives pretty poor consistency to the
guest, which cannot know what can be tagged and what cannot, and
breaks a guarantee that the guest should be able to rely on.

Thanks,

	M.
diff mbox series

Patch

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index ef53af6df6de..1f1b5aa43d2d 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -2178,11 +2178,6 @@  int kvm_arch_prepare_memory_region(struct kvm *kvm,
 		if (!vma)
 			break;
 
-		if (kvm_has_mte(kvm) && !kvm_vma_mte_allowed(vma)) {
-			ret = -EINVAL;
-			break;
-		}
-
 		if (vma->vm_flags & VM_PFNMAP) {
 			/* IO region dirty page logging not allowed */
 			if (new->flags & KVM_MEM_LOG_DIRTY_PAGES) {