From patchwork Thu Nov  9 21:03:12 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Anish Moorthy <amoorthy@google.com>
X-Patchwork-Id: 13451695
Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net
 [23.128.96.19])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9A56C374C2
	for <kvm@vger.kernel.org>; Thu,  9 Nov 2023 21:03:45 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="rBQlbKiS"
Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com
 [IPv6:2607:f8b0:4864:20::b49])
	by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E49BA46BD
	for <kvm@vger.kernel.org>; Thu,  9 Nov 2023 13:03:44 -0800 (PST)
Received: by mail-yb1-xb49.google.com with SMTP id
 3f1490d57ef6-da04fb79246so1599450276.2
        for <kvm@vger.kernel.org>; Thu, 09 Nov 2023 13:03:44 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1699563824; x=1700168624;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=e/ZPH+fW+W+TWa6v1thwgh3xl/A5VwjzOhARfEMfmNg=;
        b=rBQlbKiSKG79dz5i6kvFmcJcgaFpvoMO91ea3HFj2CQFJ/OQLW7zcTrp5/79gF4DAp
         GNP0hBzduzaT0zomTg0ipbmEHqlu1tP1Db+XGvDG4kh4SMdrZHNvIWT3ssqMi9blX31t
         TPHMpWAuAG+Q4a8w3lm+/q/UxM7md4hX3mZ+G2zjNr/EvQPYVlMLbzD/e/d2gp6jJTJa
         ZadrJ1d4YY/MMvVdSS8nGWJakJ+a9fT9XBoXEsdO1mPkLXLOcQO6hvqJEVU4XGNi2eMy
         6bgpk6/SA/+j6oqZ3Rj7TK57NuVIeffhlRdKreBN3a0G/5wPuTOubZPM37QAN3vHFHRw
         h/gg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1699563824; x=1700168624;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=e/ZPH+fW+W+TWa6v1thwgh3xl/A5VwjzOhARfEMfmNg=;
        b=uGka6anuWq8zf6Uwnbvvj0dshUltFOyDlfi5phJeICmpSEX0Odm6rHUzN4MoUOIK64
         azRPTWKNyOjgf+qzpH77ohlJ1316WWmS/o+/9ulHHYswZDBsUE/+GTN0xi94rUYHxFxF
         VVXcahrZ4V7DgkvbbFJl9Jm4ZVJ1u2iY+mzqdycPB4B5sDlLd9k6Hw4ZGAWrE2dVR7Ie
         1L8HcDn50D20VMb2uuYbXqPbSNU1bsFhbRjIeijmwWXVe11Hll4AWVM6maXOVRPD9vBH
         2tF8ntIrOchai4G9xm6g+WaQqWADhNfFzKmFQkUwk/qFqoUVf8PoXmTkn2xolSL/Smo3
         2Cqw==
X-Gm-Message-State: AOJu0Yw+B/lW8EcH9BMCtqADaxBQuczCen/NLvvJl/DcFr4KQlePnXUz
	GPuv5DRAM8iXi6AqYsEx4y0CIq0Cn/DPpg==
X-Google-Smtp-Source: 
 AGHT+IGaEfaKBu+8pNYilCRAtbdsyMyIV0FpZwyPySRyn2xcyuTHN8dL3ZSfFTUzVJm9BYhJ0Q8k+CrUz239dw==
X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9])
 (user=amoorthy job=sendgmr) by 2002:a25:828e:0:b0:d9c:bdff:e45a with SMTP id
 r14-20020a25828e000000b00d9cbdffe45amr176664ybk.12.1699563824210; Thu, 09 Nov
 2023 13:03:44 -0800 (PST)
Date: Thu,  9 Nov 2023 21:03:12 +0000
In-Reply-To: <20231109210325.3806151-1-amoorthy@google.com>
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20231109210325.3806151-1-amoorthy@google.com>
X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog
Message-ID: <20231109210325.3806151-2-amoorthy@google.com>
Subject: [PATCH v6 01/14] KVM: Documentation: Clarify meaning of
 hva_to_pfn()'s 'atomic' parameter
From: Anish Moorthy <amoorthy@google.com>
To: seanjc@google.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev
Cc: oliver.upton@linux.dev, pbonzini@redhat.com, maz@kernel.org,
	robert.hoo.linux@gmail.com, jthoughton@google.com, amoorthy@google.com,
	dmatlack@google.com, axelrasmussen@google.com, peterx@redhat.com,
	nadav.amit@gmail.com, isaku.yamahata@gmail.com, kconsul@linux.vnet.ibm.com

The current docstring can be read as "atomic -> allowed to sleep," when
in fact the intended statement is "atomic -> NOT allowed to sleep." Make
that clearer in the docstring.

Signed-off-by: Anish Moorthy <amoorthy@google.com>
---
 virt/kvm/kvm_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 9170a61ea99f..687374138cfd 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2983,7 +2983,7 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 /*
  * Pin guest page in memory and return its pfn.
  * @addr: host virtual address which maps memory to the guest
- * @atomic: whether this function can sleep
+ * @atomic: whether this function is forbidden from sleeping
  * @interruptible: whether the process can be interrupted by non-fatal signals
  * @async: whether this function need to wait IO complete if the
  *         host page is not in the memory

From patchwork Thu Nov  9 21:03:13 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Anish Moorthy <amoorthy@google.com>
X-Patchwork-Id: 13451697
Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net
 [23.128.96.19])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id ED34E374EC
	for <kvm@vger.kernel.org>; Thu,  9 Nov 2023 21:03:46 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="NyigdJF+"
Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com
 [IPv6:2607:f8b0:4864:20::114a])
	by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 58D2A478D
	for <kvm@vger.kernel.org>; Thu,  9 Nov 2023 13:03:46 -0800 (PST)
Received: by mail-yw1-x114a.google.com with SMTP id
 00721157ae682-5afa77f9a33so18862717b3.0
        for <kvm@vger.kernel.org>; Thu, 09 Nov 2023 13:03:46 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1699563825; x=1700168625;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=qijOPy0ERXUhax/ZBRVntyygIlckAr4kgGgQkmlO6nw=;
        b=NyigdJF+kb5HNAGNgNtj/fDKG+vw3PRPV6Qi7oveX2OcQzuZ/WZl9YLQ5SxfkoqW+/
         wMk3yk1QoiGq93HfTcosbxHuSI5Rcq12MoMCQxAc/HgS+k/vEcfygF/HcfIvoMTyzaib
         q3PrCx8424plOIfXMHzA0B4sva61Z97IHz6cBAaFRMMK+RKtppIUX5nkMFhS3xm0agMK
         5TCXOZmhu56DYM88mro+8JDbbIhaItZXMFn6hQDGKKyOrqyeqPfGufn7eGQ62pp6s6fv
         864PLSHWkLdnxnP1+ix2lIAxn1x5E7K0YxhLCUighRfPWfoT2fvsw3LDaNdQrrMQ/fs8
         Q48w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1699563825; x=1700168625;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=qijOPy0ERXUhax/ZBRVntyygIlckAr4kgGgQkmlO6nw=;
        b=CaGWj73sJYo9gWdEVn/QWxlG1hJrQ+hnbd0HvO0xAA9qn8pu/vBlNvVBq5QBs+eq6s
         cx4KrRfs5OEoB5/1hGa51xf4cGDqL9eQ4hUy+yCp4LbImTIpCQYXxAbBaCkHPOAVpQwF
         vmhwAgNcfqFJR6Ioo9wliE6mxdE2UPanY8EUXjzyqROLUIXTzuruDiUxjkPcqmGip9uC
         XJpApFRXJEKcVET9kkvUlUYPq3PNi7Ir2bWm9bSKeBeZLvRcP6/iTkBHkFs+22W0Hh7r
         18F6aFRJUxWQpUA+IHGdA35aPdR7cX4lKWOEkt0shCtLphSXSxfWZiSzc06QstOuYSYe
         pImw==
X-Gm-Message-State: AOJu0Yz8YOS0VJwfz3r0FfWcM7TV29LYOaEpJBc/eiqcOD/yZgTVX+pB
	bNGt47b+X9HoAQSHzUiwrx+/SyzNh31zvg==
X-Google-Smtp-Source: 
 AGHT+IHnmQkK4g0KoxqixIoEiE1OXZsGIcaGUD3aF5w4eY/4zuUTkKMYBZztpx1oc2DM9MDYdS3CIiEG7TrJVQ==
X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9])
 (user=amoorthy job=sendgmr) by 2002:a0d:d78d:0:b0:59b:f138:c835 with SMTP id
 z135-20020a0dd78d000000b0059bf138c835mr167156ywd.5.1699563825368; Thu, 09 Nov
 2023 13:03:45 -0800 (PST)
Date: Thu,  9 Nov 2023 21:03:13 +0000
In-Reply-To: <20231109210325.3806151-1-amoorthy@google.com>
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20231109210325.3806151-1-amoorthy@google.com>
X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog
Message-ID: <20231109210325.3806151-3-amoorthy@google.com>
Subject: [PATCH v6 02/14] KVM: Documentation: Add docstrings for
 __kvm_read/write_guest_page()
From: Anish Moorthy <amoorthy@google.com>
To: seanjc@google.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev
Cc: oliver.upton@linux.dev, pbonzini@redhat.com, maz@kernel.org,
	robert.hoo.linux@gmail.com, jthoughton@google.com, amoorthy@google.com,
	dmatlack@google.com, axelrasmussen@google.com, peterx@redhat.com,
	nadav.amit@gmail.com, isaku.yamahata@gmail.com, kconsul@linux.vnet.ibm.com

The (gfn, data, offset, len) order of parameters is a little strange
since "offset" applies to "gfn" rather than to "data". Add docstrings to
make things perfectly clear.

Signed-off-by: Anish Moorthy <amoorthy@google.com>
---
 virt/kvm/kvm_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 687374138cfd..f521b6fd808f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3328,6 +3328,7 @@ static int next_segment(unsigned long len, int offset)
 		return len;
 }
 
+/* Copy @len bytes from guest memory at '(@gfn * PAGE_SIZE) + @offset' to @data */
 static int __kvm_read_guest_page(struct kvm_memory_slot *slot, gfn_t gfn,
 				 void *data, int offset, int len)
 {
@@ -3429,6 +3430,7 @@ int kvm_vcpu_read_guest_atomic(struct kvm_vcpu *vcpu, gpa_t gpa,
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_read_guest_atomic);
 
+/* Copy @len bytes from @data into guest memory at '(@gfn * PAGE_SIZE) + @offset' */
 static int __kvm_write_guest_page(struct kvm *kvm,
 				  struct kvm_memory_slot *memslot, gfn_t gfn,
 			          const void *data, int offset, int len)

From patchwork Thu Nov  9 21:03:14 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Anish Moorthy <amoorthy@google.com>
X-Patchwork-Id: 13451698
Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net
 [23.128.96.19])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0FF03374F8
	for <kvm@vger.kernel.org>; Thu,  9 Nov 2023 21:03:47 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="glMoh0xi"
Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com
 [IPv6:2607:f8b0:4864:20::b49])
	by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7B4D544B6
	for <kvm@vger.kernel.org>; Thu,  9 Nov 2023 13:03:47 -0800 (PST)
Received: by mail-yb1-xb49.google.com with SMTP id
 3f1490d57ef6-da033914f7cso1623608276.0
        for <kvm@vger.kernel.org>; Thu, 09 Nov 2023 13:03:47 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1699563826; x=1700168626;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=2mOhK5FoYcXL75GR8axKmDOtKXDH3smigWBB7T8RqTw=;
        b=glMoh0xiBz3HGxjxGGyIaFmB+d1hF2pn2FcEtM4xyB9IiWQlCcnuTUUpQT7RFTBwLu
         bn+8jLI9A0H+jOr8TPQZWCyvzP9vZ3IUioAAKsPhkgO1xgf8QEf4wsb9pd4me7Td12sK
         8ARyoHUKdL/y9cmNIyfUV45MK5cHdZwJ/+q1Ti8pgXiBBfnKRF1Xl5Wwnn+HJARgEVIZ
         4yi17/28voOqfH5A8XOKWej3+4+3dqm/drr1VpP1B1cCkvPm1F4GtkDmslbJBVq3n2gV
         t6HpW/J6yYpaaN8wF6SqEPGY6/33wgH/ORIa0dBO4U2fIlfMVbFHY8HdIWyP1teqOfbE
         216w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1699563826; x=1700168626;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=2mOhK5FoYcXL75GR8axKmDOtKXDH3smigWBB7T8RqTw=;
        b=f6/BvskYOay+Jv0pgIXZccCK6X8E7UeXOrIEz45RkF/+BEEo/bdiXxPVsTu4vKIFS2
         XrMcCLbnSqgD06Pk+fTj+ZbgwJSSDclYr/FvaJ0cxT4jh6eTfHor4a2802JACX79tMG5
         eXbrkI4TM4p56a3Wn6Pq2H4XOHDHC+PxcuvAofiPNPdE+5KEX5HGgHqdZDjXlYSYPCTX
         ZDtbWxps7NNBgjX9o2pnSOOvQl+5ESeCGlgWjDM3/dp0cXLI8Cjc3DGXmfHoq8X5nNyp
         U/UEoifvDdhTKGNWQHDNifMXq8Gt/KcakUR0NCv2j/83jVWfR9McG0767Gzu24GE9ISY
         nq0w==
X-Gm-Message-State: AOJu0Yy9PPfHUfytQPkG7C2nZs0WHlWfBaiLQoksv/fJUlBDxijecGru
	Q+3AOwlSDDIl8EbkhKvWKjHd/1QR6SdVOw==
X-Google-Smtp-Source: 
 AGHT+IEevm9gxrKUT+zKzL4wMuDoD2o6L3pgb1/3TA3DxCiu1yUYlIaSqbwc2y/dH2L/0EUx5ZlxnIF/gqvflg==
X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9])
 (user=amoorthy job=sendgmr) by 2002:a25:e64a:0:b0:da0:29ba:668c with SMTP id
 d71-20020a25e64a000000b00da029ba668cmr143892ybh.10.1699563826668; Thu, 09 Nov
 2023 13:03:46 -0800 (PST)
Date: Thu,  9 Nov 2023 21:03:14 +0000
In-Reply-To: <20231109210325.3806151-1-amoorthy@google.com>
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20231109210325.3806151-1-amoorthy@google.com>
X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog
Message-ID: <20231109210325.3806151-4-amoorthy@google.com>
Subject: [PATCH v6 03/14] KVM: Simplify error handling in
 __gfn_to_pfn_memslot()
From: Anish Moorthy <amoorthy@google.com>
To: seanjc@google.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev
Cc: oliver.upton@linux.dev, pbonzini@redhat.com, maz@kernel.org,
	robert.hoo.linux@gmail.com, jthoughton@google.com, amoorthy@google.com,
	dmatlack@google.com, axelrasmussen@google.com, peterx@redhat.com,
	nadav.amit@gmail.com, isaku.yamahata@gmail.com, kconsul@linux.vnet.ibm.com

KVM_HVA_ERR_RO_BAD satisfies kvm_is_error_hva(), so there's no need to
duplicate the "if (writable)" block. Fix this by bringing all
kvm_is_error_hva() cases under one conditional.

Signed-off-by: Anish Moorthy <amoorthy@google.com>
---
 virt/kvm/kvm_main.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f521b6fd808f..88946d5d102b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3055,15 +3055,13 @@ kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
 	if (hva)
 		*hva = addr;
 
-	if (addr == KVM_HVA_ERR_RO_BAD) {
-		if (writable)
-			*writable = false;
-		return KVM_PFN_ERR_RO_FAULT;
-	}
-
 	if (kvm_is_error_hva(addr)) {
 		if (writable)
 			*writable = false;
+
+		if (addr == KVM_HVA_ERR_RO_BAD)
+			return KVM_PFN_ERR_RO_FAULT;
+
 		return KVM_PFN_NOSLOT;
 	}
 

From patchwork Thu Nov  9 21:03:15 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Anish Moorthy <amoorthy@google.com>
X-Patchwork-Id: 13451699
Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net
 [23.128.96.19])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 42ED738DE7
	for <kvm@vger.kernel.org>; Thu,  9 Nov 2023 21:03:49 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="df/9vJiy"
Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com
 [IPv6:2607:f8b0:4864:20::114a])
	by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 94F99468D
	for <kvm@vger.kernel.org>; Thu,  9 Nov 2023 13:03:48 -0800 (PST)
Received: by mail-yw1-x114a.google.com with SMTP id
 00721157ae682-5af16e00fadso18869057b3.0
        for <kvm@vger.kernel.org>; Thu, 09 Nov 2023 13:03:48 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1699563828; x=1700168628;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=6nUPeAHfZEyhmLdl3NqUNDrUfGjWmU/+Qb2Tjd/grrc=;
        b=df/9vJiy5kAJx1daqXAOmwoap63JARAlMoCmGJSV6SAUheBUve3b3fgGeSYwmf/Gwr
         66LRqMOFXmnHH5JohSpT72xP4J0qpRD4kDP/bkUVuIPLnaPGPPOLJBVpEo5TyS5JQTS9
         fTXkLDWO/d+q2tzi2s8McGUGu2uB1SVSI8f+KlqZ9ZMCvhmnpQ8GXm7GZiQB+Eb2xxr4
         C71udoZi52qsXYJ5zmfsajcwF4DN7Toq0XJ6qsxfuZXdt6vxxyYeh8kQytJIuQuJanRR
         mDFeM0h934Pub+XNRR6ARagThoPUJZ0bRpLPFpJ27ZddIBDS6h5FFHdw2ZP740MmEbeL
         RndA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1699563828; x=1700168628;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=6nUPeAHfZEyhmLdl3NqUNDrUfGjWmU/+Qb2Tjd/grrc=;
        b=aJZPZjnzlJs6NDnU2/JSsl9nFtFacR7/zne7uewYeI6KaNSzkYUEYby1iwO6UvgFU6
         pwHCRwDhRgIdYkuD6g5fD36ISWIjZmHu4Ub3pUC29tdOOWSY+bQ9sCkDMxWRLbktKBU6
         qCe0dy1gnoA2eZvsguNQLbA93jnduPMMQ0Fe/Z1IHKZFgv90V+n9RqT5kTqfSBAHAGBJ
         B7CvlS5U6J3K7jSmAzkXkL1CIiOX4qJ88KnBM6Vkw4giPvJ6bBehYuf4K+ZRt7aEl6s2
         Znbf291KyWLZgjSePkesJzlqphPyuDK3qD8hJ7fmy1FclEexEp3YoVBTJG5m9FNdzqhK
         dLaA==
X-Gm-Message-State: AOJu0YxTLkGSEEbuswDaJXFBpgzUVi20kPGZGF6qey4u10fQYT26lmN+
	Xhbn8Hb/cw8ttDUO5S21W8aVPOpk3sTpsw==
X-Google-Smtp-Source: 
 AGHT+IGYRKD7rEcFCG/FkA77dggqygTDLeWg1w3kaLGy0YrkzhLxkHgLcnRLFPcg6Q/Xm7V2ZW+NzaJ/Oc7KBg==
X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9])
 (user=amoorthy job=sendgmr) by 2002:a25:e643:0:b0:d9a:c218:8177 with SMTP id
 d64-20020a25e643000000b00d9ac2188177mr146076ybh.8.1699563827866; Thu, 09 Nov
 2023 13:03:47 -0800 (PST)
Date: Thu,  9 Nov 2023 21:03:15 +0000
In-Reply-To: <20231109210325.3806151-1-amoorthy@google.com>
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20231109210325.3806151-1-amoorthy@google.com>
X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog
Message-ID: <20231109210325.3806151-5-amoorthy@google.com>
Subject: [PATCH v6 04/14] KVM: Define and communicate KVM_EXIT_MEMORY_FAULT
 RWX flags to userspace
From: Anish Moorthy <amoorthy@google.com>
To: seanjc@google.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev
Cc: oliver.upton@linux.dev, pbonzini@redhat.com, maz@kernel.org,
	robert.hoo.linux@gmail.com, jthoughton@google.com, amoorthy@google.com,
	dmatlack@google.com, axelrasmussen@google.com, peterx@redhat.com,
	nadav.amit@gmail.com, isaku.yamahata@gmail.com, kconsul@linux.vnet.ibm.com

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Anish Moorthy <amoorthy@google.com>
---
 Documentation/virt/kvm/api.rst | 5 +++++
 include/linux/kvm_host.h       | 9 ++++++++-
 include/uapi/linux/kvm.h       | 3 +++
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index c13ede498369..a07964f601de 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6979,6 +6979,9 @@ spec refer, https://github.com/riscv/riscv-sbi-doc.
 
 		/* KVM_EXIT_MEMORY_FAULT */
 		struct {
+  #define KVM_MEMORY_EXIT_FLAG_READ     (1ULL << 0)
+  #define KVM_MEMORY_EXIT_FLAG_WRITE    (1ULL << 1)
+  #define KVM_MEMORY_EXIT_FLAG_EXEC     (1ULL << 2)
   #define KVM_MEMORY_EXIT_FLAG_PRIVATE	(1ULL << 3)
 			__u64 flags;
 			__u64 gpa;
@@ -6990,6 +6993,8 @@ could not be resolved by KVM.  The 'gpa' and 'size' (in bytes) describe the
 guest physical address range [gpa, gpa + size) of the fault.  The 'flags' field
 describes properties of the faulting access that are likely pertinent:
 
+ - KVM_MEMORY_EXIT_FLAG_READ/WRITE/EXEC - When set, indicates that the memory
+   fault occurred on a read/write/exec access respectively.
  - KVM_MEMORY_EXIT_FLAG_PRIVATE - When set, indicates the memory fault occurred
    on a private memory access.  When clear, indicates the fault occurred on a
    shared access.
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 4d5d139b0bde..5201400358da 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2372,8 +2372,15 @@ static inline void kvm_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
 	vcpu->run->memory_fault.gpa = gpa;
 	vcpu->run->memory_fault.size = size;
 
-	/* RWX flags are not (yet) defined or communicated to userspace. */
 	vcpu->run->memory_fault.flags = 0;
+
+	if (is_write)
+		vcpu->run->memory_fault.flags |= KVM_MEMORY_EXIT_FLAG_WRITE;
+	else if (is_exec)
+		vcpu->run->memory_fault.flags |= KVM_MEMORY_EXIT_FLAG_EXEC;
+	else
+		vcpu->run->memory_fault.flags |= KVM_MEMORY_EXIT_FLAG_READ;
+
 	if (is_private)
 		vcpu->run->memory_fault.flags |= KVM_MEMORY_EXIT_FLAG_PRIVATE;
 }
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index b4ba4b53b834..bda5622a9c68 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -535,6 +535,9 @@ struct kvm_run {
 		} notify;
 		/* KVM_EXIT_MEMORY_FAULT */
 		struct {
+#define KVM_MEMORY_EXIT_FLAG_READ       (1ULL << 0)
+#define KVM_MEMORY_EXIT_FLAG_WRITE      (1ULL << 1)
+#define KVM_MEMORY_EXIT_FLAG_EXEC       (1ULL << 2)
 #define KVM_MEMORY_EXIT_FLAG_PRIVATE	(1ULL << 3)
 			__u64 flags;
 			__u64 gpa;

From patchwork Thu Nov  9 21:03:16 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Anish Moorthy <amoorthy@google.com>
X-Patchwork-Id: 13451700
Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net
 [23.128.96.19])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 54BE038DF4
	for <kvm@vger.kernel.org>; Thu,  9 Nov 2023 21:03:50 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="vlkwxTWI"
Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com
 [IPv6:2607:f8b0:4864:20::b49])
	by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BE13347B6
	for <kvm@vger.kernel.org>; Thu,  9 Nov 2023 13:03:49 -0800 (PST)
Received: by mail-yb1-xb49.google.com with SMTP id
 3f1490d57ef6-d9a5a3f2d4fso1598576276.3
        for <kvm@vger.kernel.org>; Thu, 09 Nov 2023 13:03:49 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1699563829; x=1700168629;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=Jh+5PdZyMdvOdt4VU0TI3yfX/lCMTr3N0qAkuDtTShY=;
        b=vlkwxTWI4LsVc27kx0uB6ZMDkTbhewzlwj1wv/7+uZKvU0ebkS8Onq2v/c3GQoK9Ix
         IbxAXe7wBOnCQChJBhwEZutTLAyFnSySN/KR/Byb5J69u6LneTvQ2NpyMb59JSZdYLXu
         4xCNDF9s4qn9OeLCtFgOVA1rzOIsmfNEYuM/OB28E2xDZxROlHWPPyuvuVdvDgT633iD
         4G2Trl1FIas2H3MF0lEcUHWw1KmvzAOvGiyHvV7Rgk5OxNdWUEmxpOdj9kD68zxX22Sd
         HNTcqLRajakyLjzbN7i71C7uNtHGIZShPKeFFhpL6414aMq8bs6WOGQVz0ArozRKCEqp
         sziQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1699563829; x=1700168629;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=Jh+5PdZyMdvOdt4VU0TI3yfX/lCMTr3N0qAkuDtTShY=;
        b=xPxHK+VG/9zavJf6AmInTCfXJvZy349Of7/fErIijaeDi6/NZ4Mp2lQmkPPZFMM7z1
         5PE213GlHTPwuM8kXKTOxfALyb+hRlVk/45FCzRUZFHkHzLkh8dle/I5Fb5KhDut21yW
         D7odpp282NW3SJ4XDror8mFrhFgGDlpe+KN697pGhNlSe8Rk7RtvT6GlSUe6Eo+TPW8+
         uWGa9cvssJp53QvFLDA9LSkeSs6MefgyxqnZ6tO/YUdqep+6J3BfZIW/6SqK/UbyY7j8
         DmLxA8fcWVr4IfZSBe8zf1Mrlf6GqkCYuvgvKj2iIXET9XVoKIEzjD+6uMtOdkVFZdV9
         welw==
X-Gm-Message-State: AOJu0Yz5tUSY/w9jR9Gn1C8Mr7er/gEjG7sSNv5xu+ITRYQfsSZSbIgL
	u9GTOHVk6/njnQAVx0RZ5Vt1u+//jvWqGQ==
X-Google-Smtp-Source: 
 AGHT+IEuA2Q5awcM86dll/dLoA1R4aMXnRfBkF4bbpGUOJqZbuvX3Eb5LoClktTK56YlYoIiH1cKlOvlGRBPTg==
X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9])
 (user=amoorthy job=sendgmr) by 2002:a25:244f:0:b0:d9a:3a14:a5a2 with SMTP id
 k76-20020a25244f000000b00d9a3a14a5a2mr153578ybk.13.1699563828968; Thu, 09 Nov
 2023 13:03:48 -0800 (PST)
Date: Thu,  9 Nov 2023 21:03:16 +0000
In-Reply-To: <20231109210325.3806151-1-amoorthy@google.com>
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20231109210325.3806151-1-amoorthy@google.com>
X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog
Message-ID: <20231109210325.3806151-6-amoorthy@google.com>
Subject: [PATCH v6 05/14] KVM: Try using fast GUP to resolve read faults
From: Anish Moorthy <amoorthy@google.com>
To: seanjc@google.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev
Cc: oliver.upton@linux.dev, pbonzini@redhat.com, maz@kernel.org,
	robert.hoo.linux@gmail.com, jthoughton@google.com, amoorthy@google.com,
	dmatlack@google.com, axelrasmussen@google.com, peterx@redhat.com,
	nadav.amit@gmail.com, isaku.yamahata@gmail.com, kconsul@linux.vnet.ibm.com

hva_to_pfn_fast() currently just fails for faults where establishing
writable mappings is forbidden, which is unnecessary. Instead, try
getting the page without passing FOLL_WRITE. This allows the
aforementioned faults to (potentially) be resolved without falling back
to slow GUP.

Suggested-by: James Houghton <jthoughton@google.com>
Signed-off-by: Anish Moorthy <amoorthy@google.com>
---
 virt/kvm/kvm_main.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 88946d5d102b..725191333c4e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2811,7 +2811,7 @@ static inline int check_user_page_hwpoison(unsigned long addr)
 }
 
 /*
- * The fast path to get the writable pfn which will be stored in @pfn,
+ * The fast path to get the pfn which will be stored in @pfn,
  * true indicates success, otherwise false is returned.  It's also the
  * only part that runs if we can in atomic context.
  */
@@ -2825,10 +2825,9 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
 	 * or the caller allows to map a writable pfn for a read fault
 	 * request.
 	 */
-	if (!(write_fault || writable))
-		return false;
+	unsigned int gup_flags = (write_fault || writable) ? FOLL_WRITE : 0;
 
-	if (get_user_page_fast_only(addr, FOLL_WRITE, page)) {
+	if (get_user_page_fast_only(addr, gup_flags, page)) {
 		*pfn = page_to_pfn(page[0]);
 
 		if (writable)

From patchwork Thu Nov  9 21:03:17 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Anish Moorthy <amoorthy@google.com>
X-Patchwork-Id: 13451704
Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net
 [23.128.96.19])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 95E3738F8B
	for <kvm@vger.kernel.org>; Thu,  9 Nov 2023 21:03:51 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="PSGytOLE"
Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com
 [IPv6:2607:f8b0:4864:20::1149])
	by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AFFD447BD
	for <kvm@vger.kernel.org>; Thu,  9 Nov 2023 13:03:50 -0800 (PST)
Received: by mail-yw1-x1149.google.com with SMTP id
 00721157ae682-5a7aa161b2fso18303697b3.2
        for <kvm@vger.kernel.org>; Thu, 09 Nov 2023 13:03:50 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1699563830; x=1700168630;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=tlWksH+O+8+jt38vfCEwQsu3wHCerzCA31np5Nu9t24=;
        b=PSGytOLEQIdUKGXv6hJ28KEXRCWxR/5K/muuPxNaYKqNP4/W5XqaBWuha82d9cP6BB
         5/Z3OXEwxB8Rk+eODFm/m8U7CFiXIGyZlrOKfZ9Gxetp/carJc3b1x0QIIbOvw0gJ18N
         qLR0zSyjTar29aJepSaf01z3sU0I+4Xt4Cz3g1Pdz6FN8LqgouE4a4S0XjYKJcQPTP7/
         ephZenANHH6pcV7U+W4rJ4+m17FH/VhpKdF9mCkdX5B91KdToCKihTWQS0V/P7Uh8hKY
         3FAPA/rdj1MosVB1XHOKWIb1FlSGDKngTpobexIUG3rTZHINXYlz5Nl04zfb1XLuDSdY
         MUnA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1699563830; x=1700168630;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=tlWksH+O+8+jt38vfCEwQsu3wHCerzCA31np5Nu9t24=;
        b=RoFHHTygQSUZHRmhwD5o6EV8OcoMiy4Omsp3cFL6cGex+20Y6DhAbUarznFtRUTGaq
         dJxR5DwK/B84I1EtKjmihdhAcMffWg0g6M7i4MQOzfoUlRoKflc4pyuGZSr7Mo3bIxcP
         E4zqIy6aiDlK1kDcArx5B29T5UJpuD1uBjowkYOmc8Tr7MBM6gGTO0ez+cZs18jtQkQ2
         v5IXWHQTcpQ1iuBdq7UosrpqwYRlejbgs9XXxdHuzDuhKJIlcYiGXsi5HY/PKToCniXA
         hzjVKaVcceb9jqat1yj1UWaRY4SxLYObQ2+8repBoPGcXIai7ZwzqfQMRCtHv0XsgtcL
         urZA==
X-Gm-Message-State: AOJu0YzWJVuhjXy4jRKvgGIsO6CmVB+Bvs5uzs6ozXrbYvhD96QeK0jz
	+PPR9Wrx5dmuaZTWIHGB7wTbkrT9l5U1GA==
X-Google-Smtp-Source: 
 AGHT+IFrJ5ArL9n6cHRfgoq9hKPNizot9gzZMyLlmN/VSIZQOIylWEKuTRLp3pCBLCLxFUTt+QtCu0l3X16Kkg==
X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9])
 (user=amoorthy job=sendgmr) by 2002:a25:d7c8:0:b0:d9a:58e1:bb52 with SMTP id
 o191-20020a25d7c8000000b00d9a58e1bb52mr163014ybg.6.1699563829970; Thu, 09 Nov
 2023 13:03:49 -0800 (PST)
Date: Thu,  9 Nov 2023 21:03:17 +0000
In-Reply-To: <20231109210325.3806151-1-amoorthy@google.com>
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20231109210325.3806151-1-amoorthy@google.com>
X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog
Message-ID: <20231109210325.3806151-7-amoorthy@google.com>
Subject: [PATCH v6 06/14] KVM: Add memslot flag to let userspace force an exit
 on missing hva mappings
From: Anish Moorthy <amoorthy@google.com>
To: seanjc@google.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev
Cc: oliver.upton@linux.dev, pbonzini@redhat.com, maz@kernel.org,
	robert.hoo.linux@gmail.com, jthoughton@google.com, amoorthy@google.com,
	dmatlack@google.com, axelrasmussen@google.com, peterx@redhat.com,
	nadav.amit@gmail.com, isaku.yamahata@gmail.com, kconsul@linux.vnet.ibm.com

Allowing KVM to fault in pages during vcpu-context guest memory accesses
can be undesirable: during userfaultfd-based postcopy, it can cause
significant performance issues due to vCPUs contending for
userfaultfd-internal locks.

Add a new memslot flag (KVM_MEM_EXIT_ON_MISSING) through which userspace
can indicate that KVM_RUN should exit instead of faulting in pages
during vcpu-context guest memory accesses. The unfaulted pages are
reported by the accompanying KVM_EXIT_MEMORY_FAULT_INFO, allowing
userspace to determine and take appropriate action.

The basic implementation strategy is to check the memslot flag from
within __gfn_to_pfn_memslot() and override the caller-provided arguments
accordingly. Some callers (such as kvm_vcpu_map()) must be able to opt
out of this behavior, and do so by passing can_exit_on_missing=false.

No functional change intended: nothing sets KVM_MEM_EXIT_ON_MISSING or
passes can_exit_on_missing=true to __gfn_to_pfn_memslot().

Suggested-by: James Houghton <jthoughton@google.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Anish Moorthy <amoorthy@google.com>
Reviewed-by: James Houghton <jthoughton@google.com>
---
 Documentation/virt/kvm/api.rst         | 28 +++++++++++++++++++++++---
 arch/arm64/kvm/mmu.c                   |  2 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c    |  2 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c |  2 +-
 arch/x86/kvm/mmu/mmu.c                 |  4 ++--
 include/linux/kvm_host.h               | 12 ++++++++++-
 include/uapi/linux/kvm.h               |  2 ++
 virt/kvm/Kconfig                       |  3 +++
 virt/kvm/kvm_main.c                    | 25 ++++++++++++++++++-----
 9 files changed, 66 insertions(+), 14 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index a07964f601de..1457865f6e98 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -1365,6 +1365,8 @@ yet and must be cleared on entry.
   /* for kvm_userspace_memory_region::flags */
   #define KVM_MEM_LOG_DIRTY_PAGES	(1UL << 0)
   #define KVM_MEM_READONLY	(1UL << 1)
+  #define KVM_MEM_GUEST_MEMFD      (1UL << 2)
+  #define KVM_MEM_EXIT_ON_MISSING  (1UL << 3)
 
 This ioctl allows the user to create, modify or delete a guest physical
 memory slot.  Bits 0-15 of "slot" specify the slot id and this value
@@ -1395,12 +1397,16 @@ It is recommended that the lower 21 bits of guest_phys_addr and userspace_addr
 be identical.  This allows large pages in the guest to be backed by large
 pages in the host.
 
-The flags field supports two flags: KVM_MEM_LOG_DIRTY_PAGES and
-KVM_MEM_READONLY.  The former can be set to instruct KVM to keep track of
+The flags field supports four flags
+
+1.  KVM_MEM_LOG_DIRTY_PAGES: can be set to instruct KVM to keep track of
 writes to memory within the slot.  See KVM_GET_DIRTY_LOG ioctl to know how to
-use it.  The latter can be set, if KVM_CAP_READONLY_MEM capability allows it,
+use it.
+2.  KVM_MEM_READONLY: can be set, if KVM_CAP_READONLY_MEM capability allows it,
 to make a new slot read-only.  In this case, writes to this memory will be
 posted to userspace as KVM_EXIT_MMIO exits.
+3.  KVM_MEM_GUEST_MEMFD
+4.  KVM_MEM_EXIT_ON_MISSING: see KVM_CAP_EXIT_ON_MISSING for details.
 
 When the KVM_CAP_SYNC_MMU capability is available, changes in the backing of
 the memory region are automatically reflected into the guest.  For example, an
@@ -8059,6 +8065,22 @@ error/annotated fault.
 
 See KVM_EXIT_MEMORY_FAULT for more information.
 
+7.35 KVM_CAP_EXIT_ON_MISSING
+----------------------------
+
+:Architectures: None
+:Returns: Informational only, -EINVAL on direct KVM_ENABLE_CAP.
+
+The presence of this capability indicates that userspace may set the
+KVM_MEM_EXIT_ON_MISSING flag on memslots. Said flag will cause KVM_RUN to fail
+(-EFAULT) in response to guest-context memory accesses which would require KVM
+to page fault on the userspace mapping.
+
+The range of guest physical memory causing the fault is advertised to userspace
+through KVM_CAP_MEMORY_FAULT_INFO. Userspace should take appropriate action.
+This could mean, for instance, checking that the fault is resolvable, faulting
+in the relevant userspace mapping, then retrying KVM_RUN.
+
 8. Other capabilities.
 ======================
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 4e41ceed5468..13066a6fdfff 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1486,7 +1486,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	mmap_read_unlock(current->mm);
 
 	pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false, NULL,
-				   write_fault, &writable, NULL);
+				   write_fault, &writable, false, NULL);
 	if (pfn == KVM_PFN_ERR_HWPOISON) {
 		kvm_send_hwpoison_signal(hva, vma_shift);
 		return 0;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index efd0ebf70a5e..2ce0e1d3f597 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -613,7 +613,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_vcpu *vcpu,
 	} else {
 		/* Call KVM generic code to do the slow-path check */
 		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false, NULL,
-					   writing, &write_ok, NULL);
+					   writing, &write_ok, false, NULL);
 		if (is_error_noslot_pfn(pfn))
 			return -EFAULT;
 		page = NULL;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 572707858d65..9d40ca02747f 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -847,7 +847,7 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu,
 
 		/* Call KVM generic code to do the slow-path check */
 		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false, NULL,
-					   writing, upgrade_p, NULL);
+					   writing, upgrade_p, false, NULL);
 		if (is_error_noslot_pfn(pfn))
 			return -EFAULT;
 		page = NULL;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4de7670d5976..b1e5e42bdeb4 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4375,7 +4375,7 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	async = false;
 	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, false, &async,
 					  fault->write, &fault->map_writable,
-					  &fault->hva);
+					  false, &fault->hva);
 	if (!async)
 		return RET_PF_CONTINUE; /* *pfn has correct page already */
 
@@ -4397,7 +4397,7 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	 */
 	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, true, NULL,
 					  fault->write, &fault->map_writable,
-					  &fault->hva);
+					  false, &fault->hva);
 	return RET_PF_CONTINUE;
 }
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 5201400358da..e8e30088289e 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1219,7 +1219,8 @@ kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn);
 kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gfn);
 kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
 			       bool atomic, bool interruptible, bool *async,
-			       bool write_fault, bool *writable, hva_t *hva);
+			       bool write_fault, bool *writable,
+			       bool can_exit_on_missing, hva_t *hva);
 
 void kvm_release_pfn_clean(kvm_pfn_t pfn);
 void kvm_release_pfn_dirty(kvm_pfn_t pfn);
@@ -2423,4 +2424,13 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
 }
 #endif /* CONFIG_KVM_PRIVATE_MEM */
 
+/*
+ * Whether vCPUs should exit upon trying to access memory for which the
+ * userspace mappings are missing.
+ */
+static inline bool kvm_is_slot_exit_on_missing(const struct kvm_memory_slot *slot)
+{
+	return slot && slot->flags & KVM_MEM_EXIT_ON_MISSING;
+}
+
 #endif
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index bda5622a9c68..18546cbada61 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -116,6 +116,7 @@ struct kvm_userspace_memory_region2 {
 #define KVM_MEM_LOG_DIRTY_PAGES	(1UL << 0)
 #define KVM_MEM_READONLY	(1UL << 1)
 #define KVM_MEM_GUEST_MEMFD	(1UL << 2)
+#define KVM_MEM_EXIT_ON_MISSING	(1UL << 3)
 
 /* for KVM_IRQ_LINE */
 struct kvm_irq_level {
@@ -1231,6 +1232,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_MEMORY_ATTRIBUTES 233
 #define KVM_CAP_GUEST_MEMFD 234
 #define KVM_CAP_VM_TYPES 235
+#define KVM_CAP_EXIT_ON_MISSING 236
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 2c964586aa14..241f524a4e9d 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -109,3 +109,6 @@ config KVM_GENERIC_PRIVATE_MEM
        select KVM_GENERIC_MEMORY_ATTRIBUTES
        select KVM_PRIVATE_MEM
        bool
+
+config HAVE_KVM_EXIT_ON_MISSING
+       bool
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 725191333c4e..faaccdba179c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1614,7 +1614,7 @@ static void kvm_replace_memslot(struct kvm *kvm,
  * only allows these.
  */
 #define KVM_SET_USER_MEMORY_REGION_V1_FLAGS \
-	(KVM_MEM_LOG_DIRTY_PAGES | KVM_MEM_READONLY)
+	(KVM_MEM_LOG_DIRTY_PAGES | KVM_MEM_READONLY | KVM_MEM_EXIT_ON_MISSING)
 
 static int check_memory_region_flags(struct kvm *kvm,
 				     const struct kvm_userspace_memory_region2 *mem)
@@ -1632,6 +1632,9 @@ static int check_memory_region_flags(struct kvm *kvm,
 	valid_flags |= KVM_MEM_READONLY;
 #endif
 
+	if (IS_ENABLED(CONFIG_HAVE_KVM_EXIT_ON_MISSING))
+		valid_flags |= KVM_MEM_EXIT_ON_MISSING;
+
 	if (mem->flags & ~valid_flags)
 		return -EINVAL;
 
@@ -3047,7 +3050,8 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool interruptible,
 
 kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
 			       bool atomic, bool interruptible, bool *async,
-			       bool write_fault, bool *writable, hva_t *hva)
+			       bool write_fault, bool *writable,
+			       bool can_exit_on_missing, hva_t *hva)
 {
 	unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL, write_fault);
 
@@ -3070,6 +3074,15 @@ kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
 		writable = NULL;
 	}
 
+	if (!atomic && can_exit_on_missing
+	    && kvm_is_slot_exit_on_missing(slot)) {
+		atomic = true;
+		if (async) {
+			*async = false;
+			async = NULL;
+		}
+	}
+
 	return hva_to_pfn(addr, atomic, interruptible, async, write_fault,
 			  writable);
 }
@@ -3079,21 +3092,21 @@ kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
 		      bool *writable)
 {
 	return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, false,
-				    NULL, write_fault, writable, NULL);
+				    NULL, write_fault, writable, false, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_prot);
 
 kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn)
 {
 	return __gfn_to_pfn_memslot(slot, gfn, false, false, NULL, true,
-				    NULL, NULL);
+				    NULL, false, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot);
 
 kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gfn)
 {
 	return __gfn_to_pfn_memslot(slot, gfn, true, false, NULL, true,
-				    NULL, NULL);
+				    NULL, false, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot_atomic);
 
@@ -4898,6 +4911,8 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 	case KVM_CAP_GUEST_MEMFD:
 		return !kvm || kvm_arch_has_private_mem(kvm);
 #endif
+	case KVM_CAP_EXIT_ON_MISSING:
+		return IS_ENABLED(CONFIG_HAVE_KVM_EXIT_ON_MISSING);
 	default:
 		break;
 	}

From patchwork Thu Nov  9 21:03:18 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Anish Moorthy <amoorthy@google.com>
X-Patchwork-Id: 13451701
Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net
 [23.128.96.19])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9E1B938F8C
	for <kvm@vger.kernel.org>; Thu,  9 Nov 2023 21:03:52 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="Ge76z8Sj"
Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com
 [IPv6:2607:f8b0:4864:20::114a])
	by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C37C149CC
	for <kvm@vger.kernel.org>; Thu,  9 Nov 2023 13:03:51 -0800 (PST)
Received: by mail-yw1-x114a.google.com with SMTP id
 00721157ae682-5b0e9c78309so19451697b3.1
        for <kvm@vger.kernel.org>; Thu, 09 Nov 2023 13:03:51 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1699563831; x=1700168631;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=l82gPfac/SXOymVisAyaLDDJ1X21QbMxRoT2bJNvsrA=;
        b=Ge76z8SjTbsl7ewxy9N8ErWBFHajrAoHXQ87MTCfxUZxuq7qF+PYaSZYK4HLMWR7XG
         5DHBg5PWKAnWZ1o/ijmzvkM7erY+ujKgX+iWmMluy8hAQcOlYqvlNzcGg8u3kcKD5dZc
         NBI1hLPT6VLXz7OUjDsnSkD+Ynp11jOZPFANyNGL2IKHuTXgW5DhYjAH301fABwbPXid
         H+G/3DxfhaXLOQ6NvXJYd+eTqrkQHTpabe5LrgmH1y5S7l7IIL4zIErzekkzabWPk+Rt
         To2bwrYt3zQgHCFwBBsTXnbWJ3QvSTK8KGxoxjbX+Y2k2GTSeUNxIcmW2pTx3/FY+ZDR
         EFtw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1699563831; x=1700168631;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=l82gPfac/SXOymVisAyaLDDJ1X21QbMxRoT2bJNvsrA=;
        b=WJgLVf3JwgDXT1uW40KwmyvRvvVu8oMqlBi5lFRa0gkMaoZ5vd1MGpL/XOfoK/DjHP
         73fo9YLydTz/cOLgegBwnWCgL+8ndsB7fHBaDM0jRT+oduH72jLpnZzXxrkJOiKNyvIP
         2Eqb7paRGJOLmN6P4O1KMcR+bgugNUS/YywFQ0/8LaKDnTw0rB0Q9/CD1X7YhzatNUMp
         n3YMwD79PfvBkwWCnJ3CDSLpE5JD8UAPWnz99qE9MM9tXr0t4Y+0RxmcxyJcwegXI1UD
         D9p6gyGzLikGLzjxhRMPAVxGbVfIoEIZN46N+o1oCwAWBD8HtQNoRBwHeOOz4udRw5w8
         XhaQ==
X-Gm-Message-State: AOJu0YyVypH8D3zacLIXcwaez/36r+9Y8vl5edeuPFX7KJ1Porj932qH
	fDduv6WV9HpaLeGo8TKT+lejxgqui4Ofiw==
X-Google-Smtp-Source: 
 AGHT+IFvJ5ZeOdaPhXfzZmDQUxm7sOA9rGuldoFyaa/+DAtdsI/IW/Gd/xhQNzlGrBgekGARQHuJRJYSnz/Stg==
X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9])
 (user=amoorthy job=sendgmr) by 2002:a05:690c:2d05:b0:5be:a164:566e with SMTP
 id eq5-20020a05690c2d0500b005bea164566emr143138ywb.4.1699563831020; Thu, 09
 Nov 2023 13:03:51 -0800 (PST)
Date: Thu,  9 Nov 2023 21:03:18 +0000
In-Reply-To: <20231109210325.3806151-1-amoorthy@google.com>
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20231109210325.3806151-1-amoorthy@google.com>
X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog
Message-ID: <20231109210325.3806151-8-amoorthy@google.com>
Subject: [PATCH v6 07/14] KVM: x86: Enable KVM_CAP_EXIT_ON_MISSING and
 annotate EFAULTs from stage-2 fault handler
From: Anish Moorthy <amoorthy@google.com>
To: seanjc@google.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev
Cc: oliver.upton@linux.dev, pbonzini@redhat.com, maz@kernel.org,
	robert.hoo.linux@gmail.com, jthoughton@google.com, amoorthy@google.com,
	dmatlack@google.com, axelrasmussen@google.com, peterx@redhat.com,
	nadav.amit@gmail.com, isaku.yamahata@gmail.com, kconsul@linux.vnet.ibm.com

Prevent the stage-2 fault handler from faulting in pages when
KVM_MEM_EXIT_ON_MISSING is set by allowing its  __gfn_to_pfn_memslot()
calls to check the memslot flag.

To actually make that behavior useful, prepare a KVM_EXIT_MEMORY_FAULT
when the stage-2 handler returns EFAULT, e.g. when it cannot resolve the
pfn. With KVM_MEM_EXIT_ON_MISSING enabled this effects the delivery of
stage-2 faults as vCPU exits, which userspace can attempt to resolve
without terminating the guest.

Delivering stage-2 faults to userspace in this way sidesteps the
significant scalabiliy issues associated with using userfaultfd for the
same purpose.

Signed-off-by: Anish Moorthy <amoorthy@google.com>
---
 Documentation/virt/kvm/api.rst | 2 +-
 arch/x86/kvm/Kconfig           | 1 +
 arch/x86/kvm/mmu/mmu.c         | 8 ++++++--
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 1457865f6e98..fd87bbfbfdf2 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8068,7 +8068,7 @@ See KVM_EXIT_MEMORY_FAULT for more information.
 7.35 KVM_CAP_EXIT_ON_MISSING
 ----------------------------
 
-:Architectures: None
+:Architectures: x86
 :Returns: Informational only, -EINVAL on direct KVM_ENABLE_CAP.
 
 The presence of this capability indicates that userspace may set the
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index c1716e83d176..97b16be349a2 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -49,6 +49,7 @@ config KVM
 	select INTERVAL_TREE
 	select HAVE_KVM_PM_NOTIFIER if PM
 	select KVM_GENERIC_HARDWARE_ENABLING
+        select HAVE_KVM_EXIT_ON_MISSING
 	help
 	  Support hosting fully virtualized guest machines using hardware
 	  virtualization extensions.  You will need a fairly recent
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b1e5e42bdeb4..bc978260d2be 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3309,6 +3309,10 @@ static int kvm_handle_error_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fa
 		return RET_PF_RETRY;
 	}
 
+	WARN_ON_ONCE(fault->goal_level != PG_LEVEL_4K);
+
+	kvm_prepare_memory_fault_exit(vcpu, gfn_to_gpa(fault->gfn), PAGE_SIZE,
+				      fault->write, fault->exec, fault->is_private);
 	return -EFAULT;
 }
 
@@ -4375,7 +4379,7 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	async = false;
 	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, false, &async,
 					  fault->write, &fault->map_writable,
-					  false, &fault->hva);
+					  true, &fault->hva);
 	if (!async)
 		return RET_PF_CONTINUE; /* *pfn has correct page already */
 
@@ -4397,7 +4401,7 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	 */
 	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, true, NULL,
 					  fault->write, &fault->map_writable,
-					  false, &fault->hva);
+					  true, &fault->hva);
 	return RET_PF_CONTINUE;
 }
 

From patchwork Thu Nov  9 21:03:19 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Anish Moorthy <amoorthy@google.com>
X-Patchwork-Id: 13451702
Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net
 [23.128.96.19])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1141238F99
	for <kvm@vger.kernel.org>; Thu,  9 Nov 2023 21:03:53 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="EfAj8SFz"
Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com
 [IPv6:2607:f8b0:4864:20::b49])
	by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D17E949D3
	for <kvm@vger.kernel.org>; Thu,  9 Nov 2023 13:03:52 -0800 (PST)
Received: by mail-yb1-xb49.google.com with SMTP id
 3f1490d57ef6-da0c7d27fb0so1604664276.1
        for <kvm@vger.kernel.org>; Thu, 09 Nov 2023 13:03:52 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1699563832; x=1700168632;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=FlXGm33qnDUGSwPi6nRUH+NhfpRFfL8yhZwyreXloOM=;
        b=EfAj8SFzDmOCXtDzsirs+0iUcajDkPSK9vppWQGwMHpXm/pdBoaLdeRkRmeR+qWu0t
         MJi6PPezjpRVpYWJLg9pCNfO8DnCiWVyUe/7zJkRnGtZ/K8dL+yF3Ftw5Nz0YYfTT9Ca
         iro9z8IxcRkLsZlZVY+tAsOqA38e9qqHwwehE4Pz8j6YhhG/xcDlLVhMibF6Y+BmJ0e/
         kUmrruDA5LHTqLdcBcJdIJqInhuoUX10u9nolrHCmRugccpmlzN+F1ddtvilXFFz6x9O
         gDG/E2GIYRaQlvoCwUjFTrlrJLjsTvE+49nJg8m4ZHw7ltAopBcQxvH4Nvw/XQWjPJvZ
         nvTQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1699563832; x=1700168632;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=FlXGm33qnDUGSwPi6nRUH+NhfpRFfL8yhZwyreXloOM=;
        b=HOPUxG7kkabT10g/qrhaYXze6pXTEg3HAS71CAPgCQDXkwY7ZFpr/j5kVYizYF+0Yg
         hZ2mqve5HjQAkayLaEvG345qpw947OMU1TS7BilGUbDDLrMnWaCCpPqNlWB7JPI5Af7N
         qMmO+BU6uhe38A8GMt4xbvNfAke0hC9MQDtKh3feHvp5uJStR+ttp2+T1fmII6qU5uo9
         ZL/khR5k0k3zvIkoYK2z0/yvTIu5Mc3FTnBJtR5lwfKonhY0PLxvdIITc0PXjooG0SRg
         Okd2pppLa7Q6SEfaDjJPUwzzwzbposw8iNciQ/WPZNzj9p1KdXx8QAZ5uoGrtA+C6fuw
         k2OQ==
X-Gm-Message-State: AOJu0YwaW9CfrcZdBU4+uBTg62NjE30uRhYjQ4hTscdnwnLPosVxmrMN
	EdHiYqebBx0py+xaau28umrPZwRhxUPh0A==
X-Google-Smtp-Source: 
 AGHT+IFaRWh9Kbo8WRKxbI1CvM7U18p+ZwojLoHrOJQ0ULGFOG0q4KET+/ELOWlz9GcEp/WRM4l5DEYeDb6NvQ==
X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9])
 (user=amoorthy job=sendgmr) by 2002:a05:6902:1366:b0:dae:49a3:ae23 with SMTP
 id bt6-20020a056902136600b00dae49a3ae23mr158715ybb.7.1699563832063; Thu, 09
 Nov 2023 13:03:52 -0800 (PST)
Date: Thu,  9 Nov 2023 21:03:19 +0000
In-Reply-To: <20231109210325.3806151-1-amoorthy@google.com>
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20231109210325.3806151-1-amoorthy@google.com>
X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog
Message-ID: <20231109210325.3806151-9-amoorthy@google.com>
Subject: [PATCH v6 08/14] KVM: arm64: Enable KVM_CAP_MEMORY_FAULT_INFO
From: Anish Moorthy <amoorthy@google.com>
To: seanjc@google.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev
Cc: oliver.upton@linux.dev, pbonzini@redhat.com, maz@kernel.org,
	robert.hoo.linux@gmail.com, jthoughton@google.com, amoorthy@google.com,
	dmatlack@google.com, axelrasmussen@google.com, peterx@redhat.com,
	nadav.amit@gmail.com, isaku.yamahata@gmail.com, kconsul@linux.vnet.ibm.com

TODO: Changelog -- and possibly just merge into the "god" arm commit?

Signed-off-by: Anish Moorthy <amoorthy@google.com>
---
 arch/arm64/kvm/arm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 317964bad1e1..b5c1d1fb77d0 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -241,6 +241,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_ARM_SYSTEM_SUSPEND:
 	case KVM_CAP_IRQFD_RESAMPLE:
 	case KVM_CAP_COUNTER_OFFSET:
+	case KVM_CAP_MEMORY_FAULT_INFO:
 		r = 1;
 		break;
 	case KVM_CAP_SET_GUEST_DEBUG2:

From patchwork Thu Nov  9 21:03:20 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Anish Moorthy <amoorthy@google.com>
X-Patchwork-Id: 13451703
Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net
 [23.128.96.19])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7F49838F9E
	for <kvm@vger.kernel.org>; Thu,  9 Nov 2023 21:03:54 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="KpycGLKB"
Received: from mail-qk1-x74a.google.com (mail-qk1-x74a.google.com
 [IPv6:2607:f8b0:4864:20::74a])
	by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CEC7E49DD
	for <kvm@vger.kernel.org>; Thu,  9 Nov 2023 13:03:53 -0800 (PST)
Received: by mail-qk1-x74a.google.com with SMTP id
 af79cd13be357-778b3526240so147393285a.1
        for <kvm@vger.kernel.org>; Thu, 09 Nov 2023 13:03:53 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1699563833; x=1700168633;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=FOcXPtnv49h6RAm6cS9G5VMrOnN29qVgKtjpOURCfKY=;
        b=KpycGLKBMqfkE0N7EQ0j6/0DWWO3N23UArSyO2Fv1bjR+SjyHgyjwi8hfUKYzi7Cm5
         uaRPVzxiAigaLxKS9ruIofjhTjl1T3Tg9G2Ca7aYseDdTa+48zfnSzWaRYKMspT5QsM5
         VhogVEnJh/Y4i2ChjPW/CokUz8JBcC4GkrnijOztGgggi9BfKIGmvR8xNUikspRcc35+
         SGrlqRNqfy/OLKWu5eaYjgifO5Sogzfr3OBYK/66H9lWLZ7LVhFHccblTq6bX3r0Nz3I
         Mp9I8/hf7MASaUlnx0Ut//IYajIVyUaPG3Unx8vBkgR7xvma7Cs6fmxnXX5oDfJ6Md/V
         4Bqw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1699563833; x=1700168633;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=FOcXPtnv49h6RAm6cS9G5VMrOnN29qVgKtjpOURCfKY=;
        b=QLPo3KcKDOV3kHwYsKAKQoeoRrOXpjh1y9FUFuZAPr7Gn4Yw8H7B6leGkcsm6SagVE
         vTSK7+hHKwisViHWGcnw3KMCnSZzWHoPUgiB6IUfyqSDeAfOvZcF30s/E7NvP5IBlzbH
         GFCdEDxO83JP6jlKwbRlkpI21469jVeml5UdJKUsqAZattt21MsdIcpmzf1fkLicYt5E
         h/PZWHF1lZ3Ejp9FjMxGSWp26Hb021ZqBdNDHkvYSmC21VhM2aY+cMUZJmV79vu1wa+c
         sxDfc2NT5F+bWpA6epSIySk21D7mT8OAyLLBSRXJSAy/+OvhhFInZX+prqygs/H5OLOD
         TXfw==
X-Gm-Message-State: AOJu0Yx2VbhR6DnDps1eKvFzBYP1r3HhDIU//2sAZuLtTvIK2R9BYeYo
	XG0LYIFawsRF6FbvKBL/c4d3wWYugGAo6A==
X-Google-Smtp-Source: 
 AGHT+IEPvmgFr/ZO9CbrgP1bKZHB0+G4ikHM0pFYVGxmjxS0Pviry7UBKDcbDGJlyuhvYt2yNZUCp14aW3sEOA==
X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9])
 (user=amoorthy job=sendgmr) by 2002:a05:620a:4509:b0:775:7572:4e62 with SMTP
 id t9-20020a05620a450900b0077575724e62mr188628qkp.2.1699563833047; Thu, 09
 Nov 2023 13:03:53 -0800 (PST)
Date: Thu,  9 Nov 2023 21:03:20 +0000
In-Reply-To: <20231109210325.3806151-1-amoorthy@google.com>
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20231109210325.3806151-1-amoorthy@google.com>
X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog
Message-ID: <20231109210325.3806151-10-amoorthy@google.com>
Subject: [PATCH v6 09/14] KVM: arm64: Enable KVM_CAP_EXIT_ON_MISSING and
 annotate an EFAULT from stage-2 fault-handler
From: Anish Moorthy <amoorthy@google.com>
To: seanjc@google.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev
Cc: oliver.upton@linux.dev, pbonzini@redhat.com, maz@kernel.org,
	robert.hoo.linux@gmail.com, jthoughton@google.com, amoorthy@google.com,
	dmatlack@google.com, axelrasmussen@google.com, peterx@redhat.com,
	nadav.amit@gmail.com, isaku.yamahata@gmail.com, kconsul@linux.vnet.ibm.com

Prevent the stage-2 fault handler from faulting in pages when
KVM_MEM_EXIT_ON_MISSING is set by allowing its  __gfn_to_pfn_memslot()
calls to check the memslot flag.

To actually make that behavior useful, prepare a KVM_EXIT_MEMORY_FAULT
when the stage-2 handler cannot resolve the pfn for a fault. With
KVM_MEM_EXIT_ON_MISSING enabled this effects the delivery of stage-2
faults as vCPU exits, which userspace can attempt to resolve without
terminating the guest.

Delivering stage-2 faults to userspace in this way sidesteps the
significant scalabiliy issues associated with using userfaultfd for the
same purpose.

Signed-off-by: Anish Moorthy <amoorthy@google.com>
---
 Documentation/virt/kvm/api.rst | 2 +-
 arch/arm64/kvm/Kconfig         | 1 +
 arch/arm64/kvm/mmu.c           | 7 +++++--
 3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index fd87bbfbfdf2..67fcb9dbe855 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8068,7 +8068,7 @@ See KVM_EXIT_MEMORY_FAULT for more information.
 7.35 KVM_CAP_EXIT_ON_MISSING
 ----------------------------
 
-:Architectures: x86
+:Architectures: x86, arm64
 :Returns: Informational only, -EINVAL on direct KVM_ENABLE_CAP.
 
 The presence of this capability indicates that userspace may set the
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 1a777715199f..d6fae31f7e1a 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -43,6 +43,7 @@ menuconfig KVM
 	select GUEST_PERF_EVENTS if PERF_EVENTS
 	select INTERVAL_TREE
 	select XARRAY_MULTI
+        select HAVE_KVM_EXIT_ON_MISSING
 	help
 	  Support hosting virtualized guest machines.
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 13066a6fdfff..3b9fb80672ac 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1486,13 +1486,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	mmap_read_unlock(current->mm);
 
 	pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false, NULL,
-				   write_fault, &writable, false, NULL);
+				   write_fault, &writable, true, NULL);
 	if (pfn == KVM_PFN_ERR_HWPOISON) {
 		kvm_send_hwpoison_signal(hva, vma_shift);
 		return 0;
 	}
-	if (is_error_noslot_pfn(pfn))
+	if (is_error_noslot_pfn(pfn)) {
+		kvm_prepare_memory_fault_exit(vcpu, gfn * PAGE_SIZE, PAGE_SIZE,
+					      write_fault, exec_fault, false);
 		return -EFAULT;
+	}
 
 	if (kvm_is_device_pfn(pfn)) {
 		/*

From patchwork Thu Nov  9 21:03:21 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Anish Moorthy <amoorthy@google.com>
X-Patchwork-Id: 13451705
Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net
 [23.128.96.19])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id AE43B38FAE
	for <kvm@vger.kernel.org>; Thu,  9 Nov 2023 21:03:55 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="1koZbFRk"
Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com
 [IPv6:2607:f8b0:4864:20::1149])
	by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E1EE049EA
	for <kvm@vger.kernel.org>; Thu,  9 Nov 2023 13:03:54 -0800 (PST)
Received: by mail-yw1-x1149.google.com with SMTP id
 00721157ae682-5a8ee6a1801so18261117b3.3
        for <kvm@vger.kernel.org>; Thu, 09 Nov 2023 13:03:54 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1699563834; x=1700168634;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=ndN5evHM8MUglZ5LACl1va3WXCv7FT2yLnixDthGd8U=;
        b=1koZbFRkvo13gtyS05R6wZwGD0fXvQah5zHalyzUW9AxunfsEXwKN2kPuxQEeKC2MF
         jeU9iweKcReKzBfyoylfK3KSa673kv0AP7m1cWp9h+8nQYW9Szmw6jVzVJ2iffPeR1/n
         Sl2RNeIz7NfYoME/fx6UHFBhIK410BcBFTIeYmjS5WQkR7DloxS3n1OruGuajsBblwKH
         7VAIGBssgG1D5bUPDbu7BWoxSC6XMgXp0rgc5AKX/OY7LYXuVgPeHQqlmPFKyqM04oKX
         7piCjPjMHbKLnhn2X51u88bmfRlF116WCSoDtPkiLEwdjJ1JVVDu3KaP3Qlfwy7b3ld3
         eZJg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1699563834; x=1700168634;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=ndN5evHM8MUglZ5LACl1va3WXCv7FT2yLnixDthGd8U=;
        b=N6PQk74ecIBGE80p061fKmyy9hHCExr7W9g0xQh2JGZUQqHMAG9Hd6Nsje22AW8C2G
         p2m3cAxdbOaLxmtp5yTSLLkreCRdf12z4oSi4LokUFvCucjWh7XLqziZnwTrdlTYJNrZ
         RWnnDA7H1d9q8swp3+H3fmBKimN1uoMd9j8g4QVZOOS6v7GorxAm8ru999y5rQv3puE1
         Hz+Ed6yWYdjpTcldxN3NBvDnahUTPPIFYyI2GRdtzVb+gSMjT6nnuOL/CNh+UtxXQsIP
         0Rdsodirj62OgaALcoTkX+3bxKoZwdac/Uuhj8Tug80caBxCBFM95dfEXBGmi4ilM339
         PCXQ==
X-Gm-Message-State: AOJu0YzC2m7oHVTVAZfZk95YzBZm4xJQQo//GRZyxP9WwfHiG/+2kRy0
	RuSlz8y7HrElnkYRhAX/TA29Lmarm5cfWQ==
X-Google-Smtp-Source: 
 AGHT+IGr/R/Ys/jo55wlqQbJQ0YqF45A4R74eV/HizZPI6A91g6dD/euSHzkl4e6vMTELVXYRo6uYcd4cdgrgw==
X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9])
 (user=amoorthy job=sendgmr) by 2002:a25:1fd6:0:b0:d9a:e3d9:99bd with SMTP id
 f205-20020a251fd6000000b00d9ae3d999bdmr147977ybf.5.1699563834122; Thu, 09 Nov
 2023 13:03:54 -0800 (PST)
Date: Thu,  9 Nov 2023 21:03:21 +0000
In-Reply-To: <20231109210325.3806151-1-amoorthy@google.com>
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20231109210325.3806151-1-amoorthy@google.com>
X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog
Message-ID: <20231109210325.3806151-11-amoorthy@google.com>
Subject: [PATCH v6 10/14] KVM: selftests: Report per-vcpu demand paging rate
 from demand paging test
From: Anish Moorthy <amoorthy@google.com>
To: seanjc@google.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev
Cc: oliver.upton@linux.dev, pbonzini@redhat.com, maz@kernel.org,
	robert.hoo.linux@gmail.com, jthoughton@google.com, amoorthy@google.com,
	dmatlack@google.com, axelrasmussen@google.com, peterx@redhat.com,
	nadav.amit@gmail.com, isaku.yamahata@gmail.com, kconsul@linux.vnet.ibm.com

Using the overall demand paging rate to measure performance can be
slightly misleading when vCPU accesses are not overlapped. Adding more
vCPUs will (usually) increase the overall demand paging rate even
if performance remains constant or even degrades on a per-vcpu basis. As
such, it makes sense to report both the total and per-vcpu paging rates.

Signed-off-by: Anish Moorthy <amoorthy@google.com>
---
 tools/testing/selftests/kvm/demand_paging_test.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c
index 09c116a82a84..6dc823fa933a 100644
--- a/tools/testing/selftests/kvm/demand_paging_test.c
+++ b/tools/testing/selftests/kvm/demand_paging_test.c
@@ -135,6 +135,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 	struct timespec ts_diff;
 	struct kvm_vm *vm;
 	int i;
+	double vcpu_paging_rate;
 
 	vm = memstress_create_vm(mode, nr_vcpus, guest_percpu_mem_size, 1,
 				 p->src_type, p->partition_vcpu_memory_access);
@@ -191,11 +192,17 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 			uffd_stop_demand_paging(uffd_descs[i]);
 	}
 
-	pr_info("Total guest execution time: %ld.%.9lds\n",
+	pr_info("Total guest execution time:\t%ld.%.9lds\n",
 		ts_diff.tv_sec, ts_diff.tv_nsec);
-	pr_info("Overall demand paging rate: %f pgs/sec\n",
-		memstress_args.vcpu_args[0].pages * nr_vcpus /
-		((double)ts_diff.tv_sec + (double)ts_diff.tv_nsec / NSEC_PER_SEC));
+
+	vcpu_paging_rate =
+		memstress_args.vcpu_args[0].pages
+		/ ((double)ts_diff.tv_sec
+			+ (double)ts_diff.tv_nsec / NSEC_PER_SEC);
+	pr_info("Per-vcpu demand paging rate:\t%f pgs/sec/vcpu\n",
+		vcpu_paging_rate);
+	pr_info("Overall demand paging rate:\t%f pgs/sec\n",
+		vcpu_paging_rate * nr_vcpus);
 
 	memstress_destroy_vm(vm);
 

From patchwork Thu Nov  9 21:03:22 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Anish Moorthy <amoorthy@google.com>
X-Patchwork-Id: 13451706
Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net
 [23.128.96.19])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8717638FB8
	for <kvm@vger.kernel.org>; Thu,  9 Nov 2023 21:03:56 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="2Br4hzZr"
Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com
 [IPv6:2607:f8b0:4864:20::1149])
	by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 07D1749F8
	for <kvm@vger.kernel.org>; Thu,  9 Nov 2023 13:03:56 -0800 (PST)
Received: by mail-yw1-x1149.google.com with SMTP id
 00721157ae682-5a9012ab0adso19316147b3.1
        for <kvm@vger.kernel.org>; Thu, 09 Nov 2023 13:03:55 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1699563835; x=1700168635;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=TmcvA2ZZT630fMBccfzxy2XW4lw7KM7+A6ZwrGNXQDc=;
        b=2Br4hzZrj99zP76BOnKcEBSUxj8PTR42njrHMqRZoVgTlS0xX1pVBnGYrBfKfwXU2/
         m8hXKx1Xx3hs82/OO2laASQ/cj2ApWuNxHwCfWf8LzXz04jvH6SIo9v8xf+CYwjkbFsL
         /XWg+08pDrRXToezx+9MYx8VPLtjZ8dBNiFLi++kQ2d/yFwoKHQFR6aNY5LeNAQoil31
         s6jnZapPygEovmSBHFFO7ZuOFOj1jIw1IchofT+sah+bQgMuV7okDdNLJCT6XLPaIG5M
         4T1YAWKoxpP/yun+YhG6y94QvfbA27VNc6xe6jbActUv7VhTvC9qou1EYD8g/2D3+lWk
         /5HA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1699563835; x=1700168635;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=TmcvA2ZZT630fMBccfzxy2XW4lw7KM7+A6ZwrGNXQDc=;
        b=lXAEENNHAF9rejWghqiz8S8J7pyiWgVffbNzugsU48GK2FsxwIhwpj8vdmw5XaI//5
         v+OsF/Z5vPAlS8aX09Ae07cuafbSzxBOZy9oCPHz4TfKpnnnGxl2EFVIP+4lEIP72IuN
         zEsgGSpRTOOtKaHvQF/+8Bs76JW9nnunfXRkLByBSFzI+NQdiykfuSBtTCqNuqqS/dG9
         6XNirFb5C07zOCOjuXnm5rlx6ZGabetm/x5OPaQTrTlpT/hWpwAcN+UqQAnsmhsKBNhC
         zA5YMefaM1R886yvjBGluoi5q1ITZX5M02xqNB18VKS9lBFbJz8k+a7VsWkwMYJjSg20
         CLKg==
X-Gm-Message-State: AOJu0Yz0gcxwVg96IL1+jIIHYyob7eERFqq+tOXpvhT7vh2xnzBhC5NR
	fQTMnyCQqvl8we6Svr3YaTdvB23A4dO5KQ==
X-Google-Smtp-Source: 
 AGHT+IGE7Up2m5KuyO2zCRsTtM4ClEFjk3esQEe3dlB7T2ZYx643WTzVUSJx0s9yMYjB2D6Z2anjiLa7EJfeaQ==
X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9])
 (user=amoorthy job=sendgmr) by 2002:a0d:d7c2:0:b0:59b:db15:498c with SMTP id
 z185-20020a0dd7c2000000b0059bdb15498cmr158254ywd.10.1699563835263; Thu, 09
 Nov 2023 13:03:55 -0800 (PST)
Date: Thu,  9 Nov 2023 21:03:22 +0000
In-Reply-To: <20231109210325.3806151-1-amoorthy@google.com>
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20231109210325.3806151-1-amoorthy@google.com>
X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog
Message-ID: <20231109210325.3806151-12-amoorthy@google.com>
Subject: [PATCH v6 11/14] KVM: selftests: Allow many vCPUs and reader threads
 per UFFD in demand paging test
From: Anish Moorthy <amoorthy@google.com>
To: seanjc@google.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev
Cc: oliver.upton@linux.dev, pbonzini@redhat.com, maz@kernel.org,
	robert.hoo.linux@gmail.com, jthoughton@google.com, amoorthy@google.com,
	dmatlack@google.com, axelrasmussen@google.com, peterx@redhat.com,
	nadav.amit@gmail.com, isaku.yamahata@gmail.com, kconsul@linux.vnet.ibm.com

At the moment, demand_paging_test does not support profiling/testing
multiple vCPU threads concurrently faulting on a single uffd because

    (a) "-u" (run test in userfaultfd mode) creates a uffd for each vCPU's
        region, so that each uffd services a single vCPU thread.
    (b) "-u -o" (userfaultfd mode + overlapped vCPU memory accesses)
        simply doesn't work: the test tries to register the same memory
        to multiple uffds, causing an error.

Add support for many vcpus per uffd by
    (1) Keeping "-u" behavior unchanged.
    (2) Making "-u -a" create a single uffd for all of guest memory.
    (3) Making "-u -o" implicitly pass "-a", solving the problem in (b).
In cases (2) and (3) all vCPU threads fault on a single uffd.

With potentially multiple vCPUs per UFFD, it makes sense to allow
configuring the number of reader threads per UFFD as well: add the "-r"
flag to do so.

Signed-off-by: Anish Moorthy <amoorthy@google.com>
Acked-by: James Houghton <jthoughton@google.com>
---
 .../selftests/kvm/aarch64/page_fault_test.c   |  4 +-
 .../selftests/kvm/demand_paging_test.c        | 76 +++++++++++++---
 .../selftests/kvm/include/userfaultfd_util.h  | 17 +++-
 .../selftests/kvm/lib/userfaultfd_util.c      | 87 +++++++++++++------
 4 files changed, 137 insertions(+), 47 deletions(-)

diff --git a/tools/testing/selftests/kvm/aarch64/page_fault_test.c b/tools/testing/selftests/kvm/aarch64/page_fault_test.c
index 08a5ca5bed56..dad1fb338f36 100644
--- a/tools/testing/selftests/kvm/aarch64/page_fault_test.c
+++ b/tools/testing/selftests/kvm/aarch64/page_fault_test.c
@@ -375,14 +375,14 @@ static void setup_uffd(struct kvm_vm *vm, struct test_params *p,
 		*pt_uffd = uffd_setup_demand_paging(uffd_mode, 0,
 						    pt_args.hva,
 						    pt_args.paging_size,
-						    test->uffd_pt_handler);
+						    1, test->uffd_pt_handler);
 
 	*data_uffd = NULL;
 	if (test->uffd_data_handler)
 		*data_uffd = uffd_setup_demand_paging(uffd_mode, 0,
 						      data_args.hva,
 						      data_args.paging_size,
-						      test->uffd_data_handler);
+						      1, test->uffd_data_handler);
 }
 
 static void free_uffd(struct test_desc *test, struct uffd_desc *pt_uffd,
diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c
index 6dc823fa933a..f7897a951f90 100644
--- a/tools/testing/selftests/kvm/demand_paging_test.c
+++ b/tools/testing/selftests/kvm/demand_paging_test.c
@@ -77,8 +77,20 @@ static int handle_uffd_page_request(int uffd_mode, int uffd,
 		copy.mode = 0;
 
 		r = ioctl(uffd, UFFDIO_COPY, &copy);
-		if (r == -1) {
-			pr_info("Failed UFFDIO_COPY in 0x%lx from thread %d with errno: %d\n",
+		/*
+		 * With multiple vCPU threads fault on a single page and there are
+		 * multiple readers for the UFFD, at least one of the UFFDIO_COPYs
+		 * will fail with EEXIST: handle that case without signaling an
+		 * error.
+		 *
+		 * Note that this also suppress any EEXISTs occurring from,
+		 * e.g., the first UFFDIO_COPY/CONTINUEs on a page. That never
+		 * happens here, but a realistic VMM might potentially maintain
+		 * some external state to correctly surface EEXISTs to userspace
+		 * (or prevent duplicate COPY/CONTINUEs in the first place).
+		 */
+		if (r == -1 && errno != EEXIST) {
+			pr_info("Failed UFFDIO_COPY in 0x%lx from thread %d, errno = %d\n",
 				addr, tid, errno);
 			return r;
 		}
@@ -89,8 +101,20 @@ static int handle_uffd_page_request(int uffd_mode, int uffd,
 		cont.range.len = demand_paging_size;
 
 		r = ioctl(uffd, UFFDIO_CONTINUE, &cont);
-		if (r == -1) {
-			pr_info("Failed UFFDIO_CONTINUE in 0x%lx from thread %d with errno: %d\n",
+		/*
+		 * With multiple vCPU threads fault on a single page and there are
+		 * multiple readers for the UFFD, at least one of the UFFDIO_COPYs
+		 * will fail with EEXIST: handle that case without signaling an
+		 * error.
+		 *
+		 * Note that this also suppress any EEXISTs occurring from,
+		 * e.g., the first UFFDIO_COPY/CONTINUEs on a page. That never
+		 * happens here, but a realistic VMM might potentially maintain
+		 * some external state to correctly surface EEXISTs to userspace
+		 * (or prevent duplicate COPY/CONTINUEs in the first place).
+		 */
+		if (r == -1 && errno != EEXIST) {
+			pr_info("Failed UFFDIO_CONTINUE in 0x%lx, thread %d, errno = %d\n",
 				addr, tid, errno);
 			return r;
 		}
@@ -110,7 +134,9 @@ static int handle_uffd_page_request(int uffd_mode, int uffd,
 
 struct test_params {
 	int uffd_mode;
+	bool single_uffd;
 	useconds_t uffd_delay;
+	int readers_per_uffd;
 	enum vm_mem_backing_src_type src_type;
 	bool partition_vcpu_memory_access;
 };
@@ -134,8 +160,9 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 	struct timespec start;
 	struct timespec ts_diff;
 	struct kvm_vm *vm;
-	int i;
+	int i, num_uffds = 0;
 	double vcpu_paging_rate;
+	uint64_t uffd_region_size;
 
 	vm = memstress_create_vm(mode, nr_vcpus, guest_percpu_mem_size, 1,
 				 p->src_type, p->partition_vcpu_memory_access);
@@ -148,7 +175,8 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 	memset(guest_data_prototype, 0xAB, demand_paging_size);
 
 	if (p->uffd_mode == UFFDIO_REGISTER_MODE_MINOR) {
-		for (i = 0; i < nr_vcpus; i++) {
+		num_uffds = p->single_uffd ? 1 : nr_vcpus;
+		for (i = 0; i < num_uffds; i++) {
 			vcpu_args = &memstress_args.vcpu_args[i];
 			prefault_mem(addr_gpa2alias(vm, vcpu_args->gpa),
 				     vcpu_args->pages * memstress_args.guest_page_size);
@@ -156,9 +184,13 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 	}
 
 	if (p->uffd_mode) {
-		uffd_descs = malloc(nr_vcpus * sizeof(struct uffd_desc *));
+		num_uffds = p->single_uffd ? 1 : nr_vcpus;
+		uffd_region_size = nr_vcpus * guest_percpu_mem_size / num_uffds;
+
+		uffd_descs = malloc(num_uffds * sizeof(struct uffd_desc *));
 		TEST_ASSERT(uffd_descs, "Memory allocation failed");
-		for (i = 0; i < nr_vcpus; i++) {
+		for (i = 0; i < num_uffds; i++) {
+			struct memstress_vcpu_args *vcpu_args;
 			void *vcpu_hva;
 
 			vcpu_args = &memstress_args.vcpu_args[i];
@@ -171,7 +203,8 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 			 */
 			uffd_descs[i] = uffd_setup_demand_paging(
 				p->uffd_mode, p->uffd_delay, vcpu_hva,
-				vcpu_args->pages * memstress_args.guest_page_size,
+				uffd_region_size,
+				p->readers_per_uffd,
 				&handle_uffd_page_request);
 		}
 	}
@@ -188,7 +221,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 
 	if (p->uffd_mode) {
 		/* Tell the user fault fd handler threads to quit */
-		for (i = 0; i < nr_vcpus; i++)
+		for (i = 0; i < num_uffds; i++)
 			uffd_stop_demand_paging(uffd_descs[i]);
 	}
 
@@ -214,15 +247,20 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 static void help(char *name)
 {
 	puts("");
-	printf("usage: %s [-h] [-m vm_mode] [-u uffd_mode] [-d uffd_delay_usec]\n"
-	       "          [-b memory] [-s type] [-v vcpus] [-c cpu_list] [-o]\n", name);
+	printf("usage: %s [-h] [-m vm_mode] [-u uffd_mode] [-a]\n"
+		   "          [-d uffd_delay_usec] [-r readers_per_uffd] [-b memory]\n"
+		   "          [-s type] [-v vcpus] [-c cpu_list] [-o]\n", name);
 	guest_modes_help();
 	printf(" -u: use userfaultfd to handle vCPU page faults. Mode is a\n"
 	       "     UFFD registration mode: 'MISSING' or 'MINOR'.\n");
 	kvm_print_vcpu_pinning_help();
+	printf(" -a: Use a single userfaultfd for all of guest memory, instead of\n"
+	       "     creating one for each region paged by a unique vCPU\n"
+	       "     Set implicitly with -o, and no effect without -u.\n");
 	printf(" -d: add a delay in usec to the User Fault\n"
 	       "     FD handler to simulate demand paging\n"
 	       "     overheads. Ignored without -u.\n");
+	printf(" -r: Set the number of reader threads per uffd.\n");
 	printf(" -b: specify the size of the memory region which should be\n"
 	       "     demand paged by each vCPU. e.g. 10M or 3G.\n"
 	       "     Default: 1G\n");
@@ -241,12 +279,14 @@ int main(int argc, char *argv[])
 	struct test_params p = {
 		.src_type = DEFAULT_VM_MEM_SRC,
 		.partition_vcpu_memory_access = true,
+		.readers_per_uffd = 1,
+		.single_uffd = false,
 	};
 	int opt;
 
 	guest_modes_append_default();
 
-	while ((opt = getopt(argc, argv, "hm:u:d:b:s:v:c:o")) != -1) {
+	while ((opt = getopt(argc, argv, "ahom:u:d:b:s:v:c:r:")) != -1) {
 		switch (opt) {
 		case 'm':
 			guest_modes_cmdline(optarg);
@@ -258,6 +298,9 @@ int main(int argc, char *argv[])
 				p.uffd_mode = UFFDIO_REGISTER_MODE_MINOR;
 			TEST_ASSERT(p.uffd_mode, "UFFD mode must be 'MISSING' or 'MINOR'.");
 			break;
+		case 'a':
+			p.single_uffd = true;
+			break;
 		case 'd':
 			p.uffd_delay = strtoul(optarg, NULL, 0);
 			TEST_ASSERT(p.uffd_delay >= 0, "A negative UFFD delay is not supported.");
@@ -278,6 +321,13 @@ int main(int argc, char *argv[])
 			break;
 		case 'o':
 			p.partition_vcpu_memory_access = false;
+			p.single_uffd = true;
+			break;
+		case 'r':
+			p.readers_per_uffd = atoi(optarg);
+			TEST_ASSERT(p.readers_per_uffd >= 1,
+				    "Invalid number of readers per uffd %d: must be >=1",
+				    p.readers_per_uffd);
 			break;
 		case 'h':
 		default:
diff --git a/tools/testing/selftests/kvm/include/userfaultfd_util.h b/tools/testing/selftests/kvm/include/userfaultfd_util.h
index 877449c34592..af83a437e74a 100644
--- a/tools/testing/selftests/kvm/include/userfaultfd_util.h
+++ b/tools/testing/selftests/kvm/include/userfaultfd_util.h
@@ -17,18 +17,27 @@
 
 typedef int (*uffd_handler_t)(int uffd_mode, int uffd, struct uffd_msg *msg);
 
-struct uffd_desc {
+struct uffd_reader_args {
 	int uffd_mode;
 	int uffd;
-	int pipefds[2];
 	useconds_t delay;
 	uffd_handler_t handler;
-	pthread_t thread;
+	/* Holds the read end of the pipe for killing the reader. */
+	int pipe;
+};
+
+struct uffd_desc {
+	int uffd;
+	uint64_t num_readers;
+	/* Holds the write ends of the pipes for killing the readers. */
+	int *pipefds;
+	pthread_t *readers;
+	struct uffd_reader_args *reader_args;
 };
 
 struct uffd_desc *uffd_setup_demand_paging(int uffd_mode, useconds_t delay,
 					   void *hva, uint64_t len,
-					   uffd_handler_t handler);
+					   uint64_t num_readers, uffd_handler_t handler);
 
 void uffd_stop_demand_paging(struct uffd_desc *uffd);
 
diff --git a/tools/testing/selftests/kvm/lib/userfaultfd_util.c b/tools/testing/selftests/kvm/lib/userfaultfd_util.c
index 271f63891581..6f220aa4fb08 100644
--- a/tools/testing/selftests/kvm/lib/userfaultfd_util.c
+++ b/tools/testing/selftests/kvm/lib/userfaultfd_util.c
@@ -27,10 +27,8 @@
 
 static void *uffd_handler_thread_fn(void *arg)
 {
-	struct uffd_desc *uffd_desc = (struct uffd_desc *)arg;
-	int uffd = uffd_desc->uffd;
-	int pipefd = uffd_desc->pipefds[0];
-	useconds_t delay = uffd_desc->delay;
+	struct uffd_reader_args *reader_args = (struct uffd_reader_args *)arg;
+	int uffd = reader_args->uffd;
 	int64_t pages = 0;
 	struct timespec start;
 	struct timespec ts_diff;
@@ -44,7 +42,7 @@ static void *uffd_handler_thread_fn(void *arg)
 
 		pollfd[0].fd = uffd;
 		pollfd[0].events = POLLIN;
-		pollfd[1].fd = pipefd;
+		pollfd[1].fd = reader_args->pipe;
 		pollfd[1].events = POLLIN;
 
 		r = poll(pollfd, 2, -1);
@@ -92,9 +90,9 @@ static void *uffd_handler_thread_fn(void *arg)
 		if (!(msg.event & UFFD_EVENT_PAGEFAULT))
 			continue;
 
-		if (delay)
-			usleep(delay);
-		r = uffd_desc->handler(uffd_desc->uffd_mode, uffd, &msg);
+		if (reader_args->delay)
+			usleep(reader_args->delay);
+		r = reader_args->handler(reader_args->uffd_mode, uffd, &msg);
 		if (r < 0)
 			return NULL;
 		pages++;
@@ -110,7 +108,7 @@ static void *uffd_handler_thread_fn(void *arg)
 
 struct uffd_desc *uffd_setup_demand_paging(int uffd_mode, useconds_t delay,
 					   void *hva, uint64_t len,
-					   uffd_handler_t handler)
+					   uint64_t num_readers, uffd_handler_t handler)
 {
 	struct uffd_desc *uffd_desc;
 	bool is_minor = (uffd_mode == UFFDIO_REGISTER_MODE_MINOR);
@@ -118,14 +116,26 @@ struct uffd_desc *uffd_setup_demand_paging(int uffd_mode, useconds_t delay,
 	struct uffdio_api uffdio_api;
 	struct uffdio_register uffdio_register;
 	uint64_t expected_ioctls = ((uint64_t) 1) << _UFFDIO_COPY;
-	int ret;
+	int ret, i;
 
 	PER_PAGE_DEBUG("Userfaultfd %s mode, faults resolved with %s\n",
 		       is_minor ? "MINOR" : "MISSING",
 		       is_minor ? "UFFDIO_CONINUE" : "UFFDIO_COPY");
 
 	uffd_desc = malloc(sizeof(struct uffd_desc));
-	TEST_ASSERT(uffd_desc, "malloc failed");
+	TEST_ASSERT(uffd_desc, "Failed to malloc uffd descriptor");
+
+	uffd_desc->pipefds = malloc(sizeof(int) * num_readers);
+	TEST_ASSERT(uffd_desc->pipefds, "Failed to malloc pipes");
+
+	uffd_desc->readers = malloc(sizeof(pthread_t) * num_readers);
+	TEST_ASSERT(uffd_desc->readers, "Failed to malloc reader threads");
+
+	uffd_desc->reader_args = malloc(
+		sizeof(struct uffd_reader_args) * num_readers);
+	TEST_ASSERT(uffd_desc->reader_args, "Failed to malloc reader_args");
+
+	uffd_desc->num_readers = num_readers;
 
 	/* In order to get minor faults, prefault via the alias. */
 	if (is_minor)
@@ -148,18 +158,28 @@ struct uffd_desc *uffd_setup_demand_paging(int uffd_mode, useconds_t delay,
 	TEST_ASSERT((uffdio_register.ioctls & expected_ioctls) ==
 		    expected_ioctls, "missing userfaultfd ioctls");
 
-	ret = pipe2(uffd_desc->pipefds, O_CLOEXEC | O_NONBLOCK);
-	TEST_ASSERT(!ret, "Failed to set up pipefd");
-
-	uffd_desc->uffd_mode = uffd_mode;
 	uffd_desc->uffd = uffd;
-	uffd_desc->delay = delay;
-	uffd_desc->handler = handler;
-	pthread_create(&uffd_desc->thread, NULL, uffd_handler_thread_fn,
-		       uffd_desc);
+	for (i = 0; i < uffd_desc->num_readers; ++i) {
+		int pipes[2];
+
+		ret = pipe2((int *) &pipes, O_CLOEXEC | O_NONBLOCK);
+		TEST_ASSERT(!ret, "Failed to set up pipefd %i for uffd_desc %p",
+			    i, uffd_desc);
+
+		uffd_desc->pipefds[i] = pipes[1];
 
-	PER_VCPU_DEBUG("Created uffd thread for HVA range [%p, %p)\n",
-		       hva, hva + len);
+		uffd_desc->reader_args[i].uffd_mode = uffd_mode;
+		uffd_desc->reader_args[i].uffd = uffd;
+		uffd_desc->reader_args[i].delay = delay;
+		uffd_desc->reader_args[i].handler = handler;
+		uffd_desc->reader_args[i].pipe = pipes[0];
+
+		pthread_create(&uffd_desc->readers[i], NULL, uffd_handler_thread_fn,
+			       &uffd_desc->reader_args[i]);
+
+		PER_VCPU_DEBUG("Created uffd thread %i for HVA range [%p, %p)\n",
+			       i, hva, hva + len);
+	}
 
 	return uffd_desc;
 }
@@ -167,19 +187,30 @@ struct uffd_desc *uffd_setup_demand_paging(int uffd_mode, useconds_t delay,
 void uffd_stop_demand_paging(struct uffd_desc *uffd)
 {
 	char c = 0;
-	int ret;
+	int i, ret;
 
-	ret = write(uffd->pipefds[1], &c, 1);
-	TEST_ASSERT(ret == 1, "Unable to write to pipefd");
+	for (i = 0; i < uffd->num_readers; ++i) {
+		ret = write(uffd->pipefds[i], &c, 1);
+		TEST_ASSERT(
+			ret == 1, "Unable to write to pipefd %i for uffd_desc %p", i, uffd);
+	}
 
-	ret = pthread_join(uffd->thread, NULL);
-	TEST_ASSERT(ret == 0, "Pthread_join failed.");
+	for (i = 0; i < uffd->num_readers; ++i) {
+		ret = pthread_join(uffd->readers[i], NULL);
+		TEST_ASSERT(
+			ret == 0, "Pthread_join failed on reader %i for uffd_desc %p", i, uffd);
+	}
 
 	close(uffd->uffd);
 
-	close(uffd->pipefds[1]);
-	close(uffd->pipefds[0]);
+	for (i = 0; i < uffd->num_readers; ++i) {
+		close(uffd->pipefds[i]);
+		close(uffd->reader_args[i].pipe);
+	}
 
+	free(uffd->pipefds);
+	free(uffd->readers);
+	free(uffd->reader_args);
 	free(uffd);
 }
 

From patchwork Thu Nov  9 21:03:23 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Anish Moorthy <amoorthy@google.com>
X-Patchwork-Id: 13451707
Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net
 [23.128.96.19])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id F317F3984F
	for <kvm@vger.kernel.org>; Thu,  9 Nov 2023 21:03:57 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="xGm3433i"
Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com
 [IPv6:2607:f8b0:4864:20::1149])
	by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 19EA14C03
	for <kvm@vger.kernel.org>; Thu,  9 Nov 2023 13:03:57 -0800 (PST)
Received: by mail-yw1-x1149.google.com with SMTP id
 00721157ae682-5a7d261a84bso18621747b3.3
        for <kvm@vger.kernel.org>; Thu, 09 Nov 2023 13:03:57 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1699563836; x=1700168636;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=+gpIf+3+94gtvFVbnfXs0lHfYeXyNXAQWYAzs/xCr+4=;
        b=xGm3433iBc4/SCUy+u4pPm3tDLHqgsxejx2eWzQRN0+A7D+bfyGoh5SeSeNYiSR9gJ
         UHceAMMplmT0qFXTY9bNALKeoyeDVL1jHI9NPfAjQMMGgLjmnJ6AEqX6mrKI2cJTpRn/
         ZGGqPKcqNhYWbrsgyJMej2zkWAvyWLcFuVbUOhwhgNdC87z7TSN6RfSBLb3KalDWNEi1
         OSz7OCoyv46s8FVQh+KeXbP7JX8qxH22TehPCYWBn13xFxIXJSG3UU9o7GvJjfHo0RGS
         pEE8kaJUHD8idGCGPpa+xUiHkvvCufpKPYzg4JqVT+T/9byQt/4KNfggZ1Kib6rV/PW3
         vM5Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1699563836; x=1700168636;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=+gpIf+3+94gtvFVbnfXs0lHfYeXyNXAQWYAzs/xCr+4=;
        b=GuG+czQx2Hq5DTLOf1y4r/koINshXgPxIO1oCumZl0IXHbE8mqTzj3SIRyg2rFF3kT
         /1qn7tqMFpYLfiVN35OjvMe5kWJvLEEXxvCujheY3ZDWLG8Zo+UVDCH69sFSUeObe+sS
         8itS/pBIp3BxMjVqo5CaZ8u1/cmG/nUeV0kXlWvgdqs3JD2c/ppZGWc+vxwNiIoTUVeQ
         VzRDsfS1XpucTpl1Cp0uGWECSg7oVl/obm5rCZ1ksuqCdknS+ghhqmfgbzzvMNuru+l1
         J8RGvtJo+d3Lw4DxPMn5ytRIRBq9eXdte9UXAjM6je2sj9uWGJBAhFnRG38rgK90y01U
         KZ/Q==
X-Gm-Message-State: AOJu0Yz/oulII7jaSMcacI7NIPPpUEpMb4GTxamMR3r0SFFQH8pVJqd8
	+HG8X8vCNOBQf4cbly5TUwkLjKr764+1hA==
X-Google-Smtp-Source: 
 AGHT+IGot9REZNapp+cyRtlZcvZmtFQgUqJxuXKpOVq+Yvxt4bIIfoZ74Q/Nsan6zjTjWuZjAV1phgrvl0wnbA==
X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9])
 (user=amoorthy job=sendgmr) by 2002:a81:8343:0:b0:59f:77f6:6d12 with SMTP id
 t64-20020a818343000000b0059f77f66d12mr168413ywf.0.1699563836347; Thu, 09 Nov
 2023 13:03:56 -0800 (PST)
Date: Thu,  9 Nov 2023 21:03:23 +0000
In-Reply-To: <20231109210325.3806151-1-amoorthy@google.com>
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20231109210325.3806151-1-amoorthy@google.com>
X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog
Message-ID: <20231109210325.3806151-13-amoorthy@google.com>
Subject: [PATCH v6 12/14] KVM: selftests: Use EPOLL in userfaultfd_util reader
 threads and signal errors via TEST_ASSERT
From: Anish Moorthy <amoorthy@google.com>
To: seanjc@google.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev
Cc: oliver.upton@linux.dev, pbonzini@redhat.com, maz@kernel.org,
	robert.hoo.linux@gmail.com, jthoughton@google.com, amoorthy@google.com,
	dmatlack@google.com, axelrasmussen@google.com, peterx@redhat.com,
	nadav.amit@gmail.com, isaku.yamahata@gmail.com, kconsul@linux.vnet.ibm.com

With multiple reader threads POLLing a single UFFD, the test suffers
from the thundering herd problem: performance degrades as the number of
reader threads is increased. Solve this issue [1] by switching the
the polling mechanism to EPOLL + EPOLLEXCLUSIVE.

Also, change the error-handling convention of uffd_handler_thread_fn.
Instead of just printing errors and returning early from the polling
loop, check for them via TEST_ASSERT. "return NULL" is reserved for a
successful exit from uffd_handler_thread_fn, ie one triggered by a
write to the exit pipe.

Performance samples generated by the command in [2] are given below.

Num Reader Threads, Paging Rate (POLL), Paging Rate (EPOLL)
1      249k      185k
2      201k      235k
4      186k      155k
16     150k      217k
32     89k       198k

[1] Single-vCPU performance does suffer somewhat.
[2] ./demand_paging_test -u MINOR -s shmem -v 4 -o -r <num readers>

Signed-off-by: Anish Moorthy <amoorthy@google.com>
Acked-by: James Houghton <jthoughton@google.com>
---
 .../selftests/kvm/demand_paging_test.c        |  1 -
 .../selftests/kvm/lib/userfaultfd_util.c      | 74 +++++++++----------
 2 files changed, 35 insertions(+), 40 deletions(-)

diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c
index f7897a951f90..0455347f932a 100644
--- a/tools/testing/selftests/kvm/demand_paging_test.c
+++ b/tools/testing/selftests/kvm/demand_paging_test.c
@@ -13,7 +13,6 @@
 #include <stdio.h>
 #include <stdlib.h>
 #include <time.h>
-#include <poll.h>
 #include <pthread.h>
 #include <linux/userfaultfd.h>
 #include <sys/syscall.h>
diff --git a/tools/testing/selftests/kvm/lib/userfaultfd_util.c b/tools/testing/selftests/kvm/lib/userfaultfd_util.c
index 6f220aa4fb08..2a179133645a 100644
--- a/tools/testing/selftests/kvm/lib/userfaultfd_util.c
+++ b/tools/testing/selftests/kvm/lib/userfaultfd_util.c
@@ -16,6 +16,7 @@
 #include <poll.h>
 #include <pthread.h>
 #include <linux/userfaultfd.h>
+#include <sys/epoll.h>
 #include <sys/syscall.h>
 
 #include "kvm_util.h"
@@ -32,60 +33,55 @@ static void *uffd_handler_thread_fn(void *arg)
 	int64_t pages = 0;
 	struct timespec start;
 	struct timespec ts_diff;
+	int epollfd;
+	struct epoll_event evt;
+
+	epollfd = epoll_create(1);
+	TEST_ASSERT(epollfd >= 0, "Failed to create epollfd.");
+
+	evt.events = EPOLLIN | EPOLLEXCLUSIVE;
+	evt.data.u32 = 0;
+	TEST_ASSERT(epoll_ctl(epollfd, EPOLL_CTL_ADD, uffd, &evt) == 0,
+		    "Failed to add uffd to epollfd");
+
+	evt.events = EPOLLIN;
+	evt.data.u32 = 1;
+	TEST_ASSERT(epoll_ctl(epollfd, EPOLL_CTL_ADD, reader_args->pipe, &evt) == 0,
+		    "Failed to add pipe to epollfd");
 
 	clock_gettime(CLOCK_MONOTONIC, &start);
 	while (1) {
 		struct uffd_msg msg;
-		struct pollfd pollfd[2];
-		char tmp_chr;
 		int r;
 
-		pollfd[0].fd = uffd;
-		pollfd[0].events = POLLIN;
-		pollfd[1].fd = reader_args->pipe;
-		pollfd[1].events = POLLIN;
-
-		r = poll(pollfd, 2, -1);
-		switch (r) {
-		case -1:
-			pr_info("poll err");
-			continue;
-		case 0:
-			continue;
-		case 1:
-			break;
-		default:
-			pr_info("Polling uffd returned %d", r);
-			return NULL;
-		}
+		r = epoll_wait(epollfd, &evt, 1, -1);
+		TEST_ASSERT(r == 1,
+			    "Unexpected number of events (%d) from epoll, errno = %d",
+			    r, errno);
 
-		if (pollfd[0].revents & POLLERR) {
-			pr_info("uffd revents has POLLERR");
-			return NULL;
-		}
+		if (evt.data.u32 == 1) {
+			char tmp_chr;
 
-		if (pollfd[1].revents & POLLIN) {
-			r = read(pollfd[1].fd, &tmp_chr, 1);
+			TEST_ASSERT(!(evt.events & (EPOLLERR | EPOLLHUP)),
+				    "Reader thread received EPOLLERR or EPOLLHUP on pipe.");
+			r = read(reader_args->pipe, &tmp_chr, 1);
 			TEST_ASSERT(r == 1,
-				    "Error reading pipefd in UFFD thread\n");
+				    "Error reading pipefd in uffd reader thread");
 			break;
 		}
 
-		if (!(pollfd[0].revents & POLLIN))
-			continue;
+		TEST_ASSERT(!(evt.events & (EPOLLERR | EPOLLHUP)),
+			    "Reader thread received EPOLLERR or EPOLLHUP on uffd.");
 
 		r = read(uffd, &msg, sizeof(msg));
 		if (r == -1) {
-			if (errno == EAGAIN)
-				continue;
-			pr_info("Read of uffd got errno %d\n", errno);
-			return NULL;
+			TEST_ASSERT(errno == EAGAIN,
+				    "Error reading from UFFD: errno = %d", errno);
+			continue;
 		}
 
-		if (r != sizeof(msg)) {
-			pr_info("Read on uffd returned unexpected size: %d bytes", r);
-			return NULL;
-		}
+		TEST_ASSERT(r == sizeof(msg),
+			    "Read on uffd returned unexpected number of bytes (%d)", r);
 
 		if (!(msg.event & UFFD_EVENT_PAGEFAULT))
 			continue;
@@ -93,8 +89,8 @@ static void *uffd_handler_thread_fn(void *arg)
 		if (reader_args->delay)
 			usleep(reader_args->delay);
 		r = reader_args->handler(reader_args->uffd_mode, uffd, &msg);
-		if (r < 0)
-			return NULL;
+		TEST_ASSERT(r >= 0,
+			    "Reader thread handler fn returned negative value %d", r);
 		pages++;
 	}
 

From patchwork Thu Nov  9 21:03:24 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Anish Moorthy <amoorthy@google.com>
X-Patchwork-Id: 13451708
Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net
 [23.128.96.19])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2822739862
	for <kvm@vger.kernel.org>; Thu,  9 Nov 2023 21:03:59 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="WoN/S/qg"
Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com
 [IPv6:2607:f8b0:4864:20::1149])
	by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5AE5A49EA
	for <kvm@vger.kernel.org>; Thu,  9 Nov 2023 13:03:58 -0800 (PST)
Received: by mail-yw1-x1149.google.com with SMTP id
 00721157ae682-5afe220cadeso18512337b3.3
        for <kvm@vger.kernel.org>; Thu, 09 Nov 2023 13:03:58 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1699563837; x=1700168637;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=2a/M+0Fr1PqC5kgKOfRG5N8I8zu921PMSouy8YMa7RU=;
        b=WoN/S/qgMPKbQJC6g+FjJzVngApy91gFcXr6idpMzzZW58JvfHB465Vg230vhUI22y
         TM2HJ3JSXvRAvG0ATN349naHWRQJs/Ei+oDqppbwrDarw4t40LrdjkzIbhC+I99LjJ++
         jqo3Cix2MVA9u7glOE2zJuKUiN6uRC+4ELgkox7H9LSHuPbUhhHbIykf8M/0UU2XTWUB
         tQZyCgW4+xeal4dQJEsf4T9Kh93vSxyHEtiqD82wwpWFiiLoo/yKGjlawcvAYhgvxFuX
         f0WXANlamJ9IsJtv+rrYyXzmsTq7MNw2+Q2ZHT+asHl40AY/Qpr6/WUde32OSOkY0JGm
         EiFg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1699563837; x=1700168637;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=2a/M+0Fr1PqC5kgKOfRG5N8I8zu921PMSouy8YMa7RU=;
        b=k50xyFAfJERA0Hl960Fbktq/JqRf4owx/RJlj7SXdLklrBbs+x1K5x4bFu+G9jvR8q
         H7iuwSDobfCtj6i8kmNW6nInoG64feTqFwfmH1EXUY/jD0gBax6HT2yyC2Pi9cq3BD7E
         35Ko5JMCtxxFXncKR+jHVh2p0ZEsh1OL/pvz+SkFrw617easPlfjZ8CmfKPbXlYiyF0l
         EmDu6wYSM7oAiCghFO67B1/A92Sy4sbTmXHXgOD8r4SrXExfljikuuckO5ZQcadHj7Ky
         p/wFA9wkBIJO6JYOVb4PZCR/YG7xVf6or+KReLzKND+/DUcH+Co4mXGb6RV3rynI9v4l
         paew==
X-Gm-Message-State: AOJu0YxrWim/oKVuA6+vH/KH2SiyB8HqHNLfeTthoEaIZ+ZiGlufMNVU
	vD/0SFAevsP9p3b72kNrLKVX+babj4B4qw==
X-Google-Smtp-Source: 
 AGHT+IGkRtZ0otuvawnXg3FnB/DDPqjO//IWJIG3gi/q13j0FI8C9+TJy7xo5m2vn9PLkCfKS8hq9ldNvIQW3A==
X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9])
 (user=amoorthy job=sendgmr) by 2002:a25:800c:0:b0:da0:48e1:5f46 with SMTP id
 m12-20020a25800c000000b00da048e15f46mr166858ybk.9.1699563837558; Thu, 09 Nov
 2023 13:03:57 -0800 (PST)
Date: Thu,  9 Nov 2023 21:03:24 +0000
In-Reply-To: <20231109210325.3806151-1-amoorthy@google.com>
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20231109210325.3806151-1-amoorthy@google.com>
X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog
Message-ID: <20231109210325.3806151-14-amoorthy@google.com>
Subject: [PATCH v6 13/14] KVM: selftests: Add memslot_flags parameter to
 memstress_create_vm()
From: Anish Moorthy <amoorthy@google.com>
To: seanjc@google.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev
Cc: oliver.upton@linux.dev, pbonzini@redhat.com, maz@kernel.org,
	robert.hoo.linux@gmail.com, jthoughton@google.com, amoorthy@google.com,
	dmatlack@google.com, axelrasmussen@google.com, peterx@redhat.com,
	nadav.amit@gmail.com, isaku.yamahata@gmail.com, kconsul@linux.vnet.ibm.com

Memslot flags aren't currently exposed to the tests, and are just always
set to 0. Add a parameter to allow tests to manually set those flags.

Signed-off-by: Anish Moorthy <amoorthy@google.com>
---
 tools/testing/selftests/kvm/access_tracking_perf_test.c       | 2 +-
 tools/testing/selftests/kvm/demand_paging_test.c              | 2 +-
 tools/testing/selftests/kvm/dirty_log_perf_test.c             | 2 +-
 tools/testing/selftests/kvm/include/memstress.h               | 2 +-
 tools/testing/selftests/kvm/lib/memstress.c                   | 4 ++--
 .../testing/selftests/kvm/memslot_modification_stress_test.c  | 2 +-
 .../selftests/kvm/x86_64/dirty_log_page_splitting_test.c      | 2 +-
 7 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/kvm/access_tracking_perf_test.c b/tools/testing/selftests/kvm/access_tracking_perf_test.c
index 3c7defd34f56..b51656b408b8 100644
--- a/tools/testing/selftests/kvm/access_tracking_perf_test.c
+++ b/tools/testing/selftests/kvm/access_tracking_perf_test.c
@@ -306,7 +306,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 	struct kvm_vm *vm;
 	int nr_vcpus = params->nr_vcpus;
 
-	vm = memstress_create_vm(mode, nr_vcpus, params->vcpu_memory_bytes, 1,
+	vm = memstress_create_vm(mode, nr_vcpus, params->vcpu_memory_bytes, 1, 0,
 				 params->backing_src, !overlap_memory_access);
 
 	memstress_start_vcpu_threads(nr_vcpus, vcpu_thread_main);
diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c
index 0455347f932a..61bb2e23bef0 100644
--- a/tools/testing/selftests/kvm/demand_paging_test.c
+++ b/tools/testing/selftests/kvm/demand_paging_test.c
@@ -163,7 +163,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 	double vcpu_paging_rate;
 	uint64_t uffd_region_size;
 
-	vm = memstress_create_vm(mode, nr_vcpus, guest_percpu_mem_size, 1,
+	vm = memstress_create_vm(mode, nr_vcpus, guest_percpu_mem_size, 1, 0,
 				 p->src_type, p->partition_vcpu_memory_access);
 
 	demand_paging_size = get_backing_src_pagesz(p->src_type);
diff --git a/tools/testing/selftests/kvm/dirty_log_perf_test.c b/tools/testing/selftests/kvm/dirty_log_perf_test.c
index d374dbcf9a53..8b1a84a4db3b 100644
--- a/tools/testing/selftests/kvm/dirty_log_perf_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_perf_test.c
@@ -153,7 +153,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 	int i;
 
 	vm = memstress_create_vm(mode, nr_vcpus, guest_percpu_mem_size,
-				 p->slots, p->backing_src,
+				 p->slots, 0, p->backing_src,
 				 p->partition_vcpu_memory_access);
 
 	pr_info("Random seed: %u\n", p->random_seed);
diff --git a/tools/testing/selftests/kvm/include/memstress.h b/tools/testing/selftests/kvm/include/memstress.h
index ce4e603050ea..8be9609d3ca0 100644
--- a/tools/testing/selftests/kvm/include/memstress.h
+++ b/tools/testing/selftests/kvm/include/memstress.h
@@ -56,7 +56,7 @@ struct memstress_args {
 extern struct memstress_args memstress_args;
 
 struct kvm_vm *memstress_create_vm(enum vm_guest_mode mode, int nr_vcpus,
-				   uint64_t vcpu_memory_bytes, int slots,
+				   uint64_t vcpu_memory_bytes, int slots, uint32_t slot_flags,
 				   enum vm_mem_backing_src_type backing_src,
 				   bool partition_vcpu_memory_access);
 void memstress_destroy_vm(struct kvm_vm *vm);
diff --git a/tools/testing/selftests/kvm/lib/memstress.c b/tools/testing/selftests/kvm/lib/memstress.c
index d05487e5a371..e74b09f39769 100644
--- a/tools/testing/selftests/kvm/lib/memstress.c
+++ b/tools/testing/selftests/kvm/lib/memstress.c
@@ -123,7 +123,7 @@ void memstress_setup_vcpus(struct kvm_vm *vm, int nr_vcpus,
 }
 
 struct kvm_vm *memstress_create_vm(enum vm_guest_mode mode, int nr_vcpus,
-				   uint64_t vcpu_memory_bytes, int slots,
+				   uint64_t vcpu_memory_bytes, int slots, uint32_t slot_flags,
 				   enum vm_mem_backing_src_type backing_src,
 				   bool partition_vcpu_memory_access)
 {
@@ -212,7 +212,7 @@ struct kvm_vm *memstress_create_vm(enum vm_guest_mode mode, int nr_vcpus,
 
 		vm_userspace_mem_region_add(vm, backing_src, region_start,
 					    MEMSTRESS_MEM_SLOT_INDEX + i,
-					    region_pages, 0);
+					    region_pages, slot_flags);
 	}
 
 	/* Do mapping for the demand paging memory slot */
diff --git a/tools/testing/selftests/kvm/memslot_modification_stress_test.c b/tools/testing/selftests/kvm/memslot_modification_stress_test.c
index 9855c41ca811..0b19ec3ecc9c 100644
--- a/tools/testing/selftests/kvm/memslot_modification_stress_test.c
+++ b/tools/testing/selftests/kvm/memslot_modification_stress_test.c
@@ -95,7 +95,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 	struct test_params *p = arg;
 	struct kvm_vm *vm;
 
-	vm = memstress_create_vm(mode, nr_vcpus, guest_percpu_mem_size, 1,
+	vm = memstress_create_vm(mode, nr_vcpus, guest_percpu_mem_size, 1, 0,
 				 VM_MEM_SRC_ANONYMOUS,
 				 p->partition_vcpu_memory_access);
 
diff --git a/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c b/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c
index 634c6bfcd572..a770d7fa469a 100644
--- a/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c
+++ b/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c
@@ -100,7 +100,7 @@ static void run_test(enum vm_guest_mode mode, void *unused)
 	struct kvm_page_stats stats_dirty_logging_disabled;
 	struct kvm_page_stats stats_repopulated;
 
-	vm = memstress_create_vm(mode, VCPUS, guest_percpu_mem_size,
+	vm = memstress_create_vm(mode, VCPUS, guest_percpu_mem_size, 0,
 				 SLOTS, backing_src, false);
 
 	guest_num_pages = (VCPUS * guest_percpu_mem_size) >> vm->page_shift;

From patchwork Thu Nov  9 21:03:25 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Anish Moorthy <amoorthy@google.com>
X-Patchwork-Id: 13451709
Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net
 [23.128.96.19])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 148B339870
	for <kvm@vger.kernel.org>; Thu,  9 Nov 2023 21:04:00 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="31U00ztZ"
Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com
 [IPv6:2607:f8b0:4864:20::b4a])
	by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 740F14C1A
	for <kvm@vger.kernel.org>; Thu,  9 Nov 2023 13:03:59 -0800 (PST)
Received: by mail-yb1-xb4a.google.com with SMTP id
 3f1490d57ef6-da04776a869so1652551276.0
        for <kvm@vger.kernel.org>; Thu, 09 Nov 2023 13:03:59 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1699563838; x=1700168638;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=ebcFSdFa++R257OKZj60sR0YHtuRHuIkQtR+xwo+tiE=;
        b=31U00ztZqislD1eeaUpWE1aG0VNFToXdjVs/ZNAziUeissdeJdzbjDJF/D8X9RG+4x
         MavPmLXAId+phTMEhbBsYjIn69vJ10UROAS4R/+FgIW2+tXYAp+30oh/+5+UXD5vOZAk
         707pBJMXRTXKOomsXgmV4T3rsCPxGTak2mIZIH97H+8aFzryyRXTLdxDgPJLVMeS/XTt
         gLy3Psoj6Pj20eY7FSmI3QblVLB1N08Eq1oFZnlXCWFX/NfQjQrA+gwZ2LMSU0OLBm+8
         AuhB5ubMYPjj8O1qvus0cL1sK/u7h3bO6ErlwFxmJG+xi469VpgtXSDyn+oR6L6AjsM7
         3ixA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1699563838; x=1700168638;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=ebcFSdFa++R257OKZj60sR0YHtuRHuIkQtR+xwo+tiE=;
        b=SE82r1o63PsL/Ip4h+teGp0IvZYrdt2qhOdW/rc6WwVkLqGRLRJ4HHCw5f7JKMoMjl
         lipLzd1XyLxrneZA98RG3BriiEy+VRIABPzGpTCAJcnjzO0kOLJSKfNZotavdkouAQkn
         litABa7JOLEbj6iAesZB0ipPGRYQoRgytJ02oTxEgTXqfMUbbEbbyjYj33m4TlMqtlZI
         tHavuD7SG6g6jiV8MNKrI4zgoo2uV+R4t0Tpf9C/E1LO5E8zMDSys2nBhAST0Dm4dLmy
         K3bg/H+vBUrlhVb3lHXB5+tOjDohytyVj0+vB8Rnb2pEvyjFFLAjHtspk8mpTlb8Roh4
         USLg==
X-Gm-Message-State: AOJu0YztcJRPgWeiWxlSyxgSDN0QNHEXCLEuszrUCia+t9repnoTztuu
	xGD2JPiJp4av3kfD/FuZu58OyDG9sYfCZw==
X-Google-Smtp-Source: 
 AGHT+IEpMK+w1yaYJDTMb00d7SXUilcTamTn9YrAyFRRyCMycM7ow5u4aT2y4zOC/qHpA+LckxXipp6JbevdfA==
X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9])
 (user=amoorthy job=sendgmr) by 2002:a25:da44:0:b0:d89:dd12:163d with SMTP id
 n65-20020a25da44000000b00d89dd12163dmr155114ybf.1.1699563838720; Thu, 09 Nov
 2023 13:03:58 -0800 (PST)
Date: Thu,  9 Nov 2023 21:03:25 +0000
In-Reply-To: <20231109210325.3806151-1-amoorthy@google.com>
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20231109210325.3806151-1-amoorthy@google.com>
X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog
Message-ID: <20231109210325.3806151-15-amoorthy@google.com>
Subject: [PATCH v6 14/14] KVM: selftests: Handle memory fault exits in
 demand_paging_test
From: Anish Moorthy <amoorthy@google.com>
To: seanjc@google.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev
Cc: oliver.upton@linux.dev, pbonzini@redhat.com, maz@kernel.org,
	robert.hoo.linux@gmail.com, jthoughton@google.com, amoorthy@google.com,
	dmatlack@google.com, axelrasmussen@google.com, peterx@redhat.com,
	nadav.amit@gmail.com, isaku.yamahata@gmail.com, kconsul@linux.vnet.ibm.com

Demonstrate a (very basic) scheme for supporting memory fault exits.

From the vCPU threads:
1. Simply issue UFFDIO_COPY/CONTINUEs in response to memory fault exits,
   with the purpose of establishing the absent mappings. Do so with
   wake_waiters=false to avoid serializing on the userfaultfd wait queue
   locks.

2. When the UFFDIO_COPY/CONTINUE in (1) fails with EEXIST,
   assume that the mapping was already established but is currently
   absent [A] and attempt to populate it using MADV_POPULATE_WRITE.

Issue UFFDIO_COPY/CONTINUEs from the reader threads as well, but with
wake_waiters=true to ensure that any threads sleeping on the uffd are
eventually woken up.

A real VMM would track whether it had already COPY/CONTINUEd pages (eg,
via a bitmap) to avoid calls destined to EEXIST. However, even the
naive approach is enough to demonstrate the performance advantages of
KVM_EXIT_MEMORY_FAULT.

[A] In reality it is much likelier that the vCPU thread simply lost a
    race to establish the mapping for the page.

Signed-off-by: Anish Moorthy <amoorthy@google.com>
Acked-by: James Houghton <jthoughton@google.com>
---
 .../selftests/kvm/demand_paging_test.c        | 245 +++++++++++++-----
 1 file changed, 173 insertions(+), 72 deletions(-)

diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c
index 61bb2e23bef0..44bdcc7aad87 100644
--- a/tools/testing/selftests/kvm/demand_paging_test.c
+++ b/tools/testing/selftests/kvm/demand_paging_test.c
@@ -15,6 +15,7 @@
 #include <time.h>
 #include <pthread.h>
 #include <linux/userfaultfd.h>
+#include <linux/mman.h>
 #include <sys/syscall.h>
 
 #include "kvm_util.h"
@@ -31,36 +32,102 @@ static uint64_t guest_percpu_mem_size = DEFAULT_PER_VCPU_MEM_SIZE;
 static size_t demand_paging_size;
 static char *guest_data_prototype;
 
+static int num_uffds;
+static size_t uffd_region_size;
+static struct uffd_desc **uffd_descs;
+/*
+ * Delay when demand paging is performed through userfaultfd or directly by
+ * vcpu_worker in the case of an annotated memory fault.
+ */
+static useconds_t uffd_delay;
+static int uffd_mode;
+
+
+static int handle_uffd_page_request(int uffd_mode, int uffd, uint64_t hva,
+				    bool is_vcpu);
+
+static void madv_write_or_err(uint64_t gpa)
+{
+	int r;
+	void *hva = addr_gpa2hva(memstress_args.vm, gpa);
+
+	r = madvise(hva, demand_paging_size, MADV_POPULATE_WRITE);
+	TEST_ASSERT(r == 0,
+		    "MADV_POPULATE_WRITE on hva 0x%lx (gpa 0x%lx) fail, errno %i\n",
+		    (uintptr_t) hva, gpa, errno);
+}
+
+static void ready_page(uint64_t gpa)
+{
+	int r, uffd;
+
+	/*
+	 * This test only registers memslot 1 w/ userfaultfd. Any accesses outside
+	 * the registered ranges should fault in the physical pages through
+	 * MADV_POPULATE_WRITE.
+	 */
+	if ((gpa < memstress_args.gpa)
+		|| (gpa >= memstress_args.gpa + memstress_args.size)) {
+		madv_write_or_err(gpa);
+	} else {
+		if (uffd_delay)
+			usleep(uffd_delay);
+
+		uffd = uffd_descs[(gpa - memstress_args.gpa) / uffd_region_size]->uffd;
+
+		r = handle_uffd_page_request(uffd_mode, uffd,
+					     (uint64_t) addr_gpa2hva(memstress_args.vm, gpa), true);
+
+		if (r == EEXIST)
+			madv_write_or_err(gpa);
+	}
+}
+
 static void vcpu_worker(struct memstress_vcpu_args *vcpu_args)
 {
 	struct kvm_vcpu *vcpu = vcpu_args->vcpu;
 	int vcpu_idx = vcpu_args->vcpu_idx;
 	struct kvm_run *run = vcpu->run;
-	struct timespec start;
-	struct timespec ts_diff;
+	struct timespec last_start;
+	struct timespec total_runtime = {};
 	int ret;
-
-	clock_gettime(CLOCK_MONOTONIC, &start);
-
-	/* Let the guest access its memory */
-	ret = _vcpu_run(vcpu);
-	TEST_ASSERT(ret == 0, "vcpu_run failed: %d\n", ret);
-	if (get_ucall(vcpu, NULL) != UCALL_SYNC) {
-		TEST_ASSERT(false,
-			    "Invalid guest sync status: exit_reason=%s\n",
-			    exit_reason_str(run->exit_reason));
+	u64 num_memory_fault_exits = 0;
+	bool annotated_memory_fault = false;
+
+	while (true) {
+		clock_gettime(CLOCK_MONOTONIC, &last_start);
+		/* Let the guest access its memory */
+		ret = _vcpu_run(vcpu);
+		annotated_memory_fault = errno == EFAULT
+					 && run->exit_reason == KVM_EXIT_MEMORY_FAULT;
+		TEST_ASSERT(ret == 0 || annotated_memory_fault,
+			    "vcpu_run failed: %d\n", ret);
+
+		total_runtime = timespec_add(total_runtime,
+					     timespec_elapsed(last_start));
+		if (ret != 0 && get_ucall(vcpu, NULL) != UCALL_SYNC) {
+
+			if (annotated_memory_fault) {
+				++num_memory_fault_exits;
+				ready_page(run->memory_fault.gpa);
+				continue;
+			}
+
+			TEST_ASSERT(false,
+				    "Invalid guest sync status: exit_reason=%s\n",
+				    exit_reason_str(run->exit_reason));
+		}
+		break;
 	}
-
-	ts_diff = timespec_elapsed(start);
-	PER_VCPU_DEBUG("vCPU %d execution time: %ld.%.9lds\n", vcpu_idx,
-		       ts_diff.tv_sec, ts_diff.tv_nsec);
+	PER_VCPU_DEBUG("vCPU %d execution time: %ld.%.9lds, %d memory fault exits\n",
+		       vcpu_idx, total_runtime.tv_sec, total_runtime.tv_nsec,
+		       num_memory_fault_exits);
 }
 
-static int handle_uffd_page_request(int uffd_mode, int uffd,
-		struct uffd_msg *msg)
+static int handle_uffd_page_request(int uffd_mode, int uffd, uint64_t hva,
+				    bool is_vcpu)
 {
 	pid_t tid = syscall(__NR_gettid);
-	uint64_t addr = msg->arg.pagefault.address;
 	struct timespec start;
 	struct timespec ts_diff;
 	int r;
@@ -71,16 +138,15 @@ static int handle_uffd_page_request(int uffd_mode, int uffd,
 		struct uffdio_copy copy;
 
 		copy.src = (uint64_t)guest_data_prototype;
-		copy.dst = addr;
+		copy.dst = hva;
 		copy.len = demand_paging_size;
-		copy.mode = 0;
+		copy.mode = is_vcpu ? UFFDIO_COPY_MODE_DONTWAKE : 0;
 
-		r = ioctl(uffd, UFFDIO_COPY, &copy);
 		/*
-		 * With multiple vCPU threads fault on a single page and there are
-		 * multiple readers for the UFFD, at least one of the UFFDIO_COPYs
-		 * will fail with EEXIST: handle that case without signaling an
-		 * error.
+		 * With multiple vCPU threads and at least one of multiple reader threads
+		 * or vCPU memory faults, multiple vCPUs accessing an absent page will
+		 * almost certainly cause some thread doing the UFFDIO_COPY here to get
+		 * EEXIST: make sure to allow that case.
 		 *
 		 * Note that this also suppress any EEXISTs occurring from,
 		 * e.g., the first UFFDIO_COPY/CONTINUEs on a page. That never
@@ -88,23 +154,24 @@ static int handle_uffd_page_request(int uffd_mode, int uffd,
 		 * some external state to correctly surface EEXISTs to userspace
 		 * (or prevent duplicate COPY/CONTINUEs in the first place).
 		 */
-		if (r == -1 && errno != EEXIST) {
-			pr_info("Failed UFFDIO_COPY in 0x%lx from thread %d, errno = %d\n",
-				addr, tid, errno);
-			return r;
-		}
+		r = ioctl(uffd, UFFDIO_COPY, &copy);
+		TEST_ASSERT(r == 0 || errno == EEXIST,
+			    "Thread 0x%x failed UFFDIO_COPY on hva 0x%lx, errno = %d",
+			    tid, hva, errno);
 	} else if (uffd_mode == UFFDIO_REGISTER_MODE_MINOR) {
+		/* The comments in the UFFDIO_COPY branch also apply here. */
 		struct uffdio_continue cont = {0};
 
-		cont.range.start = addr;
+		cont.range.start = hva;
 		cont.range.len = demand_paging_size;
+		cont.mode = is_vcpu ? UFFDIO_CONTINUE_MODE_DONTWAKE : 0;
 
 		r = ioctl(uffd, UFFDIO_CONTINUE, &cont);
 		/*
-		 * With multiple vCPU threads fault on a single page and there are
-		 * multiple readers for the UFFD, at least one of the UFFDIO_COPYs
-		 * will fail with EEXIST: handle that case without signaling an
-		 * error.
+		 * With multiple vCPU threads and at least one of multiple reader threads
+		 * or vCPU memory faults, multiple vCPUs accessing an absent page will
+		 * almost certainly cause some thread doing the UFFDIO_COPY here to get
+		 * EEXIST: make sure to allow that case.
 		 *
 		 * Note that this also suppress any EEXISTs occurring from,
 		 * e.g., the first UFFDIO_COPY/CONTINUEs on a page. That never
@@ -112,32 +179,54 @@ static int handle_uffd_page_request(int uffd_mode, int uffd,
 		 * some external state to correctly surface EEXISTs to userspace
 		 * (or prevent duplicate COPY/CONTINUEs in the first place).
 		 */
-		if (r == -1 && errno != EEXIST) {
-			pr_info("Failed UFFDIO_CONTINUE in 0x%lx, thread %d, errno = %d\n",
-				addr, tid, errno);
-			return r;
-		}
+		TEST_ASSERT(r == 0 || errno == EEXIST,
+			    "Thread 0x%x failed UFFDIO_CONTINUE on hva 0x%lx, errno = %d",
+			    tid, hva, errno);
 	} else {
 		TEST_FAIL("Invalid uffd mode %d", uffd_mode);
 	}
 
+	/*
+	 * If the above UFFDIO_COPY/CONTINUE failed with EEXIST, waiting threads
+	 * will not have been woken: wake them here.
+	 */
+	if (!is_vcpu && r != 0) {
+		struct uffdio_range range = {
+			.start = hva,
+			.len = demand_paging_size
+		};
+		r = ioctl(uffd, UFFDIO_WAKE, &range);
+		TEST_ASSERT(r == 0,
+			    "Thread 0x%x failed UFFDIO_WAKE on hva 0x%lx, errno = %d",
+			    tid, hva, errno);
+	}
+
 	ts_diff = timespec_elapsed(start);
 
 	PER_PAGE_DEBUG("UFFD page-in %d \t%ld ns\n", tid,
 		       timespec_to_ns(ts_diff));
 	PER_PAGE_DEBUG("Paged in %ld bytes at 0x%lx from thread %d\n",
-		       demand_paging_size, addr, tid);
+		       demand_paging_size, hva, tid);
 
 	return 0;
 }
 
+static int handle_uffd_page_request_from_uffd(int uffd_mode, int uffd,
+					      struct uffd_msg *msg)
+{
+	TEST_ASSERT(msg->event == UFFD_EVENT_PAGEFAULT,
+		    "Received uffd message with event %d != UFFD_EVENT_PAGEFAULT",
+		    msg->event);
+	return handle_uffd_page_request(uffd_mode, uffd,
+					msg->arg.pagefault.address, false);
+}
+
 struct test_params {
-	int uffd_mode;
 	bool single_uffd;
-	useconds_t uffd_delay;
 	int readers_per_uffd;
 	enum vm_mem_backing_src_type src_type;
 	bool partition_vcpu_memory_access;
+	bool memfault_exits;
 };
 
 static void prefault_mem(void *alias, uint64_t len)
@@ -155,16 +244,22 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 {
 	struct memstress_vcpu_args *vcpu_args;
 	struct test_params *p = arg;
-	struct uffd_desc **uffd_descs = NULL;
 	struct timespec start;
 	struct timespec ts_diff;
 	struct kvm_vm *vm;
-	int i, num_uffds = 0;
+	int i;
 	double vcpu_paging_rate;
-	uint64_t uffd_region_size;
+	uint32_t slot_flags = 0;
+	bool uffd_memfault_exits = uffd_mode && p->memfault_exits;
 
-	vm = memstress_create_vm(mode, nr_vcpus, guest_percpu_mem_size, 1, 0,
-				 p->src_type, p->partition_vcpu_memory_access);
+	if (uffd_memfault_exits) {
+		TEST_ASSERT(kvm_has_cap(KVM_CAP_EXIT_ON_MISSING) > 0,
+					"KVM does not have KVM_CAP_EXIT_ON_MISSING");
+		slot_flags = KVM_MEM_EXIT_ON_MISSING;
+	}
+
+	vm = memstress_create_vm(mode, nr_vcpus, guest_percpu_mem_size,
+				 1, slot_flags, p->src_type, p->partition_vcpu_memory_access);
 
 	demand_paging_size = get_backing_src_pagesz(p->src_type);
 
@@ -173,21 +268,21 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 		    "Failed to allocate buffer for guest data pattern");
 	memset(guest_data_prototype, 0xAB, demand_paging_size);
 
-	if (p->uffd_mode == UFFDIO_REGISTER_MODE_MINOR) {
-		num_uffds = p->single_uffd ? 1 : nr_vcpus;
-		for (i = 0; i < num_uffds; i++) {
-			vcpu_args = &memstress_args.vcpu_args[i];
-			prefault_mem(addr_gpa2alias(vm, vcpu_args->gpa),
-				     vcpu_args->pages * memstress_args.guest_page_size);
-		}
-	}
-
-	if (p->uffd_mode) {
+	if (uffd_mode) {
 		num_uffds = p->single_uffd ? 1 : nr_vcpus;
 		uffd_region_size = nr_vcpus * guest_percpu_mem_size / num_uffds;
 
+		if (uffd_mode == UFFDIO_REGISTER_MODE_MINOR) {
+			for (i = 0; i < num_uffds; i++) {
+				vcpu_args = &memstress_args.vcpu_args[i];
+				prefault_mem(addr_gpa2alias(vm, vcpu_args->gpa),
+					     uffd_region_size);
+			}
+		}
+
 		uffd_descs = malloc(num_uffds * sizeof(struct uffd_desc *));
-		TEST_ASSERT(uffd_descs, "Memory allocation failed");
+		TEST_ASSERT(uffd_descs, "Failed to allocate uffd descriptors");
+
 		for (i = 0; i < num_uffds; i++) {
 			struct memstress_vcpu_args *vcpu_args;
 			void *vcpu_hva;
@@ -201,10 +296,10 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 			 * requests.
 			 */
 			uffd_descs[i] = uffd_setup_demand_paging(
-				p->uffd_mode, p->uffd_delay, vcpu_hva,
+				uffd_mode, uffd_delay, vcpu_hva,
 				uffd_region_size,
 				p->readers_per_uffd,
-				&handle_uffd_page_request);
+				&handle_uffd_page_request_from_uffd);
 		}
 	}
 
@@ -218,7 +313,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 	ts_diff = timespec_elapsed(start);
 	pr_info("All vCPU threads joined\n");
 
-	if (p->uffd_mode) {
+	if (uffd_mode) {
 		/* Tell the user fault fd handler threads to quit */
 		for (i = 0; i < num_uffds; i++)
 			uffd_stop_demand_paging(uffd_descs[i]);
@@ -239,7 +334,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 	memstress_destroy_vm(vm);
 
 	free(guest_data_prototype);
-	if (p->uffd_mode)
+	if (uffd_mode)
 		free(uffd_descs);
 }
 
@@ -248,7 +343,8 @@ static void help(char *name)
 	puts("");
 	printf("usage: %s [-h] [-m vm_mode] [-u uffd_mode] [-a]\n"
 		   "          [-d uffd_delay_usec] [-r readers_per_uffd] [-b memory]\n"
-		   "          [-s type] [-v vcpus] [-c cpu_list] [-o]\n", name);
+		   "          [-s type] [-v vcpus] [-c cpu_list] [-o] [-w] \n",
+	       name);
 	guest_modes_help();
 	printf(" -u: use userfaultfd to handle vCPU page faults. Mode is a\n"
 	       "     UFFD registration mode: 'MISSING' or 'MINOR'.\n");
@@ -260,6 +356,7 @@ static void help(char *name)
 	       "     FD handler to simulate demand paging\n"
 	       "     overheads. Ignored without -u.\n");
 	printf(" -r: Set the number of reader threads per uffd.\n");
+	printf(" -w: Enable kvm cap for memory fault exits.\n");
 	printf(" -b: specify the size of the memory region which should be\n"
 	       "     demand paged by each vCPU. e.g. 10M or 3G.\n"
 	       "     Default: 1G\n");
@@ -280,29 +377,30 @@ int main(int argc, char *argv[])
 		.partition_vcpu_memory_access = true,
 		.readers_per_uffd = 1,
 		.single_uffd = false,
+		.memfault_exits = false,
 	};
 	int opt;
 
 	guest_modes_append_default();
 
-	while ((opt = getopt(argc, argv, "ahom:u:d:b:s:v:c:r:")) != -1) {
+	while ((opt = getopt(argc, argv, "ahowm:u:d:b:s:v:c:r:")) != -1) {
 		switch (opt) {
 		case 'm':
 			guest_modes_cmdline(optarg);
 			break;
 		case 'u':
 			if (!strcmp("MISSING", optarg))
-				p.uffd_mode = UFFDIO_REGISTER_MODE_MISSING;
+				uffd_mode = UFFDIO_REGISTER_MODE_MISSING;
 			else if (!strcmp("MINOR", optarg))
-				p.uffd_mode = UFFDIO_REGISTER_MODE_MINOR;
-			TEST_ASSERT(p.uffd_mode, "UFFD mode must be 'MISSING' or 'MINOR'.");
+				uffd_mode = UFFDIO_REGISTER_MODE_MINOR;
+			TEST_ASSERT(uffd_mode, "UFFD mode must be 'MISSING' or 'MINOR'.");
 			break;
 		case 'a':
 			p.single_uffd = true;
 			break;
 		case 'd':
-			p.uffd_delay = strtoul(optarg, NULL, 0);
-			TEST_ASSERT(p.uffd_delay >= 0, "A negative UFFD delay is not supported.");
+			uffd_delay = strtoul(optarg, NULL, 0);
+			TEST_ASSERT(uffd_delay >= 0, "A negative UFFD delay is not supported.");
 			break;
 		case 'b':
 			guest_percpu_mem_size = parse_size(optarg);
@@ -328,6 +426,9 @@ int main(int argc, char *argv[])
 				    "Invalid number of readers per uffd %d: must be >=1",
 				    p.readers_per_uffd);
 			break;
+		case 'w':
+			p.memfault_exits = true;
+			break;
 		case 'h':
 		default:
 			help(argv[0]);
@@ -335,7 +436,7 @@ int main(int argc, char *argv[])
 		}
 	}
 
-	if (p.uffd_mode == UFFDIO_REGISTER_MODE_MINOR &&
+	if (uffd_mode == UFFDIO_REGISTER_MODE_MINOR &&
 	    !backing_src_is_shared(p.src_type)) {
 		TEST_FAIL("userfaultfd MINOR mode requires shared memory; pick a different -s");
 	}