From patchwork Thu Nov 9 21:03:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anish Moorthy X-Patchwork-Id: 13451695 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9A56C374C2 for ; Thu, 9 Nov 2023 21:03:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="rBQlbKiS" Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E49BA46BD for ; Thu, 9 Nov 2023 13:03:44 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-da04fb79246so1599450276.2 for ; Thu, 09 Nov 2023 13:03:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1699563824; x=1700168624; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=e/ZPH+fW+W+TWa6v1thwgh3xl/A5VwjzOhARfEMfmNg=; b=rBQlbKiSKG79dz5i6kvFmcJcgaFpvoMO91ea3HFj2CQFJ/OQLW7zcTrp5/79gF4DAp GNP0hBzduzaT0zomTg0ipbmEHqlu1tP1Db+XGvDG4kh4SMdrZHNvIWT3ssqMi9blX31t TPHMpWAuAG+Q4a8w3lm+/q/UxM7md4hX3mZ+G2zjNr/EvQPYVlMLbzD/e/d2gp6jJTJa ZadrJ1d4YY/MMvVdSS8nGWJakJ+a9fT9XBoXEsdO1mPkLXLOcQO6hvqJEVU4XGNi2eMy 6bgpk6/SA/+j6oqZ3Rj7TK57NuVIeffhlRdKreBN3a0G/5wPuTOubZPM37QAN3vHFHRw h/gg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699563824; x=1700168624; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=e/ZPH+fW+W+TWa6v1thwgh3xl/A5VwjzOhARfEMfmNg=; b=uGka6anuWq8zf6Uwnbvvj0dshUltFOyDlfi5phJeICmpSEX0Odm6rHUzN4MoUOIK64 azRPTWKNyOjgf+qzpH77ohlJ1316WWmS/o+/9ulHHYswZDBsUE/+GTN0xi94rUYHxFxF VVXcahrZ4V7DgkvbbFJl9Jm4ZVJ1u2iY+mzqdycPB4B5sDlLd9k6Hw4ZGAWrE2dVR7Ie 1L8HcDn50D20VMb2uuYbXqPbSNU1bsFhbRjIeijmwWXVe11Hll4AWVM6maXOVRPD9vBH 2tF8ntIrOchai4G9xm6g+WaQqWADhNfFzKmFQkUwk/qFqoUVf8PoXmTkn2xolSL/Smo3 2Cqw== X-Gm-Message-State: AOJu0Yw+B/lW8EcH9BMCtqADaxBQuczCen/NLvvJl/DcFr4KQlePnXUz GPuv5DRAM8iXi6AqYsEx4y0CIq0Cn/DPpg== X-Google-Smtp-Source: AGHT+IGaEfaKBu+8pNYilCRAtbdsyMyIV0FpZwyPySRyn2xcyuTHN8dL3ZSfFTUzVJm9BYhJ0Q8k+CrUz239dw== X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9]) (user=amoorthy job=sendgmr) by 2002:a25:828e:0:b0:d9c:bdff:e45a with SMTP id r14-20020a25828e000000b00d9cbdffe45amr176664ybk.12.1699563824210; Thu, 09 Nov 2023 13:03:44 -0800 (PST) Date: Thu, 9 Nov 2023 21:03:12 +0000 In-Reply-To: <20231109210325.3806151-1-amoorthy@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20231109210325.3806151-1-amoorthy@google.com> X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog Message-ID: <20231109210325.3806151-2-amoorthy@google.com> Subject: [PATCH v6 01/14] KVM: Documentation: Clarify meaning of hva_to_pfn()'s 'atomic' parameter From: Anish Moorthy To: seanjc@google.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev Cc: oliver.upton@linux.dev, pbonzini@redhat.com, maz@kernel.org, robert.hoo.linux@gmail.com, jthoughton@google.com, amoorthy@google.com, dmatlack@google.com, axelrasmussen@google.com, peterx@redhat.com, nadav.amit@gmail.com, isaku.yamahata@gmail.com, kconsul@linux.vnet.ibm.com The current docstring can be read as "atomic -> allowed to sleep," when in fact the intended statement is "atomic -> NOT allowed to sleep." Make that clearer in the docstring. Signed-off-by: Anish Moorthy --- virt/kvm/kvm_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 9170a61ea99f..687374138cfd 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2983,7 +2983,7 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, /* * Pin guest page in memory and return its pfn. * @addr: host virtual address which maps memory to the guest - * @atomic: whether this function can sleep + * @atomic: whether this function is forbidden from sleeping * @interruptible: whether the process can be interrupted by non-fatal signals * @async: whether this function need to wait IO complete if the * host page is not in the memory From patchwork Thu Nov 9 21:03:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anish Moorthy X-Patchwork-Id: 13451697 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ED34E374EC for ; Thu, 9 Nov 2023 21:03:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="NyigdJF+" Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 58D2A478D for ; Thu, 9 Nov 2023 13:03:46 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-5afa77f9a33so18862717b3.0 for ; Thu, 09 Nov 2023 13:03:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1699563825; x=1700168625; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=qijOPy0ERXUhax/ZBRVntyygIlckAr4kgGgQkmlO6nw=; b=NyigdJF+kb5HNAGNgNtj/fDKG+vw3PRPV6Qi7oveX2OcQzuZ/WZl9YLQ5SxfkoqW+/ wMk3yk1QoiGq93HfTcosbxHuSI5Rcq12MoMCQxAc/HgS+k/vEcfygF/HcfIvoMTyzaib q3PrCx8424plOIfXMHzA0B4sva61Z97IHz6cBAaFRMMK+RKtppIUX5nkMFhS3xm0agMK 5TCXOZmhu56DYM88mro+8JDbbIhaItZXMFn6hQDGKKyOrqyeqPfGufn7eGQ62pp6s6fv 864PLSHWkLdnxnP1+ix2lIAxn1x5E7K0YxhLCUighRfPWfoT2fvsw3LDaNdQrrMQ/fs8 Q48w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699563825; x=1700168625; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=qijOPy0ERXUhax/ZBRVntyygIlckAr4kgGgQkmlO6nw=; b=CaGWj73sJYo9gWdEVn/QWxlG1hJrQ+hnbd0HvO0xAA9qn8pu/vBlNvVBq5QBs+eq6s cx4KrRfs5OEoB5/1hGa51xf4cGDqL9eQ4hUy+yCp4LbImTIpCQYXxAbBaCkHPOAVpQwF vmhwAgNcfqFJR6Ioo9wliE6mxdE2UPanY8EUXjzyqROLUIXTzuruDiUxjkPcqmGip9uC XJpApFRXJEKcVET9kkvUlUYPq3PNi7Ir2bWm9bSKeBeZLvRcP6/iTkBHkFs+22W0Hh7r 18F6aFRJUxWQpUA+IHGdA35aPdR7cX4lKWOEkt0shCtLphSXSxfWZiSzc06QstOuYSYe pImw== X-Gm-Message-State: AOJu0Yz8YOS0VJwfz3r0FfWcM7TV29LYOaEpJBc/eiqcOD/yZgTVX+pB bNGt47b+X9HoAQSHzUiwrx+/SyzNh31zvg== X-Google-Smtp-Source: AGHT+IHnmQkK4g0KoxqixIoEiE1OXZsGIcaGUD3aF5w4eY/4zuUTkKMYBZztpx1oc2DM9MDYdS3CIiEG7TrJVQ== X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9]) (user=amoorthy job=sendgmr) by 2002:a0d:d78d:0:b0:59b:f138:c835 with SMTP id z135-20020a0dd78d000000b0059bf138c835mr167156ywd.5.1699563825368; Thu, 09 Nov 2023 13:03:45 -0800 (PST) Date: Thu, 9 Nov 2023 21:03:13 +0000 In-Reply-To: <20231109210325.3806151-1-amoorthy@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20231109210325.3806151-1-amoorthy@google.com> X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog Message-ID: <20231109210325.3806151-3-amoorthy@google.com> Subject: [PATCH v6 02/14] KVM: Documentation: Add docstrings for __kvm_read/write_guest_page() From: Anish Moorthy To: seanjc@google.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev Cc: oliver.upton@linux.dev, pbonzini@redhat.com, maz@kernel.org, robert.hoo.linux@gmail.com, jthoughton@google.com, amoorthy@google.com, dmatlack@google.com, axelrasmussen@google.com, peterx@redhat.com, nadav.amit@gmail.com, isaku.yamahata@gmail.com, kconsul@linux.vnet.ibm.com The (gfn, data, offset, len) order of parameters is a little strange since "offset" applies to "gfn" rather than to "data". Add docstrings to make things perfectly clear. Signed-off-by: Anish Moorthy --- virt/kvm/kvm_main.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 687374138cfd..f521b6fd808f 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3328,6 +3328,7 @@ static int next_segment(unsigned long len, int offset) return len; } +/* Copy @len bytes from guest memory at '(@gfn * PAGE_SIZE) + @offset' to @data */ static int __kvm_read_guest_page(struct kvm_memory_slot *slot, gfn_t gfn, void *data, int offset, int len) { @@ -3429,6 +3430,7 @@ int kvm_vcpu_read_guest_atomic(struct kvm_vcpu *vcpu, gpa_t gpa, } EXPORT_SYMBOL_GPL(kvm_vcpu_read_guest_atomic); +/* Copy @len bytes from @data into guest memory at '(@gfn * PAGE_SIZE) + @offset' */ static int __kvm_write_guest_page(struct kvm *kvm, struct kvm_memory_slot *memslot, gfn_t gfn, const void *data, int offset, int len) From patchwork Thu Nov 9 21:03:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anish Moorthy X-Patchwork-Id: 13451698 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0FF03374F8 for ; Thu, 9 Nov 2023 21:03:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="glMoh0xi" Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7B4D544B6 for ; Thu, 9 Nov 2023 13:03:47 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-da033914f7cso1623608276.0 for ; Thu, 09 Nov 2023 13:03:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1699563826; x=1700168626; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=2mOhK5FoYcXL75GR8axKmDOtKXDH3smigWBB7T8RqTw=; b=glMoh0xiBz3HGxjxGGyIaFmB+d1hF2pn2FcEtM4xyB9IiWQlCcnuTUUpQT7RFTBwLu bn+8jLI9A0H+jOr8TPQZWCyvzP9vZ3IUioAAKsPhkgO1xgf8QEf4wsb9pd4me7Td12sK 8ARyoHUKdL/y9cmNIyfUV45MK5cHdZwJ/+q1Ti8pgXiBBfnKRF1Xl5Wwnn+HJARgEVIZ 4yi17/28voOqfH5A8XOKWej3+4+3dqm/drr1VpP1B1cCkvPm1F4GtkDmslbJBVq3n2gV t6HpW/J6yYpaaN8wF6SqEPGY6/33wgH/ORIa0dBO4U2fIlfMVbFHY8HdIWyP1teqOfbE 216w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699563826; x=1700168626; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=2mOhK5FoYcXL75GR8axKmDOtKXDH3smigWBB7T8RqTw=; b=f6/BvskYOay+Jv0pgIXZccCK6X8E7UeXOrIEz45RkF/+BEEo/bdiXxPVsTu4vKIFS2 XrMcCLbnSqgD06Pk+fTj+ZbgwJSSDclYr/FvaJ0cxT4jh6eTfHor4a2802JACX79tMG5 eXbrkI4TM4p56a3Wn6Pq2H4XOHDHC+PxcuvAofiPNPdE+5KEX5HGgHqdZDjXlYSYPCTX ZDtbWxps7NNBgjX9o2pnSOOvQl+5ESeCGlgWjDM3/dp0cXLI8Cjc3DGXmfHoq8X5nNyp U/UEoifvDdhTKGNWQHDNifMXq8Gt/KcakUR0NCv2j/83jVWfR9McG0767Gzu24GE9ISY nq0w== X-Gm-Message-State: AOJu0Yy9PPfHUfytQPkG7C2nZs0WHlWfBaiLQoksv/fJUlBDxijecGru Q+3AOwlSDDIl8EbkhKvWKjHd/1QR6SdVOw== X-Google-Smtp-Source: AGHT+IEevm9gxrKUT+zKzL4wMuDoD2o6L3pgb1/3TA3DxCiu1yUYlIaSqbwc2y/dH2L/0EUx5ZlxnIF/gqvflg== X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9]) (user=amoorthy job=sendgmr) by 2002:a25:e64a:0:b0:da0:29ba:668c with SMTP id d71-20020a25e64a000000b00da029ba668cmr143892ybh.10.1699563826668; Thu, 09 Nov 2023 13:03:46 -0800 (PST) Date: Thu, 9 Nov 2023 21:03:14 +0000 In-Reply-To: <20231109210325.3806151-1-amoorthy@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20231109210325.3806151-1-amoorthy@google.com> X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog Message-ID: <20231109210325.3806151-4-amoorthy@google.com> Subject: [PATCH v6 03/14] KVM: Simplify error handling in __gfn_to_pfn_memslot() From: Anish Moorthy To: seanjc@google.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev Cc: oliver.upton@linux.dev, pbonzini@redhat.com, maz@kernel.org, robert.hoo.linux@gmail.com, jthoughton@google.com, amoorthy@google.com, dmatlack@google.com, axelrasmussen@google.com, peterx@redhat.com, nadav.amit@gmail.com, isaku.yamahata@gmail.com, kconsul@linux.vnet.ibm.com KVM_HVA_ERR_RO_BAD satisfies kvm_is_error_hva(), so there's no need to duplicate the "if (writable)" block. Fix this by bringing all kvm_is_error_hva() cases under one conditional. Signed-off-by: Anish Moorthy --- virt/kvm/kvm_main.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f521b6fd808f..88946d5d102b 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3055,15 +3055,13 @@ kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn, if (hva) *hva = addr; - if (addr == KVM_HVA_ERR_RO_BAD) { - if (writable) - *writable = false; - return KVM_PFN_ERR_RO_FAULT; - } - if (kvm_is_error_hva(addr)) { if (writable) *writable = false; + + if (addr == KVM_HVA_ERR_RO_BAD) + return KVM_PFN_ERR_RO_FAULT; + return KVM_PFN_NOSLOT; } From patchwork Thu Nov 9 21:03:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anish Moorthy X-Patchwork-Id: 13451699 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 42ED738DE7 for ; Thu, 9 Nov 2023 21:03:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="df/9vJiy" Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 94F99468D for ; Thu, 9 Nov 2023 13:03:48 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-5af16e00fadso18869057b3.0 for ; Thu, 09 Nov 2023 13:03:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1699563828; x=1700168628; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=6nUPeAHfZEyhmLdl3NqUNDrUfGjWmU/+Qb2Tjd/grrc=; b=df/9vJiy5kAJx1daqXAOmwoap63JARAlMoCmGJSV6SAUheBUve3b3fgGeSYwmf/Gwr 66LRqMOFXmnHH5JohSpT72xP4J0qpRD4kDP/bkUVuIPLnaPGPPOLJBVpEo5TyS5JQTS9 fTXkLDWO/d+q2tzi2s8McGUGu2uB1SVSI8f+KlqZ9ZMCvhmnpQ8GXm7GZiQB+Eb2xxr4 C71udoZi52qsXYJ5zmfsajcwF4DN7Toq0XJ6qsxfuZXdt6vxxyYeh8kQytJIuQuJanRR mDFeM0h934Pub+XNRR6ARagThoPUJZ0bRpLPFpJ27ZddIBDS6h5FFHdw2ZP740MmEbeL RndA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699563828; x=1700168628; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=6nUPeAHfZEyhmLdl3NqUNDrUfGjWmU/+Qb2Tjd/grrc=; b=aJZPZjnzlJs6NDnU2/JSsl9nFtFacR7/zne7uewYeI6KaNSzkYUEYby1iwO6UvgFU6 pwHCRwDhRgIdYkuD6g5fD36ISWIjZmHu4Ub3pUC29tdOOWSY+bQ9sCkDMxWRLbktKBU6 qCe0dy1gnoA2eZvsguNQLbA93jnduPMMQ0Fe/Z1IHKZFgv90V+n9RqT5kTqfSBAHAGBJ B7CvlS5U6J3K7jSmAzkXkL1CIiOX4qJ88KnBM6Vkw4giPvJ6bBehYuf4K+ZRt7aEl6s2 Znbf291KyWLZgjSePkesJzlqphPyuDK3qD8hJ7fmy1FclEexEp3YoVBTJG5m9FNdzqhK dLaA== X-Gm-Message-State: AOJu0YxTLkGSEEbuswDaJXFBpgzUVi20kPGZGF6qey4u10fQYT26lmN+ Xhbn8Hb/cw8ttDUO5S21W8aVPOpk3sTpsw== X-Google-Smtp-Source: AGHT+IGYRKD7rEcFCG/FkA77dggqygTDLeWg1w3kaLGy0YrkzhLxkHgLcnRLFPcg6Q/Xm7V2ZW+NzaJ/Oc7KBg== X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9]) (user=amoorthy job=sendgmr) by 2002:a25:e643:0:b0:d9a:c218:8177 with SMTP id d64-20020a25e643000000b00d9ac2188177mr146076ybh.8.1699563827866; Thu, 09 Nov 2023 13:03:47 -0800 (PST) Date: Thu, 9 Nov 2023 21:03:15 +0000 In-Reply-To: <20231109210325.3806151-1-amoorthy@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20231109210325.3806151-1-amoorthy@google.com> X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog Message-ID: <20231109210325.3806151-5-amoorthy@google.com> Subject: [PATCH v6 04/14] KVM: Define and communicate KVM_EXIT_MEMORY_FAULT RWX flags to userspace From: Anish Moorthy To: seanjc@google.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev Cc: oliver.upton@linux.dev, pbonzini@redhat.com, maz@kernel.org, robert.hoo.linux@gmail.com, jthoughton@google.com, amoorthy@google.com, dmatlack@google.com, axelrasmussen@google.com, peterx@redhat.com, nadav.amit@gmail.com, isaku.yamahata@gmail.com, kconsul@linux.vnet.ibm.com Suggested-by: Sean Christopherson Signed-off-by: Anish Moorthy --- Documentation/virt/kvm/api.rst | 5 +++++ include/linux/kvm_host.h | 9 ++++++++- include/uapi/linux/kvm.h | 3 +++ 3 files changed, 16 insertions(+), 1 deletion(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index c13ede498369..a07964f601de 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6979,6 +6979,9 @@ spec refer, https://github.com/riscv/riscv-sbi-doc. /* KVM_EXIT_MEMORY_FAULT */ struct { + #define KVM_MEMORY_EXIT_FLAG_READ (1ULL << 0) + #define KVM_MEMORY_EXIT_FLAG_WRITE (1ULL << 1) + #define KVM_MEMORY_EXIT_FLAG_EXEC (1ULL << 2) #define KVM_MEMORY_EXIT_FLAG_PRIVATE (1ULL << 3) __u64 flags; __u64 gpa; @@ -6990,6 +6993,8 @@ could not be resolved by KVM. The 'gpa' and 'size' (in bytes) describe the guest physical address range [gpa, gpa + size) of the fault. The 'flags' field describes properties of the faulting access that are likely pertinent: + - KVM_MEMORY_EXIT_FLAG_READ/WRITE/EXEC - When set, indicates that the memory + fault occurred on a read/write/exec access respectively. - KVM_MEMORY_EXIT_FLAG_PRIVATE - When set, indicates the memory fault occurred on a private memory access. When clear, indicates the fault occurred on a shared access. diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 4d5d139b0bde..5201400358da 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2372,8 +2372,15 @@ static inline void kvm_prepare_memory_fault_exit(struct kvm_vcpu *vcpu, vcpu->run->memory_fault.gpa = gpa; vcpu->run->memory_fault.size = size; - /* RWX flags are not (yet) defined or communicated to userspace. */ vcpu->run->memory_fault.flags = 0; + + if (is_write) + vcpu->run->memory_fault.flags |= KVM_MEMORY_EXIT_FLAG_WRITE; + else if (is_exec) + vcpu->run->memory_fault.flags |= KVM_MEMORY_EXIT_FLAG_EXEC; + else + vcpu->run->memory_fault.flags |= KVM_MEMORY_EXIT_FLAG_READ; + if (is_private) vcpu->run->memory_fault.flags |= KVM_MEMORY_EXIT_FLAG_PRIVATE; } diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index b4ba4b53b834..bda5622a9c68 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -535,6 +535,9 @@ struct kvm_run { } notify; /* KVM_EXIT_MEMORY_FAULT */ struct { +#define KVM_MEMORY_EXIT_FLAG_READ (1ULL << 0) +#define KVM_MEMORY_EXIT_FLAG_WRITE (1ULL << 1) +#define KVM_MEMORY_EXIT_FLAG_EXEC (1ULL << 2) #define KVM_MEMORY_EXIT_FLAG_PRIVATE (1ULL << 3) __u64 flags; __u64 gpa; From patchwork Thu Nov 9 21:03:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anish Moorthy X-Patchwork-Id: 13451700 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 54BE038DF4 for ; Thu, 9 Nov 2023 21:03:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="vlkwxTWI" Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BE13347B6 for ; Thu, 9 Nov 2023 13:03:49 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-d9a5a3f2d4fso1598576276.3 for ; Thu, 09 Nov 2023 13:03:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1699563829; x=1700168629; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Jh+5PdZyMdvOdt4VU0TI3yfX/lCMTr3N0qAkuDtTShY=; b=vlkwxTWI4LsVc27kx0uB6ZMDkTbhewzlwj1wv/7+uZKvU0ebkS8Onq2v/c3GQoK9Ix IbxAXe7wBOnCQChJBhwEZutTLAyFnSySN/KR/Byb5J69u6LneTvQ2NpyMb59JSZdYLXu 4xCNDF9s4qn9OeLCtFgOVA1rzOIsmfNEYuM/OB28E2xDZxROlHWPPyuvuVdvDgT633iD 4G2Trl1FIas2H3MF0lEcUHWw1KmvzAOvGiyHvV7Rgk5OxNdWUEmxpOdj9kD68zxX22Sd HNTcqLRajakyLjzbN7i71C7uNtHGIZShPKeFFhpL6414aMq8bs6WOGQVz0ArozRKCEqp sziQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699563829; x=1700168629; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Jh+5PdZyMdvOdt4VU0TI3yfX/lCMTr3N0qAkuDtTShY=; b=xPxHK+VG/9zavJf6AmInTCfXJvZy349Of7/fErIijaeDi6/NZ4Mp2lQmkPPZFMM7z1 5PE213GlHTPwuM8kXKTOxfALyb+hRlVk/45FCzRUZFHkHzLkh8dle/I5Fb5KhDut21yW D7odpp282NW3SJ4XDror8mFrhFgGDlpe+KN697pGhNlSe8Rk7RtvT6GlSUe6Eo+TPW8+ uWGa9cvssJp53QvFLDA9LSkeSs6MefgyxqnZ6tO/YUdqep+6J3BfZIW/6SqK/UbyY7j8 DmLxA8fcWVr4IfZSBe8zf1Mrlf6GqkCYuvgvKj2iIXET9XVoKIEzjD+6uMtOdkVFZdV9 welw== X-Gm-Message-State: AOJu0Yz5tUSY/w9jR9Gn1C8Mr7er/gEjG7sSNv5xu+ITRYQfsSZSbIgL u9GTOHVk6/njnQAVx0RZ5Vt1u+//jvWqGQ== X-Google-Smtp-Source: AGHT+IEuA2Q5awcM86dll/dLoA1R4aMXnRfBkF4bbpGUOJqZbuvX3Eb5LoClktTK56YlYoIiH1cKlOvlGRBPTg== X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9]) (user=amoorthy job=sendgmr) by 2002:a25:244f:0:b0:d9a:3a14:a5a2 with SMTP id k76-20020a25244f000000b00d9a3a14a5a2mr153578ybk.13.1699563828968; Thu, 09 Nov 2023 13:03:48 -0800 (PST) Date: Thu, 9 Nov 2023 21:03:16 +0000 In-Reply-To: <20231109210325.3806151-1-amoorthy@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20231109210325.3806151-1-amoorthy@google.com> X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog Message-ID: <20231109210325.3806151-6-amoorthy@google.com> Subject: [PATCH v6 05/14] KVM: Try using fast GUP to resolve read faults From: Anish Moorthy To: seanjc@google.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev Cc: oliver.upton@linux.dev, pbonzini@redhat.com, maz@kernel.org, robert.hoo.linux@gmail.com, jthoughton@google.com, amoorthy@google.com, dmatlack@google.com, axelrasmussen@google.com, peterx@redhat.com, nadav.amit@gmail.com, isaku.yamahata@gmail.com, kconsul@linux.vnet.ibm.com hva_to_pfn_fast() currently just fails for faults where establishing writable mappings is forbidden, which is unnecessary. Instead, try getting the page without passing FOLL_WRITE. This allows the aforementioned faults to (potentially) be resolved without falling back to slow GUP. Suggested-by: James Houghton Signed-off-by: Anish Moorthy --- virt/kvm/kvm_main.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 88946d5d102b..725191333c4e 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2811,7 +2811,7 @@ static inline int check_user_page_hwpoison(unsigned long addr) } /* - * The fast path to get the writable pfn which will be stored in @pfn, + * The fast path to get the pfn which will be stored in @pfn, * true indicates success, otherwise false is returned. It's also the * only part that runs if we can in atomic context. */ @@ -2825,10 +2825,9 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault, * or the caller allows to map a writable pfn for a read fault * request. */ - if (!(write_fault || writable)) - return false; + unsigned int gup_flags = (write_fault || writable) ? FOLL_WRITE : 0; - if (get_user_page_fast_only(addr, FOLL_WRITE, page)) { + if (get_user_page_fast_only(addr, gup_flags, page)) { *pfn = page_to_pfn(page[0]); if (writable) From patchwork Thu Nov 9 21:03:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anish Moorthy X-Patchwork-Id: 13451704 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 95E3738F8B for ; Thu, 9 Nov 2023 21:03:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="PSGytOLE" Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AFFD447BD for ; Thu, 9 Nov 2023 13:03:50 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-5a7aa161b2fso18303697b3.2 for ; Thu, 09 Nov 2023 13:03:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1699563830; x=1700168630; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=tlWksH+O+8+jt38vfCEwQsu3wHCerzCA31np5Nu9t24=; b=PSGytOLEQIdUKGXv6hJ28KEXRCWxR/5K/muuPxNaYKqNP4/W5XqaBWuha82d9cP6BB 5/Z3OXEwxB8Rk+eODFm/m8U7CFiXIGyZlrOKfZ9Gxetp/carJc3b1x0QIIbOvw0gJ18N qLR0zSyjTar29aJepSaf01z3sU0I+4Xt4Cz3g1Pdz6FN8LqgouE4a4S0XjYKJcQPTP7/ ephZenANHH6pcV7U+W4rJ4+m17FH/VhpKdF9mCkdX5B91KdToCKihTWQS0V/P7Uh8hKY 3FAPA/rdj1MosVB1XHOKWIb1FlSGDKngTpobexIUG3rTZHINXYlz5Nl04zfb1XLuDSdY MUnA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699563830; x=1700168630; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=tlWksH+O+8+jt38vfCEwQsu3wHCerzCA31np5Nu9t24=; b=RoFHHTygQSUZHRmhwD5o6EV8OcoMiy4Omsp3cFL6cGex+20Y6DhAbUarznFtRUTGaq dJxR5DwK/B84I1EtKjmihdhAcMffWg0g6M7i4MQOzfoUlRoKflc4pyuGZSr7Mo3bIxcP E4zqIy6aiDlK1kDcArx5B29T5UJpuD1uBjowkYOmc8Tr7MBM6gGTO0ez+cZs18jtQkQ2 v5IXWHQTcpQ1iuBdq7UosrpqwYRlejbgs9XXxdHuzDuhKJIlcYiGXsi5HY/PKToCniXA hzjVKaVcceb9jqat1yj1UWaRY4SxLYObQ2+8repBoPGcXIai7ZwzqfQMRCtHv0XsgtcL urZA== X-Gm-Message-State: AOJu0YzWJVuhjXy4jRKvgGIsO6CmVB+Bvs5uzs6ozXrbYvhD96QeK0jz +PPR9Wrx5dmuaZTWIHGB7wTbkrT9l5U1GA== X-Google-Smtp-Source: AGHT+IFrJ5ArL9n6cHRfgoq9hKPNizot9gzZMyLlmN/VSIZQOIylWEKuTRLp3pCBLCLxFUTt+QtCu0l3X16Kkg== X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9]) (user=amoorthy job=sendgmr) by 2002:a25:d7c8:0:b0:d9a:58e1:bb52 with SMTP id o191-20020a25d7c8000000b00d9a58e1bb52mr163014ybg.6.1699563829970; Thu, 09 Nov 2023 13:03:49 -0800 (PST) Date: Thu, 9 Nov 2023 21:03:17 +0000 In-Reply-To: <20231109210325.3806151-1-amoorthy@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20231109210325.3806151-1-amoorthy@google.com> X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog Message-ID: <20231109210325.3806151-7-amoorthy@google.com> Subject: [PATCH v6 06/14] KVM: Add memslot flag to let userspace force an exit on missing hva mappings From: Anish Moorthy To: seanjc@google.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev Cc: oliver.upton@linux.dev, pbonzini@redhat.com, maz@kernel.org, robert.hoo.linux@gmail.com, jthoughton@google.com, amoorthy@google.com, dmatlack@google.com, axelrasmussen@google.com, peterx@redhat.com, nadav.amit@gmail.com, isaku.yamahata@gmail.com, kconsul@linux.vnet.ibm.com Allowing KVM to fault in pages during vcpu-context guest memory accesses can be undesirable: during userfaultfd-based postcopy, it can cause significant performance issues due to vCPUs contending for userfaultfd-internal locks. Add a new memslot flag (KVM_MEM_EXIT_ON_MISSING) through which userspace can indicate that KVM_RUN should exit instead of faulting in pages during vcpu-context guest memory accesses. The unfaulted pages are reported by the accompanying KVM_EXIT_MEMORY_FAULT_INFO, allowing userspace to determine and take appropriate action. The basic implementation strategy is to check the memslot flag from within __gfn_to_pfn_memslot() and override the caller-provided arguments accordingly. Some callers (such as kvm_vcpu_map()) must be able to opt out of this behavior, and do so by passing can_exit_on_missing=false. No functional change intended: nothing sets KVM_MEM_EXIT_ON_MISSING or passes can_exit_on_missing=true to __gfn_to_pfn_memslot(). Suggested-by: James Houghton Suggested-by: Sean Christopherson Signed-off-by: Anish Moorthy Reviewed-by: James Houghton --- Documentation/virt/kvm/api.rst | 28 +++++++++++++++++++++++--- arch/arm64/kvm/mmu.c | 2 +- arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +- arch/powerpc/kvm/book3s_64_mmu_radix.c | 2 +- arch/x86/kvm/mmu/mmu.c | 4 ++-- include/linux/kvm_host.h | 12 ++++++++++- include/uapi/linux/kvm.h | 2 ++ virt/kvm/Kconfig | 3 +++ virt/kvm/kvm_main.c | 25 ++++++++++++++++++----- 9 files changed, 66 insertions(+), 14 deletions(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index a07964f601de..1457865f6e98 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -1365,6 +1365,8 @@ yet and must be cleared on entry. /* for kvm_userspace_memory_region::flags */ #define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0) #define KVM_MEM_READONLY (1UL << 1) + #define KVM_MEM_GUEST_MEMFD (1UL << 2) + #define KVM_MEM_EXIT_ON_MISSING (1UL << 3) This ioctl allows the user to create, modify or delete a guest physical memory slot. Bits 0-15 of "slot" specify the slot id and this value @@ -1395,12 +1397,16 @@ It is recommended that the lower 21 bits of guest_phys_addr and userspace_addr be identical. This allows large pages in the guest to be backed by large pages in the host. -The flags field supports two flags: KVM_MEM_LOG_DIRTY_PAGES and -KVM_MEM_READONLY. The former can be set to instruct KVM to keep track of +The flags field supports four flags + +1. KVM_MEM_LOG_DIRTY_PAGES: can be set to instruct KVM to keep track of writes to memory within the slot. See KVM_GET_DIRTY_LOG ioctl to know how to -use it. The latter can be set, if KVM_CAP_READONLY_MEM capability allows it, +use it. +2. KVM_MEM_READONLY: can be set, if KVM_CAP_READONLY_MEM capability allows it, to make a new slot read-only. In this case, writes to this memory will be posted to userspace as KVM_EXIT_MMIO exits. +3. KVM_MEM_GUEST_MEMFD +4. KVM_MEM_EXIT_ON_MISSING: see KVM_CAP_EXIT_ON_MISSING for details. When the KVM_CAP_SYNC_MMU capability is available, changes in the backing of the memory region are automatically reflected into the guest. For example, an @@ -8059,6 +8065,22 @@ error/annotated fault. See KVM_EXIT_MEMORY_FAULT for more information. +7.35 KVM_CAP_EXIT_ON_MISSING +---------------------------- + +:Architectures: None +:Returns: Informational only, -EINVAL on direct KVM_ENABLE_CAP. + +The presence of this capability indicates that userspace may set the +KVM_MEM_EXIT_ON_MISSING flag on memslots. Said flag will cause KVM_RUN to fail +(-EFAULT) in response to guest-context memory accesses which would require KVM +to page fault on the userspace mapping. + +The range of guest physical memory causing the fault is advertised to userspace +through KVM_CAP_MEMORY_FAULT_INFO. Userspace should take appropriate action. +This could mean, for instance, checking that the fault is resolvable, faulting +in the relevant userspace mapping, then retrying KVM_RUN. + 8. Other capabilities. ====================== diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 4e41ceed5468..13066a6fdfff 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1486,7 +1486,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, mmap_read_unlock(current->mm); pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false, NULL, - write_fault, &writable, NULL); + write_fault, &writable, false, NULL); if (pfn == KVM_PFN_ERR_HWPOISON) { kvm_send_hwpoison_signal(hva, vma_shift); return 0; diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index efd0ebf70a5e..2ce0e1d3f597 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -613,7 +613,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_vcpu *vcpu, } else { /* Call KVM generic code to do the slow-path check */ pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false, NULL, - writing, &write_ok, NULL); + writing, &write_ok, false, NULL); if (is_error_noslot_pfn(pfn)) return -EFAULT; page = NULL; diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index 572707858d65..9d40ca02747f 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -847,7 +847,7 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu, /* Call KVM generic code to do the slow-path check */ pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false, NULL, - writing, upgrade_p, NULL); + writing, upgrade_p, false, NULL); if (is_error_noslot_pfn(pfn)) return -EFAULT; page = NULL; diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 4de7670d5976..b1e5e42bdeb4 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4375,7 +4375,7 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault async = false; fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, false, &async, fault->write, &fault->map_writable, - &fault->hva); + false, &fault->hva); if (!async) return RET_PF_CONTINUE; /* *pfn has correct page already */ @@ -4397,7 +4397,7 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault */ fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, true, NULL, fault->write, &fault->map_writable, - &fault->hva); + false, &fault->hva); return RET_PF_CONTINUE; } diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 5201400358da..e8e30088289e 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1219,7 +1219,8 @@ kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn); kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gfn); kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn, bool atomic, bool interruptible, bool *async, - bool write_fault, bool *writable, hva_t *hva); + bool write_fault, bool *writable, + bool can_exit_on_missing, hva_t *hva); void kvm_release_pfn_clean(kvm_pfn_t pfn); void kvm_release_pfn_dirty(kvm_pfn_t pfn); @@ -2423,4 +2424,13 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm, } #endif /* CONFIG_KVM_PRIVATE_MEM */ +/* + * Whether vCPUs should exit upon trying to access memory for which the + * userspace mappings are missing. + */ +static inline bool kvm_is_slot_exit_on_missing(const struct kvm_memory_slot *slot) +{ + return slot && slot->flags & KVM_MEM_EXIT_ON_MISSING; +} + #endif diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index bda5622a9c68..18546cbada61 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -116,6 +116,7 @@ struct kvm_userspace_memory_region2 { #define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0) #define KVM_MEM_READONLY (1UL << 1) #define KVM_MEM_GUEST_MEMFD (1UL << 2) +#define KVM_MEM_EXIT_ON_MISSING (1UL << 3) /* for KVM_IRQ_LINE */ struct kvm_irq_level { @@ -1231,6 +1232,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_MEMORY_ATTRIBUTES 233 #define KVM_CAP_GUEST_MEMFD 234 #define KVM_CAP_VM_TYPES 235 +#define KVM_CAP_EXIT_ON_MISSING 236 #ifdef KVM_CAP_IRQ_ROUTING diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index 2c964586aa14..241f524a4e9d 100644 --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -109,3 +109,6 @@ config KVM_GENERIC_PRIVATE_MEM select KVM_GENERIC_MEMORY_ATTRIBUTES select KVM_PRIVATE_MEM bool + +config HAVE_KVM_EXIT_ON_MISSING + bool diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 725191333c4e..faaccdba179c 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1614,7 +1614,7 @@ static void kvm_replace_memslot(struct kvm *kvm, * only allows these. */ #define KVM_SET_USER_MEMORY_REGION_V1_FLAGS \ - (KVM_MEM_LOG_DIRTY_PAGES | KVM_MEM_READONLY) + (KVM_MEM_LOG_DIRTY_PAGES | KVM_MEM_READONLY | KVM_MEM_EXIT_ON_MISSING) static int check_memory_region_flags(struct kvm *kvm, const struct kvm_userspace_memory_region2 *mem) @@ -1632,6 +1632,9 @@ static int check_memory_region_flags(struct kvm *kvm, valid_flags |= KVM_MEM_READONLY; #endif + if (IS_ENABLED(CONFIG_HAVE_KVM_EXIT_ON_MISSING)) + valid_flags |= KVM_MEM_EXIT_ON_MISSING; + if (mem->flags & ~valid_flags) return -EINVAL; @@ -3047,7 +3050,8 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool interruptible, kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn, bool atomic, bool interruptible, bool *async, - bool write_fault, bool *writable, hva_t *hva) + bool write_fault, bool *writable, + bool can_exit_on_missing, hva_t *hva) { unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL, write_fault); @@ -3070,6 +3074,15 @@ kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn, writable = NULL; } + if (!atomic && can_exit_on_missing + && kvm_is_slot_exit_on_missing(slot)) { + atomic = true; + if (async) { + *async = false; + async = NULL; + } + } + return hva_to_pfn(addr, atomic, interruptible, async, write_fault, writable); } @@ -3079,21 +3092,21 @@ kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault, bool *writable) { return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, false, - NULL, write_fault, writable, NULL); + NULL, write_fault, writable, false, NULL); } EXPORT_SYMBOL_GPL(gfn_to_pfn_prot); kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn) { return __gfn_to_pfn_memslot(slot, gfn, false, false, NULL, true, - NULL, NULL); + NULL, false, NULL); } EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot); kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gfn) { return __gfn_to_pfn_memslot(slot, gfn, true, false, NULL, true, - NULL, NULL); + NULL, false, NULL); } EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot_atomic); @@ -4898,6 +4911,8 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg) case KVM_CAP_GUEST_MEMFD: return !kvm || kvm_arch_has_private_mem(kvm); #endif + case KVM_CAP_EXIT_ON_MISSING: + return IS_ENABLED(CONFIG_HAVE_KVM_EXIT_ON_MISSING); default: break; } From patchwork Thu Nov 9 21:03:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anish Moorthy X-Patchwork-Id: 13451701 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9E1B938F8C for ; Thu, 9 Nov 2023 21:03:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Ge76z8Sj" Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C37C149CC for ; Thu, 9 Nov 2023 13:03:51 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-5b0e9c78309so19451697b3.1 for ; Thu, 09 Nov 2023 13:03:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1699563831; x=1700168631; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=l82gPfac/SXOymVisAyaLDDJ1X21QbMxRoT2bJNvsrA=; b=Ge76z8SjTbsl7ewxy9N8ErWBFHajrAoHXQ87MTCfxUZxuq7qF+PYaSZYK4HLMWR7XG 5DHBg5PWKAnWZ1o/ijmzvkM7erY+ujKgX+iWmMluy8hAQcOlYqvlNzcGg8u3kcKD5dZc NBI1hLPT6VLXz7OUjDsnSkD+Ynp11jOZPFANyNGL2IKHuTXgW5DhYjAH301fABwbPXid H+G/3DxfhaXLOQ6NvXJYd+eTqrkQHTpabe5LrgmH1y5S7l7IIL4zIErzekkzabWPk+Rt To2bwrYt3zQgHCFwBBsTXnbWJ3QvSTK8KGxoxjbX+Y2k2GTSeUNxIcmW2pTx3/FY+ZDR EFtw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699563831; x=1700168631; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=l82gPfac/SXOymVisAyaLDDJ1X21QbMxRoT2bJNvsrA=; b=WJgLVf3JwgDXT1uW40KwmyvRvvVu8oMqlBi5lFRa0gkMaoZ5vd1MGpL/XOfoK/DjHP 73fo9YLydTz/cOLgegBwnWCgL+8ndsB7fHBaDM0jRT+oduH72jLpnZzXxrkJOiKNyvIP 2Eqb7paRGJOLmN6P4O1KMcR+bgugNUS/YywFQ0/8LaKDnTw0rB0Q9/CD1X7YhzatNUMp n3YMwD79PfvBkwWCnJ3CDSLpE5JD8UAPWnz99qE9MM9tXr0t4Y+0RxmcxyJcwegXI1UD D9p6gyGzLikGLzjxhRMPAVxGbVfIoEIZN46N+o1oCwAWBD8HtQNoRBwHeOOz4udRw5w8 XhaQ== X-Gm-Message-State: AOJu0YyVypH8D3zacLIXcwaez/36r+9Y8vl5edeuPFX7KJ1Porj932qH fDduv6WV9HpaLeGo8TKT+lejxgqui4Ofiw== X-Google-Smtp-Source: AGHT+IFvJ5ZeOdaPhXfzZmDQUxm7sOA9rGuldoFyaa/+DAtdsI/IW/Gd/xhQNzlGrBgekGARQHuJRJYSnz/Stg== X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9]) (user=amoorthy job=sendgmr) by 2002:a05:690c:2d05:b0:5be:a164:566e with SMTP id eq5-20020a05690c2d0500b005bea164566emr143138ywb.4.1699563831020; Thu, 09 Nov 2023 13:03:51 -0800 (PST) Date: Thu, 9 Nov 2023 21:03:18 +0000 In-Reply-To: <20231109210325.3806151-1-amoorthy@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20231109210325.3806151-1-amoorthy@google.com> X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog Message-ID: <20231109210325.3806151-8-amoorthy@google.com> Subject: [PATCH v6 07/14] KVM: x86: Enable KVM_CAP_EXIT_ON_MISSING and annotate EFAULTs from stage-2 fault handler From: Anish Moorthy To: seanjc@google.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev Cc: oliver.upton@linux.dev, pbonzini@redhat.com, maz@kernel.org, robert.hoo.linux@gmail.com, jthoughton@google.com, amoorthy@google.com, dmatlack@google.com, axelrasmussen@google.com, peterx@redhat.com, nadav.amit@gmail.com, isaku.yamahata@gmail.com, kconsul@linux.vnet.ibm.com Prevent the stage-2 fault handler from faulting in pages when KVM_MEM_EXIT_ON_MISSING is set by allowing its __gfn_to_pfn_memslot() calls to check the memslot flag. To actually make that behavior useful, prepare a KVM_EXIT_MEMORY_FAULT when the stage-2 handler returns EFAULT, e.g. when it cannot resolve the pfn. With KVM_MEM_EXIT_ON_MISSING enabled this effects the delivery of stage-2 faults as vCPU exits, which userspace can attempt to resolve without terminating the guest. Delivering stage-2 faults to userspace in this way sidesteps the significant scalabiliy issues associated with using userfaultfd for the same purpose. Signed-off-by: Anish Moorthy --- Documentation/virt/kvm/api.rst | 2 +- arch/x86/kvm/Kconfig | 1 + arch/x86/kvm/mmu/mmu.c | 8 ++++++-- 3 files changed, 8 insertions(+), 3 deletions(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 1457865f6e98..fd87bbfbfdf2 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -8068,7 +8068,7 @@ See KVM_EXIT_MEMORY_FAULT for more information. 7.35 KVM_CAP_EXIT_ON_MISSING ---------------------------- -:Architectures: None +:Architectures: x86 :Returns: Informational only, -EINVAL on direct KVM_ENABLE_CAP. The presence of this capability indicates that userspace may set the diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index c1716e83d176..97b16be349a2 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -49,6 +49,7 @@ config KVM select INTERVAL_TREE select HAVE_KVM_PM_NOTIFIER if PM select KVM_GENERIC_HARDWARE_ENABLING + select HAVE_KVM_EXIT_ON_MISSING help Support hosting fully virtualized guest machines using hardware virtualization extensions. You will need a fairly recent diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index b1e5e42bdeb4..bc978260d2be 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3309,6 +3309,10 @@ static int kvm_handle_error_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fa return RET_PF_RETRY; } + WARN_ON_ONCE(fault->goal_level != PG_LEVEL_4K); + + kvm_prepare_memory_fault_exit(vcpu, gfn_to_gpa(fault->gfn), PAGE_SIZE, + fault->write, fault->exec, fault->is_private); return -EFAULT; } @@ -4375,7 +4379,7 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault async = false; fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, false, &async, fault->write, &fault->map_writable, - false, &fault->hva); + true, &fault->hva); if (!async) return RET_PF_CONTINUE; /* *pfn has correct page already */ @@ -4397,7 +4401,7 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault */ fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, true, NULL, fault->write, &fault->map_writable, - false, &fault->hva); + true, &fault->hva); return RET_PF_CONTINUE; } From patchwork Thu Nov 9 21:03:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anish Moorthy X-Patchwork-Id: 13451702 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1141238F99 for ; Thu, 9 Nov 2023 21:03:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="EfAj8SFz" Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D17E949D3 for ; Thu, 9 Nov 2023 13:03:52 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-da0c7d27fb0so1604664276.1 for ; Thu, 09 Nov 2023 13:03:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1699563832; x=1700168632; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=FlXGm33qnDUGSwPi6nRUH+NhfpRFfL8yhZwyreXloOM=; b=EfAj8SFzDmOCXtDzsirs+0iUcajDkPSK9vppWQGwMHpXm/pdBoaLdeRkRmeR+qWu0t MJi6PPezjpRVpYWJLg9pCNfO8DnCiWVyUe/7zJkRnGtZ/K8dL+yF3Ftw5Nz0YYfTT9Ca iro9z8IxcRkLsZlZVY+tAsOqA38e9qqHwwehE4Pz8j6YhhG/xcDlLVhMibF6Y+BmJ0e/ kUmrruDA5LHTqLdcBcJdIJqInhuoUX10u9nolrHCmRugccpmlzN+F1ddtvilXFFz6x9O gDG/E2GIYRaQlvoCwUjFTrlrJLjsTvE+49nJg8m4ZHw7ltAopBcQxvH4Nvw/XQWjPJvZ nvTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699563832; x=1700168632; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=FlXGm33qnDUGSwPi6nRUH+NhfpRFfL8yhZwyreXloOM=; b=HOPUxG7kkabT10g/qrhaYXze6pXTEg3HAS71CAPgCQDXkwY7ZFpr/j5kVYizYF+0Yg hZ2mqve5HjQAkayLaEvG345qpw947OMU1TS7BilGUbDDLrMnWaCCpPqNlWB7JPI5Af7N qMmO+BU6uhe38A8GMt4xbvNfAke0hC9MQDtKh3feHvp5uJStR+ttp2+T1fmII6qU5uo9 ZL/khR5k0k3zvIkoYK2z0/yvTIu5Mc3FTnBJtR5lwfKonhY0PLxvdIITc0PXjooG0SRg Okd2pppLa7Q6SEfaDjJPUwzzwzbposw8iNciQ/WPZNzj9p1KdXx8QAZ5uoGrtA+C6fuw k2OQ== X-Gm-Message-State: AOJu0YwaW9CfrcZdBU4+uBTg62NjE30uRhYjQ4hTscdnwnLPosVxmrMN EdHiYqebBx0py+xaau28umrPZwRhxUPh0A== X-Google-Smtp-Source: AGHT+IFaRWh9Kbo8WRKxbI1CvM7U18p+ZwojLoHrOJQ0ULGFOG0q4KET+/ELOWlz9GcEp/WRM4l5DEYeDb6NvQ== X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9]) (user=amoorthy job=sendgmr) by 2002:a05:6902:1366:b0:dae:49a3:ae23 with SMTP id bt6-20020a056902136600b00dae49a3ae23mr158715ybb.7.1699563832063; Thu, 09 Nov 2023 13:03:52 -0800 (PST) Date: Thu, 9 Nov 2023 21:03:19 +0000 In-Reply-To: <20231109210325.3806151-1-amoorthy@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20231109210325.3806151-1-amoorthy@google.com> X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog Message-ID: <20231109210325.3806151-9-amoorthy@google.com> Subject: [PATCH v6 08/14] KVM: arm64: Enable KVM_CAP_MEMORY_FAULT_INFO From: Anish Moorthy To: seanjc@google.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev Cc: oliver.upton@linux.dev, pbonzini@redhat.com, maz@kernel.org, robert.hoo.linux@gmail.com, jthoughton@google.com, amoorthy@google.com, dmatlack@google.com, axelrasmussen@google.com, peterx@redhat.com, nadav.amit@gmail.com, isaku.yamahata@gmail.com, kconsul@linux.vnet.ibm.com TODO: Changelog -- and possibly just merge into the "god" arm commit? Signed-off-by: Anish Moorthy --- arch/arm64/kvm/arm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 317964bad1e1..b5c1d1fb77d0 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -241,6 +241,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_ARM_SYSTEM_SUSPEND: case KVM_CAP_IRQFD_RESAMPLE: case KVM_CAP_COUNTER_OFFSET: + case KVM_CAP_MEMORY_FAULT_INFO: r = 1; break; case KVM_CAP_SET_GUEST_DEBUG2: From patchwork Thu Nov 9 21:03:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anish Moorthy X-Patchwork-Id: 13451703 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7F49838F9E for ; Thu, 9 Nov 2023 21:03:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="KpycGLKB" Received: from mail-qk1-x74a.google.com (mail-qk1-x74a.google.com [IPv6:2607:f8b0:4864:20::74a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CEC7E49DD for ; Thu, 9 Nov 2023 13:03:53 -0800 (PST) Received: by mail-qk1-x74a.google.com with SMTP id af79cd13be357-778b3526240so147393285a.1 for ; Thu, 09 Nov 2023 13:03:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1699563833; x=1700168633; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=FOcXPtnv49h6RAm6cS9G5VMrOnN29qVgKtjpOURCfKY=; b=KpycGLKBMqfkE0N7EQ0j6/0DWWO3N23UArSyO2Fv1bjR+SjyHgyjwi8hfUKYzi7Cm5 uaRPVzxiAigaLxKS9ruIofjhTjl1T3Tg9G2Ca7aYseDdTa+48zfnSzWaRYKMspT5QsM5 VhogVEnJh/Y4i2ChjPW/CokUz8JBcC4GkrnijOztGgggi9BfKIGmvR8xNUikspRcc35+ SGrlqRNqfy/OLKWu5eaYjgifO5Sogzfr3OBYK/66H9lWLZ7LVhFHccblTq6bX3r0Nz3I Mp9I8/hf7MASaUlnx0Ut//IYajIVyUaPG3Unx8vBkgR7xvma7Cs6fmxnXX5oDfJ6Md/V 4Bqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699563833; x=1700168633; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=FOcXPtnv49h6RAm6cS9G5VMrOnN29qVgKtjpOURCfKY=; b=QLPo3KcKDOV3kHwYsKAKQoeoRrOXpjh1y9FUFuZAPr7Gn4Yw8H7B6leGkcsm6SagVE vTSK7+hHKwisViHWGcnw3KMCnSZzWHoPUgiB6IUfyqSDeAfOvZcF30s/E7NvP5IBlzbH GFCdEDxO83JP6jlKwbRlkpI21469jVeml5UdJKUsqAZattt21MsdIcpmzf1fkLicYt5E h/PZWHF1lZ3Ejp9FjMxGSWp26Hb021ZqBdNDHkvYSmC21VhM2aY+cMUZJmV79vu1wa+c sxDfc2NT5F+bWpA6epSIySk21D7mT8OAyLLBSRXJSAy/+OvhhFInZX+prqygs/H5OLOD TXfw== X-Gm-Message-State: AOJu0Yx2VbhR6DnDps1eKvFzBYP1r3HhDIU//2sAZuLtTvIK2R9BYeYo XG0LYIFawsRF6FbvKBL/c4d3wWYugGAo6A== X-Google-Smtp-Source: AGHT+IEPvmgFr/ZO9CbrgP1bKZHB0+G4ikHM0pFYVGxmjxS0Pviry7UBKDcbDGJlyuhvYt2yNZUCp14aW3sEOA== X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9]) (user=amoorthy job=sendgmr) by 2002:a05:620a:4509:b0:775:7572:4e62 with SMTP id t9-20020a05620a450900b0077575724e62mr188628qkp.2.1699563833047; Thu, 09 Nov 2023 13:03:53 -0800 (PST) Date: Thu, 9 Nov 2023 21:03:20 +0000 In-Reply-To: <20231109210325.3806151-1-amoorthy@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20231109210325.3806151-1-amoorthy@google.com> X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog Message-ID: <20231109210325.3806151-10-amoorthy@google.com> Subject: [PATCH v6 09/14] KVM: arm64: Enable KVM_CAP_EXIT_ON_MISSING and annotate an EFAULT from stage-2 fault-handler From: Anish Moorthy To: seanjc@google.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev Cc: oliver.upton@linux.dev, pbonzini@redhat.com, maz@kernel.org, robert.hoo.linux@gmail.com, jthoughton@google.com, amoorthy@google.com, dmatlack@google.com, axelrasmussen@google.com, peterx@redhat.com, nadav.amit@gmail.com, isaku.yamahata@gmail.com, kconsul@linux.vnet.ibm.com Prevent the stage-2 fault handler from faulting in pages when KVM_MEM_EXIT_ON_MISSING is set by allowing its __gfn_to_pfn_memslot() calls to check the memslot flag. To actually make that behavior useful, prepare a KVM_EXIT_MEMORY_FAULT when the stage-2 handler cannot resolve the pfn for a fault. With KVM_MEM_EXIT_ON_MISSING enabled this effects the delivery of stage-2 faults as vCPU exits, which userspace can attempt to resolve without terminating the guest. Delivering stage-2 faults to userspace in this way sidesteps the significant scalabiliy issues associated with using userfaultfd for the same purpose. Signed-off-by: Anish Moorthy --- Documentation/virt/kvm/api.rst | 2 +- arch/arm64/kvm/Kconfig | 1 + arch/arm64/kvm/mmu.c | 7 +++++-- 3 files changed, 7 insertions(+), 3 deletions(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index fd87bbfbfdf2..67fcb9dbe855 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -8068,7 +8068,7 @@ See KVM_EXIT_MEMORY_FAULT for more information. 7.35 KVM_CAP_EXIT_ON_MISSING ---------------------------- -:Architectures: x86 +:Architectures: x86, arm64 :Returns: Informational only, -EINVAL on direct KVM_ENABLE_CAP. The presence of this capability indicates that userspace may set the diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig index 1a777715199f..d6fae31f7e1a 100644 --- a/arch/arm64/kvm/Kconfig +++ b/arch/arm64/kvm/Kconfig @@ -43,6 +43,7 @@ menuconfig KVM select GUEST_PERF_EVENTS if PERF_EVENTS select INTERVAL_TREE select XARRAY_MULTI + select HAVE_KVM_EXIT_ON_MISSING help Support hosting virtualized guest machines. diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 13066a6fdfff..3b9fb80672ac 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1486,13 +1486,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, mmap_read_unlock(current->mm); pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false, NULL, - write_fault, &writable, false, NULL); + write_fault, &writable, true, NULL); if (pfn == KVM_PFN_ERR_HWPOISON) { kvm_send_hwpoison_signal(hva, vma_shift); return 0; } - if (is_error_noslot_pfn(pfn)) + if (is_error_noslot_pfn(pfn)) { + kvm_prepare_memory_fault_exit(vcpu, gfn * PAGE_SIZE, PAGE_SIZE, + write_fault, exec_fault, false); return -EFAULT; + } if (kvm_is_device_pfn(pfn)) { /* From patchwork Thu Nov 9 21:03:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anish Moorthy X-Patchwork-Id: 13451705 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AE43B38FAE for ; Thu, 9 Nov 2023 21:03:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="1koZbFRk" Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E1EE049EA for ; Thu, 9 Nov 2023 13:03:54 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-5a8ee6a1801so18261117b3.3 for ; Thu, 09 Nov 2023 13:03:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1699563834; x=1700168634; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ndN5evHM8MUglZ5LACl1va3WXCv7FT2yLnixDthGd8U=; b=1koZbFRkvo13gtyS05R6wZwGD0fXvQah5zHalyzUW9AxunfsEXwKN2kPuxQEeKC2MF jeU9iweKcReKzBfyoylfK3KSa673kv0AP7m1cWp9h+8nQYW9Szmw6jVzVJ2iffPeR1/n Sl2RNeIz7NfYoME/fx6UHFBhIK410BcBFTIeYmjS5WQkR7DloxS3n1OruGuajsBblwKH 7VAIGBssgG1D5bUPDbu7BWoxSC6XMgXp0rgc5AKX/OY7LYXuVgPeHQqlmPFKyqM04oKX 7piCjPjMHbKLnhn2X51u88bmfRlF116WCSoDtPkiLEwdjJ1JVVDu3KaP3Qlfwy7b3ld3 eZJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699563834; x=1700168634; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ndN5evHM8MUglZ5LACl1va3WXCv7FT2yLnixDthGd8U=; b=N6PQk74ecIBGE80p061fKmyy9hHCExr7W9g0xQh2JGZUQqHMAG9Hd6Nsje22AW8C2G p2m3cAxdbOaLxmtp5yTSLLkreCRdf12z4oSi4LokUFvCucjWh7XLqziZnwTrdlTYJNrZ RWnnDA7H1d9q8swp3+H3fmBKimN1uoMd9j8g4QVZOOS6v7GorxAm8ru999y5rQv3puE1 Hz+Ed6yWYdjpTcldxN3NBvDnahUTPPIFYyI2GRdtzVb+gSMjT6nnuOL/CNh+UtxXQsIP 0Rdsodirj62OgaALcoTkX+3bxKoZwdac/Uuhj8Tug80caBxCBFM95dfEXBGmi4ilM339 PCXQ== X-Gm-Message-State: AOJu0YzC2m7oHVTVAZfZk95YzBZm4xJQQo//GRZyxP9WwfHiG/+2kRy0 RuSlz8y7HrElnkYRhAX/TA29Lmarm5cfWQ== X-Google-Smtp-Source: AGHT+IGr/R/Ys/jo55wlqQbJQ0YqF45A4R74eV/HizZPI6A91g6dD/euSHzkl4e6vMTELVXYRo6uYcd4cdgrgw== X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9]) (user=amoorthy job=sendgmr) by 2002:a25:1fd6:0:b0:d9a:e3d9:99bd with SMTP id f205-20020a251fd6000000b00d9ae3d999bdmr147977ybf.5.1699563834122; Thu, 09 Nov 2023 13:03:54 -0800 (PST) Date: Thu, 9 Nov 2023 21:03:21 +0000 In-Reply-To: <20231109210325.3806151-1-amoorthy@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20231109210325.3806151-1-amoorthy@google.com> X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog Message-ID: <20231109210325.3806151-11-amoorthy@google.com> Subject: [PATCH v6 10/14] KVM: selftests: Report per-vcpu demand paging rate from demand paging test From: Anish Moorthy To: seanjc@google.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev Cc: oliver.upton@linux.dev, pbonzini@redhat.com, maz@kernel.org, robert.hoo.linux@gmail.com, jthoughton@google.com, amoorthy@google.com, dmatlack@google.com, axelrasmussen@google.com, peterx@redhat.com, nadav.amit@gmail.com, isaku.yamahata@gmail.com, kconsul@linux.vnet.ibm.com Using the overall demand paging rate to measure performance can be slightly misleading when vCPU accesses are not overlapped. Adding more vCPUs will (usually) increase the overall demand paging rate even if performance remains constant or even degrades on a per-vcpu basis. As such, it makes sense to report both the total and per-vcpu paging rates. Signed-off-by: Anish Moorthy --- tools/testing/selftests/kvm/demand_paging_test.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c index 09c116a82a84..6dc823fa933a 100644 --- a/tools/testing/selftests/kvm/demand_paging_test.c +++ b/tools/testing/selftests/kvm/demand_paging_test.c @@ -135,6 +135,7 @@ static void run_test(enum vm_guest_mode mode, void *arg) struct timespec ts_diff; struct kvm_vm *vm; int i; + double vcpu_paging_rate; vm = memstress_create_vm(mode, nr_vcpus, guest_percpu_mem_size, 1, p->src_type, p->partition_vcpu_memory_access); @@ -191,11 +192,17 @@ static void run_test(enum vm_guest_mode mode, void *arg) uffd_stop_demand_paging(uffd_descs[i]); } - pr_info("Total guest execution time: %ld.%.9lds\n", + pr_info("Total guest execution time:\t%ld.%.9lds\n", ts_diff.tv_sec, ts_diff.tv_nsec); - pr_info("Overall demand paging rate: %f pgs/sec\n", - memstress_args.vcpu_args[0].pages * nr_vcpus / - ((double)ts_diff.tv_sec + (double)ts_diff.tv_nsec / NSEC_PER_SEC)); + + vcpu_paging_rate = + memstress_args.vcpu_args[0].pages + / ((double)ts_diff.tv_sec + + (double)ts_diff.tv_nsec / NSEC_PER_SEC); + pr_info("Per-vcpu demand paging rate:\t%f pgs/sec/vcpu\n", + vcpu_paging_rate); + pr_info("Overall demand paging rate:\t%f pgs/sec\n", + vcpu_paging_rate * nr_vcpus); memstress_destroy_vm(vm); From patchwork Thu Nov 9 21:03:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anish Moorthy X-Patchwork-Id: 13451706 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8717638FB8 for ; Thu, 9 Nov 2023 21:03:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="2Br4hzZr" Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 07D1749F8 for ; Thu, 9 Nov 2023 13:03:56 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-5a9012ab0adso19316147b3.1 for ; Thu, 09 Nov 2023 13:03:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1699563835; x=1700168635; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=TmcvA2ZZT630fMBccfzxy2XW4lw7KM7+A6ZwrGNXQDc=; b=2Br4hzZrj99zP76BOnKcEBSUxj8PTR42njrHMqRZoVgTlS0xX1pVBnGYrBfKfwXU2/ m8hXKx1Xx3hs82/OO2laASQ/cj2ApWuNxHwCfWf8LzXz04jvH6SIo9v8xf+CYwjkbFsL /XWg+08pDrRXToezx+9MYx8VPLtjZ8dBNiFLi++kQ2d/yFwoKHQFR6aNY5LeNAQoil31 s6jnZapPygEovmSBHFFO7ZuOFOj1jIw1IchofT+sah+bQgMuV7okDdNLJCT6XLPaIG5M 4T1YAWKoxpP/yun+YhG6y94QvfbA27VNc6xe6jbActUv7VhTvC9qou1EYD8g/2D3+lWk /5HA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699563835; x=1700168635; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=TmcvA2ZZT630fMBccfzxy2XW4lw7KM7+A6ZwrGNXQDc=; b=lXAEENNHAF9rejWghqiz8S8J7pyiWgVffbNzugsU48GK2FsxwIhwpj8vdmw5XaI//5 v+OsF/Z5vPAlS8aX09Ae07cuafbSzxBOZy9oCPHz4TfKpnnnGxl2EFVIP+4lEIP72IuN zEsgGSpRTOOtKaHvQF/+8Bs76JW9nnunfXRkLByBSFzI+NQdiykfuSBtTCqNuqqS/dG9 6XNirFb5C07zOCOjuXnm5rlx6ZGabetm/x5OPaQTrTlpT/hWpwAcN+UqQAnsmhsKBNhC zA5YMefaM1R886yvjBGluoi5q1ITZX5M02xqNB18VKS9lBFbJz8k+a7VsWkwMYJjSg20 CLKg== X-Gm-Message-State: AOJu0Yz0gcxwVg96IL1+jIIHYyob7eERFqq+tOXpvhT7vh2xnzBhC5NR fQTMnyCQqvl8we6Svr3YaTdvB23A4dO5KQ== X-Google-Smtp-Source: AGHT+IGE7Up2m5KuyO2zCRsTtM4ClEFjk3esQEe3dlB7T2ZYx643WTzVUSJx0s9yMYjB2D6Z2anjiLa7EJfeaQ== X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9]) (user=amoorthy job=sendgmr) by 2002:a0d:d7c2:0:b0:59b:db15:498c with SMTP id z185-20020a0dd7c2000000b0059bdb15498cmr158254ywd.10.1699563835263; Thu, 09 Nov 2023 13:03:55 -0800 (PST) Date: Thu, 9 Nov 2023 21:03:22 +0000 In-Reply-To: <20231109210325.3806151-1-amoorthy@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20231109210325.3806151-1-amoorthy@google.com> X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog Message-ID: <20231109210325.3806151-12-amoorthy@google.com> Subject: [PATCH v6 11/14] KVM: selftests: Allow many vCPUs and reader threads per UFFD in demand paging test From: Anish Moorthy To: seanjc@google.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev Cc: oliver.upton@linux.dev, pbonzini@redhat.com, maz@kernel.org, robert.hoo.linux@gmail.com, jthoughton@google.com, amoorthy@google.com, dmatlack@google.com, axelrasmussen@google.com, peterx@redhat.com, nadav.amit@gmail.com, isaku.yamahata@gmail.com, kconsul@linux.vnet.ibm.com At the moment, demand_paging_test does not support profiling/testing multiple vCPU threads concurrently faulting on a single uffd because (a) "-u" (run test in userfaultfd mode) creates a uffd for each vCPU's region, so that each uffd services a single vCPU thread. (b) "-u -o" (userfaultfd mode + overlapped vCPU memory accesses) simply doesn't work: the test tries to register the same memory to multiple uffds, causing an error. Add support for many vcpus per uffd by (1) Keeping "-u" behavior unchanged. (2) Making "-u -a" create a single uffd for all of guest memory. (3) Making "-u -o" implicitly pass "-a", solving the problem in (b). In cases (2) and (3) all vCPU threads fault on a single uffd. With potentially multiple vCPUs per UFFD, it makes sense to allow configuring the number of reader threads per UFFD as well: add the "-r" flag to do so. Signed-off-by: Anish Moorthy Acked-by: James Houghton --- .../selftests/kvm/aarch64/page_fault_test.c | 4 +- .../selftests/kvm/demand_paging_test.c | 76 +++++++++++++--- .../selftests/kvm/include/userfaultfd_util.h | 17 +++- .../selftests/kvm/lib/userfaultfd_util.c | 87 +++++++++++++------ 4 files changed, 137 insertions(+), 47 deletions(-) diff --git a/tools/testing/selftests/kvm/aarch64/page_fault_test.c b/tools/testing/selftests/kvm/aarch64/page_fault_test.c index 08a5ca5bed56..dad1fb338f36 100644 --- a/tools/testing/selftests/kvm/aarch64/page_fault_test.c +++ b/tools/testing/selftests/kvm/aarch64/page_fault_test.c @@ -375,14 +375,14 @@ static void setup_uffd(struct kvm_vm *vm, struct test_params *p, *pt_uffd = uffd_setup_demand_paging(uffd_mode, 0, pt_args.hva, pt_args.paging_size, - test->uffd_pt_handler); + 1, test->uffd_pt_handler); *data_uffd = NULL; if (test->uffd_data_handler) *data_uffd = uffd_setup_demand_paging(uffd_mode, 0, data_args.hva, data_args.paging_size, - test->uffd_data_handler); + 1, test->uffd_data_handler); } static void free_uffd(struct test_desc *test, struct uffd_desc *pt_uffd, diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c index 6dc823fa933a..f7897a951f90 100644 --- a/tools/testing/selftests/kvm/demand_paging_test.c +++ b/tools/testing/selftests/kvm/demand_paging_test.c @@ -77,8 +77,20 @@ static int handle_uffd_page_request(int uffd_mode, int uffd, copy.mode = 0; r = ioctl(uffd, UFFDIO_COPY, ©); - if (r == -1) { - pr_info("Failed UFFDIO_COPY in 0x%lx from thread %d with errno: %d\n", + /* + * With multiple vCPU threads fault on a single page and there are + * multiple readers for the UFFD, at least one of the UFFDIO_COPYs + * will fail with EEXIST: handle that case without signaling an + * error. + * + * Note that this also suppress any EEXISTs occurring from, + * e.g., the first UFFDIO_COPY/CONTINUEs on a page. That never + * happens here, but a realistic VMM might potentially maintain + * some external state to correctly surface EEXISTs to userspace + * (or prevent duplicate COPY/CONTINUEs in the first place). + */ + if (r == -1 && errno != EEXIST) { + pr_info("Failed UFFDIO_COPY in 0x%lx from thread %d, errno = %d\n", addr, tid, errno); return r; } @@ -89,8 +101,20 @@ static int handle_uffd_page_request(int uffd_mode, int uffd, cont.range.len = demand_paging_size; r = ioctl(uffd, UFFDIO_CONTINUE, &cont); - if (r == -1) { - pr_info("Failed UFFDIO_CONTINUE in 0x%lx from thread %d with errno: %d\n", + /* + * With multiple vCPU threads fault on a single page and there are + * multiple readers for the UFFD, at least one of the UFFDIO_COPYs + * will fail with EEXIST: handle that case without signaling an + * error. + * + * Note that this also suppress any EEXISTs occurring from, + * e.g., the first UFFDIO_COPY/CONTINUEs on a page. That never + * happens here, but a realistic VMM might potentially maintain + * some external state to correctly surface EEXISTs to userspace + * (or prevent duplicate COPY/CONTINUEs in the first place). + */ + if (r == -1 && errno != EEXIST) { + pr_info("Failed UFFDIO_CONTINUE in 0x%lx, thread %d, errno = %d\n", addr, tid, errno); return r; } @@ -110,7 +134,9 @@ static int handle_uffd_page_request(int uffd_mode, int uffd, struct test_params { int uffd_mode; + bool single_uffd; useconds_t uffd_delay; + int readers_per_uffd; enum vm_mem_backing_src_type src_type; bool partition_vcpu_memory_access; }; @@ -134,8 +160,9 @@ static void run_test(enum vm_guest_mode mode, void *arg) struct timespec start; struct timespec ts_diff; struct kvm_vm *vm; - int i; + int i, num_uffds = 0; double vcpu_paging_rate; + uint64_t uffd_region_size; vm = memstress_create_vm(mode, nr_vcpus, guest_percpu_mem_size, 1, p->src_type, p->partition_vcpu_memory_access); @@ -148,7 +175,8 @@ static void run_test(enum vm_guest_mode mode, void *arg) memset(guest_data_prototype, 0xAB, demand_paging_size); if (p->uffd_mode == UFFDIO_REGISTER_MODE_MINOR) { - for (i = 0; i < nr_vcpus; i++) { + num_uffds = p->single_uffd ? 1 : nr_vcpus; + for (i = 0; i < num_uffds; i++) { vcpu_args = &memstress_args.vcpu_args[i]; prefault_mem(addr_gpa2alias(vm, vcpu_args->gpa), vcpu_args->pages * memstress_args.guest_page_size); @@ -156,9 +184,13 @@ static void run_test(enum vm_guest_mode mode, void *arg) } if (p->uffd_mode) { - uffd_descs = malloc(nr_vcpus * sizeof(struct uffd_desc *)); + num_uffds = p->single_uffd ? 1 : nr_vcpus; + uffd_region_size = nr_vcpus * guest_percpu_mem_size / num_uffds; + + uffd_descs = malloc(num_uffds * sizeof(struct uffd_desc *)); TEST_ASSERT(uffd_descs, "Memory allocation failed"); - for (i = 0; i < nr_vcpus; i++) { + for (i = 0; i < num_uffds; i++) { + struct memstress_vcpu_args *vcpu_args; void *vcpu_hva; vcpu_args = &memstress_args.vcpu_args[i]; @@ -171,7 +203,8 @@ static void run_test(enum vm_guest_mode mode, void *arg) */ uffd_descs[i] = uffd_setup_demand_paging( p->uffd_mode, p->uffd_delay, vcpu_hva, - vcpu_args->pages * memstress_args.guest_page_size, + uffd_region_size, + p->readers_per_uffd, &handle_uffd_page_request); } } @@ -188,7 +221,7 @@ static void run_test(enum vm_guest_mode mode, void *arg) if (p->uffd_mode) { /* Tell the user fault fd handler threads to quit */ - for (i = 0; i < nr_vcpus; i++) + for (i = 0; i < num_uffds; i++) uffd_stop_demand_paging(uffd_descs[i]); } @@ -214,15 +247,20 @@ static void run_test(enum vm_guest_mode mode, void *arg) static void help(char *name) { puts(""); - printf("usage: %s [-h] [-m vm_mode] [-u uffd_mode] [-d uffd_delay_usec]\n" - " [-b memory] [-s type] [-v vcpus] [-c cpu_list] [-o]\n", name); + printf("usage: %s [-h] [-m vm_mode] [-u uffd_mode] [-a]\n" + " [-d uffd_delay_usec] [-r readers_per_uffd] [-b memory]\n" + " [-s type] [-v vcpus] [-c cpu_list] [-o]\n", name); guest_modes_help(); printf(" -u: use userfaultfd to handle vCPU page faults. Mode is a\n" " UFFD registration mode: 'MISSING' or 'MINOR'.\n"); kvm_print_vcpu_pinning_help(); + printf(" -a: Use a single userfaultfd for all of guest memory, instead of\n" + " creating one for each region paged by a unique vCPU\n" + " Set implicitly with -o, and no effect without -u.\n"); printf(" -d: add a delay in usec to the User Fault\n" " FD handler to simulate demand paging\n" " overheads. Ignored without -u.\n"); + printf(" -r: Set the number of reader threads per uffd.\n"); printf(" -b: specify the size of the memory region which should be\n" " demand paged by each vCPU. e.g. 10M or 3G.\n" " Default: 1G\n"); @@ -241,12 +279,14 @@ int main(int argc, char *argv[]) struct test_params p = { .src_type = DEFAULT_VM_MEM_SRC, .partition_vcpu_memory_access = true, + .readers_per_uffd = 1, + .single_uffd = false, }; int opt; guest_modes_append_default(); - while ((opt = getopt(argc, argv, "hm:u:d:b:s:v:c:o")) != -1) { + while ((opt = getopt(argc, argv, "ahom:u:d:b:s:v:c:r:")) != -1) { switch (opt) { case 'm': guest_modes_cmdline(optarg); @@ -258,6 +298,9 @@ int main(int argc, char *argv[]) p.uffd_mode = UFFDIO_REGISTER_MODE_MINOR; TEST_ASSERT(p.uffd_mode, "UFFD mode must be 'MISSING' or 'MINOR'."); break; + case 'a': + p.single_uffd = true; + break; case 'd': p.uffd_delay = strtoul(optarg, NULL, 0); TEST_ASSERT(p.uffd_delay >= 0, "A negative UFFD delay is not supported."); @@ -278,6 +321,13 @@ int main(int argc, char *argv[]) break; case 'o': p.partition_vcpu_memory_access = false; + p.single_uffd = true; + break; + case 'r': + p.readers_per_uffd = atoi(optarg); + TEST_ASSERT(p.readers_per_uffd >= 1, + "Invalid number of readers per uffd %d: must be >=1", + p.readers_per_uffd); break; case 'h': default: diff --git a/tools/testing/selftests/kvm/include/userfaultfd_util.h b/tools/testing/selftests/kvm/include/userfaultfd_util.h index 877449c34592..af83a437e74a 100644 --- a/tools/testing/selftests/kvm/include/userfaultfd_util.h +++ b/tools/testing/selftests/kvm/include/userfaultfd_util.h @@ -17,18 +17,27 @@ typedef int (*uffd_handler_t)(int uffd_mode, int uffd, struct uffd_msg *msg); -struct uffd_desc { +struct uffd_reader_args { int uffd_mode; int uffd; - int pipefds[2]; useconds_t delay; uffd_handler_t handler; - pthread_t thread; + /* Holds the read end of the pipe for killing the reader. */ + int pipe; +}; + +struct uffd_desc { + int uffd; + uint64_t num_readers; + /* Holds the write ends of the pipes for killing the readers. */ + int *pipefds; + pthread_t *readers; + struct uffd_reader_args *reader_args; }; struct uffd_desc *uffd_setup_demand_paging(int uffd_mode, useconds_t delay, void *hva, uint64_t len, - uffd_handler_t handler); + uint64_t num_readers, uffd_handler_t handler); void uffd_stop_demand_paging(struct uffd_desc *uffd); diff --git a/tools/testing/selftests/kvm/lib/userfaultfd_util.c b/tools/testing/selftests/kvm/lib/userfaultfd_util.c index 271f63891581..6f220aa4fb08 100644 --- a/tools/testing/selftests/kvm/lib/userfaultfd_util.c +++ b/tools/testing/selftests/kvm/lib/userfaultfd_util.c @@ -27,10 +27,8 @@ static void *uffd_handler_thread_fn(void *arg) { - struct uffd_desc *uffd_desc = (struct uffd_desc *)arg; - int uffd = uffd_desc->uffd; - int pipefd = uffd_desc->pipefds[0]; - useconds_t delay = uffd_desc->delay; + struct uffd_reader_args *reader_args = (struct uffd_reader_args *)arg; + int uffd = reader_args->uffd; int64_t pages = 0; struct timespec start; struct timespec ts_diff; @@ -44,7 +42,7 @@ static void *uffd_handler_thread_fn(void *arg) pollfd[0].fd = uffd; pollfd[0].events = POLLIN; - pollfd[1].fd = pipefd; + pollfd[1].fd = reader_args->pipe; pollfd[1].events = POLLIN; r = poll(pollfd, 2, -1); @@ -92,9 +90,9 @@ static void *uffd_handler_thread_fn(void *arg) if (!(msg.event & UFFD_EVENT_PAGEFAULT)) continue; - if (delay) - usleep(delay); - r = uffd_desc->handler(uffd_desc->uffd_mode, uffd, &msg); + if (reader_args->delay) + usleep(reader_args->delay); + r = reader_args->handler(reader_args->uffd_mode, uffd, &msg); if (r < 0) return NULL; pages++; @@ -110,7 +108,7 @@ static void *uffd_handler_thread_fn(void *arg) struct uffd_desc *uffd_setup_demand_paging(int uffd_mode, useconds_t delay, void *hva, uint64_t len, - uffd_handler_t handler) + uint64_t num_readers, uffd_handler_t handler) { struct uffd_desc *uffd_desc; bool is_minor = (uffd_mode == UFFDIO_REGISTER_MODE_MINOR); @@ -118,14 +116,26 @@ struct uffd_desc *uffd_setup_demand_paging(int uffd_mode, useconds_t delay, struct uffdio_api uffdio_api; struct uffdio_register uffdio_register; uint64_t expected_ioctls = ((uint64_t) 1) << _UFFDIO_COPY; - int ret; + int ret, i; PER_PAGE_DEBUG("Userfaultfd %s mode, faults resolved with %s\n", is_minor ? "MINOR" : "MISSING", is_minor ? "UFFDIO_CONINUE" : "UFFDIO_COPY"); uffd_desc = malloc(sizeof(struct uffd_desc)); - TEST_ASSERT(uffd_desc, "malloc failed"); + TEST_ASSERT(uffd_desc, "Failed to malloc uffd descriptor"); + + uffd_desc->pipefds = malloc(sizeof(int) * num_readers); + TEST_ASSERT(uffd_desc->pipefds, "Failed to malloc pipes"); + + uffd_desc->readers = malloc(sizeof(pthread_t) * num_readers); + TEST_ASSERT(uffd_desc->readers, "Failed to malloc reader threads"); + + uffd_desc->reader_args = malloc( + sizeof(struct uffd_reader_args) * num_readers); + TEST_ASSERT(uffd_desc->reader_args, "Failed to malloc reader_args"); + + uffd_desc->num_readers = num_readers; /* In order to get minor faults, prefault via the alias. */ if (is_minor) @@ -148,18 +158,28 @@ struct uffd_desc *uffd_setup_demand_paging(int uffd_mode, useconds_t delay, TEST_ASSERT((uffdio_register.ioctls & expected_ioctls) == expected_ioctls, "missing userfaultfd ioctls"); - ret = pipe2(uffd_desc->pipefds, O_CLOEXEC | O_NONBLOCK); - TEST_ASSERT(!ret, "Failed to set up pipefd"); - - uffd_desc->uffd_mode = uffd_mode; uffd_desc->uffd = uffd; - uffd_desc->delay = delay; - uffd_desc->handler = handler; - pthread_create(&uffd_desc->thread, NULL, uffd_handler_thread_fn, - uffd_desc); + for (i = 0; i < uffd_desc->num_readers; ++i) { + int pipes[2]; + + ret = pipe2((int *) &pipes, O_CLOEXEC | O_NONBLOCK); + TEST_ASSERT(!ret, "Failed to set up pipefd %i for uffd_desc %p", + i, uffd_desc); + + uffd_desc->pipefds[i] = pipes[1]; - PER_VCPU_DEBUG("Created uffd thread for HVA range [%p, %p)\n", - hva, hva + len); + uffd_desc->reader_args[i].uffd_mode = uffd_mode; + uffd_desc->reader_args[i].uffd = uffd; + uffd_desc->reader_args[i].delay = delay; + uffd_desc->reader_args[i].handler = handler; + uffd_desc->reader_args[i].pipe = pipes[0]; + + pthread_create(&uffd_desc->readers[i], NULL, uffd_handler_thread_fn, + &uffd_desc->reader_args[i]); + + PER_VCPU_DEBUG("Created uffd thread %i for HVA range [%p, %p)\n", + i, hva, hva + len); + } return uffd_desc; } @@ -167,19 +187,30 @@ struct uffd_desc *uffd_setup_demand_paging(int uffd_mode, useconds_t delay, void uffd_stop_demand_paging(struct uffd_desc *uffd) { char c = 0; - int ret; + int i, ret; - ret = write(uffd->pipefds[1], &c, 1); - TEST_ASSERT(ret == 1, "Unable to write to pipefd"); + for (i = 0; i < uffd->num_readers; ++i) { + ret = write(uffd->pipefds[i], &c, 1); + TEST_ASSERT( + ret == 1, "Unable to write to pipefd %i for uffd_desc %p", i, uffd); + } - ret = pthread_join(uffd->thread, NULL); - TEST_ASSERT(ret == 0, "Pthread_join failed."); + for (i = 0; i < uffd->num_readers; ++i) { + ret = pthread_join(uffd->readers[i], NULL); + TEST_ASSERT( + ret == 0, "Pthread_join failed on reader %i for uffd_desc %p", i, uffd); + } close(uffd->uffd); - close(uffd->pipefds[1]); - close(uffd->pipefds[0]); + for (i = 0; i < uffd->num_readers; ++i) { + close(uffd->pipefds[i]); + close(uffd->reader_args[i].pipe); + } + free(uffd->pipefds); + free(uffd->readers); + free(uffd->reader_args); free(uffd); } From patchwork Thu Nov 9 21:03:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anish Moorthy X-Patchwork-Id: 13451707 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F317F3984F for ; Thu, 9 Nov 2023 21:03:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="xGm3433i" Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 19EA14C03 for ; Thu, 9 Nov 2023 13:03:57 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-5a7d261a84bso18621747b3.3 for ; Thu, 09 Nov 2023 13:03:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1699563836; x=1700168636; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=+gpIf+3+94gtvFVbnfXs0lHfYeXyNXAQWYAzs/xCr+4=; b=xGm3433iBc4/SCUy+u4pPm3tDLHqgsxejx2eWzQRN0+A7D+bfyGoh5SeSeNYiSR9gJ UHceAMMplmT0qFXTY9bNALKeoyeDVL1jHI9NPfAjQMMGgLjmnJ6AEqX6mrKI2cJTpRn/ ZGGqPKcqNhYWbrsgyJMej2zkWAvyWLcFuVbUOhwhgNdC87z7TSN6RfSBLb3KalDWNEi1 OSz7OCoyv46s8FVQh+KeXbP7JX8qxH22TehPCYWBn13xFxIXJSG3UU9o7GvJjfHo0RGS pEE8kaJUHD8idGCGPpa+xUiHkvvCufpKPYzg4JqVT+T/9byQt/4KNfggZ1Kib6rV/PW3 vM5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699563836; x=1700168636; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=+gpIf+3+94gtvFVbnfXs0lHfYeXyNXAQWYAzs/xCr+4=; b=GuG+czQx2Hq5DTLOf1y4r/koINshXgPxIO1oCumZl0IXHbE8mqTzj3SIRyg2rFF3kT /1qn7tqMFpYLfiVN35OjvMe5kWJvLEEXxvCujheY3ZDWLG8Zo+UVDCH69sFSUeObe+sS 8itS/pBIp3BxMjVqo5CaZ8u1/cmG/nUeV0kXlWvgdqs3JD2c/ppZGWc+vxwNiIoTUVeQ VzRDsfS1XpucTpl1Cp0uGWECSg7oVl/obm5rCZ1ksuqCdknS+ghhqmfgbzzvMNuru+l1 J8RGvtJo+d3Lw4DxPMn5ytRIRBq9eXdte9UXAjM6je2sj9uWGJBAhFnRG38rgK90y01U KZ/Q== X-Gm-Message-State: AOJu0Yz/oulII7jaSMcacI7NIPPpUEpMb4GTxamMR3r0SFFQH8pVJqd8 +HG8X8vCNOBQf4cbly5TUwkLjKr764+1hA== X-Google-Smtp-Source: AGHT+IGot9REZNapp+cyRtlZcvZmtFQgUqJxuXKpOVq+Yvxt4bIIfoZ74Q/Nsan6zjTjWuZjAV1phgrvl0wnbA== X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9]) (user=amoorthy job=sendgmr) by 2002:a81:8343:0:b0:59f:77f6:6d12 with SMTP id t64-20020a818343000000b0059f77f66d12mr168413ywf.0.1699563836347; Thu, 09 Nov 2023 13:03:56 -0800 (PST) Date: Thu, 9 Nov 2023 21:03:23 +0000 In-Reply-To: <20231109210325.3806151-1-amoorthy@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20231109210325.3806151-1-amoorthy@google.com> X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog Message-ID: <20231109210325.3806151-13-amoorthy@google.com> Subject: [PATCH v6 12/14] KVM: selftests: Use EPOLL in userfaultfd_util reader threads and signal errors via TEST_ASSERT From: Anish Moorthy To: seanjc@google.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev Cc: oliver.upton@linux.dev, pbonzini@redhat.com, maz@kernel.org, robert.hoo.linux@gmail.com, jthoughton@google.com, amoorthy@google.com, dmatlack@google.com, axelrasmussen@google.com, peterx@redhat.com, nadav.amit@gmail.com, isaku.yamahata@gmail.com, kconsul@linux.vnet.ibm.com With multiple reader threads POLLing a single UFFD, the test suffers from the thundering herd problem: performance degrades as the number of reader threads is increased. Solve this issue [1] by switching the the polling mechanism to EPOLL + EPOLLEXCLUSIVE. Also, change the error-handling convention of uffd_handler_thread_fn. Instead of just printing errors and returning early from the polling loop, check for them via TEST_ASSERT. "return NULL" is reserved for a successful exit from uffd_handler_thread_fn, ie one triggered by a write to the exit pipe. Performance samples generated by the command in [2] are given below. Num Reader Threads, Paging Rate (POLL), Paging Rate (EPOLL) 1 249k 185k 2 201k 235k 4 186k 155k 16 150k 217k 32 89k 198k [1] Single-vCPU performance does suffer somewhat. [2] ./demand_paging_test -u MINOR -s shmem -v 4 -o -r Signed-off-by: Anish Moorthy Acked-by: James Houghton --- .../selftests/kvm/demand_paging_test.c | 1 - .../selftests/kvm/lib/userfaultfd_util.c | 74 +++++++++---------- 2 files changed, 35 insertions(+), 40 deletions(-) diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c index f7897a951f90..0455347f932a 100644 --- a/tools/testing/selftests/kvm/demand_paging_test.c +++ b/tools/testing/selftests/kvm/demand_paging_test.c @@ -13,7 +13,6 @@ #include #include #include -#include #include #include #include diff --git a/tools/testing/selftests/kvm/lib/userfaultfd_util.c b/tools/testing/selftests/kvm/lib/userfaultfd_util.c index 6f220aa4fb08..2a179133645a 100644 --- a/tools/testing/selftests/kvm/lib/userfaultfd_util.c +++ b/tools/testing/selftests/kvm/lib/userfaultfd_util.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include "kvm_util.h" @@ -32,60 +33,55 @@ static void *uffd_handler_thread_fn(void *arg) int64_t pages = 0; struct timespec start; struct timespec ts_diff; + int epollfd; + struct epoll_event evt; + + epollfd = epoll_create(1); + TEST_ASSERT(epollfd >= 0, "Failed to create epollfd."); + + evt.events = EPOLLIN | EPOLLEXCLUSIVE; + evt.data.u32 = 0; + TEST_ASSERT(epoll_ctl(epollfd, EPOLL_CTL_ADD, uffd, &evt) == 0, + "Failed to add uffd to epollfd"); + + evt.events = EPOLLIN; + evt.data.u32 = 1; + TEST_ASSERT(epoll_ctl(epollfd, EPOLL_CTL_ADD, reader_args->pipe, &evt) == 0, + "Failed to add pipe to epollfd"); clock_gettime(CLOCK_MONOTONIC, &start); while (1) { struct uffd_msg msg; - struct pollfd pollfd[2]; - char tmp_chr; int r; - pollfd[0].fd = uffd; - pollfd[0].events = POLLIN; - pollfd[1].fd = reader_args->pipe; - pollfd[1].events = POLLIN; - - r = poll(pollfd, 2, -1); - switch (r) { - case -1: - pr_info("poll err"); - continue; - case 0: - continue; - case 1: - break; - default: - pr_info("Polling uffd returned %d", r); - return NULL; - } + r = epoll_wait(epollfd, &evt, 1, -1); + TEST_ASSERT(r == 1, + "Unexpected number of events (%d) from epoll, errno = %d", + r, errno); - if (pollfd[0].revents & POLLERR) { - pr_info("uffd revents has POLLERR"); - return NULL; - } + if (evt.data.u32 == 1) { + char tmp_chr; - if (pollfd[1].revents & POLLIN) { - r = read(pollfd[1].fd, &tmp_chr, 1); + TEST_ASSERT(!(evt.events & (EPOLLERR | EPOLLHUP)), + "Reader thread received EPOLLERR or EPOLLHUP on pipe."); + r = read(reader_args->pipe, &tmp_chr, 1); TEST_ASSERT(r == 1, - "Error reading pipefd in UFFD thread\n"); + "Error reading pipefd in uffd reader thread"); break; } - if (!(pollfd[0].revents & POLLIN)) - continue; + TEST_ASSERT(!(evt.events & (EPOLLERR | EPOLLHUP)), + "Reader thread received EPOLLERR or EPOLLHUP on uffd."); r = read(uffd, &msg, sizeof(msg)); if (r == -1) { - if (errno == EAGAIN) - continue; - pr_info("Read of uffd got errno %d\n", errno); - return NULL; + TEST_ASSERT(errno == EAGAIN, + "Error reading from UFFD: errno = %d", errno); + continue; } - if (r != sizeof(msg)) { - pr_info("Read on uffd returned unexpected size: %d bytes", r); - return NULL; - } + TEST_ASSERT(r == sizeof(msg), + "Read on uffd returned unexpected number of bytes (%d)", r); if (!(msg.event & UFFD_EVENT_PAGEFAULT)) continue; @@ -93,8 +89,8 @@ static void *uffd_handler_thread_fn(void *arg) if (reader_args->delay) usleep(reader_args->delay); r = reader_args->handler(reader_args->uffd_mode, uffd, &msg); - if (r < 0) - return NULL; + TEST_ASSERT(r >= 0, + "Reader thread handler fn returned negative value %d", r); pages++; } From patchwork Thu Nov 9 21:03:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anish Moorthy X-Patchwork-Id: 13451708 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2822739862 for ; Thu, 9 Nov 2023 21:03:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="WoN/S/qg" Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5AE5A49EA for ; Thu, 9 Nov 2023 13:03:58 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-5afe220cadeso18512337b3.3 for ; Thu, 09 Nov 2023 13:03:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1699563837; x=1700168637; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=2a/M+0Fr1PqC5kgKOfRG5N8I8zu921PMSouy8YMa7RU=; b=WoN/S/qgMPKbQJC6g+FjJzVngApy91gFcXr6idpMzzZW58JvfHB465Vg230vhUI22y TM2HJ3JSXvRAvG0ATN349naHWRQJs/Ei+oDqppbwrDarw4t40LrdjkzIbhC+I99LjJ++ jqo3Cix2MVA9u7glOE2zJuKUiN6uRC+4ELgkox7H9LSHuPbUhhHbIykf8M/0UU2XTWUB tQZyCgW4+xeal4dQJEsf4T9Kh93vSxyHEtiqD82wwpWFiiLoo/yKGjlawcvAYhgvxFuX f0WXANlamJ9IsJtv+rrYyXzmsTq7MNw2+Q2ZHT+asHl40AY/Qpr6/WUde32OSOkY0JGm EiFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699563837; x=1700168637; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=2a/M+0Fr1PqC5kgKOfRG5N8I8zu921PMSouy8YMa7RU=; b=k50xyFAfJERA0Hl960Fbktq/JqRf4owx/RJlj7SXdLklrBbs+x1K5x4bFu+G9jvR8q H7iuwSDobfCtj6i8kmNW6nInoG64feTqFwfmH1EXUY/jD0gBax6HT2yyC2Pi9cq3BD7E 35Ko5JMCtxxFXncKR+jHVh2p0ZEsh1OL/pvz+SkFrw617easPlfjZ8CmfKPbXlYiyF0l EmDu6wYSM7oAiCghFO67B1/A92Sy4sbTmXHXgOD8r4SrXExfljikuuckO5ZQcadHj7Ky p/wFA9wkBIJO6JYOVb4PZCR/YG7xVf6or+KReLzKND+/DUcH+Co4mXGb6RV3rynI9v4l paew== X-Gm-Message-State: AOJu0YxrWim/oKVuA6+vH/KH2SiyB8HqHNLfeTthoEaIZ+ZiGlufMNVU vD/0SFAevsP9p3b72kNrLKVX+babj4B4qw== X-Google-Smtp-Source: AGHT+IGkRtZ0otuvawnXg3FnB/DDPqjO//IWJIG3gi/q13j0FI8C9+TJy7xo5m2vn9PLkCfKS8hq9ldNvIQW3A== X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9]) (user=amoorthy job=sendgmr) by 2002:a25:800c:0:b0:da0:48e1:5f46 with SMTP id m12-20020a25800c000000b00da048e15f46mr166858ybk.9.1699563837558; Thu, 09 Nov 2023 13:03:57 -0800 (PST) Date: Thu, 9 Nov 2023 21:03:24 +0000 In-Reply-To: <20231109210325.3806151-1-amoorthy@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20231109210325.3806151-1-amoorthy@google.com> X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog Message-ID: <20231109210325.3806151-14-amoorthy@google.com> Subject: [PATCH v6 13/14] KVM: selftests: Add memslot_flags parameter to memstress_create_vm() From: Anish Moorthy To: seanjc@google.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev Cc: oliver.upton@linux.dev, pbonzini@redhat.com, maz@kernel.org, robert.hoo.linux@gmail.com, jthoughton@google.com, amoorthy@google.com, dmatlack@google.com, axelrasmussen@google.com, peterx@redhat.com, nadav.amit@gmail.com, isaku.yamahata@gmail.com, kconsul@linux.vnet.ibm.com Memslot flags aren't currently exposed to the tests, and are just always set to 0. Add a parameter to allow tests to manually set those flags. Signed-off-by: Anish Moorthy --- tools/testing/selftests/kvm/access_tracking_perf_test.c | 2 +- tools/testing/selftests/kvm/demand_paging_test.c | 2 +- tools/testing/selftests/kvm/dirty_log_perf_test.c | 2 +- tools/testing/selftests/kvm/include/memstress.h | 2 +- tools/testing/selftests/kvm/lib/memstress.c | 4 ++-- .../testing/selftests/kvm/memslot_modification_stress_test.c | 2 +- .../selftests/kvm/x86_64/dirty_log_page_splitting_test.c | 2 +- 7 files changed, 8 insertions(+), 8 deletions(-) diff --git a/tools/testing/selftests/kvm/access_tracking_perf_test.c b/tools/testing/selftests/kvm/access_tracking_perf_test.c index 3c7defd34f56..b51656b408b8 100644 --- a/tools/testing/selftests/kvm/access_tracking_perf_test.c +++ b/tools/testing/selftests/kvm/access_tracking_perf_test.c @@ -306,7 +306,7 @@ static void run_test(enum vm_guest_mode mode, void *arg) struct kvm_vm *vm; int nr_vcpus = params->nr_vcpus; - vm = memstress_create_vm(mode, nr_vcpus, params->vcpu_memory_bytes, 1, + vm = memstress_create_vm(mode, nr_vcpus, params->vcpu_memory_bytes, 1, 0, params->backing_src, !overlap_memory_access); memstress_start_vcpu_threads(nr_vcpus, vcpu_thread_main); diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c index 0455347f932a..61bb2e23bef0 100644 --- a/tools/testing/selftests/kvm/demand_paging_test.c +++ b/tools/testing/selftests/kvm/demand_paging_test.c @@ -163,7 +163,7 @@ static void run_test(enum vm_guest_mode mode, void *arg) double vcpu_paging_rate; uint64_t uffd_region_size; - vm = memstress_create_vm(mode, nr_vcpus, guest_percpu_mem_size, 1, + vm = memstress_create_vm(mode, nr_vcpus, guest_percpu_mem_size, 1, 0, p->src_type, p->partition_vcpu_memory_access); demand_paging_size = get_backing_src_pagesz(p->src_type); diff --git a/tools/testing/selftests/kvm/dirty_log_perf_test.c b/tools/testing/selftests/kvm/dirty_log_perf_test.c index d374dbcf9a53..8b1a84a4db3b 100644 --- a/tools/testing/selftests/kvm/dirty_log_perf_test.c +++ b/tools/testing/selftests/kvm/dirty_log_perf_test.c @@ -153,7 +153,7 @@ static void run_test(enum vm_guest_mode mode, void *arg) int i; vm = memstress_create_vm(mode, nr_vcpus, guest_percpu_mem_size, - p->slots, p->backing_src, + p->slots, 0, p->backing_src, p->partition_vcpu_memory_access); pr_info("Random seed: %u\n", p->random_seed); diff --git a/tools/testing/selftests/kvm/include/memstress.h b/tools/testing/selftests/kvm/include/memstress.h index ce4e603050ea..8be9609d3ca0 100644 --- a/tools/testing/selftests/kvm/include/memstress.h +++ b/tools/testing/selftests/kvm/include/memstress.h @@ -56,7 +56,7 @@ struct memstress_args { extern struct memstress_args memstress_args; struct kvm_vm *memstress_create_vm(enum vm_guest_mode mode, int nr_vcpus, - uint64_t vcpu_memory_bytes, int slots, + uint64_t vcpu_memory_bytes, int slots, uint32_t slot_flags, enum vm_mem_backing_src_type backing_src, bool partition_vcpu_memory_access); void memstress_destroy_vm(struct kvm_vm *vm); diff --git a/tools/testing/selftests/kvm/lib/memstress.c b/tools/testing/selftests/kvm/lib/memstress.c index d05487e5a371..e74b09f39769 100644 --- a/tools/testing/selftests/kvm/lib/memstress.c +++ b/tools/testing/selftests/kvm/lib/memstress.c @@ -123,7 +123,7 @@ void memstress_setup_vcpus(struct kvm_vm *vm, int nr_vcpus, } struct kvm_vm *memstress_create_vm(enum vm_guest_mode mode, int nr_vcpus, - uint64_t vcpu_memory_bytes, int slots, + uint64_t vcpu_memory_bytes, int slots, uint32_t slot_flags, enum vm_mem_backing_src_type backing_src, bool partition_vcpu_memory_access) { @@ -212,7 +212,7 @@ struct kvm_vm *memstress_create_vm(enum vm_guest_mode mode, int nr_vcpus, vm_userspace_mem_region_add(vm, backing_src, region_start, MEMSTRESS_MEM_SLOT_INDEX + i, - region_pages, 0); + region_pages, slot_flags); } /* Do mapping for the demand paging memory slot */ diff --git a/tools/testing/selftests/kvm/memslot_modification_stress_test.c b/tools/testing/selftests/kvm/memslot_modification_stress_test.c index 9855c41ca811..0b19ec3ecc9c 100644 --- a/tools/testing/selftests/kvm/memslot_modification_stress_test.c +++ b/tools/testing/selftests/kvm/memslot_modification_stress_test.c @@ -95,7 +95,7 @@ static void run_test(enum vm_guest_mode mode, void *arg) struct test_params *p = arg; struct kvm_vm *vm; - vm = memstress_create_vm(mode, nr_vcpus, guest_percpu_mem_size, 1, + vm = memstress_create_vm(mode, nr_vcpus, guest_percpu_mem_size, 1, 0, VM_MEM_SRC_ANONYMOUS, p->partition_vcpu_memory_access); diff --git a/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c b/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c index 634c6bfcd572..a770d7fa469a 100644 --- a/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c +++ b/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c @@ -100,7 +100,7 @@ static void run_test(enum vm_guest_mode mode, void *unused) struct kvm_page_stats stats_dirty_logging_disabled; struct kvm_page_stats stats_repopulated; - vm = memstress_create_vm(mode, VCPUS, guest_percpu_mem_size, + vm = memstress_create_vm(mode, VCPUS, guest_percpu_mem_size, 0, SLOTS, backing_src, false); guest_num_pages = (VCPUS * guest_percpu_mem_size) >> vm->page_shift; From patchwork Thu Nov 9 21:03:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anish Moorthy X-Patchwork-Id: 13451709 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 148B339870 for ; Thu, 9 Nov 2023 21:04:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="31U00ztZ" Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 740F14C1A for ; Thu, 9 Nov 2023 13:03:59 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-da04776a869so1652551276.0 for ; Thu, 09 Nov 2023 13:03:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1699563838; x=1700168638; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ebcFSdFa++R257OKZj60sR0YHtuRHuIkQtR+xwo+tiE=; b=31U00ztZqislD1eeaUpWE1aG0VNFToXdjVs/ZNAziUeissdeJdzbjDJF/D8X9RG+4x MavPmLXAId+phTMEhbBsYjIn69vJ10UROAS4R/+FgIW2+tXYAp+30oh/+5+UXD5vOZAk 707pBJMXRTXKOomsXgmV4T3rsCPxGTak2mIZIH97H+8aFzryyRXTLdxDgPJLVMeS/XTt gLy3Psoj6Pj20eY7FSmI3QblVLB1N08Eq1oFZnlXCWFX/NfQjQrA+gwZ2LMSU0OLBm+8 AuhB5ubMYPjj8O1qvus0cL1sK/u7h3bO6ErlwFxmJG+xi469VpgtXSDyn+oR6L6AjsM7 3ixA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699563838; x=1700168638; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ebcFSdFa++R257OKZj60sR0YHtuRHuIkQtR+xwo+tiE=; b=SE82r1o63PsL/Ip4h+teGp0IvZYrdt2qhOdW/rc6WwVkLqGRLRJ4HHCw5f7JKMoMjl lipLzd1XyLxrneZA98RG3BriiEy+VRIABPzGpTCAJcnjzO0kOLJSKfNZotavdkouAQkn litABa7JOLEbj6iAesZB0ipPGRYQoRgytJ02oTxEgTXqfMUbbEbbyjYj33m4TlMqtlZI tHavuD7SG6g6jiV8MNKrI4zgoo2uV+R4t0Tpf9C/E1LO5E8zMDSys2nBhAST0Dm4dLmy K3bg/H+vBUrlhVb3lHXB5+tOjDohytyVj0+vB8Rnb2pEvyjFFLAjHtspk8mpTlb8Roh4 USLg== X-Gm-Message-State: AOJu0YztcJRPgWeiWxlSyxgSDN0QNHEXCLEuszrUCia+t9repnoTztuu xGD2JPiJp4av3kfD/FuZu58OyDG9sYfCZw== X-Google-Smtp-Source: AGHT+IEpMK+w1yaYJDTMb00d7SXUilcTamTn9YrAyFRRyCMycM7ow5u4aT2y4zOC/qHpA+LckxXipp6JbevdfA== X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9]) (user=amoorthy job=sendgmr) by 2002:a25:da44:0:b0:d89:dd12:163d with SMTP id n65-20020a25da44000000b00d89dd12163dmr155114ybf.1.1699563838720; Thu, 09 Nov 2023 13:03:58 -0800 (PST) Date: Thu, 9 Nov 2023 21:03:25 +0000 In-Reply-To: <20231109210325.3806151-1-amoorthy@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20231109210325.3806151-1-amoorthy@google.com> X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog Message-ID: <20231109210325.3806151-15-amoorthy@google.com> Subject: [PATCH v6 14/14] KVM: selftests: Handle memory fault exits in demand_paging_test From: Anish Moorthy To: seanjc@google.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev Cc: oliver.upton@linux.dev, pbonzini@redhat.com, maz@kernel.org, robert.hoo.linux@gmail.com, jthoughton@google.com, amoorthy@google.com, dmatlack@google.com, axelrasmussen@google.com, peterx@redhat.com, nadav.amit@gmail.com, isaku.yamahata@gmail.com, kconsul@linux.vnet.ibm.com Demonstrate a (very basic) scheme for supporting memory fault exits. From the vCPU threads: 1. Simply issue UFFDIO_COPY/CONTINUEs in response to memory fault exits, with the purpose of establishing the absent mappings. Do so with wake_waiters=false to avoid serializing on the userfaultfd wait queue locks. 2. When the UFFDIO_COPY/CONTINUE in (1) fails with EEXIST, assume that the mapping was already established but is currently absent [A] and attempt to populate it using MADV_POPULATE_WRITE. Issue UFFDIO_COPY/CONTINUEs from the reader threads as well, but with wake_waiters=true to ensure that any threads sleeping on the uffd are eventually woken up. A real VMM would track whether it had already COPY/CONTINUEd pages (eg, via a bitmap) to avoid calls destined to EEXIST. However, even the naive approach is enough to demonstrate the performance advantages of KVM_EXIT_MEMORY_FAULT. [A] In reality it is much likelier that the vCPU thread simply lost a race to establish the mapping for the page. Signed-off-by: Anish Moorthy Acked-by: James Houghton --- .../selftests/kvm/demand_paging_test.c | 245 +++++++++++++----- 1 file changed, 173 insertions(+), 72 deletions(-) diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c index 61bb2e23bef0..44bdcc7aad87 100644 --- a/tools/testing/selftests/kvm/demand_paging_test.c +++ b/tools/testing/selftests/kvm/demand_paging_test.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include "kvm_util.h" @@ -31,36 +32,102 @@ static uint64_t guest_percpu_mem_size = DEFAULT_PER_VCPU_MEM_SIZE; static size_t demand_paging_size; static char *guest_data_prototype; +static int num_uffds; +static size_t uffd_region_size; +static struct uffd_desc **uffd_descs; +/* + * Delay when demand paging is performed through userfaultfd or directly by + * vcpu_worker in the case of an annotated memory fault. + */ +static useconds_t uffd_delay; +static int uffd_mode; + + +static int handle_uffd_page_request(int uffd_mode, int uffd, uint64_t hva, + bool is_vcpu); + +static void madv_write_or_err(uint64_t gpa) +{ + int r; + void *hva = addr_gpa2hva(memstress_args.vm, gpa); + + r = madvise(hva, demand_paging_size, MADV_POPULATE_WRITE); + TEST_ASSERT(r == 0, + "MADV_POPULATE_WRITE on hva 0x%lx (gpa 0x%lx) fail, errno %i\n", + (uintptr_t) hva, gpa, errno); +} + +static void ready_page(uint64_t gpa) +{ + int r, uffd; + + /* + * This test only registers memslot 1 w/ userfaultfd. Any accesses outside + * the registered ranges should fault in the physical pages through + * MADV_POPULATE_WRITE. + */ + if ((gpa < memstress_args.gpa) + || (gpa >= memstress_args.gpa + memstress_args.size)) { + madv_write_or_err(gpa); + } else { + if (uffd_delay) + usleep(uffd_delay); + + uffd = uffd_descs[(gpa - memstress_args.gpa) / uffd_region_size]->uffd; + + r = handle_uffd_page_request(uffd_mode, uffd, + (uint64_t) addr_gpa2hva(memstress_args.vm, gpa), true); + + if (r == EEXIST) + madv_write_or_err(gpa); + } +} + static void vcpu_worker(struct memstress_vcpu_args *vcpu_args) { struct kvm_vcpu *vcpu = vcpu_args->vcpu; int vcpu_idx = vcpu_args->vcpu_idx; struct kvm_run *run = vcpu->run; - struct timespec start; - struct timespec ts_diff; + struct timespec last_start; + struct timespec total_runtime = {}; int ret; - - clock_gettime(CLOCK_MONOTONIC, &start); - - /* Let the guest access its memory */ - ret = _vcpu_run(vcpu); - TEST_ASSERT(ret == 0, "vcpu_run failed: %d\n", ret); - if (get_ucall(vcpu, NULL) != UCALL_SYNC) { - TEST_ASSERT(false, - "Invalid guest sync status: exit_reason=%s\n", - exit_reason_str(run->exit_reason)); + u64 num_memory_fault_exits = 0; + bool annotated_memory_fault = false; + + while (true) { + clock_gettime(CLOCK_MONOTONIC, &last_start); + /* Let the guest access its memory */ + ret = _vcpu_run(vcpu); + annotated_memory_fault = errno == EFAULT + && run->exit_reason == KVM_EXIT_MEMORY_FAULT; + TEST_ASSERT(ret == 0 || annotated_memory_fault, + "vcpu_run failed: %d\n", ret); + + total_runtime = timespec_add(total_runtime, + timespec_elapsed(last_start)); + if (ret != 0 && get_ucall(vcpu, NULL) != UCALL_SYNC) { + + if (annotated_memory_fault) { + ++num_memory_fault_exits; + ready_page(run->memory_fault.gpa); + continue; + } + + TEST_ASSERT(false, + "Invalid guest sync status: exit_reason=%s\n", + exit_reason_str(run->exit_reason)); + } + break; } - - ts_diff = timespec_elapsed(start); - PER_VCPU_DEBUG("vCPU %d execution time: %ld.%.9lds\n", vcpu_idx, - ts_diff.tv_sec, ts_diff.tv_nsec); + PER_VCPU_DEBUG("vCPU %d execution time: %ld.%.9lds, %d memory fault exits\n", + vcpu_idx, total_runtime.tv_sec, total_runtime.tv_nsec, + num_memory_fault_exits); } -static int handle_uffd_page_request(int uffd_mode, int uffd, - struct uffd_msg *msg) +static int handle_uffd_page_request(int uffd_mode, int uffd, uint64_t hva, + bool is_vcpu) { pid_t tid = syscall(__NR_gettid); - uint64_t addr = msg->arg.pagefault.address; struct timespec start; struct timespec ts_diff; int r; @@ -71,16 +138,15 @@ static int handle_uffd_page_request(int uffd_mode, int uffd, struct uffdio_copy copy; copy.src = (uint64_t)guest_data_prototype; - copy.dst = addr; + copy.dst = hva; copy.len = demand_paging_size; - copy.mode = 0; + copy.mode = is_vcpu ? UFFDIO_COPY_MODE_DONTWAKE : 0; - r = ioctl(uffd, UFFDIO_COPY, ©); /* - * With multiple vCPU threads fault on a single page and there are - * multiple readers for the UFFD, at least one of the UFFDIO_COPYs - * will fail with EEXIST: handle that case without signaling an - * error. + * With multiple vCPU threads and at least one of multiple reader threads + * or vCPU memory faults, multiple vCPUs accessing an absent page will + * almost certainly cause some thread doing the UFFDIO_COPY here to get + * EEXIST: make sure to allow that case. * * Note that this also suppress any EEXISTs occurring from, * e.g., the first UFFDIO_COPY/CONTINUEs on a page. That never @@ -88,23 +154,24 @@ static int handle_uffd_page_request(int uffd_mode, int uffd, * some external state to correctly surface EEXISTs to userspace * (or prevent duplicate COPY/CONTINUEs in the first place). */ - if (r == -1 && errno != EEXIST) { - pr_info("Failed UFFDIO_COPY in 0x%lx from thread %d, errno = %d\n", - addr, tid, errno); - return r; - } + r = ioctl(uffd, UFFDIO_COPY, ©); + TEST_ASSERT(r == 0 || errno == EEXIST, + "Thread 0x%x failed UFFDIO_COPY on hva 0x%lx, errno = %d", + tid, hva, errno); } else if (uffd_mode == UFFDIO_REGISTER_MODE_MINOR) { + /* The comments in the UFFDIO_COPY branch also apply here. */ struct uffdio_continue cont = {0}; - cont.range.start = addr; + cont.range.start = hva; cont.range.len = demand_paging_size; + cont.mode = is_vcpu ? UFFDIO_CONTINUE_MODE_DONTWAKE : 0; r = ioctl(uffd, UFFDIO_CONTINUE, &cont); /* - * With multiple vCPU threads fault on a single page and there are - * multiple readers for the UFFD, at least one of the UFFDIO_COPYs - * will fail with EEXIST: handle that case without signaling an - * error. + * With multiple vCPU threads and at least one of multiple reader threads + * or vCPU memory faults, multiple vCPUs accessing an absent page will + * almost certainly cause some thread doing the UFFDIO_COPY here to get + * EEXIST: make sure to allow that case. * * Note that this also suppress any EEXISTs occurring from, * e.g., the first UFFDIO_COPY/CONTINUEs on a page. That never @@ -112,32 +179,54 @@ static int handle_uffd_page_request(int uffd_mode, int uffd, * some external state to correctly surface EEXISTs to userspace * (or prevent duplicate COPY/CONTINUEs in the first place). */ - if (r == -1 && errno != EEXIST) { - pr_info("Failed UFFDIO_CONTINUE in 0x%lx, thread %d, errno = %d\n", - addr, tid, errno); - return r; - } + TEST_ASSERT(r == 0 || errno == EEXIST, + "Thread 0x%x failed UFFDIO_CONTINUE on hva 0x%lx, errno = %d", + tid, hva, errno); } else { TEST_FAIL("Invalid uffd mode %d", uffd_mode); } + /* + * If the above UFFDIO_COPY/CONTINUE failed with EEXIST, waiting threads + * will not have been woken: wake them here. + */ + if (!is_vcpu && r != 0) { + struct uffdio_range range = { + .start = hva, + .len = demand_paging_size + }; + r = ioctl(uffd, UFFDIO_WAKE, &range); + TEST_ASSERT(r == 0, + "Thread 0x%x failed UFFDIO_WAKE on hva 0x%lx, errno = %d", + tid, hva, errno); + } + ts_diff = timespec_elapsed(start); PER_PAGE_DEBUG("UFFD page-in %d \t%ld ns\n", tid, timespec_to_ns(ts_diff)); PER_PAGE_DEBUG("Paged in %ld bytes at 0x%lx from thread %d\n", - demand_paging_size, addr, tid); + demand_paging_size, hva, tid); return 0; } +static int handle_uffd_page_request_from_uffd(int uffd_mode, int uffd, + struct uffd_msg *msg) +{ + TEST_ASSERT(msg->event == UFFD_EVENT_PAGEFAULT, + "Received uffd message with event %d != UFFD_EVENT_PAGEFAULT", + msg->event); + return handle_uffd_page_request(uffd_mode, uffd, + msg->arg.pagefault.address, false); +} + struct test_params { - int uffd_mode; bool single_uffd; - useconds_t uffd_delay; int readers_per_uffd; enum vm_mem_backing_src_type src_type; bool partition_vcpu_memory_access; + bool memfault_exits; }; static void prefault_mem(void *alias, uint64_t len) @@ -155,16 +244,22 @@ static void run_test(enum vm_guest_mode mode, void *arg) { struct memstress_vcpu_args *vcpu_args; struct test_params *p = arg; - struct uffd_desc **uffd_descs = NULL; struct timespec start; struct timespec ts_diff; struct kvm_vm *vm; - int i, num_uffds = 0; + int i; double vcpu_paging_rate; - uint64_t uffd_region_size; + uint32_t slot_flags = 0; + bool uffd_memfault_exits = uffd_mode && p->memfault_exits; - vm = memstress_create_vm(mode, nr_vcpus, guest_percpu_mem_size, 1, 0, - p->src_type, p->partition_vcpu_memory_access); + if (uffd_memfault_exits) { + TEST_ASSERT(kvm_has_cap(KVM_CAP_EXIT_ON_MISSING) > 0, + "KVM does not have KVM_CAP_EXIT_ON_MISSING"); + slot_flags = KVM_MEM_EXIT_ON_MISSING; + } + + vm = memstress_create_vm(mode, nr_vcpus, guest_percpu_mem_size, + 1, slot_flags, p->src_type, p->partition_vcpu_memory_access); demand_paging_size = get_backing_src_pagesz(p->src_type); @@ -173,21 +268,21 @@ static void run_test(enum vm_guest_mode mode, void *arg) "Failed to allocate buffer for guest data pattern"); memset(guest_data_prototype, 0xAB, demand_paging_size); - if (p->uffd_mode == UFFDIO_REGISTER_MODE_MINOR) { - num_uffds = p->single_uffd ? 1 : nr_vcpus; - for (i = 0; i < num_uffds; i++) { - vcpu_args = &memstress_args.vcpu_args[i]; - prefault_mem(addr_gpa2alias(vm, vcpu_args->gpa), - vcpu_args->pages * memstress_args.guest_page_size); - } - } - - if (p->uffd_mode) { + if (uffd_mode) { num_uffds = p->single_uffd ? 1 : nr_vcpus; uffd_region_size = nr_vcpus * guest_percpu_mem_size / num_uffds; + if (uffd_mode == UFFDIO_REGISTER_MODE_MINOR) { + for (i = 0; i < num_uffds; i++) { + vcpu_args = &memstress_args.vcpu_args[i]; + prefault_mem(addr_gpa2alias(vm, vcpu_args->gpa), + uffd_region_size); + } + } + uffd_descs = malloc(num_uffds * sizeof(struct uffd_desc *)); - TEST_ASSERT(uffd_descs, "Memory allocation failed"); + TEST_ASSERT(uffd_descs, "Failed to allocate uffd descriptors"); + for (i = 0; i < num_uffds; i++) { struct memstress_vcpu_args *vcpu_args; void *vcpu_hva; @@ -201,10 +296,10 @@ static void run_test(enum vm_guest_mode mode, void *arg) * requests. */ uffd_descs[i] = uffd_setup_demand_paging( - p->uffd_mode, p->uffd_delay, vcpu_hva, + uffd_mode, uffd_delay, vcpu_hva, uffd_region_size, p->readers_per_uffd, - &handle_uffd_page_request); + &handle_uffd_page_request_from_uffd); } } @@ -218,7 +313,7 @@ static void run_test(enum vm_guest_mode mode, void *arg) ts_diff = timespec_elapsed(start); pr_info("All vCPU threads joined\n"); - if (p->uffd_mode) { + if (uffd_mode) { /* Tell the user fault fd handler threads to quit */ for (i = 0; i < num_uffds; i++) uffd_stop_demand_paging(uffd_descs[i]); @@ -239,7 +334,7 @@ static void run_test(enum vm_guest_mode mode, void *arg) memstress_destroy_vm(vm); free(guest_data_prototype); - if (p->uffd_mode) + if (uffd_mode) free(uffd_descs); } @@ -248,7 +343,8 @@ static void help(char *name) puts(""); printf("usage: %s [-h] [-m vm_mode] [-u uffd_mode] [-a]\n" " [-d uffd_delay_usec] [-r readers_per_uffd] [-b memory]\n" - " [-s type] [-v vcpus] [-c cpu_list] [-o]\n", name); + " [-s type] [-v vcpus] [-c cpu_list] [-o] [-w] \n", + name); guest_modes_help(); printf(" -u: use userfaultfd to handle vCPU page faults. Mode is a\n" " UFFD registration mode: 'MISSING' or 'MINOR'.\n"); @@ -260,6 +356,7 @@ static void help(char *name) " FD handler to simulate demand paging\n" " overheads. Ignored without -u.\n"); printf(" -r: Set the number of reader threads per uffd.\n"); + printf(" -w: Enable kvm cap for memory fault exits.\n"); printf(" -b: specify the size of the memory region which should be\n" " demand paged by each vCPU. e.g. 10M or 3G.\n" " Default: 1G\n"); @@ -280,29 +377,30 @@ int main(int argc, char *argv[]) .partition_vcpu_memory_access = true, .readers_per_uffd = 1, .single_uffd = false, + .memfault_exits = false, }; int opt; guest_modes_append_default(); - while ((opt = getopt(argc, argv, "ahom:u:d:b:s:v:c:r:")) != -1) { + while ((opt = getopt(argc, argv, "ahowm:u:d:b:s:v:c:r:")) != -1) { switch (opt) { case 'm': guest_modes_cmdline(optarg); break; case 'u': if (!strcmp("MISSING", optarg)) - p.uffd_mode = UFFDIO_REGISTER_MODE_MISSING; + uffd_mode = UFFDIO_REGISTER_MODE_MISSING; else if (!strcmp("MINOR", optarg)) - p.uffd_mode = UFFDIO_REGISTER_MODE_MINOR; - TEST_ASSERT(p.uffd_mode, "UFFD mode must be 'MISSING' or 'MINOR'."); + uffd_mode = UFFDIO_REGISTER_MODE_MINOR; + TEST_ASSERT(uffd_mode, "UFFD mode must be 'MISSING' or 'MINOR'."); break; case 'a': p.single_uffd = true; break; case 'd': - p.uffd_delay = strtoul(optarg, NULL, 0); - TEST_ASSERT(p.uffd_delay >= 0, "A negative UFFD delay is not supported."); + uffd_delay = strtoul(optarg, NULL, 0); + TEST_ASSERT(uffd_delay >= 0, "A negative UFFD delay is not supported."); break; case 'b': guest_percpu_mem_size = parse_size(optarg); @@ -328,6 +426,9 @@ int main(int argc, char *argv[]) "Invalid number of readers per uffd %d: must be >=1", p.readers_per_uffd); break; + case 'w': + p.memfault_exits = true; + break; case 'h': default: help(argv[0]); @@ -335,7 +436,7 @@ int main(int argc, char *argv[]) } } - if (p.uffd_mode == UFFDIO_REGISTER_MODE_MINOR && + if (uffd_mode == UFFDIO_REGISTER_MODE_MINOR && !backing_src_is_shared(p.src_type)) { TEST_FAIL("userfaultfd MINOR mode requires shared memory; pick a different -s"); }