From patchwork Thu Jul 21 00:03:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12924632 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A69CFC433EF for ; Thu, 21 Jul 2022 00:03:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E69426B0073; Wed, 20 Jul 2022 20:03:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E1C4D6B0074; Wed, 20 Jul 2022 20:03:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CBACD6B0075; Wed, 20 Jul 2022 20:03:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id B85CC6B0073 for ; Wed, 20 Jul 2022 20:03:24 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 8A579120756 for ; Thu, 21 Jul 2022 00:03:24 +0000 (UTC) X-FDA: 79709157528.18.0FAA351 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf09.hostedemail.com (Postfix) with ESMTP id 2D6F41400AB for ; Thu, 21 Jul 2022 00:03:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1658361803; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uYesQTXIGdo47yQu+QowHw+8l05Pgun0K3Pva/sDbSk=; b=PgS2oCkfQaRz5iYX+kTlYh9HFkoTCmkivlBh9rjJwRALX5a6tbHaMNex2LKxaube+YnJ7+ jPh5KO1mn0WKuj22uDhouNWFGHm6oAtGbQkIw/u5IjFpT3EwZTPBGukqpFphkiC+rb5qTF 5ObdOEMFUklx++TZzgoH4Vk/GAbu9+Y= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-494-7qb1B67pNMWBk8HjfVIrgQ-1; Wed, 20 Jul 2022 20:03:22 -0400 X-MC-Unique: 7qb1B67pNMWBk8HjfVIrgQ-1 Received: by mail-qv1-f71.google.com with SMTP id na10-20020a0562142d4a00b00473fb0f94deso3969822qvb.21 for ; Wed, 20 Jul 2022 17:03:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=uYesQTXIGdo47yQu+QowHw+8l05Pgun0K3Pva/sDbSk=; b=K2kukQM3/9WyhACjPddnY3j7JFJZDWxLkl5AcF9AnJhVw4P7046nyVVWZ3xmMcteiF 1K/KGwfrCiZlK8LiD84COkFLwXogO6WnO3fPZGBLqJZ7q8wsY4F5YPv1dp1xAeDjRU/K LxpCdwzTBubxfwHAOdXAsfHXTYvNxG8jPW5Nt8A28u/T14L3lWWQ12Crr0W8737ETqqK 36jez3914tYikfHlStvJmImtgFqWxjEI1p9zr4JvsgBcrBpZcOfMsdMXIgy7Fm75GUEb z0Z1iHb/3jFGz6KQoQfSqdJLGulqT452+Mezm+QcOKJ8kuOIJL/a+f5HuaQnPCqzFL0w 5lUg== X-Gm-Message-State: AJIora+/7vSQrm60t+W67fkuymqsrPtgSY33YLq+P7ekuV43xN/O3vlS yNxPOTeMLyLHZV6k7V5G8hkAmYLn1jpw0PqkjflP9xJDW1qXNXZ6i0wiBNjfprujtY4LMaU9hxa iA0H/qLe75p4= X-Received: by 2002:a05:6214:2a84:b0:473:2958:2b02 with SMTP id jr4-20020a0562142a8400b0047329582b02mr32042129qvb.122.1658361802027; Wed, 20 Jul 2022 17:03:22 -0700 (PDT) X-Google-Smtp-Source: AGRyM1v59tccxm2kEE0GOfREC1qZsXxxMExtmQCtk56QjVayvWjveIoVQM2AJ2cuAgdIySuuVl0NQw== X-Received: by 2002:a05:6214:2a84:b0:473:2958:2b02 with SMTP id jr4-20020a0562142a8400b0047329582b02mr32042102qvb.122.1658361801787; Wed, 20 Jul 2022 17:03:21 -0700 (PDT) Received: from localhost.localdomain (bras-base-aurron9127w-grc-37-74-12-30-48.dsl.bell.ca. [74.12.30.48]) by smtp.gmail.com with ESMTPSA id g4-20020ac87f44000000b0031eb3af3ffesm418640qtk.52.2022.07.20.17.03.20 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 20 Jul 2022 17:03:21 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: David Hildenbrand , "Dr . David Alan Gilbert" , peterx@redhat.com, John Hubbard , Sean Christopherson , Linux MM Mailing List , Andrew Morton , Paolo Bonzini , Andrea Arcangeli Subject: [PATCH v2 1/3] mm/gup: Add FOLL_INTERRUPTIBLE Date: Wed, 20 Jul 2022 20:03:16 -0400 Message-Id: <20220721000318.93522-2-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220721000318.93522-1-peterx@redhat.com> References: <20220721000318.93522-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-type: text/plain ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=PgS2oCkf; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf09.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=peterx@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1658361804; a=rsa-sha256; cv=none; b=LMkp4No9KaUt94pNP2lw6NXi4r8fCD7RHz6nkQW7+7OxDQwzK06aO37dSAMn+hG7X8Mx50 6orPZd0upxOldr5xYJNZdkPyIVr/g9eSbtTGdQQtm92NjqYs925nsI0kw3tYiNHIkmeF/5 JZDp6+NwR4xC6o/pTwcxt8Thh3V+1Ho= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1658361804; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uYesQTXIGdo47yQu+QowHw+8l05Pgun0K3Pva/sDbSk=; b=BRslQep8to/cU3PBNtvnKO7r1yYEmcA4nz/bHeROSmiA2C/wy999QWrU7rjryJyAwCbUkf IxpPMK94nLT1lJGVpqaypPAhwADq/dGlXetiLa8J6w6AECrD5R1DlI8KHbDoUIYWQidlDe qq/IwMfkCldJFaRl2iMRUNR5cKqHy/o= X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 2D6F41400AB Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=PgS2oCkf; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf09.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=peterx@redhat.com X-Stat-Signature: z5wk9boo5s9nxna7n9tembpp85zesffz X-HE-Tag: 1658361804-452508 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We have had FAULT_FLAG_INTERRUPTIBLE but it was never applied to GUPs. One issue with it is that not all GUP paths are able to handle signal delivers besides SIGKILL. That's not ideal for the GUP users who are actually able to handle these cases, like KVM. KVM uses GUP extensively on faulting guest pages, during which we've got existing infrastructures to retry a page fault at a later time. Allowing the GUP to be interrupted by generic signals can make KVM related threads to be more responsive. For examples: (1) SIGUSR1: which QEMU/KVM uses to deliver an inter-process IPI, e.g. when the admin issues a vm_stop QMP command, SIGUSR1 can be generated to kick the vcpus out of kernel context immediately, (2) SIGINT: which can be used with interactive hypervisor users to stop a virtual machine with Ctrl-C without any delays/hangs, (3) SIGTRAP: which grants GDB capability even during page faults that are stuck for a long time. Normally hypervisor will be able to receive these signals properly, but not if we're stuck in a GUP for a long time for whatever reason. It happens easily with a stucked postcopy migration when e.g. a network temp failure happens, then some vcpu threads can hang death waiting for the pages. With the new FOLL_INTERRUPTIBLE, we can allow GUP users like KVM to selectively enable the ability to trap these signals. Reviewed-by: John Hubbard Signed-off-by: Peter Xu Reviewed-by: David Hildenbrand --- include/linux/mm.h | 1 + mm/gup.c | 33 +++++++++++++++++++++++++++++---- 2 files changed, 30 insertions(+), 4 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index cf3d0d673f6b..c09eccd5d553 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2941,6 +2941,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, #define FOLL_SPLIT_PMD 0x20000 /* split huge pmd before returning */ #define FOLL_PIN 0x40000 /* pages must be released via unpin_user_page */ #define FOLL_FAST_ONLY 0x80000 /* gup_fast: prevent fall-back to slow gup */ +#define FOLL_INTERRUPTIBLE 0x100000 /* allow interrupts from generic signals */ /* * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each diff --git a/mm/gup.c b/mm/gup.c index 551264407624..f39cbe011cf1 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -933,8 +933,17 @@ static int faultin_page(struct vm_area_struct *vma, fault_flags |= FAULT_FLAG_WRITE; if (*flags & FOLL_REMOTE) fault_flags |= FAULT_FLAG_REMOTE; - if (locked) + if (locked) { fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + /* + * FAULT_FLAG_INTERRUPTIBLE is opt-in. GUP callers must set + * FOLL_INTERRUPTIBLE to enable FAULT_FLAG_INTERRUPTIBLE. + * That's because some callers may not be prepared to + * handle early exits caused by non-fatal signals. + */ + if (*flags & FOLL_INTERRUPTIBLE) + fault_flags |= FAULT_FLAG_INTERRUPTIBLE; + } if (*flags & FOLL_NOWAIT) fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT; if (*flags & FOLL_TRIED) { @@ -1322,6 +1331,22 @@ int fixup_user_fault(struct mm_struct *mm, } EXPORT_SYMBOL_GPL(fixup_user_fault); +/* + * GUP always responds to fatal signals. When FOLL_INTERRUPTIBLE is + * specified, it'll also respond to generic signals. The caller of GUP + * that has FOLL_INTERRUPTIBLE should take care of the GUP interruption. + */ +static bool gup_signal_pending(unsigned int flags) +{ + if (fatal_signal_pending(current)) + return true; + + if (!(flags & FOLL_INTERRUPTIBLE)) + return false; + + return signal_pending(current); +} + /* * Please note that this function, unlike __get_user_pages will not * return 0 for nr_pages > 0 without FOLL_NOWAIT @@ -1403,11 +1428,11 @@ static __always_inline long __get_user_pages_locked(struct mm_struct *mm, * Repeat on the address that fired VM_FAULT_RETRY * with both FAULT_FLAG_ALLOW_RETRY and * FAULT_FLAG_TRIED. Note that GUP can be interrupted - * by fatal signals, so we need to check it before we + * by fatal signals of even common signals, depending on + * the caller's request. So we need to check it before we * start trying again otherwise it can loop forever. */ - - if (fatal_signal_pending(current)) { + if (gup_signal_pending(flags)) { if (!pages_done) pages_done = -EINTR; break; From patchwork Thu Jul 21 00:03:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12924634 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5AA41C43334 for ; Thu, 21 Jul 2022 00:03:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E1C416B0075; Wed, 20 Jul 2022 20:03:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DCC306B0078; Wed, 20 Jul 2022 20:03:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C6C336B007B; Wed, 20 Jul 2022 20:03:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id B86746B0075 for ; Wed, 20 Jul 2022 20:03:33 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 8A3E2140795 for ; Thu, 21 Jul 2022 00:03:33 +0000 (UTC) X-FDA: 79709157906.01.8D3E614 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf18.hostedemail.com (Postfix) with ESMTP id 4ADFA1C0090 for ; Thu, 21 Jul 2022 00:03:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1658361804; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=17Vp4Ex87tn6Uk/BpRm2TxdVL2JJUXYS9NeIPo0wplg=; b=EcWfdj4C+RULS3swIwtT6bJTVZVlKBudT09igcl9kYdbYSREsqI0TcIe/yWLgseN80oQtz uzP0RyLnSKo3B21mS+DutOSSFt+7FiqmuowdYyQ86n2FHCBFhDJ+wRgaLCdNNMyQdtkIGA 844d1KXxsDNd9oDhYyzbneZheDTm2vs= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-532-yHa0RW0-P2u0Ec4aQlJmYw-1; Wed, 20 Jul 2022 20:03:23 -0400 X-MC-Unique: yHa0RW0-P2u0Ec4aQlJmYw-1 Received: by mail-qk1-f200.google.com with SMTP id h8-20020a05620a284800b006b5c98f09fbso204740qkp.21 for ; Wed, 20 Jul 2022 17:03:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=17Vp4Ex87tn6Uk/BpRm2TxdVL2JJUXYS9NeIPo0wplg=; b=LOM4Sp6haTRRbFN+WELZBVoyy6ToGj3hzA/2wQLt9mZGAB3X8fndkfTInskxVpHHvY E4Ue8wESm4G/IHruJOMp5ghkiGUpebRUp8Q0Zgh1jKolFxUvz2niecyuwZD2nSUZLpmt OF8Ygn/hVW/XSkAIFO0vozTfFgKkr77IPtub4Zog83OHyozZxWVTfwQBJtd03t0gGz/w 7gnRheqIf0Zrx6eVqN0JHQSBD2LqiaJZoXySpYpVH9zyh9QUmV2cJd0wYKv+HkxaTM18 nT0SaeeQo7AEs2hPIbRyz0Qg8JU3dNZUFuSUfMu7Dx0YWmD3hd7vDX+wkc8b7CTm+2JG Q5ng== X-Gm-Message-State: AJIora/FTXneoNKJIeZ+n6tAPQkTENpURKFNlo8V6xt5Ny60YARgnR2+ sTAYpRv9RSZXuaF7hB7CoA9CmM8T/BHAt70/Wyf8wsTQx9pCs/kcOb4JHcPmquPzSpvT4zU3BBa aVZDbDKNcvOc= X-Received: by 2002:a0c:b284:0:b0:472:6e5e:e2f0 with SMTP id r4-20020a0cb284000000b004726e5ee2f0mr32317992qve.90.1658361803306; Wed, 20 Jul 2022 17:03:23 -0700 (PDT) X-Google-Smtp-Source: AGRyM1s1+KU5FgF3m6vx4q4MlpqxGLoNsyQhiNfjM06PJt/2rXTILbIrwzzfnLXh7xwYd1LxLFugOw== X-Received: by 2002:a0c:b284:0:b0:472:6e5e:e2f0 with SMTP id r4-20020a0cb284000000b004726e5ee2f0mr32317973qve.90.1658361803125; Wed, 20 Jul 2022 17:03:23 -0700 (PDT) Received: from localhost.localdomain (bras-base-aurron9127w-grc-37-74-12-30-48.dsl.bell.ca. [74.12.30.48]) by smtp.gmail.com with ESMTPSA id g4-20020ac87f44000000b0031eb3af3ffesm418640qtk.52.2022.07.20.17.03.21 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 20 Jul 2022 17:03:22 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: David Hildenbrand , "Dr . David Alan Gilbert" , peterx@redhat.com, John Hubbard , Sean Christopherson , Linux MM Mailing List , Andrew Morton , Paolo Bonzini , Andrea Arcangeli Subject: [PATCH v2 2/3] kvm: Add new pfn error KVM_PFN_ERR_SIGPENDING Date: Wed, 20 Jul 2022 20:03:17 -0400 Message-Id: <20220721000318.93522-3-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220721000318.93522-1-peterx@redhat.com> References: <20220721000318.93522-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-type: text/plain ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=temperror ("DNS error when getting key") header.d=redhat.com header.s=mimecast20190719 header.b=EcWfdj4C; spf=temperror (imf18.hostedemail.com: error in processing during lookup of peterx@redhat.com: DNS error) smtp.mailfrom=peterx@redhat.com; dmarc=temperror reason="SPF/DKIM temp error" header.from=redhat.com (policy=temperror) ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1658361812; a=rsa-sha256; cv=none; b=Ee/y+lPWdQfrHzoWbxXNmQcrho5flJVphdBiGJ794g2Jl5q8jFGhhXhGe1ZP2kL3ZyyT3p sU5MmDhjJCnWf/QsA9BtCTbkBdBkCKYrVErnlgAEm3hdGSx3vGjkehNL2zps5o6hLvXRnd L6S41TWJHorrXGOUCNwHHIhajGXts0Y= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1658361812; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=17Vp4Ex87tn6Uk/BpRm2TxdVL2JJUXYS9NeIPo0wplg=; b=NwppQcuDjgUFiiPsBBAYg2O4myTiRaBwT1OE7qadjA8wf2FrNV783Bc/oJIorql1gxYh35 zfLDTTtFZQLMflzKVLUZTsptYN2VA200Hxgw/tD5PZaWAVDQmxPCqO1jZlGGztTYusPBjh yzi2ImO6k+l9D6qntxOFPi56+arOSd8= Authentication-Results: imf18.hostedemail.com; dkim=temperror ("DNS error when getting key") header.d=redhat.com header.s=mimecast20190719 header.b=EcWfdj4C; spf=temperror (imf18.hostedemail.com: error in processing during lookup of peterx@redhat.com: DNS error) smtp.mailfrom=peterx@redhat.com; dmarc=temperror reason="SPF/DKIM temp error" header.from=redhat.com (policy=temperror) X-Rspam-User: X-Rspamd-Server: rspam01 X-Stat-Signature: wfjetcrhzz14g3m6nk1ia41ufdu6hike X-Rspamd-Queue-Id: 4ADFA1C0090 X-HE-Tag: 1658361805-845592 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add one new PFN error type to show when we got interrupted when fetching the PFN due to signal pending. This prepares KVM to be able to respond to SIGUSR1 (for QEMU that's the SIGIPI) even during e.g. handling an userfaultfd page fault. Signed-off-by: Peter Xu --- include/linux/kvm_host.h | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 83cf7fd842e0..06a5b17d3679 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -96,6 +96,7 @@ #define KVM_PFN_ERR_FAULT (KVM_PFN_ERR_MASK) #define KVM_PFN_ERR_HWPOISON (KVM_PFN_ERR_MASK + 1) #define KVM_PFN_ERR_RO_FAULT (KVM_PFN_ERR_MASK + 2) +#define KVM_PFN_ERR_SIGPENDING (KVM_PFN_ERR_MASK + 3) /* * error pfns indicate that the gfn is in slot but faild to @@ -106,6 +107,16 @@ static inline bool is_error_pfn(kvm_pfn_t pfn) return !!(pfn & KVM_PFN_ERR_MASK); } +/* + * When KVM_PFN_ERR_SIGPENDING returned, it means we're interrupted during + * fetching the PFN (a signal might have arrived), we may want to retry at + * some later point and kick the userspace to handle the signal. + */ +static inline bool is_sigpending_pfn(kvm_pfn_t pfn) +{ + return pfn == KVM_PFN_ERR_SIGPENDING; +} + /* * error_noslot pfns indicate that the gfn can not be * translated to pfn - it is not in slot or failed to From patchwork Thu Jul 21 00:03:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12924633 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 99F6FC43334 for ; Thu, 21 Jul 2022 00:03:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2D4766B0074; Wed, 20 Jul 2022 20:03:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 281AB6B0075; Wed, 20 Jul 2022 20:03:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 120F06B0078; Wed, 20 Jul 2022 20:03:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 0327C6B0074 for ; Wed, 20 Jul 2022 20:03:29 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id B0BAF14075A for ; Thu, 21 Jul 2022 00:03:28 +0000 (UTC) X-FDA: 79709157696.06.822A4B5 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf01.hostedemail.com (Postfix) with ESMTP id D2E094007F for ; Thu, 21 Jul 2022 00:03:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1658361807; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vEKhqgRr9e+FdFBlcgIQBne//MdIQ/2rpsCkqOHqxu0=; b=hdAv2Nh17lpT5EQ2hwK6Xy+3/l7ORwdLBze9txK7NdtZZZCPlfUypls0ws2KXQ+lQ8KfGq CZMbuk8yv1TeazgC1zfESCf3ypk2+TbX4JKSDYidZ+xscDGbQfw6fvn0YER8DcP+Ic567K cGMKy0A+xBMrBJo8pP3r8GaXMEeMWLc= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-610-ypG3G9wXMLaPYg-4ezvQMw-1; Wed, 20 Jul 2022 20:03:26 -0400 X-MC-Unique: ypG3G9wXMLaPYg-4ezvQMw-1 Received: by mail-qt1-f198.google.com with SMTP id i8-20020ac871c8000000b0031ed35facf3so207461qtp.0 for ; Wed, 20 Jul 2022 17:03:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=vEKhqgRr9e+FdFBlcgIQBne//MdIQ/2rpsCkqOHqxu0=; b=Nm+t9V47KTpmcbrP1isRx14ho0Qa/A89hXZwPXTYBUWX6Oh90XKNznGmxMB4QL8xwp QiMSNZSEAi4UYjnt1hIYi/MLOGyirQByO7TsIPN5kUfO6ulSKXTu2eAkNP7F2JCauIxJ eJ5xEUZi/VwY/rQh5iFgKOFIKSSokfKJn+T/LrG/8WLDEZPNoYT31k6ooTamRMMrC58z OlmU+txpequjb8yyMxwo7Zs/j4zhpW/o2ZgKkbtbwUPVcoJQjQnwS3HxlM27lrPCPDhm nfuUBxJ57esERccmTbosMg+ZcwBzzV/RR4j58NAmTXenKVcEZ635vhQhy9gsuuTmz43r o0PA== X-Gm-Message-State: AJIora+nKvgiVP5ZzJhQeU2Fzod2SmhbRQ7f5LBiQNQ9BfG2gdSWwEz5 gPrkuLTLVjox9kHUVW8QfE3fIQUmGsArPMsHM7OVnmg3WzmFinYDwdf+z5frETJrqp1DFk0w7Z1 g4YbeLpHOVi0= X-Received: by 2002:a05:620a:410c:b0:6b2:82d8:dcae with SMTP id j12-20020a05620a410c00b006b282d8dcaemr26219834qko.259.1658361804831; Wed, 20 Jul 2022 17:03:24 -0700 (PDT) X-Google-Smtp-Source: AGRyM1tKKR7T4A+w03q2sFw9EEHlT0lO0nSE92BRB0WQDe4C2KneGlhmAUXd/ls2kVGBH4CAIrGMMw== X-Received: by 2002:a05:620a:410c:b0:6b2:82d8:dcae with SMTP id j12-20020a05620a410c00b006b282d8dcaemr26219802qko.259.1658361804490; Wed, 20 Jul 2022 17:03:24 -0700 (PDT) Received: from localhost.localdomain (bras-base-aurron9127w-grc-37-74-12-30-48.dsl.bell.ca. [74.12.30.48]) by smtp.gmail.com with ESMTPSA id g4-20020ac87f44000000b0031eb3af3ffesm418640qtk.52.2022.07.20.17.03.23 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 20 Jul 2022 17:03:23 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: David Hildenbrand , "Dr . David Alan Gilbert" , peterx@redhat.com, John Hubbard , Sean Christopherson , Linux MM Mailing List , Andrew Morton , Paolo Bonzini , Andrea Arcangeli Subject: [PATCH v2 3/3] kvm/x86: Allow to respond to generic signals during slow page faults Date: Wed, 20 Jul 2022 20:03:18 -0400 Message-Id: <20220721000318.93522-4-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220721000318.93522-1-peterx@redhat.com> References: <20220721000318.93522-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-type: text/plain ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=hdAv2Nh1; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf01.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1658361808; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vEKhqgRr9e+FdFBlcgIQBne//MdIQ/2rpsCkqOHqxu0=; b=lviGgEXq6BtCs/jcVeuwvwBQgN7V/i9xHYxntYC37T2yGDSpq/Mx3F5nLg4cCfppoxKCX8 ZNrWWd7J+ULTFIhI9vlRT8HCmZLEjmX5lWREqzfjeOkE3DIuKNPpBCwgMkzx04/963Zumt 6YapQryUW+Ju6u3QohAjuXtPTYuwG0M= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1658361808; a=rsa-sha256; cv=none; b=7x6kfPwbs4swgI51ODUZlKcSDlyjWuv4P7OsytSiAW3OXqB/JgP5EqnLKwAMa/yzXmVXB0 uve1tWBj1ZzwZvC9It4062nWbpvzkRB6oiGMkuyDEEUNMUPnl/mMTM05Pald2zcXwxkS4F 0RfRz7dgtSdcL3thqzQWYsKRCqYaPm8= X-Rspamd-Queue-Id: D2E094007F Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=hdAv2Nh1; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf01.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=peterx@redhat.com X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: kpai6cpwe9xaobypf7kkg83uhb5yu7tk X-HE-Tag: 1658361807-503679 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: All the facilities should be ready for this, what we need to do is to add a new "interruptible" flag showing that we're willing to be interrupted by common signals during the __gfn_to_pfn_memslot() request, and wire it up with a FOLL_INTERRUPTIBLE flag that we've just introduced. Note that only x86 slow page fault routine will set this to true. The new flag is by default false in non-x86 arch or on other gup paths even for x86. It can actually be used elsewhere too but not yet covered. When we see the PFN fetching was interrupted, do early exit to userspace with an KVM_EXIT_INTR exit reason. Signed-off-by: Peter Xu --- arch/arm64/kvm/mmu.c | 2 +- arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +- arch/powerpc/kvm/book3s_64_mmu_radix.c | 2 +- arch/x86/kvm/mmu/mmu.c | 16 ++++++++++++-- include/linux/kvm_host.h | 4 ++-- virt/kvm/kvm_main.c | 30 ++++++++++++++++---------- virt/kvm/kvm_mm.h | 4 ++-- virt/kvm/pfncache.c | 2 +- 8 files changed, 41 insertions(+), 21 deletions(-) diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index f5651a05b6a8..93f6b9bf1af1 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1204,7 +1204,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, */ smp_rmb(); - pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL, + pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false, NULL, write_fault, &writable, NULL); if (pfn == KVM_PFN_ERR_HWPOISON) { kvm_send_hwpoison_signal(hva, vma_shift); diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 514fd45c1994..7aed5ef6588e 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -598,7 +598,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_vcpu *vcpu, write_ok = true; } else { /* Call KVM generic code to do the slow-path check */ - pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL, + pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false, NULL, writing, &write_ok, NULL); if (is_error_noslot_pfn(pfn)) return -EFAULT; diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index 42851c32ff3b..9991f9d9ee59 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -845,7 +845,7 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu, unsigned long pfn; /* Call KVM generic code to do the slow-path check */ - pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL, + pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false, NULL, writing, upgrade_p, NULL); if (is_error_noslot_pfn(pfn)) return -EFAULT; diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 17252f39bd7c..aeafe0e9cfbf 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3012,6 +3012,13 @@ static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn) static int handle_abnormal_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, unsigned int access) { + /* NOTE: not all error pfn is fatal; handle sigpending pfn first */ + if (unlikely(is_sigpending_pfn(fault->pfn))) { + vcpu->run->exit_reason = KVM_EXIT_INTR; + ++vcpu->stat.signal_exits; + return -EINTR; + } + /* The pfn is invalid, report the error! */ if (unlikely(is_error_pfn(fault->pfn))) return kvm_handle_bad_page(vcpu, fault->gfn, fault->pfn); @@ -3999,7 +4006,7 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) } async = false; - fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, &async, + fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, false, &async, fault->write, &fault->map_writable, &fault->hva); if (!async) @@ -4016,7 +4023,12 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) } } - fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, NULL, + /* + * Allow gup to bail on pending non-fatal signals when it's also allowed + * to wait for IO. Note, gup always bails if it is unable to quickly + * get a page and a fatal signal, i.e. SIGKILL, is pending. + */ + fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, true, NULL, fault->write, &fault->map_writable, &fault->hva); return RET_PF_CONTINUE; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 06a5b17d3679..5bae753ebe48 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1158,8 +1158,8 @@ kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault, kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn); kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gfn); kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn, - bool atomic, bool *async, bool write_fault, - bool *writable, hva_t *hva); + bool atomic, bool interruptible, bool *async, + bool write_fault, bool *writable, hva_t *hva); void kvm_release_pfn_clean(kvm_pfn_t pfn); void kvm_release_pfn_dirty(kvm_pfn_t pfn); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index a49df8988cd6..25deacc705b8 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2445,7 +2445,7 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault, * 1 indicates success, -errno is returned if error is detected. */ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault, - bool *writable, kvm_pfn_t *pfn) + bool interruptible, bool *writable, kvm_pfn_t *pfn) { unsigned int flags = FOLL_HWPOISON; struct page *page; @@ -2460,6 +2460,8 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault, flags |= FOLL_WRITE; if (async) flags |= FOLL_NOWAIT; + if (interruptible) + flags |= FOLL_INTERRUPTIBLE; npages = get_user_pages_unlocked(addr, 1, &page, flags); if (npages != 1) @@ -2566,6 +2568,7 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, * Pin guest page in memory and return its pfn. * @addr: host virtual address which maps memory to the guest * @atomic: whether this function can sleep + * @interruptible: whether the process can be interrupted by non-fatal signals * @async: whether this function need to wait IO complete if the * host page is not in the memory * @write_fault: whether we should get a writable host page @@ -2576,8 +2579,8 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, * 2): @write_fault = false && @writable, @writable will tell the caller * whether the mapping is writable. */ -kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async, - bool write_fault, bool *writable) +kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool interruptible, + bool *async, bool write_fault, bool *writable) { struct vm_area_struct *vma; kvm_pfn_t pfn = 0; @@ -2592,9 +2595,12 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async, if (atomic) return KVM_PFN_ERR_FAULT; - npages = hva_to_pfn_slow(addr, async, write_fault, writable, &pfn); + npages = hva_to_pfn_slow(addr, async, write_fault, interruptible, + writable, &pfn); if (npages == 1) return pfn; + if (npages == -EINTR) + return KVM_PFN_ERR_SIGPENDING; mmap_read_lock(current->mm); if (npages == -EHWPOISON || @@ -2625,8 +2631,8 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async, } kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn, - bool atomic, bool *async, bool write_fault, - bool *writable, hva_t *hva) + bool atomic, bool interruptible, bool *async, + bool write_fault, bool *writable, hva_t *hva) { unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL, write_fault); @@ -2651,7 +2657,7 @@ kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn, writable = NULL; } - return hva_to_pfn(addr, atomic, async, write_fault, + return hva_to_pfn(addr, atomic, interruptible, async, write_fault, writable); } EXPORT_SYMBOL_GPL(__gfn_to_pfn_memslot); @@ -2659,20 +2665,22 @@ EXPORT_SYMBOL_GPL(__gfn_to_pfn_memslot); kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault, bool *writable) { - return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, NULL, - write_fault, writable, NULL); + return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, + false, NULL, write_fault, writable, NULL); } EXPORT_SYMBOL_GPL(gfn_to_pfn_prot); kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn) { - return __gfn_to_pfn_memslot(slot, gfn, false, NULL, true, NULL, NULL); + return __gfn_to_pfn_memslot(slot, gfn, false, false, NULL, true, + NULL, NULL); } EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot); kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gfn) { - return __gfn_to_pfn_memslot(slot, gfn, true, NULL, true, NULL, NULL); + return __gfn_to_pfn_memslot(slot, gfn, true, false, NULL, true, + NULL, NULL); } EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot_atomic); diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h index 41da467d99c9..a1ab15006af3 100644 --- a/virt/kvm/kvm_mm.h +++ b/virt/kvm/kvm_mm.h @@ -24,8 +24,8 @@ #define KVM_MMU_READ_UNLOCK(kvm) spin_unlock(&(kvm)->mmu_lock) #endif /* KVM_HAVE_MMU_RWLOCK */ -kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async, - bool write_fault, bool *writable); +kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool interruptible, + bool *async, bool write_fault, bool *writable); #ifdef CONFIG_HAVE_KVM_PFNCACHE void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm, diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c index dd84676615f1..294808e77f44 100644 --- a/virt/kvm/pfncache.c +++ b/virt/kvm/pfncache.c @@ -123,7 +123,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct kvm *kvm, unsigned long uhva) smp_rmb(); /* We always request a writeable mapping */ - new_pfn = hva_to_pfn(uhva, false, NULL, true, NULL); + new_pfn = hva_to_pfn(uhva, false, false, NULL, true, NULL); if (is_error_noslot_pfn(new_pfn)) break;