From patchwork Tue Oct 11 19:58:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13004290 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28B21C433FE for ; Tue, 11 Oct 2022 19:58:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9CF576B0075; Tue, 11 Oct 2022 15:58:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 909648E0002; Tue, 11 Oct 2022 15:58:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 733008E0001; Tue, 11 Oct 2022 15:58:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 4EF656B0075 for ; Tue, 11 Oct 2022 15:58:16 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 13E141C6EC1 for ; Tue, 11 Oct 2022 19:58:15 +0000 (UTC) X-FDA: 80009730192.27.676FF9C Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf25.hostedemail.com (Postfix) with ESMTP id 9C9A9A001A for ; Tue, 11 Oct 2022 19:58:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1665518295; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=V/+ripU4Cn0e2jw5c/awHk5ywDVBelMr8KNke92yFbw=; b=bkun3Cln06Y6GtyXN0vys3GYlpr0GocFYNRIK0VPKIycDeaofmOELmVVKI3oDyNxNPSzbs NEvHXEYycwcwaU1A9kLGEQj7gTHivmsW4m4eLhdtNh+DJY60dbzXrnzJZ1ZS+6gaamT0T+ fOH2SnNP+gDEwuWtOvSbaH5BsSK3ys4= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-625-dMod-bCHPd-pnANwtB6ZbQ-1; Tue, 11 Oct 2022 15:58:13 -0400 X-MC-Unique: dMod-bCHPd-pnANwtB6ZbQ-1 Received: by mail-qt1-f198.google.com with SMTP id bv21-20020a05622a0a1500b00393a6724d4cso9982988qtb.23 for ; Tue, 11 Oct 2022 12:58:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=V/+ripU4Cn0e2jw5c/awHk5ywDVBelMr8KNke92yFbw=; b=v0/OQhrLjXfpWCwfizUfxylxdRduioRx1Td0sOIpqLl9NOCUOj12ZvyTXSw020vZJt 6jZ8cnEb03jJq+jtXeOEocYyKUAIm5AEpDNbzUEB2LkaajIG6ilWOZrMZPJOf7poDhxs aTuQgxCyFtwlDQIwyP1+0/xlICHJKsnDCRQlIgmVEnQ0i403e5CoCE7NLUIvhLwrPVny qwLORBXQah0ivSF3NRjF4ShmBZtOlgwcuioqLkONjOA50u72XwNwIPehUzT3WFMYXQrB nixDaddHl7UjKzcOizIae5H86NimW5VVaG1MRZHoi1QmKbfKlYjO+UKGqh7E05SfMJVi 9DWA== X-Gm-Message-State: ACrzQf1qvgtwTwgEmmN5DxRTLSTm+YSPcBJv0VIjKYrDRoUvs//JQVxV gDEHOPDNGHi+lAIOX2tiQsHRBo3ILcXNfuOQwl3Y+mtuSBJqdS403GZ3uuaYAWZ2azeDlyqHeb4 N9k5Ez4RnIuU= X-Received: by 2002:a0c:9a0d:0:b0:4b1:982e:96d4 with SMTP id p13-20020a0c9a0d000000b004b1982e96d4mr19788281qvd.114.1665518293326; Tue, 11 Oct 2022 12:58:13 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6PixulCqGkG8rkh18fizY4hjzT3AYfYhVR7T8bvGGt8538uBuPPVxwiplaUYFmIz1hwWZjbA== X-Received: by 2002:a0c:9a0d:0:b0:4b1:982e:96d4 with SMTP id p13-20020a0c9a0d000000b004b1982e96d4mr19788255qvd.114.1665518293119; Tue, 11 Oct 2022 12:58:13 -0700 (PDT) Received: from x1n.redhat.com (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id az31-20020a05620a171f00b006ce9e880c6fsm13648837qkb.111.2022.10.11.12.58.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Oct 2022 12:58:12 -0700 (PDT) From: Peter Xu To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Sean Christopherson , peterx@redhat.com, John Hubbard , Paolo Bonzini , David Matlack , Andrew Morton , Andrea Arcangeli , "Dr . David Alan Gilbert" , David Hildenbrand , Linux MM Mailing List , Mike Kravetz Subject: [PATCH v4 1/4] mm/gup: Add FOLL_INTERRUPTIBLE Date: Tue, 11 Oct 2022 15:58:06 -0400 Message-Id: <20221011195809.557016-2-peterx@redhat.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221011195809.557016-1-peterx@redhat.com> References: <20221011195809.557016-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-type: text/plain ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665518295; a=rsa-sha256; cv=none; b=C9v64HijXcdWYq/XlQbRyH1SEGwtFfD6SeOofa69b1Tn5dm8i38Dl9I2IGVPxU2b7CKbpg GwSL2kRg1AJj4ACB0FBFNhuXY8wK748qGCU8c8hcAXRYuUV+DehmlKDa2/nyB4Sm9TZOjL qeC2r5UEbp2g8/dJpnkumelXgiXjsJ0= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=bkun3Cln; spf=pass (imf25.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665518295; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=V/+ripU4Cn0e2jw5c/awHk5ywDVBelMr8KNke92yFbw=; b=CiM9bHwUp/rTYrHf/uijmsf+EOV4OJOYcnXRaZ28wQ2U+c+2+UBIukm1wbsI2Snorybpr4 eScl+RsHUQ8W84N+Y2sSZUto905Tw6izQeMpSEtrahEnNOIclA7jY5/ClrnCEwNhgU38GH Nr4S+rCC+ubY0aqX0ig0frx++/yQysI= X-Stat-Signature: mzj6846nmti617ah8phiayc114mk4mgt X-Rspamd-Queue-Id: 9C9A9A001A X-Rspam-User: Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=bkun3Cln; spf=pass (imf25.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam06 X-HE-Tag: 1665518295-998444 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We have had FAULT_FLAG_INTERRUPTIBLE but it was never applied to GUPs. One issue with it is that not all GUP paths are able to handle signal delivers besides SIGKILL. That's not ideal for the GUP users who are actually able to handle these cases, like KVM. KVM uses GUP extensively on faulting guest pages, during which we've got existing infrastructures to retry a page fault at a later time. Allowing the GUP to be interrupted by generic signals can make KVM related threads to be more responsive. For examples: (1) SIGUSR1: which QEMU/KVM uses to deliver an inter-process IPI, e.g. when the admin issues a vm_stop QMP command, SIGUSR1 can be generated to kick the vcpus out of kernel context immediately, (2) SIGINT: which can be used with interactive hypervisor users to stop a virtual machine with Ctrl-C without any delays/hangs, (3) SIGTRAP: which grants GDB capability even during page faults that are stuck for a long time. Normally hypervisor will be able to receive these signals properly, but not if we're stuck in a GUP for a long time for whatever reason. It happens easily with a stucked postcopy migration when e.g. a network temp failure happens, then some vcpu threads can hang death waiting for the pages. With the new FOLL_INTERRUPTIBLE, we can allow GUP users like KVM to selectively enable the ability to trap these signals. Reviewed-by: John Hubbard Reviewed-by: David Hildenbrand Signed-off-by: Peter Xu --- include/linux/mm.h | 1 + mm/gup.c | 33 +++++++++++++++++++++++++++++---- mm/hugetlb.c | 5 ++++- 3 files changed, 34 insertions(+), 5 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 21f8b27bd9fd..488a9f4cce07 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2897,6 +2897,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, #define FOLL_SPLIT_PMD 0x20000 /* split huge pmd before returning */ #define FOLL_PIN 0x40000 /* pages must be released via unpin_user_page */ #define FOLL_FAST_ONLY 0x80000 /* gup_fast: prevent fall-back to slow gup */ +#define FOLL_INTERRUPTIBLE 0x100000 /* allow interrupts from generic signals */ /* * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each diff --git a/mm/gup.c b/mm/gup.c index 5abdaf487460..d51e7ccaef32 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -970,8 +970,17 @@ static int faultin_page(struct vm_area_struct *vma, fault_flags |= FAULT_FLAG_WRITE; if (*flags & FOLL_REMOTE) fault_flags |= FAULT_FLAG_REMOTE; - if (locked) + if (locked) { fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + /* + * FAULT_FLAG_INTERRUPTIBLE is opt-in. GUP callers must set + * FOLL_INTERRUPTIBLE to enable FAULT_FLAG_INTERRUPTIBLE. + * That's because some callers may not be prepared to + * handle early exits caused by non-fatal signals. + */ + if (*flags & FOLL_INTERRUPTIBLE) + fault_flags |= FAULT_FLAG_INTERRUPTIBLE; + } if (*flags & FOLL_NOWAIT) fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT; if (*flags & FOLL_TRIED) { @@ -1380,6 +1389,22 @@ int fixup_user_fault(struct mm_struct *mm, } EXPORT_SYMBOL_GPL(fixup_user_fault); +/* + * GUP always responds to fatal signals. When FOLL_INTERRUPTIBLE is + * specified, it'll also respond to generic signals. The caller of GUP + * that has FOLL_INTERRUPTIBLE should take care of the GUP interruption. + */ +static bool gup_signal_pending(unsigned int flags) +{ + if (fatal_signal_pending(current)) + return true; + + if (!(flags & FOLL_INTERRUPTIBLE)) + return false; + + return signal_pending(current); +} + /* * Please note that this function, unlike __get_user_pages will not * return 0 for nr_pages > 0 without FOLL_NOWAIT @@ -1461,11 +1486,11 @@ static __always_inline long __get_user_pages_locked(struct mm_struct *mm, * Repeat on the address that fired VM_FAULT_RETRY * with both FAULT_FLAG_ALLOW_RETRY and * FAULT_FLAG_TRIED. Note that GUP can be interrupted - * by fatal signals, so we need to check it before we + * by fatal signals of even common signals, depending on + * the caller's request. So we need to check it before we * start trying again otherwise it can loop forever. */ - - if (fatal_signal_pending(current)) { + if (gup_signal_pending(flags)) { if (!pages_done) pages_done = -EINTR; break; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index e070b8593b37..202f3ad7f35c 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6206,9 +6206,12 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, fault_flags |= FAULT_FLAG_WRITE; else if (unshare) fault_flags |= FAULT_FLAG_UNSHARE; - if (locked) + if (locked) { fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + if (flags & FOLL_INTERRUPTIBLE) + fault_flags |= FAULT_FLAG_INTERRUPTIBLE; + } if (flags & FOLL_NOWAIT) fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT;