From patchwork Wed Mar 12 14:51:31 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 14013619 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5A94C28B28 for ; Wed, 12 Mar 2025 14:51:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9B369280002; Wed, 12 Mar 2025 10:51:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 96DE9280001; Wed, 12 Mar 2025 10:51:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 80449280002; Wed, 12 Mar 2025 10:51:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 643CF280001 for ; Wed, 12 Mar 2025 10:51:39 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 14F7116128B for ; Wed, 12 Mar 2025 14:51:41 +0000 (UTC) X-FDA: 83213188002.25.2B47042 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf07.hostedemail.com (Postfix) with ESMTP id 52B454000F for ; Wed, 12 Mar 2025 14:51:38 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=fLZL1kRe; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf07.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741791098; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=yKAxRE4cpEVReAVBfRWToeyYUILhM0htuYiSpO0zMq4=; b=8fbuaO3cfTnzTpwm9+2bN5NCkoYu1NIerihxyygy867hNdUZF1oI0ZaKoWtIcoxNyk/2GI jcw3VOXR1Ixe7VcUk2JN34N09xtR8MGqPpbFociCQIS3HZp+7caL6bzhoFwRVLu4OqLHU0 dDhymnyd1QP3HyjbNQdfPx1lK1pCiE8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741791098; a=rsa-sha256; cv=none; b=MkxReu2vzagHTniLhOlQjP0U4WB+vubRNzHTshsUFlEaD0d8Pq6a35SuKhquphU9KAa4ES AF6Dw+xLOZRKZ4OSU0xIgxlgNEzI8xlvPPpWND7HWFx1ObkKf48FKqYXzDRDdWDkHuHU5P 2oYvcqy+j8TVpAAdoYOKzkh6BR96l10= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=fLZL1kRe; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf07.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1741791097; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=yKAxRE4cpEVReAVBfRWToeyYUILhM0htuYiSpO0zMq4=; b=fLZL1kReGKqWW1WzNQMTGteoKKbItA65b5IiaCDTq8wNQ2ZD9BrV1yOrMjDAfAVNSQ3zts vonh5c3aeVkKBWZ+kMaD0KndhAa1PMSxJHx7dxV9/FlEvrydZ0EJgBSabbc5hXQd0a5Hn4 XsG7m5kP5hayJDdU8bE9zpA8AUwYWNU= Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-527-auJajL0wM3mdyFW6YBnM3Q-1; Wed, 12 Mar 2025 10:51:36 -0400 X-MC-Unique: auJajL0wM3mdyFW6YBnM3Q-1 X-Mimecast-MFC-AGG-ID: auJajL0wM3mdyFW6YBnM3Q_1741791095 Received: by mail-qt1-f199.google.com with SMTP id d75a77b69052e-4767348e239so83399631cf.0 for ; Wed, 12 Mar 2025 07:51:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741791095; x=1742395895; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=yKAxRE4cpEVReAVBfRWToeyYUILhM0htuYiSpO0zMq4=; b=UzOgqSMWXV8nHsLZuZTj17wvofhlJGd/Bm6snDy9UZ7wccPuZafVlPKyt4NAx7UFPf NIz1DK65riWSKBImI4mNe+RaeFKINHeF4TXX2tqLJ2tWTT3cJdXPyFBjED8ptfL4kRW8 CN5c0BXSz9NQW+AC9o12ZIHKHn0CHqG2h63AuOLwM/Pc0f5fZuI/NTeDnZUCKE8telzW lQ9KDmgF/Fvxf6Od6+CXVO1f30Xm/A/6jt2lNsNagxOeGi/9F6TNYlj9mBB/V38WNgxj lOfxW2tC8XT/Ii3qBJCnKMSyrtYyBkRQexsMw+9BofecfQ7hvtneR+XAO//kSC9Mt+Hn HhKw== X-Forwarded-Encrypted: i=1; AJvYcCVxKrOChFlmE3Z1IHDfP9qcBQswYMXI7NP6iCMqiutddoT7iX3WCiRb69tW+p39EX0lPTwL2rmT8g==@kvack.org X-Gm-Message-State: AOJu0YwMtlmiV5k+OEtCd/LsQ3a5PvuXqlTIweJDI+eAoZAOU97B3QDT BVXRphfd4oY5qKqnvdoxY/+j0GYlE6T3YFrQRsimYPSRJhfnuFVeAjzkYs/iMuhwIGwKqLqI2FG 9uhNjT7seGZmdLPXtLBFpLnWJLhb5HjVJOmV4EHSN68eDGHL9 X-Gm-Gg: ASbGncs8f0UKfcN6p1PjMXJ5ww1TUQVCHefxgtqWC41f44SU/n6qU+gPOInwIDuy+pq /0uzQqaVffHo4WIfCn8AksxbqmIrnzCQ+pkB9ITjJfu93QM+8oPkDhcXiVxjHPwp49QyXXP7OK5 WVWeFDFaG36P0EQ2RzIXjTMZWxopB3lRc0SXtFn+PuSTluzRsX3CnOZ426dbzFK2qmMiupqg681 3oQwyWMAFwDBnd2QXkHD/dT8xGhx9pG4yV/8xOZ2iQXSNH7aSu83bJo0DIfNc17eekqf81cyN3J aA/T X-Received: by 2002:a05:622a:355:b0:476:6215:eafc with SMTP id d75a77b69052e-47699502529mr104045871cf.22.1741791095540; Wed, 12 Mar 2025 07:51:35 -0700 (PDT) X-Google-Smtp-Source: AGHT+IH7uhKZcz1zhEaWhyp/OH/LmrbsvVXXd2Il8QyoPaX0rwx04CgVoOuAopMHTDzurW2U5mJAXw== X-Received: by 2002:a05:622a:355:b0:476:6215:eafc with SMTP id d75a77b69052e-47699502529mr104045581cf.22.1741791095226; Wed, 12 Mar 2025 07:51:35 -0700 (PDT) Received: from x1.com ([85.131.185.92]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-476ae289d35sm9367941cf.17.2025.03.12.07.51.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 12 Mar 2025 07:51:34 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: peterx@redhat.com, Kefeng Wang , Mike Rapoport , Al Viro , Andrew Morton , Axel Rasmussen , Pavel Emelyanov , Jinjiang Tu , Dimitris Siakavaras , Andrea Arcangeli Subject: [PATCH] mm/userfaultfd: Fix release hang over concurrent GUP Date: Wed, 12 Mar 2025 10:51:31 -0400 Message-ID: <20250312145131.1143062-1-peterx@redhat.com> X-Mailer: git-send-email 2.47.0 MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: _7Ur_5eLJo3NbddWbt94sKJo4g6Y1detzHYpb9vP8UU_1741791095 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 52B454000F X-Stat-Signature: pceosr841nq1aa5igj7jg7exwch9j939 X-HE-Tag: 1741791098-42108 X-HE-Meta: U2FsdGVkX18kC+e8W9EPn48l0+YcIYdz9mrDC9SlqPFFVmSzSYi/LWnXljTTGkikzklmfFM7DAz4U8vbuNZRd6rXY7dBxwM+MBOVIud0BW4ch1sIjIc+FPA+9fZt/b+9VYvFbzXlc5dqBOBhebhSPMHIGTivm5HcVMvDlgj7rY7yCEuaLEaj8BatZDlDl21KjLf7kcHu1PQS1gwqCdTNaM8V8Yme/uRADPCokBfxXrPdX7mDFJP9rwrIrDgsDRcevnNOnBPhPy9Gd5oSToPVqZvId4smoqb6F5yWBBP4gnNv/NQ667W/s+UZSxobeQ1nX/4iQKPL5lcA9rqswHGJTiGiSv1gOG0KrM4Aub/l1lWCWhUh3HfLyxuVwFgJoD6eU/fA1SRswo+SRYA/zXqeSam314ounDpypX5+jy7R0ENi5v0NQ22A4h7JAhBrPQQQ5IND5Msi71Jbs+FNLa6V37/D7XGO1RZ5g5NwrbzCZ3+3onen2TWxaG2hyViFMLDcoPHwsZusfva7s8IJqIeyN5YYt/p6CgBbifcPeuayN5wP1Fq5Q/nLTvuG5HA4J1Zt8HIoJRVZRH0x9PX30I92yjL2qO76DmkhVNzfB3o3jAsSQUwwfT9oj/qEaSgefYRFuQ+f/MX4WEpOZyK+PwDKFUyO8RKM8JIxbWIVvA4UNtrPnTGdQr7CuqKRYYOiScyUyZk6/1QB4xgNjYTlMaRFDKy+RGn9r9V3X1qKL98Dn5mnsiCZjQZUVly2gRXTNKAeW/QoW9J3jNUqzQUwJn6v/FtXQ2OQrm7+hwP+Jh6vgl+nUPK4RIe8pI4lE+p5d9gn5TKPErquEDTMQb1CvB9wJtUhFKmB6hsiH8ss6qVI2A/2YsPWg7GARGh5t+Jo/P9xN0Fuapx/dZlKjEPjimOBc/E3b137ltkHogaF6pTEdWe7ktkp3lfODM+MskGupBMLjhsHw+hZNOM5vxdDTnD 4BHsVQ6L WwLRUqYEx51vueEwzeDAljsyQLlWI6se+dxJKixbeHNt9SnlM3Keo1w5jwABiTnx3yFoyF7AETHL5nQ3C2zamFWWrhfpFUHxE1VZN4QQ6U8JRY2PFfZuXcXYd7gTwn918Lr7vh6aHGmcjqswT/ZNumw+Dtevt6jzJQBFcYGV8ShxA6/VS1UrZIs8BkA7ReVPod2lRawZQ/VVhM/iZX52Syh7lrdhQIxZ/ugrMF+32MGej+cx4qtr3tfZkyKso4i1jUk7cIzbj8YKapdH3tm8VZCkDS7iE12F9TPT81TH7u1ax1E+ui9Pneb1iBuz/JaPq9HVJCoyscWs7fQKZ+WhkA96JTJdEPIbXZsWpjiRF4kC4XglM/GmTiKH6TuTL7oOZBuEFVFZjR1oNm2FyINLZFt/42rhPwOTcyTha0Bb8P7MX6otLtncIyZNfYA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch should fix a possible userfaultfd release() hang during concurrent GUP. This problem was initially reported by Dimitris Siakavaras in July 2023 [1] in a firecracker use case. Firecracker has a separate process handling page faults remotely, and when the process releases the userfaultfd it can race with a concurrent GUP from KVM trying to fault in a guest page during the secondary MMU page fault process. A similar problem was reported recently again by Jinjiang Tu in March 2025 [2], even though the race happened this time with a mlockall() operation, which does GUP in a similar fashion. In 2017, commit 656710a60e36 ("userfaultfd: non-cooperative: closing the uffd without triggering SIGBUS") was trying to fix this issue. AFAIU, that fixes well the fault paths but may not work yet for GUP. In GUP, the issue is NOPAGE will be almost treated the same as "page fault resolved" in faultin_page(), then the GUP will follow page again, seeing page missing, and it'll keep going into a live lock situation as reported. This change makes core mm return RETRY instead of NOPAGE for both the GUP and fault paths, proactively releasing the mmap read lock. This should guarantee the other release thread make progress on taking the write lock and avoid the live lock even for GUP. When at it, rearrange the comments to make sure it's uptodate. [1] https://lore.kernel.org/r/79375b71-db2e-3e66-346b-254c90d915e2@cslab.ece.ntua.gr [2] https://lore.kernel.org/r/20250307072133.3522652-1-tujinjiang@huawei.com Cc: Andrea Arcangeli Cc: Mike Rapoport (IBM) Cc: Axel Rasmussen Cc: Jinjiang Tu Cc: Dimitris Siakavaras Signed-off-by: Peter Xu --- fs/userfaultfd.c | 51 ++++++++++++++++++++++++------------------------ 1 file changed, 25 insertions(+), 26 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 97c4d71115d8..d80f94346199 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -395,32 +395,6 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) if (!(vmf->flags & FAULT_FLAG_USER) && (ctx->flags & UFFD_USER_MODE_ONLY)) goto out; - /* - * If it's already released don't get it. This avoids to loop - * in __get_user_pages if userfaultfd_release waits on the - * caller of handle_userfault to release the mmap_lock. - */ - if (unlikely(READ_ONCE(ctx->released))) { - /* - * Don't return VM_FAULT_SIGBUS in this case, so a non - * cooperative manager can close the uffd after the - * last UFFDIO_COPY, without risking to trigger an - * involuntary SIGBUS if the process was starting the - * userfaultfd while the userfaultfd was still armed - * (but after the last UFFDIO_COPY). If the uffd - * wasn't already closed when the userfault reached - * this point, that would normally be solved by - * userfaultfd_must_wait returning 'false'. - * - * If we were to return VM_FAULT_SIGBUS here, the non - * cooperative manager would be instead forced to - * always call UFFDIO_UNREGISTER before it can safely - * close the uffd. - */ - ret = VM_FAULT_NOPAGE; - goto out; - } - /* * Check that we can return VM_FAULT_RETRY. * @@ -457,6 +431,31 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT) goto out; + if (unlikely(READ_ONCE(ctx->released))) { + /* + * If a concurrent release is detected, do not return + * VM_FAULT_SIGBUS or VM_FAULT_NOPAGE, but instead always + * return VM_FAULT_RETRY with lock released proactively. + * + * If we were to return VM_FAULT_SIGBUS here, the non + * cooperative manager would be instead forced to + * always call UFFDIO_UNREGISTER before it can safely + * close the uffd, to avoid involuntary SIGBUS triggered. + * + * If we were to return VM_FAULT_NOPAGE, it would work for + * the fault path, in which the lock will be released + * later. However for GUP, faultin_page() does nothing + * special on NOPAGE, so GUP would spin retrying without + * releasing the mmap read lock, causing possible livelock. + * + * Here only VM_FAULT_RETRY would make sure the mmap lock + * be released immediately, so that the thread concurrently + * releasing the userfault would always make progress. + */ + release_fault_lock(vmf); + goto out; + } + /* take the reference before dropping the mmap_lock */ userfaultfd_ctx_get(ctx);