From patchwork Wed Jul 13 14:13:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 12916824 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F084AC43334 for ; Wed, 13 Jul 2022 14:13:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8DB9C6B0137; Wed, 13 Jul 2022 10:13:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 863B36B0138; Wed, 13 Jul 2022 10:13:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 72BC06B0139; Wed, 13 Jul 2022 10:13:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 5EE0F6B0137 for ; Wed, 13 Jul 2022 10:13:41 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 23476658 for ; Wed, 13 Jul 2022 14:13:41 +0000 (UTC) X-FDA: 79682269842.27.04CC451 Received: from mail-qk1-f176.google.com (mail-qk1-f176.google.com [209.85.222.176]) by imf02.hostedemail.com (Postfix) with ESMTP id 7366480032 for ; Wed, 13 Jul 2022 14:13:39 +0000 (UTC) Received: by mail-qk1-f176.google.com with SMTP id x17so8655514qkh.11 for ; Wed, 13 Jul 2022 07:13:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=zXRv0gtyHIIjpH4XHcFmhsbTOk+Z7GVu51i8mbaOjmU=; b=z/dz1t+89fSoikCSPNvj2ywJY/buZPTmgbQ1N+m8xrst4vrTWeYz4qZOXZMWMUXGb4 2s6tWDBAak5/bqzgqVbkW5h53Wca2C2848w1FJHdwNtTmlAdk62sh8QDv8AuTH19uj4N +weYBRwlOoYcCBslvWDtmMwznCTobZ5kYaW8/DDGczQeduZHx5t68m7FaMtbVpj7uhwQ BT896bYA08lOh8Zh3e26csS/91xHcnAdEwyfhIm756VVCXGnv5+1XRxGgEguZ0RZBW2i B1ibpyGlwR7xq9aqUWgGI8leuKYnW38GYsVZ1eR9U7pMjMiXPnTLUTCDgWNeGlE5p9BM Hu+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=zXRv0gtyHIIjpH4XHcFmhsbTOk+Z7GVu51i8mbaOjmU=; b=jIi/Wcz44TGb/DXAyoka6ulKaNn/+khQNNRUBjkbIeaPonNRbw896xRuNkFwWRKqtr v+bTbbeQbprq6WX6/Bwe8HQMWC1VUJ6AllAkxJrzve/OdagNw043QdrUYgRx8YGUskfT 8xbOqW6O/giofpyOiPBOf0qnTqmD9pNkHjZWeGuRkuiA1vWkhPi1UDVwryct4yeJ5ed1 XN4GPJbrBcdHUnSKp+uXP54Yqa1HufUo3Q4l2pH9uu39pAVJujyOq1zQhtPrfoBDr3NH bOx1v4uJ2vMxqep7JEm2L7kL8l2mHDuOg9/P0g+LMlhc2RujAPvUWqn669gRM+uUMLU+ X5BA== X-Gm-Message-State: AJIora/WlHcDLrL2+4vZegTV503EkaxmWtubhdES8ccM3jo05eAkK89n nR2QMtp7QcOgaYP4udvzb9Zbuw== X-Google-Smtp-Source: AGRyM1s+u+leCBPwtg87lHA5q/oV4ruD4KylxxEIcztUxa3a/kH9KlYVUyLT8vIubJB8aH7WGVlNSQ== X-Received: by 2002:ae9:e64a:0:b0:6b5:90cf:822 with SMTP id x10-20020ae9e64a000000b006b590cf0822mr2500695qkl.541.1657721618420; Wed, 13 Jul 2022 07:13:38 -0700 (PDT) Received: from localhost (cpe-174-109-172-136.nc.res.rr.com. [174.109.172.136]) by smtp.gmail.com with ESMTPSA id l20-20020a05622a175400b00304fa21762csm6778274qtk.53.2022.07.13.07.13.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Jul 2022 07:13:37 -0700 (PDT) From: Josef Bacik To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: "Kirill A . Shutemov" , Matthew Wilcox , stable@vger.kernel.org, Rik van Riel , Chris Mason Subject: [PATCH v2] mm: fix page leak with multiple threads mapping the same page Date: Wed, 13 Jul 2022 10:13:37 -0400 Message-Id: X-Mailer: git-send-email 2.26.3 MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657721619; a=rsa-sha256; cv=none; b=EQm7L9zxEp8p0aQo01jqPew/QnL2QudMkK7fjrN8pqhKuAm5txd0odSUbFCLreIA8b/hQg yrEGvh8ckEmWpU5WapU3pyJ0IL605bxIdd87sV8Ka7JYt4aOnqpL4U1AKzB16woM+5wJX1 AO9kY9/FHBl2YZE2+R+MrC1joBcnr5o= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=toxicpanda-com.20210112.gappssmtp.com header.s=20210112 header.b="z/dz1t+8"; dmarc=none; spf=none (imf02.hostedemail.com: domain of josef@toxicpanda.com has no SPF policy when checking 209.85.222.176) smtp.mailfrom=josef@toxicpanda.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657721619; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=zXRv0gtyHIIjpH4XHcFmhsbTOk+Z7GVu51i8mbaOjmU=; b=g/fgCXwX8u0q59+MfUC+xgpHk3yWIGSAnNnTKQ5Iq36owbgbZ4HaLLRUkINYAd+fqwC8tA gbod2W9aqFVSeqcDCE7TjC9IKYx6IhY/BpY2Kq3XF+lUsg0yDRqfwsdM/RmmIUEFFYhL/N 2/3NUpJ5ektqNPBFQAjsM0k8LfLlcDY= Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=toxicpanda-com.20210112.gappssmtp.com header.s=20210112 header.b="z/dz1t+8"; dmarc=none; spf=none (imf02.hostedemail.com: domain of josef@toxicpanda.com has no SPF policy when checking 209.85.222.176) smtp.mailfrom=josef@toxicpanda.com X-Rspam-User: X-Stat-Signature: sy8kadrgcpxfbom1bjfi86cpgxqr45ut X-Rspamd-Queue-Id: 7366480032 X-Rspamd-Server: rspam08 X-HE-Tag: 1657721619-405969 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We have an application with a lot of threads that use a shared mmap backed by tmpfs mounted with -o huge=within_size. This application started leaking loads of huge pages when we upgraded to a recent kernel. Using the page ref tracepoints and a BPF program written by Tejun Heo we were able to determine that these pages would have multiple refcounts from the page fault path, but when it came to unmap time we wouldn't drop the number of refs we had added from the faults. I wrote a reproducer that mmap'ed a file backed by tmpfs with -o huge=always, and then spawned 20 threads all looping faulting random offsets in this map, while using madvise(MADV_DONTNEED) randomly for huge page aligned ranges. This very quickly reproduced the problem. The problem here is that we check for the case that we have multiple threads faulting in a range that was previously unmapped. One thread maps the PMD, the other thread loses the race and then returns 0. However at this point we already have the page, and we are no longer putting this page into the processes address space, and so we leak the page. We actually did the correct thing prior to f9ce0be71d1f, however it looks like Kirill copied what we do in the anonymous page case. In the anonymous page case we don't yet have a page, so we don't have to drop a reference on anything. Previously we did the correct thing for file based faults by returning VM_FAULT_NOPAGE so we correctly drop the reference on the page we faulted in. Fix this by returning VM_FAULT_NOPAGE in the pmd_devmap_trans_unstable() case, this makes us drop the ref on the page properly, and now my reproducer no longer leaks the huge pages. Fixes: f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths") Cc: Kirill A. Shutemov Cc: Matthew Wilcox (Oracle) Cc: stable@vger.kernel.org Acked-by: Kirill A. Shutemov Signed-off-by: Josef Bacik Signed-off-by: Rik van Riel Signed-off-by: Chris Mason --- v1->v2: - Added Kirill's Acked-by. - Added cc:stable - Added a comment about why we need to return NOPAGE. mm/memory.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 7a089145cad4..207b29b09286 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4369,9 +4369,12 @@ vm_fault_t finish_fault(struct vm_fault *vmf) return VM_FAULT_OOM; } - /* See comment in handle_pte_fault() */ + /* + * See comment in handle_pte_fault() for how this scenario happens, we + * need to return NOPAGE so that we drop this page. + */ if (pmd_devmap_trans_unstable(vmf->pmd)) - return 0; + return VM_FAULT_NOPAGE; vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl);