From patchwork Tue Nov 13 05:49:19 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sasha Levin X-Patchwork-Id: 10679553 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AC4A713BB for ; Tue, 13 Nov 2018 05:50:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9A7BD2A19B for ; Tue, 13 Nov 2018 05:50:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8BAAB2A1A5; Tue, 13 Nov 2018 05:50:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 17C282A19B for ; Tue, 13 Nov 2018 05:50:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1839C6B0007; Tue, 13 Nov 2018 00:50:13 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 133846B0008; Tue, 13 Nov 2018 00:50:13 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F3C676B000A; Tue, 13 Nov 2018 00:50:12 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) by kanga.kvack.org (Postfix) with ESMTP id ADD006B0007 for ; Tue, 13 Nov 2018 00:50:12 -0500 (EST) Received: by mail-pg1-f199.google.com with SMTP id s22so7359668pgv.8 for ; Mon, 12 Nov 2018 21:50:12 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=wTosEECzgpOVxDYCqgqKI7W7oqGjn2m6YWtkl7ajxHI=; b=DQpL4seaNFHAdklblXXRW436Xunw944EfQ+hdPk+JL0rm+SzivFGlD9ruXNdBK38bn QqUIE2SjIYQ/qgftOVXzZC4NsoHyuMoq5dFGleVdr3FXZ6USg20ifuoB8treGyoe7aFM dfi85KVZJwR3DS4RRy+cKQ5eBI6DlMXgEgrUlNtT1SsGW2irTRZ+wlYhXrkQM1P3M0hE wXpu7Ehcd6hBR+Jw8/75wEwwyaP7mp6sEsYfB0ptlZl+KvABO/aNLugQ6RbhMA9rGlx+ KcjbcQD+nGD9Y8zJvBgpAjSHg7EKwlxG1cKSt7uErE1X1yNKStGadVqQScJ4u4m7Wjni h9MQ== X-Gm-Message-State: AGRZ1gJlI8oVf1Cebdp0zyvBbEWQ9rqEGM3g0pLTwVXih9Vpr1nSwG4g b4/qD5rHxijd2bgIyuQPv9J6SMVlG1MauROR7pIIqwB4j8j5nE2lrHFfsUNaKE2nsLOeYXWrFpS mjtFNhi2Dkh5GXtdAKg2wcEsdsQFLRVPuqU1Jom5bl53ykGBdea7v6Gjl2kNNEEs6Pw== X-Received: by 2002:a17:902:5e4:: with SMTP id f91-v6mr3765002plf.75.1542088212298; Mon, 12 Nov 2018 21:50:12 -0800 (PST) X-Google-Smtp-Source: AJdET5dI0rWQL9rtC4SKpvCNVGZ8m9mtVY3lhxmOzMD39Vo4bOxLsPd/6I9GNk0CW9yc+bl9JIm5 X-Received: by 2002:a17:902:5e4:: with SMTP id f91-v6mr3764966plf.75.1542088211464; Mon, 12 Nov 2018 21:50:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542088211; cv=none; d=google.com; s=arc-20160816; b=odshe/q8t7hAEHOa0yvV/NatTMFcsVD+f6q6Bcn8fm6bdaiQHq6JKyoGgl/SQjN6N4 ifIScQJI8wbK91VN4+r3C+RmVNn+iFx435V9MPJ+AVUwk4vuz71ZTGBvAbGfgDtqv6Lk rMyDs/BhZypNAzfut6zj3aszw/TxPKHo8o0I3UdGOpv3gPv32uCB7nQwFwu9yVzbyIIR y7+F3bJnicrTMPKPAz7A6dkbIDVsmuYBo7HMTOfJz1YYwIlGYarKsCQXyXDHPYjOTuoH Qjr0/0EB9m5aRbYSuQV7C3GxYPxni4KCMRJeER8a/sJeR/oAA953WSWMw5PvQlnZtCua MkJA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=wTosEECzgpOVxDYCqgqKI7W7oqGjn2m6YWtkl7ajxHI=; b=HMTQiJIS8bnyPoHfLuD9DU4l27i8WoZPtOdzSvzanqH7nNtrLOREieiOxUMRQNIggq NcGXRV9ouVPDriGthEB0FbT3fLAwEG2Wuskj5Fk9EJHoTV5jWuTLUjmwgTqYuE3i/CCn 20MgneqT+eFZPHh819OeMZU0jyttH1MWQ8/nBW+9+u38LfcT40Idon6Na3lyeEezgFOQ m3h12BIJ7tkOO441TuQscuByrK0FhgXtR4Tdyap4JFtD0zf4gMIiM4TtAz/FA9Vt131y 2an0RaPmMp/uzlhza+YrcpfCNBV5f1tcNIutPiWRKOwpWhYLYEuISJBaI64EeQ8qlzUT PsOg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=XdVSwxaH; spf=pass (google.com: domain of sashal@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=sashal@kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id k11si19315938pgf.213.2018.11.12.21.50.11 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 12 Nov 2018 21:50:11 -0800 (PST) Received-SPF: pass (google.com: domain of sashal@kernel.org designates 198.145.29.99 as permitted sender) client-ip=198.145.29.99; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=XdVSwxaH; spf=pass (google.com: domain of sashal@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=sashal@kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from sasha-vm.mshome.net (unknown [64.114.255.114]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id DAE5C21780; Tue, 13 Nov 2018 05:50:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1542088211; bh=nuKSvICqPWuXkdr14PpQ2PIFqBt6SzgOfjffwNRA+J0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=XdVSwxaHi86+9MEmaSH7gGzA6x1fc6JUMEhnnEdxZoIUnoXqA5dVdOzazIECWPIv8 lA60+PmQDBDtD28gXYpcrQJKnAbQEobYDiDKv5AbH+iUNVqU8+uFSCidomW4P4Y31c 8hB6e4V2MevkvptlFyRLR7Ef1DBd0bHNRJbBqNwg= From: Sasha Levin To: stable@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jan Kara , Ross Zwisler , Dan Williams , Dave Jiang , Andrew Morton , Linus Torvalds , Sasha Levin , linux-mm@kvack.org Subject: [PATCH AUTOSEL 4.19 13/44] mm: Fix warning in insert_pfn() Date: Tue, 13 Nov 2018 00:49:19 -0500 Message-Id: <20181113054950.77898-13-sashal@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181113054950.77898-1-sashal@kernel.org> References: <20181113054950.77898-1-sashal@kernel.org> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jan Kara [ Upstream commit f2c57d91b0d96aa13ccff4e3b178038f17b00658 ] In DAX mode a write pagefault can race with write(2) in the following way: CPU0 CPU1 write fault for mapped zero page (hole) dax_iomap_rw() iomap_apply() xfs_file_iomap_begin() - allocates blocks dax_iomap_actor() invalidate_inode_pages2_range() - invalidates radix tree entries in given range dax_iomap_pte_fault() grab_mapping_entry() - no entry found, creates empty ... xfs_file_iomap_begin() - finds already allocated block ... vmf_insert_mixed_mkwrite() - WARNs and does nothing because there is still zero page mapped in PTE unmap_mapping_pages() This race results in WARN_ON from insert_pfn() and is occasionally triggered by fstest generic/344. Note that the race is otherwise harmless as before write(2) on CPU0 is finished, we will invalidate page tables properly and thus user of mmap will see modified data from write(2) from that point on. So just restrict the warning only to the case when the PFN in PTE is not zero page. Link: http://lkml.kernel.org/r/20180824154542.26872-1-jack@suse.cz Signed-off-by: Jan Kara Reviewed-by: Andrew Morton Cc: Ross Zwisler Cc: Dan Williams Cc: Dave Jiang Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- mm/memory.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index c467102a5cbc..d988bae46479 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1787,10 +1787,15 @@ static int insert_pfn(struct vm_area_struct *vma, unsigned long addr, * in may not match the PFN we have mapped if the * mapped PFN is a writeable COW page. In the mkwrite * case we are creating a writable PTE for a shared - * mapping and we expect the PFNs to match. + * mapping and we expect the PFNs to match. If they + * don't match, we are likely racing with block + * allocation and mapping invalidation so just skip the + * update. */ - if (WARN_ON_ONCE(pte_pfn(*pte) != pfn_t_to_pfn(pfn))) + if (pte_pfn(*pte) != pfn_t_to_pfn(pfn)) { + WARN_ON_ONCE(!is_zero_pfn(pte_pfn(*pte))); goto out_unlock; + } entry = *pte; goto out_mkwrite; } else From patchwork Tue Nov 13 05:49:20 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sasha Levin X-Patchwork-Id: 10679555 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8327D13B5 for ; Tue, 13 Nov 2018 05:50:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 713892A265 for ; Tue, 13 Nov 2018 05:50:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6D7A02A264; Tue, 13 Nov 2018 05:50:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5DC772A278 for ; Tue, 13 Nov 2018 05:50:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2973E6B000A; Tue, 13 Nov 2018 00:50:18 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2439E6B000C; Tue, 13 Nov 2018 00:50:18 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 132446B000D; Tue, 13 Nov 2018 00:50:18 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) by kanga.kvack.org (Postfix) with ESMTP id C6B1E6B000A for ; Tue, 13 Nov 2018 00:50:17 -0500 (EST) Received: by mail-pg1-f197.google.com with SMTP id r16-v6so7354273pgv.17 for ; Mon, 12 Nov 2018 21:50:17 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=NHSzKSBqrnfFfTWANbnmqZHlqoF4hfewylTH+2lYrUA=; b=crKQzb8tuhrF6Z7IcszeZ7roQJYKmfY2HBaImZr7mI4P64Ytg/dMxmjrZhyInMtOb0 o4siNmHD20XlJ/3sZGJ0nxbga9wYW4hUHViQSDCgEDMQELw4X3bShXPpqKqTSiNAjhdR Chx6Emuhpi6ogtnikoYNAvf+ZhLr64wnFW6yFsqYaOZnbMpGJJJa3E6BIuNF0B/tLkf5 voPiMnF6ZAPVYFkSEmZUBk07eJWJLgijox6ANjP5e7D9qcVDWSYy4pQ1WMCitUciCnWr DFE0R+DYkdxddVUgEU86lf2EWSDg5fZJx7M4u8/Y0np+CcAz10O8PGsdrYYAh4+48V3u I5qg== X-Gm-Message-State: AGRZ1gKV4HsJWuROmI61I+6rofzE9cVrN2w521THNdqSF/B9YBHTMM69 q8rYQnbGSDGrgH2lWvvdRbaqZOiWrospfEyOQ6h+Dhs59UFoHxemqOmwvUwTU+uzJEE8OAnp98b Pqa0gJtC158FIhyajDTrUFfIrL9/pJHykvfuR15wRAQFS56D4kGKc2kIPvsOavi2Ayw== X-Received: by 2002:a63:e80e:: with SMTP id s14mr3533309pgh.30.1542088217436; Mon, 12 Nov 2018 21:50:17 -0800 (PST) X-Google-Smtp-Source: AJdET5eyOXN8ilggc1C6A4mFrpNCZm926dJ9AeYQg8v1L0XQ2Azm0CunosYFhl00m4eALxRAiHnb X-Received: by 2002:a63:e80e:: with SMTP id s14mr3533272pgh.30.1542088216410; Mon, 12 Nov 2018 21:50:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542088216; cv=none; d=google.com; s=arc-20160816; b=fqJemSODsiEW+7VisjJznqkqBfltZ55j93fEvQalaAXK43XSkgAac1txI78VJt93Dc +JII59lGhjC9lfI9vFM/bj6Bi1lcmKUpFE9RijoWUuLSRhyvI1tTvMkxMHBfekFHvf7q R1ZoIWly9KvP8tKJ57fWPuZ+kprMp2t/C2Kx5vboRjVl4HGe05eiazuOhcXNbxoPNya4 6pmoM2ywbzXMKx296HP3PmG8XQXez5hUoSWL5ppsjuMoxPIzJ0rX2fPn6uGLBRjlno50 OwLNtBYMgW5pxD1b8UCObKBTit8EQcv4Kmrnu4bZpP3CZJPKJFfBtbmP7lDEkjBUKqSN MUjA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=NHSzKSBqrnfFfTWANbnmqZHlqoF4hfewylTH+2lYrUA=; b=djD+RH1euYCwiDn6iyVT6QzsVo7gSN/e9Q+ZfuZYoknjTgz+eOTgtPds0rsA4LCO1u 3pT6Ly7kDgXMXvOnAFCz2nFqmuGKSBCr1L6TQjOXDUuq6ljWCSZahO4q7QHDxhvyWSbe 21U/7+SkzrN/usQ96J3YSOcZNWGV0s/mXqHKybBS2ivHZaw5WHdNoW/PeSfWKvpKlniC Vfl+zHHKymcaIn0R7D8EP5QhZi9Kfz37JCW58NRm6xkUhZzQZ8a2dIyQQy/73H7fPtL3 x9/gGNYxWbDPJJ8dewM+NaKVVmdCdm2+USWXS9Y35p+JmgJnzkJCWHPSg/zmE1jPQmws aTyA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="VOBf9Km/"; spf=pass (google.com: domain of sashal@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=sashal@kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id g10si4845082pll.428.2018.11.12.21.50.16 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 12 Nov 2018 21:50:16 -0800 (PST) Received-SPF: pass (google.com: domain of sashal@kernel.org designates 198.145.29.99 as permitted sender) client-ip=198.145.29.99; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="VOBf9Km/"; spf=pass (google.com: domain of sashal@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=sashal@kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from sasha-vm.mshome.net (unknown [64.114.255.114]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 9F34021780; Tue, 13 Nov 2018 05:50:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1542088216; bh=k5BRm/Plv9zqwGKn0L7kbpaXnw5mHYf/07ZW27tGo/Q=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=VOBf9Km/wwoCEGOSy5rl6BSFaRd/vgEpFBUNsTt3WvBcVk/dPf9LLKCISdeGy5hmk S+m7quNTffwluFsxMK4LVO6PodGnAFB29uizE4LFlR9svztj9s2Rt4tsKHMA9tlLPp DAzciu2JGIAtc5KptmpotCGFyOEx95YkiUXjj8c0= From: Sasha Levin To: stable@vger.kernel.org, linux-kernel@vger.kernel.org Cc: David Hildenbrand , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , "Rafael J. Wysocki" , Len Brown , Greg Kroah-Hartman , Boris Ostrovsky , Juergen Gross , Nathan Fontenot , John Allen , Michal Hocko , Dan Williams , Joonsoo Kim , Vlastimil Babka , Mathieu Malaterre , Pavel Tatashin , YASUAKI ISHIMATSU , Balbir Singh , Haiyang Zhang , Heiko Carstens , Jonathan Corbet , Kate Stewart , "K. Y. Srinivasan" , Martin Schwidefsky , Michael Neuling , Philippe Ombredanne , Stephen Hemminger , Thomas Gleixner , Andrew Morton , Linus Torvalds , Sasha Levin , linuxppc-dev@lists.ozlabs.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH AUTOSEL 4.19 14/44] mm/memory_hotplug: make add_memory() take the device_hotplug_lock Date: Tue, 13 Nov 2018 00:49:20 -0500 Message-Id: <20181113054950.77898-14-sashal@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181113054950.77898-1-sashal@kernel.org> References: <20181113054950.77898-1-sashal@kernel.org> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: David Hildenbrand [ Upstream commit 8df1d0e4a265f25dc1e7e7624ccdbcb4a6630c89 ] add_memory() currently does not take the device_hotplug_lock, however is aleady called under the lock from arch/powerpc/platforms/pseries/hotplug-memory.c drivers/acpi/acpi_memhotplug.c to synchronize against CPU hot-remove and similar. In general, we should hold the device_hotplug_lock when adding memory to synchronize against online/offline request (e.g. from user space) - which already resulted in lock inversions due to device_lock() and mem_hotplug_lock - see 30467e0b3be ("mm, hotplug: fix concurrent memory hot-add deadlock"). add_memory()/add_memory_resource() will create memory block devices, so this really feels like the right thing to do. Holding the device_hotplug_lock makes sure that a memory block device can really only be accessed (e.g. via .online/.state) from user space, once the memory has been fully added to the system. The lock is not held yet in drivers/xen/balloon.c arch/powerpc/platforms/powernv/memtrace.c drivers/s390/char/sclp_cmd.c drivers/hv/hv_balloon.c So, let's either use the locked variants or take the lock. Don't export add_memory_resource(), as it once was exported to be used by XEN, which is never built as a module. If somebody requires it, we also have to export a locked variant (as device_hotplug_lock is never exported). Link: http://lkml.kernel.org/r/20180925091457.28651-3-david@redhat.com Signed-off-by: David Hildenbrand Reviewed-by: Pavel Tatashin Reviewed-by: Rafael J. Wysocki Reviewed-by: Rashmica Gupta Reviewed-by: Oscar Salvador Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: "Rafael J. Wysocki" Cc: Len Brown Cc: Greg Kroah-Hartman Cc: Boris Ostrovsky Cc: Juergen Gross Cc: Nathan Fontenot Cc: John Allen Cc: Michal Hocko Cc: Dan Williams Cc: Joonsoo Kim Cc: Vlastimil Babka Cc: Mathieu Malaterre Cc: Pavel Tatashin Cc: YASUAKI ISHIMATSU Cc: Balbir Singh Cc: Haiyang Zhang Cc: Heiko Carstens Cc: Jonathan Corbet Cc: Kate Stewart Cc: "K. Y. Srinivasan" Cc: Martin Schwidefsky Cc: Michael Neuling Cc: Philippe Ombredanne Cc: Stephen Hemminger Cc: Thomas Gleixner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- .../platforms/pseries/hotplug-memory.c | 2 +- drivers/acpi/acpi_memhotplug.c | 2 +- drivers/base/memory.c | 9 ++++++-- drivers/xen/balloon.c | 3 +++ include/linux/memory_hotplug.h | 1 + mm/memory_hotplug.c | 22 ++++++++++++++++--- 6 files changed, 32 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c index c1578f54c626..79e074eac486 100644 --- a/arch/powerpc/platforms/pseries/hotplug-memory.c +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c @@ -702,7 +702,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb) nid = memory_add_physaddr_to_nid(lmb->base_addr); /* Add the memory */ - rc = add_memory(nid, lmb->base_addr, block_sz); + rc = __add_memory(nid, lmb->base_addr, block_sz); if (rc) { dlpar_remove_device_tree_lmb(lmb); return rc; diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c index 6b0d3ef7309c..2ccfbb61ca89 100644 --- a/drivers/acpi/acpi_memhotplug.c +++ b/drivers/acpi/acpi_memhotplug.c @@ -228,7 +228,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device) if (node < 0) node = memory_add_physaddr_to_nid(info->start_addr); - result = add_memory(node, info->start_addr, info->length); + result = __add_memory(node, info->start_addr, info->length); /* * If the memory block has been used by the kernel, add_memory() diff --git a/drivers/base/memory.c b/drivers/base/memory.c index 817320c7c4c1..40cac122ec73 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -519,15 +519,20 @@ memory_probe_store(struct device *dev, struct device_attribute *attr, if (phys_addr & ((pages_per_block << PAGE_SHIFT) - 1)) return -EINVAL; + ret = lock_device_hotplug_sysfs(); + if (ret) + goto out; + nid = memory_add_physaddr_to_nid(phys_addr); - ret = add_memory(nid, phys_addr, - MIN_MEMORY_BLOCK_SIZE * sections_per_block); + ret = __add_memory(nid, phys_addr, + MIN_MEMORY_BLOCK_SIZE * sections_per_block); if (ret) goto out; ret = count; out: + unlock_device_hotplug(); return ret; } diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c index e12bb256036f..6bab019a82b1 100644 --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -395,7 +395,10 @@ static enum bp_state reserve_additional_memory(void) * callers drop the mutex before trying again. */ mutex_unlock(&balloon_mutex); + /* add_memory_resource() requires the device_hotplug lock */ + lock_device_hotplug(); rc = add_memory_resource(nid, resource, memhp_auto_online); + unlock_device_hotplug(); mutex_lock(&balloon_mutex); if (rc) { diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 34a28227068d..16487052017d 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -322,6 +322,7 @@ static inline void remove_memory(int nid, u64 start, u64 size) {} extern void __ref free_area_init_core_hotplug(int nid); extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn, void *arg, int (*func)(struct memory_block *, void *)); +extern int __add_memory(int nid, u64 start, u64 size); extern int add_memory(int nid, u64 start, u64 size); extern int add_memory_resource(int nid, struct resource *resource, bool online); extern int arch_add_memory(int nid, u64 start, u64 size, diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 38d94b703e9d..3e42226407c7 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1111,7 +1111,12 @@ static int online_memory_block(struct memory_block *mem, void *arg) return device_online(&mem->dev); } -/* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */ +/* + * NOTE: The caller must call lock_device_hotplug() to serialize hotplug + * and online/offline operations (triggered e.g. by sysfs). + * + * we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG + */ int __ref add_memory_resource(int nid, struct resource *res, bool online) { u64 start, size; @@ -1180,9 +1185,9 @@ int __ref add_memory_resource(int nid, struct resource *res, bool online) mem_hotplug_done(); return ret; } -EXPORT_SYMBOL_GPL(add_memory_resource); -int __ref add_memory(int nid, u64 start, u64 size) +/* requires device_hotplug_lock, see add_memory_resource() */ +int __ref __add_memory(int nid, u64 start, u64 size) { struct resource *res; int ret; @@ -1196,6 +1201,17 @@ int __ref add_memory(int nid, u64 start, u64 size) release_memory_resource(res); return ret; } + +int add_memory(int nid, u64 start, u64 size) +{ + int rc; + + lock_device_hotplug(); + rc = __add_memory(nid, start, size); + unlock_device_hotplug(); + + return rc; +} EXPORT_SYMBOL_GPL(add_memory); #ifdef CONFIG_MEMORY_HOTREMOVE From patchwork Tue Nov 13 05:49:24 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sasha Levin X-Patchwork-Id: 10679557 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AA1F413BB for ; Tue, 13 Nov 2018 05:50:27 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9A5812A1E4 for ; Tue, 13 Nov 2018 05:50:27 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8ECB22A26E; Tue, 13 Nov 2018 05:50:27 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E4B652A278 for ; Tue, 13 Nov 2018 05:50:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CF30A6B000D; Tue, 13 Nov 2018 00:50:25 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id CA3B26B000E; Tue, 13 Nov 2018 00:50:25 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BBE2A6B0010; Tue, 13 Nov 2018 00:50:25 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) by kanga.kvack.org (Postfix) with ESMTP id 757CE6B000D for ; Tue, 13 Nov 2018 00:50:25 -0500 (EST) Received: by mail-pg1-f197.google.com with SMTP id 18-v6so7386048pgn.4 for ; Mon, 12 Nov 2018 21:50:25 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=T/FexRWnLKADXhFxi2gzLlqktSrjQHqWaORBypfakLg=; b=FmYrp5DdltkZgtDTabMcaSkl9RYe4bn51SllrBxtC/lPQhLwh37SN+JgB1rrBpYVuz +JVPB4Sf3DKV7BL1N2GyUxjuvcdbsyBH4yZtwcez3I8Ruf83ddv5GaUaAr2irs9GJL21 ClRXEHOmd/St7Q/oLZIVzcqqazXGEi4tkRvMl6GBvje+/86DIoqyGL3JFlQIJjTGVSh1 vsLkOBQUTfx45zKO9MRH0euDjcZ5QMz3TNWgSK5cDMSUeWhxjaXbCopwkdboA56SRcjk ActAvBJHg20gZkufY//o1MWZB62bRj63UeoRswDxdSKrgXPM5QRGS/CROB6P6TAy94a2 xV9w== X-Gm-Message-State: AGRZ1gIzQ7XHsFrL601L/V4KpM9zGU4aDdzZc5k8AscHz32SvJXu9kPK qTlr1NtZGLjzkqCaFdLysHNkRJIn2ipq8Tz9mEbYjTf2gerGsZTAq5v4NQFSAIu9veVUmJPL0YR pnJ1qIj3ktgax9B+gE/l6gPRYyUveAX7zLJJmhBuA6rgGiYh6SB7m162SROifB2PN5A== X-Received: by 2002:a63:b81a:: with SMTP id p26mr3574177pge.433.1542088225163; Mon, 12 Nov 2018 21:50:25 -0800 (PST) X-Google-Smtp-Source: AJdET5fwHVHF4TAZdAVXd9EYNqv0Iq5EHuRvt4ljHCFGOUEYTh0JMaup927rNBCVFSTR+jXTtNzd X-Received: by 2002:a63:b81a:: with SMTP id p26mr3574153pge.433.1542088224281; Mon, 12 Nov 2018 21:50:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542088224; cv=none; d=google.com; s=arc-20160816; b=wIRAWAaZTQlFqoKOLqRhPXySdhcdQZz002cUGrcOn1+j2pJ9zV/zAyw5jdVbPqZzfp +2BSg7QDE57CWAysgUKF6NhawdnPUzpgYkdJpa/uNR4Oz0cnCpX6uju0iwHXIjD7iZ0h fSYY3XSlgwcgLqyGvPzHIViVWgsj1iW8H34efiZPxbY9bvNVaJKxXt6L0rlnQzHWAUhR yyVG/it7zfeDO6FYv2MsEo7SsfljrzTaUx3o6Y8D3meWgHFPuvgFMs+CShGQbDl994T+ 4KM6TadCmZwbNbiuzE0NLJ33nxvLj/yWP9QK1sscozZFJ6p5NMFMer1zDTNEVl2RHDD9 lpKg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=T/FexRWnLKADXhFxi2gzLlqktSrjQHqWaORBypfakLg=; b=xdV58SYoDzwmIRsLOQeDrngI8FK0HWdlc16bHjHADB5bu1PJbyZydbZAOlZ8e1oZ0l e5IppfMdC6ABhLapixLx/4MOtYOy2OMGAR9YACrqQZsJG8NmFVHC1XKYgi6FDe6ghiBB flJhWM6ApN8kHtCHW+nrUFc9W+uSkvqy62A7wpUY5Ggtr6S8cBSJna2qAtnd5d/aurua IxJYoy3/VZWGKqRU74KfYHTRR0TmvSLt1RLezdJ7cldUnQErzc6dM5ZnZ1ucEV8HfnDe nsyD44u6XO4CPDPVT55S43wRNP3qNR53HrIpyISZpp1HiBQL+wLN3/8AlV3KIHfHwq54 e9GQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=mnsDi7Gt; spf=pass (google.com: domain of sashal@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=sashal@kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id z6si18392570pgl.109.2018.11.12.21.50.24 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 12 Nov 2018 21:50:24 -0800 (PST) Received-SPF: pass (google.com: domain of sashal@kernel.org designates 198.145.29.99 as permitted sender) client-ip=198.145.29.99; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=mnsDi7Gt; spf=pass (google.com: domain of sashal@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=sashal@kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from sasha-vm.mshome.net (unknown [64.114.255.114]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id D000A22510; Tue, 13 Nov 2018 05:50:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1542088224; bh=7mQEjIV2IFnljtef90NPlS5B4LarDbNuW7HZDt0nVlo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=mnsDi7GtpgGTmrhDTQX6T5j9Gx5DDt1Ijgq1ajOci6g/oA7TWUw5eQat9zoOidIvN Fn0YTimNycRl2qjhef9/LOb0LSsD2ff8eIjltsB2UIU2bB/3fqhfAfUdP9SligkKYr iWlzP4KGeP87nr0cK0sTExR0RahAbohG2T+Likx8= From: Sasha Levin To: stable@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Dan Carpenter , Stephen Rothwell , Keith Busch , "Michael S. Tsirkin" , Kees Cook , YueHaibing , Andrew Morton , Linus Torvalds , Sasha Levin , linux-mm@kvack.org Subject: [PATCH AUTOSEL 4.19 18/44] mm/gup_benchmark.c: prevent integer overflow in ioctl Date: Tue, 13 Nov 2018 00:49:24 -0500 Message-Id: <20181113054950.77898-18-sashal@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181113054950.77898-1-sashal@kernel.org> References: <20181113054950.77898-1-sashal@kernel.org> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Dan Carpenter [ Upstream commit 4b408c74ee5a0b74fc9265c2fe39b0e7dec7c056 ] The concern here is that "gup->size" is a u64 and "nr_pages" is unsigned long. On 32 bit systems we could trick the kernel into allocating fewer pages than expected. Link: http://lkml.kernel.org/r/20181025061546.hnhkv33diogf2uis@kili.mountain Fixes: 64c349f4ae78 ("mm: add infrastructure for get_user_pages_fast() benchmarking") Signed-off-by: Dan Carpenter Acked-by: Kirill A. Shutemov Reviewed-by: Andrew Morton Cc: Stephen Rothwell Cc: Keith Busch Cc: "Michael S. Tsirkin" Cc: Kees Cook Cc: YueHaibing Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- mm/gup_benchmark.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/mm/gup_benchmark.c b/mm/gup_benchmark.c index 7405c9d89d65..7e6f2d2dafb5 100644 --- a/mm/gup_benchmark.c +++ b/mm/gup_benchmark.c @@ -23,6 +23,9 @@ static int __gup_benchmark_ioctl(unsigned int cmd, int nr; struct page **pages; + if (gup->size > ULONG_MAX) + return -EINVAL; + nr_pages = gup->size / PAGE_SIZE; pages = kvcalloc(nr_pages, sizeof(void *), GFP_KERNEL); if (!pages) From patchwork Tue Nov 13 05:49:43 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sasha Levin X-Patchwork-Id: 10679565 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 466C413BB for ; Tue, 13 Nov 2018 05:50:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 35DEE2A2CD for ; Tue, 13 Nov 2018 05:50:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2A2892A276; Tue, 13 Nov 2018 05:50:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6BEA72A19B for ; Tue, 13 Nov 2018 05:50:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6632D6B0010; Tue, 13 Nov 2018 00:50:41 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 613316B0266; Tue, 13 Nov 2018 00:50:41 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 52A606B0269; Tue, 13 Nov 2018 00:50:41 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id 116E06B0010 for ; Tue, 13 Nov 2018 00:50:41 -0500 (EST) Received: by mail-pf1-f200.google.com with SMTP id l15-v6so9570870pff.5 for ; Mon, 12 Nov 2018 21:50:41 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=lnd7r9boe2fj9KO1gDE/0zibZIbMLjFGfeIFh5zAW8g=; b=lCwptUCyVOwNN9fD5pvXv0gVNJdPk9InQb46fl/CFrW2GcBA8NcbTEYXismcO/yGN3 /UAp0W2R4NwKwA1GRdqf7Le7LjmHDsfTDstap35PxNv+3+tytM0Y0Bo3je0w+a7HcePS vLa3OJ0ngDvmeRYGrXhGtFLCDP7pvDRAYsviWgGPR9ZtjTw1kU4ZUE/TX3miNoNoJwC6 FhhigLG7MgQL7nT6t79x2xXzdFX+WRtIqRrlPPBDvpMi/8MbJe5vA5Fo43EzeZKwQged kSoHQ9yHw/gn70aRy+8Ebx/hoYUFNxSfM3DQ6J1U3MhD5gfAoouJ1w79hxpMaAcQcs1y U6PQ== X-Gm-Message-State: AGRZ1gKQqqhhxwpiRIfeSFxbei3KEYfnhDPlXobI02UiTqdTY2gmdfR4 lS7E3CIsQPYYGlvdBXdfRAhHsoyN9EFxKdqYLWjtUuY0dxxgQvU64J72EZalL9/cjcRAknVwg57 BhRQhbHX709GL8bOL2QAy1+LiC+MpplORqCgK7o5yLs5ZbTVQah7j0xuYYS8suYgPxA== X-Received: by 2002:a65:55ca:: with SMTP id k10mr3485075pgs.448.1542088240725; Mon, 12 Nov 2018 21:50:40 -0800 (PST) X-Google-Smtp-Source: AJdET5fsaT5Y5XqmDH92V2dLPEejW6/O7TRDxac5DemtR0bWwQdfOXrPbfc1pUzqaXJeupuzv42E X-Received: by 2002:a65:55ca:: with SMTP id k10mr3485033pgs.448.1542088239362; Mon, 12 Nov 2018 21:50:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542088239; cv=none; d=google.com; s=arc-20160816; b=aBT8IwJ3c0X9zhD75ja6Ki2tuB/ZIQHkZCeZoEC7CdRCIgdX6zyB75tjTHDHlB/MdP Zdrii45mdc9Gi9Yh31b4hOZfakOHRu0fWoo10AOVfCDbcjtVyTD714SycnrADvd0bFMX 6YQUfi2UJ3Dp95oQWFRdbTaw6C7eONr9aR7FldSmChp4ORB3Wv4n0jAuqnnW5hJ/Tpd7 a84NXRnQQFZa00fL6cZ4s64oXVnVrvpssibTyp8k+y3HSFhrvAs5Lh7tBD5DpE811Z0g n671oEL0E171wj0e4f+N6SDMv86B6mEPOWM9O3+CCRL6pDwkWaW875I5dY+2sfTnoHPq 4G3w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=lnd7r9boe2fj9KO1gDE/0zibZIbMLjFGfeIFh5zAW8g=; b=oV6UmytAY0z0OTwrQtZok12RtVUHWCxDfPccmODtZa0yIAfDaUGmsQfB5F2pNPq5Yi KO7kvlwt/OX/KIDoi5wbpvTs4dUBnUX907SjgCiWvfY99S5BhxqW8G2g0Ev3gHkm80WU YiIAcwnJv9iksh6m3VKgNYwnVBqI6YdbKvt0AyOMgjuolUyEGcJRTQ9uWsuTU0rvHJfz oolQqx7S1W7E8mHq4CZQBGoSHWqJX2zsf6+F0JdLHENbBpkPPODrawX26I4SsoFKwRef rzK89mWoxsINf7ekSUi7sgfednzGceA0h+kQ1iYO+JW4KXBCkuUYSPxut6x5TN7mu7Hn 3Imw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=QDbZRHQC; spf=pass (google.com: domain of sashal@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=sashal@kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id b18si20041795pgj.399.2018.11.12.21.50.39 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 12 Nov 2018 21:50:39 -0800 (PST) Received-SPF: pass (google.com: domain of sashal@kernel.org designates 198.145.29.99 as permitted sender) client-ip=198.145.29.99; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=QDbZRHQC; spf=pass (google.com: domain of sashal@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=sashal@kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from sasha-vm.mshome.net (unknown [64.114.255.114]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id E9D2F22513; Tue, 13 Nov 2018 05:50:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1542088239; bh=6mLMxvWq1b5Wgb1O0XaUhDi1D4EP82leXGTlxVcdTvQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=QDbZRHQC2QDEItdXIHN2c+j/94iJx1z2bliP+SqG6WFjcCMdB7DMC1JC2uq2di6Dt NT9gvR0JmwhIPjkFqXDN+y5dLFu74TKAQtuiPkncPAKdPex6qPnTPh+VBRNRYjWU8B 9jWvQbPr4zEB+J8pP2su6mNnY0fl8sxU+d9wc9rU= From: Sasha Levin To: stable@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Andrea Arcangeli , Jerome Glisse , Andrew Morton , Linus Torvalds , Sasha Levin , linux-mm@kvack.org Subject: [PATCH AUTOSEL 4.19 37/44] mm: thp: fix MADV_DONTNEED vs migrate_misplaced_transhuge_page race condition Date: Tue, 13 Nov 2018 00:49:43 -0500 Message-Id: <20181113054950.77898-37-sashal@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181113054950.77898-1-sashal@kernel.org> References: <20181113054950.77898-1-sashal@kernel.org> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Andrea Arcangeli [ Upstream commit d7c3393413fe7e7dc54498ea200ea94742d61e18 ] Patch series "migrate_misplaced_transhuge_page race conditions". Aaron found a new instance of the THP MADV_DONTNEED race against pmdp_clear_flush* variants, that was apparently left unfixed. While looking into the race found by Aaron, I may have found two more issues in migrate_misplaced_transhuge_page. These race conditions would not cause kernel instability, but they'd corrupt userland data or leave data non zero after MADV_DONTNEED. I did only minor testing, and I don't expect to be able to reproduce this (especially the lack of ->invalidate_range before migrate_page_copy, requires the latest iommu hardware or infiniband to reproduce). The last patch is noop for x86 and it needs further review from maintainers of archs that implement flush_cache_range() (not in CC yet). To avoid confusion, it's not the first patch that introduces the bug fixed in the second patch, even before removing the pmdp_huge_clear_flush_notify, that _notify suffix was called after migrate_page_copy already run. This patch (of 3): This is a corollary of ced108037c2aa ("thp: fix MADV_DONTNEED vs. numa balancing race"), 58ceeb6bec8 ("thp: fix MADV_DONTNEED vs. MADV_FREE race") and 5b7abeae3af8c ("thp: fix MADV_DONTNEED vs clear soft dirty race). When the above three fixes where posted Dave asked https://lkml.kernel.org/r/929b3844-aec2-0111-fef7-8002f9d4e2b9@intel.com but apparently this was missed. The pmdp_clear_flush* in migrate_misplaced_transhuge_page() was introduced in a54a407fbf7 ("mm: Close races between THP migration and PMD numa clearing"). The important part of such commit is only the part where the page lock is not released until the first do_huge_pmd_numa_page() finished disarming the pagenuma/protnone. The addition of pmdp_clear_flush() wasn't beneficial to such commit and there's no commentary about such an addition either. I guess the pmdp_clear_flush() in such commit was added just in case for safety, but it ended up introducing the MADV_DONTNEED race condition found by Aaron. At that point in time nobody thought of such kind of MADV_DONTNEED race conditions yet (they were fixed later) so the code may have looked more robust by adding the pmdp_clear_flush(). This specific race condition won't destabilize the kernel, but it can confuse userland because after MADV_DONTNEED the memory won't be zeroed out. This also optimizes the code and removes a superfluous TLB flush. [akpm@linux-foundation.org: reflow comment to 80 cols, fix grammar and typo (beacuse)] Link: http://lkml.kernel.org/r/20181013002430.698-2-aarcange@redhat.com Signed-off-by: Andrea Arcangeli Reported-by: Aaron Tomlin Acked-by: Mel Gorman Acked-by: Kirill A. Shutemov Cc: Jerome Glisse Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- mm/migrate.c | 25 ++++++++++++++++++------- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index 84381b55b2bd..1f634b1563b6 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2029,15 +2029,26 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); /* - * Clear the old entry under pagetable lock and establish the new PTE. - * Any parallel GUP will either observe the old page blocking on the - * page lock, block on the page table lock or observe the new page. - * The SetPageUptodate on the new page and page_add_new_anon_rmap - * guarantee the copy is visible before the pagetable update. + * Overwrite the old entry under pagetable lock and establish + * the new PTE. Any parallel GUP will either observe the old + * page blocking on the page lock, block on the page table + * lock or observe the new page. The SetPageUptodate on the + * new page and page_add_new_anon_rmap guarantee the copy is + * visible before the pagetable update. */ flush_cache_range(vma, mmun_start, mmun_end); page_add_anon_rmap(new_page, vma, mmun_start, true); - pmdp_huge_clear_flush_notify(vma, mmun_start, pmd); + /* + * At this point the pmd is numa/protnone (i.e. non present) and the TLB + * has already been flushed globally. So no TLB can be currently + * caching this non present pmd mapping. There's no need to clear the + * pmd before doing set_pmd_at(), nor to flush the TLB after + * set_pmd_at(). Clearing the pmd here would introduce a race + * condition against MADV_DONTNEED, because MADV_DONTNEED only holds the + * mmap_sem for reading. If the pmd is set to NULL at any given time, + * MADV_DONTNEED won't wait on the pmd lock and it'll skip clearing this + * pmd. + */ set_pmd_at(mm, mmun_start, pmd, entry); update_mmu_cache_pmd(vma, address, &entry); @@ -2051,7 +2062,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, * No need to double call mmu_notifier->invalidate_range() callback as * the above pmdp_huge_clear_flush_notify() did already call it. */ - mmu_notifier_invalidate_range_only_end(mm, mmun_start, mmun_end); + mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end); /* Take an "isolate" reference and put new page on the LRU. */ get_page(new_page); From patchwork Tue Nov 13 05:49:44 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sasha Levin X-Patchwork-Id: 10679567 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 41D0813B5 for ; Tue, 13 Nov 2018 05:50:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 319DF29BCF for ; Tue, 13 Nov 2018 05:50:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2E6D529A9E; Tue, 13 Nov 2018 05:50:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6D7842A1E4 for ; Tue, 13 Nov 2018 05:50:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3321E6B0269; Tue, 13 Nov 2018 00:50:45 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2E21A6B026A; Tue, 13 Nov 2018 00:50:45 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1CF2B6B026B; Tue, 13 Nov 2018 00:50:45 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f198.google.com (mail-pl1-f198.google.com [209.85.214.198]) by kanga.kvack.org (Postfix) with ESMTP id CB83C6B0269 for ; Tue, 13 Nov 2018 00:50:44 -0500 (EST) Received: by mail-pl1-f198.google.com with SMTP id 34-v6so8774495plf.6 for ; Mon, 12 Nov 2018 21:50:44 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=icZ/MKoXsDC/7KmA0/KbugdO6zJP04VUVYyQP1E6eww=; b=fIu4LK3tTdLjlKuQs93AF+R77QKRcxzbi7tYjDCXLqaFxw0+HFXeoJissUhcI5e3zE ih+1IYL09UOX0DynnS8tVTlNOOJvwVysh/Wi5DWyYHKBH0jzSPHoAqMs9EFJqQAKMnC7 S4ctBcDR35l6jKL3l3zpUVlpmoj9PMTP4Jr0lp7Mrev3WdYUEVfe1+gKEHYqeaLMTPKS o/3djoYvhQaO1unRCRJH9Y8yIxFWqujB31Q67LE4Jr4RbW8fwNu7Xst/Kv18kpKK+3ef G+u5tPifbME5Qr4aDU84x3QmCY3eLk2IJ4rNhTh9JULLXYS3W/cbPMWB/C2Vu4KE18Zd 1JeQ== X-Gm-Message-State: AGRZ1gJeBDnztPrhguqN39e2PUoeDVctYAQLBQ0y7BXJc6Ny3O9xdeXf +44dLlX18rS6YB5vg/OC73qs7UjL72nqtpno5A7A/YP2TCLKb1N0DKh3aMuSSLycrGdzwJaiEZA sZiIWqCyD0HdIOo/Cxa9yZ6hmgzmCS3oi1S4G831Ajyrf+hCFCmBSgJV8hhvGpb5yNA== X-Received: by 2002:a17:902:bc8c:: with SMTP id bb12-v6mr3553435plb.275.1542088244359; Mon, 12 Nov 2018 21:50:44 -0800 (PST) X-Google-Smtp-Source: AJdET5ftPL3zzStgok8zIceaRMDY8zAjowYI8D7lieMBNUPgARJOdIL8MbKxslxzidJu+89qI0pB X-Received: by 2002:a17:902:bc8c:: with SMTP id bb12-v6mr3553369plb.275.1542088242447; Mon, 12 Nov 2018 21:50:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542088242; cv=none; d=google.com; s=arc-20160816; b=UuO392N/VpZ0OgG2F8svSWNZx4p+p27ppMZOVvDrLc9Q0yZs6fdNKP6E6Bhb3ZHHNJ N5pzEO1TNvp5a3YeRXUnJM0XTUFvYYm81YMHwbMVVI1uUs+6qZS2x0F3lCJc53se3E8l J+xG0PohO7hHWyf3hJPQME9I/xw135zbzZZaRmiMwLtLQXxMfDfCVIdq2Jf5F2vluf4R km+lw0kbdFdCXwuZE3w66+Z643sfhSTglXDOnnXIRZUlaKWkdaTVOnlANDVdEC/qXgL7 j4+c+8wwPEg1oNIGddf2Qp0mbRYj4Evu54fwa1GE2kX3VAYiqHq50i3ihNmg6hk8fyXL W/dQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=icZ/MKoXsDC/7KmA0/KbugdO6zJP04VUVYyQP1E6eww=; b=ilZu12Hm/CuqOJH2DxVRfjH7a+91MNd4TvtImwpX3cvq7HDGa6bOtxaG3E21+9ZGSt qKF1BNRweYtIy7jWzkETDca+XkLmTivllXVFI7WLRkuKqr/ey6ZCWwtOsC++RqtTv2rZ QLgUO4OSLoflvJPhKCaUeUE8eDoyR3gnNiIY/J6BPfAxqCpEejvTraK6BV8OKF5eOZXI 4MPMCYfLKQGFwoY2zHavgQFVCFsp9d5M7LYwKByTtzqn9zUmZN+vsSK4mHjANir+TnDY ouOMPDINNasH1QhTdwMWGepV9qV45U4oHOK3dEfDzawz6Jgk4WSLT5XZbS7eWJbNo6YT T4wA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="xsZ4/kt0"; spf=pass (google.com: domain of sashal@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=sashal@kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id i11si12986652pgd.74.2018.11.12.21.50.42 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 12 Nov 2018 21:50:42 -0800 (PST) Received-SPF: pass (google.com: domain of sashal@kernel.org designates 198.145.29.99 as permitted sender) client-ip=198.145.29.99; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="xsZ4/kt0"; spf=pass (google.com: domain of sashal@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=sashal@kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from sasha-vm.mshome.net (unknown [64.114.255.114]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 0886922513; Tue, 13 Nov 2018 05:50:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1542088242; bh=1ETzTESbvEClI5d80nlOgMP1/waz9WQGINDIDavNRic=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=xsZ4/kt0Ozao8cBCgLt5SiUt+TQFgVdgQQwntKj0ffO55+YJhZQzUtrYtvjJ6DO7s 9e0dA4LQyo+hj5izxlt28oNASCJFoSwiLUfG1RYUGeYplNfaUWa2FcaCwXTJbKTGkJ rNR4xFLWm//qkDrM2ARMIKFd+fSiJAhkJAbQXmNk= From: Sasha Levin To: stable@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Andrea Arcangeli , Jerome Glisse , Andrew Morton , Linus Torvalds , Sasha Levin , linux-mm@kvack.org Subject: [PATCH AUTOSEL 4.19 38/44] mm: thp: fix mmu_notifier in migrate_misplaced_transhuge_page() Date: Tue, 13 Nov 2018 00:49:44 -0500 Message-Id: <20181113054950.77898-38-sashal@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181113054950.77898-1-sashal@kernel.org> References: <20181113054950.77898-1-sashal@kernel.org> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Andrea Arcangeli [ Upstream commit 7066f0f933a1fd707bb38781866657769cff7efc ] change_huge_pmd() after arming the numa/protnone pmd doesn't flush the TLB right away. do_huge_pmd_numa_page() flushes the TLB before calling migrate_misplaced_transhuge_page(). By the time do_huge_pmd_numa_page() runs some CPU could still access the page through the TLB. change_huge_pmd() before arming the numa/protnone transhuge pmd calls mmu_notifier_invalidate_range_start(). So there's no need of mmu_notifier_invalidate_range_start()/mmu_notifier_invalidate_range_only_end() sequence in migrate_misplaced_transhuge_page() too, because by the time migrate_misplaced_transhuge_page() runs, the pmd mapping has already been invalidated in the secondary MMUs. It has to or if a secondary MMU can still write to the page, the migrate_page_copy() would lose data. However an explicit mmu_notifier_invalidate_range() is needed before migrate_misplaced_transhuge_page() starts copying the data of the transhuge page or the below can happen for MMU notifier users sharing the primary MMU pagetables and only implementing ->invalidate_range: CPU0 CPU1 GPU sharing linux pagetables using only ->invalidate_range ----------- ------------ --------- GPU secondary MMU writes to the page mapped by the transhuge pmd change_pmd_range() mmu..._range_start() ->invalidate_range_start() noop change_huge_pmd() set_pmd_at(numa/protnone) pmd_unlock() do_huge_pmd_numa_page() CPU TLB flush globally (1) CPU cannot write to page migrate_misplaced_transhuge_page() GPU writes to the page... migrate_page_copy() ...GPU stops writing to the page CPU TLB flush (2) mmu..._range_end() (3) ->invalidate_range_stop() noop ->invalidate_range() GPU secondary MMU is invalidated and cannot write to the page anymore (too late) Just like we need a CPU TLB flush (1) because the TLB flush (2) arrives too late, we also need a mmu_notifier_invalidate_range() before calling migrate_misplaced_transhuge_page(), because the ->invalidate_range() in (3) also arrives too late. This requirement is the result of the lazy optimization in change_huge_pmd() that releases the pmd_lock without first flushing the TLB and without first calling mmu_notifier_invalidate_range(). Even converting the removed mmu_notifier_invalidate_range_only_end() into a mmu_notifier_invalidate_range_end() would not have been enough to fix this, because it run after migrate_page_copy(). After the hugepage data copy is done migrate_misplaced_transhuge_page() can proceed and call set_pmd_at without having to flush the TLB nor any secondary MMUs because the secondary MMU invalidate, just like the CPU TLB flush, has to happen before the migrate_page_copy() is called or it would be a bug in the first place (and it was for drivers using ->invalidate_range()). KVM is unaffected because it doesn't implement ->invalidate_range(). The standard PAGE_SIZEd migrate_misplaced_page is less accelerated and uses the generic migrate_pages which transitions the pte from numa/protnone to a migration entry in try_to_unmap_one() and flushes TLBs and all mmu notifiers there before copying the page. Link: http://lkml.kernel.org/r/20181013002430.698-3-aarcange@redhat.com Signed-off-by: Andrea Arcangeli Acked-by: Mel Gorman Acked-by: Kirill A. Shutemov Reviewed-by: Aaron Tomlin Cc: Jerome Glisse Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- mm/huge_memory.c | 14 +++++++++++++- mm/migrate.c | 19 ++++++------------- 2 files changed, 19 insertions(+), 14 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index deed97fba979..a71a5172104c 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1562,8 +1562,20 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd) * We are not sure a pending tlb flush here is for a huge page * mapping or not. Hence use the tlb range variant */ - if (mm_tlb_flush_pending(vma->vm_mm)) + if (mm_tlb_flush_pending(vma->vm_mm)) { flush_tlb_range(vma, haddr, haddr + HPAGE_PMD_SIZE); + /* + * change_huge_pmd() released the pmd lock before + * invalidating the secondary MMUs sharing the primary + * MMU pagetables (with ->invalidate_range()). The + * mmu_notifier_invalidate_range_end() (which + * internally calls ->invalidate_range()) in + * change_pmd_range() will run after us, so we can't + * rely on it here and we need an explicit invalidate. + */ + mmu_notifier_invalidate_range(vma->vm_mm, haddr, + haddr + HPAGE_PMD_SIZE); + } /* * Migrate the THP to the requested node, returns with page unlocked diff --git a/mm/migrate.c b/mm/migrate.c index 1f634b1563b6..1637a32f3dd7 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1973,8 +1973,8 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, int isolated = 0; struct page *new_page = NULL; int page_lru = page_is_file_cache(page); - unsigned long mmun_start = address & HPAGE_PMD_MASK; - unsigned long mmun_end = mmun_start + HPAGE_PMD_SIZE; + unsigned long start = address & HPAGE_PMD_MASK; + unsigned long end = start + HPAGE_PMD_SIZE; new_page = alloc_pages_node(node, (GFP_TRANSHUGE_LIGHT | __GFP_THISNODE), @@ -2001,11 +2001,9 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, WARN_ON(PageLRU(new_page)); /* Recheck the target PMD */ - mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end); ptl = pmd_lock(mm, pmd); if (unlikely(!pmd_same(*pmd, entry) || !page_ref_freeze(page, 2))) { spin_unlock(ptl); - mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end); /* Reverse changes made by migrate_page_copy() */ if (TestClearPageActive(new_page)) @@ -2036,8 +2034,8 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, * new page and page_add_new_anon_rmap guarantee the copy is * visible before the pagetable update. */ - flush_cache_range(vma, mmun_start, mmun_end); - page_add_anon_rmap(new_page, vma, mmun_start, true); + flush_cache_range(vma, start, end); + page_add_anon_rmap(new_page, vma, start, true); /* * At this point the pmd is numa/protnone (i.e. non present) and the TLB * has already been flushed globally. So no TLB can be currently @@ -2049,7 +2047,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, * MADV_DONTNEED won't wait on the pmd lock and it'll skip clearing this * pmd. */ - set_pmd_at(mm, mmun_start, pmd, entry); + set_pmd_at(mm, start, pmd, entry); update_mmu_cache_pmd(vma, address, &entry); page_ref_unfreeze(page, 2); @@ -2058,11 +2056,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, set_page_owner_migrate_reason(new_page, MR_NUMA_MISPLACED); spin_unlock(ptl); - /* - * No need to double call mmu_notifier->invalidate_range() callback as - * the above pmdp_huge_clear_flush_notify() did already call it. - */ - mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end); /* Take an "isolate" reference and put new page on the LRU. */ get_page(new_page); @@ -2086,7 +2079,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, ptl = pmd_lock(mm, pmd); if (pmd_same(*pmd, entry)) { entry = pmd_modify(entry, vma->vm_page_prot); - set_pmd_at(mm, mmun_start, pmd, entry); + set_pmd_at(mm, start, pmd, entry); update_mmu_cache_pmd(vma, address, &entry); } spin_unlock(ptl); From patchwork Tue Nov 13 05:49:45 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Sasha Levin X-Patchwork-Id: 10679569 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 41A7C13BB for ; Tue, 13 Nov 2018 05:50:50 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2928B2A261 for ; Tue, 13 Nov 2018 05:50:50 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 275A42A2A0; Tue, 13 Nov 2018 05:50:50 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 682B82A261 for ; Tue, 13 Nov 2018 05:50:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AB1426B026A; Tue, 13 Nov 2018 00:50:46 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id A344E6B026B; Tue, 13 Nov 2018 00:50:46 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8D6A56B026C; Tue, 13 Nov 2018 00:50:46 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f198.google.com (mail-pl1-f198.google.com [209.85.214.198]) by kanga.kvack.org (Postfix) with ESMTP id 4428F6B026A for ; Tue, 13 Nov 2018 00:50:46 -0500 (EST) Received: by mail-pl1-f198.google.com with SMTP id b8-v6so8725266pls.11 for ; Mon, 12 Nov 2018 21:50:46 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=+iQVxCnKMsG61tjpwUKcD3voOfFRYpNzkDmKgeptgDY=; b=FaiOtWHHmyW3EVTUOSc6M+qSejy4E8vbPso8nHdghVmhMiKBDpXVZ1n8okHkjg7rMt lJu7B4rPmnzMbFJwgd00NbPXpyRaansxuRhx8hP0cDEyN7lUQhombgGzWQap56GQLFSt V2tB2FB72RnJH8433r5arRGNO69qnB1DeDhA3gbm2NQzNyw4r/QctFBAv00YFNWbB6CW REcW5zqey4gae9KwWW979w01pRFuczOArcdw1SUaUg57jzSjqmo8eg+iIQ40ZDfFeGO+ +ERVXRnXFj1Dd4MRKVVd57ijO9eT5447LPQExPrOQAXCUT/MAAIPEW4EeccX0JmPH7z3 UuNA== X-Gm-Message-State: AGRZ1gLPgzPUdIAr1W5oi8Z8kk9a3Md6ZvXl1kbDNYO9waa2teOdrgtP mE42InaHEEUw3Lk+j8Ew+aCgQdwakj3mxdWBheHpJy0ROnNeuoVeechIEBCmW7L9Zy+YLZlAqri 0nTHAw73kz6TpCFgiMm/uPcIsyykUpHxTvprddsDHRA+oBfl3geOKqP7R4/4vl302DQ== X-Received: by 2002:a63:cf08:: with SMTP id j8mr3593761pgg.113.1542088245920; Mon, 12 Nov 2018 21:50:45 -0800 (PST) X-Google-Smtp-Source: AJdET5cZNZYF9IK6wmI7aADMGoqas/u3fVUH67M+LUhlK9umEMeMkt7wZxohia8YNQPp+od+HvYb X-Received: by 2002:a63:cf08:: with SMTP id j8mr3593723pgg.113.1542088244644; Mon, 12 Nov 2018 21:50:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542088244; cv=none; d=google.com; s=arc-20160816; b=qEhCZJ+hbGF4KTObm/xIFsepa8ztv1cPP4TL+aymp6p0nJJuodN6hIBux+J//L05n9 lV0HPV7qgFWewHm/guIces3ZmIav2NYh05NkvTh0oi+HIP5UxU1YYpLW3qnbAn70gqFF /spSCq+8tQ8ZDTMhnYH8vNlrRt7x5EhDQyw6bbLxkSUVN1POv5YHuYQYLYeWtn2Mpp4L 9rUJX+UWTqHFDDAgsYIIi7QfDjGpUjXqJCeKIQQMPU8CYYymKiahEdxftsYp2nOURF3x AomQ5jHmYawijhcE/qXcrKjbzBw5EYKP7vZHjaMz3bgVLl6Z6uXqfroOnB/qtdezi8uk 0EQw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:dkim-signature; bh=+iQVxCnKMsG61tjpwUKcD3voOfFRYpNzkDmKgeptgDY=; b=FgBlUK40tu7Fj11GmGgY0DF9KanCd6SVQzH3CF6/ZcTnyw1wyPjjxy8XQLvHwySdXs SfudnMB9sD1VWRmihQSR698oFPQGchfon4p7ZIhL6tYbzW2eUB31Ff6MWlf+rpQpIrsp bG/uvFaZid2HIE4YQiVfXcMpbgQsuTZ42jMmsPLuyvtLFOYqZ84Y5AmuW6j7RO1wqlX6 3nV2JHmW/ykH88GXWhCBQYtf7sn1x5SXiqI4mAOottfDWk7/q0+FlAp+bF+yaXhgnG+1 6rh18ODkrrkMDAM9oYNbqneldT1XqpQ+WQrauGlx6gPbbnV+I1wo2rC0JOh95MKd4pIc licg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=WUyNx8vj; spf=pass (google.com: domain of sashal@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=sashal@kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id l19si19711091pgm.432.2018.11.12.21.50.44 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 12 Nov 2018 21:50:44 -0800 (PST) Received-SPF: pass (google.com: domain of sashal@kernel.org designates 198.145.29.99 as permitted sender) client-ip=198.145.29.99; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=WUyNx8vj; spf=pass (google.com: domain of sashal@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=sashal@kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from sasha-vm.mshome.net (unknown [64.114.255.114]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 093CD22507; Tue, 13 Nov 2018 05:50:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1542088244; bh=4qzpYStjZKaOMA9LijmwsvByGEcejmbmKkqFcrbtU3Y=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=WUyNx8vj7nv/AAUsPhl9SL1+sB2jvOEA4Oiy3f4IUzh1PTPJjjOOai4ldw1Y6WJJ8 Nn507WsuSmAQZrJ0Cj2ub/YlO1+ZlTlhHxe6yMLfj3skU55SnQaOdJZFDwEzFMtgTl BAeKPCQZIvJ9pIYj9O1wQNuDs9CfMbInK/AYbvuI= From: Sasha Levin To: stable@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Pavel Tatashin , Abdul Haleem , Baoquan He , Daniel Jordan , Dan Williams , Dave Hansen , David Rientjes , Greg Kroah-Hartman , Ingo Molnar , Jan Kara , =?utf-8?b?SsOpcsO0?= =?utf-8?b?bWUgR2xpc3Nl?= , "Kirill A . Shutemov" , Michael Ellerman , Michal Hocko , Souptick Joarder , Steven Sistare , Vlastimil Babka , Wei Yang , Pasha Tatashin , Andrew Morton , Linus Torvalds , Sasha Levin , linux-mm@kvack.org Subject: [PATCH AUTOSEL 4.19 39/44] mm: calculate deferred pages after skipping mirrored memory Date: Tue, 13 Nov 2018 00:49:45 -0500 Message-Id: <20181113054950.77898-39-sashal@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181113054950.77898-1-sashal@kernel.org> References: <20181113054950.77898-1-sashal@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Pavel Tatashin [ Upstream commit d3035be4ce2345d98633a45f93a74e526e94b802 ] update_defer_init() should be called only when struct page is about to be initialized. Because it counts number of initialized struct pages, but there we may skip struct pages if there is some mirrored memory. So move, update_defer_init() after checking for mirrored memory. Also, rename update_defer_init() to defer_init() and reverse the return boolean to emphasize that this is a boolean function, that tells that the reset of memmap initialization should be deferred. Make this function self-contained: do not pass number of already initialized pages in this zone by using static counters. I found this bug by reading the code. The effect is that fewer than expected struct pages are initialized early in boot, and it is possible that in some corner cases we may fail to boot when mirrored pages are used. The deferred on demand code should somewhat mitigate this. But this still brings some inconsistencies compared to when booting without mirrored pages, so it is better to fix. [pasha.tatashin@oracle.com: add comment about defer_init's lack of locking] Link: http://lkml.kernel.org/r/20180726193509.3326-3-pasha.tatashin@oracle.com [akpm@linux-foundation.org: make defer_init non-inline, __meminit] Link: http://lkml.kernel.org/r/20180724235520.10200-3-pasha.tatashin@oracle.com Signed-off-by: Pavel Tatashin Reviewed-by: Oscar Salvador Cc: Abdul Haleem Cc: Baoquan He Cc: Daniel Jordan Cc: Dan Williams Cc: Dave Hansen Cc: David Rientjes Cc: Greg Kroah-Hartman Cc: Ingo Molnar Cc: Jan Kara Cc: Jérôme Glisse Cc: Kirill A. Shutemov Cc: Michael Ellerman Cc: Michal Hocko Cc: Souptick Joarder Cc: Steven Sistare Cc: Vlastimil Babka Cc: Wei Yang Cc: Pasha Tatashin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- mm/page_alloc.c | 45 +++++++++++++++++++++++++-------------------- 1 file changed, 25 insertions(+), 20 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index e2ef1c17942f..63f990b73750 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -306,24 +306,33 @@ static inline bool __meminit early_page_uninitialised(unsigned long pfn) } /* - * Returns false when the remaining initialisation should be deferred until + * Returns true when the remaining initialisation should be deferred until * later in the boot cycle when it can be parallelised. */ -static inline bool update_defer_init(pg_data_t *pgdat, - unsigned long pfn, unsigned long zone_end, - unsigned long *nr_initialised) +static bool __meminit +defer_init(int nid, unsigned long pfn, unsigned long end_pfn) { + static unsigned long prev_end_pfn, nr_initialised; + + /* + * prev_end_pfn static that contains the end of previous zone + * No need to protect because called very early in boot before smp_init. + */ + if (prev_end_pfn != end_pfn) { + prev_end_pfn = end_pfn; + nr_initialised = 0; + } + /* Always populate low zones for address-constrained allocations */ - if (zone_end < pgdat_end_pfn(pgdat)) - return true; - (*nr_initialised)++; - if ((*nr_initialised > pgdat->static_init_pgcnt) && - (pfn & (PAGES_PER_SECTION - 1)) == 0) { - pgdat->first_deferred_pfn = pfn; + if (end_pfn < pgdat_end_pfn(NODE_DATA(nid))) return false; + nr_initialised++; + if ((nr_initialised > NODE_DATA(nid)->static_init_pgcnt) && + (pfn & (PAGES_PER_SECTION - 1)) == 0) { + NODE_DATA(nid)->first_deferred_pfn = pfn; + return true; } - - return true; + return false; } #else static inline bool early_page_uninitialised(unsigned long pfn) @@ -331,11 +340,9 @@ static inline bool early_page_uninitialised(unsigned long pfn) return false; } -static inline bool update_defer_init(pg_data_t *pgdat, - unsigned long pfn, unsigned long zone_end, - unsigned long *nr_initialised) +static inline bool defer_init(int nid, unsigned long pfn, unsigned long end_pfn) { - return true; + return false; } #endif @@ -5459,9 +5466,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, struct vmem_altmap *altmap) { unsigned long end_pfn = start_pfn + size; - pg_data_t *pgdat = NODE_DATA(nid); unsigned long pfn; - unsigned long nr_initialised = 0; struct page *page; #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP struct memblock_region *r = NULL, *tmp; @@ -5489,8 +5494,6 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, continue; if (!early_pfn_in_nid(pfn, nid)) continue; - if (!update_defer_init(pgdat, pfn, end_pfn, &nr_initialised)) - break; #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP /* @@ -5513,6 +5516,8 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, } } #endif + if (defer_init(nid, pfn, end_pfn)) + break; not_early: page = pfn_to_page(pfn); From patchwork Tue Nov 13 05:49:46 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sasha Levin X-Patchwork-Id: 10679575 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B05D513BB for ; Tue, 13 Nov 2018 05:50:53 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A08E62A267 for ; Tue, 13 Nov 2018 05:50:53 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9E9D92A278; Tue, 13 Nov 2018 05:50:53 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D8CF62A2C4 for ; Tue, 13 Nov 2018 05:50:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 619266B026C; Tue, 13 Nov 2018 00:50:48 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 57A2C6B026D; Tue, 13 Nov 2018 00:50:48 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 442FE6B026E; Tue, 13 Nov 2018 00:50:48 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) by kanga.kvack.org (Postfix) with ESMTP id 01CAB6B026C for ; Tue, 13 Nov 2018 00:50:48 -0500 (EST) Received: by mail-pg1-f197.google.com with SMTP id h10so4954102pgv.20 for ; Mon, 12 Nov 2018 21:50:47 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=pVOyTXgERRQ5UTOrDAafcKpVSUaHeepA5Cs1hAhU/xI=; b=T7tAsuUj02UHmQpUzx5sJNwvvGwxPaa/8jUIVg2Jk1Obak/VBLIelGLh9vEB+tWxTm 1VAWepU1IV/SEqZWPyL5VJ555xI37lGEntY1M0zAicWlhJaJQoCg6m7SZuxJLJbA7dLt UEQqkFrpGuXUM6E4yqX8m8JECbgggySrE5MIRuIreAVr+CuTVxkwmGmOvtvfiG9desX/ LJeN2vJNiX1Rf/G5vfUCIb0fxz8OpyFDETe/Wl5bSY9vIBJ5II9WHmoci0ZegZNw4Qb/ sbSCsspgJo1X1JIsqOW4ovOAEvycmyRli77iVSIwerkpZOgWWTYKwzKeFEvPm8oAXxsj i7Vw== X-Gm-Message-State: AGRZ1gJnjhvCAc8H5Z9TnI4hqaxYIcgkEKvjW8VwNe/s5YjgjYud6AKZ SLMhTPvUABdXGtrv9AHjoiucMZglVcgxwdGwncSOOyYyftVLVE0O2sJtbE6jBpGsSs9KJsgzEQ9 SIrubaslWryr7Uh+9epMXgAubbHejMiEe5i84C7npf7sJtBILf2//MfZ1KQmvLZ+pGQ== X-Received: by 2002:a63:d441:: with SMTP id i1-v6mr3490321pgj.31.1542088247645; Mon, 12 Nov 2018 21:50:47 -0800 (PST) X-Google-Smtp-Source: AJdET5eCRSy42BQQpf5cty4JsKL7GwgDRVvTc5ImEKigYMnZndn7lFSgRuNpLuiLveyhayRt7fxH X-Received: by 2002:a63:d441:: with SMTP id i1-v6mr3490287pgj.31.1542088246352; Mon, 12 Nov 2018 21:50:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542088246; cv=none; d=google.com; s=arc-20160816; b=Y1GJxTGa3bO6G1Tu/gO7U4nH6x3j48ImJjKbDjBs2D+TzwL1m77ysLRptuPqFTymBT 0ZBz1p0NX5rG5+fUrYaFzBgo7ct14d3AwONmCkh9y8r2GupbSnTvkF7rRvlXOHsvJxzY /8wXqO4lcFeDi7x/YQHyu6ER2vGHX0xpeatmV4/O3phq41qNkUcSa06A7H4vBhHo9z1r Zq7majduqql3ZzBI/lNblbXhRxznu6GOvafOvJchlJt7tlrh0DsKtywrbq7zkUc5a5+3 yrepIssxG+OYrPzbRJpDH/CagAlxDMTh/zacnFTomhFwNIavlyhhtpgTo3jWQ7gEkKrb uXEQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=pVOyTXgERRQ5UTOrDAafcKpVSUaHeepA5Cs1hAhU/xI=; b=0PKesql6fl1iRuKKrqbXD9v5yXVu6tC+8NJVF3tyDTJydF/C84OVO73ryLrgjDV8wK p80RUrKvFmHnkOKBppH4qCgzM0vQeBGZ8peDGHwd4SiggQv9UBh9RmID1YIqhK0HmNb2 XyUsnKgP0395EtVDeZ93nrrUIc765e5WiJmhG2rWKPWcK2+ZeOA4A4D8o8o/FmE94tIR 3+VpDB8dSs9WFBP1uTGL5DsSrQemLEKmtC+CE5xWrGDagzKWyjllcWNS7Wb6LVkAR/RT HWrRznUYbcE+yGTBDIaSPZsLgpCfRg58DjegC0WePmiahYoVCJEmb/naVYIbaJ2csXP2 +K3A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="oJ/BqPST"; spf=pass (google.com: domain of sashal@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=sashal@kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id p188-v6si22201860pfp.119.2018.11.12.21.50.46 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 12 Nov 2018 21:50:46 -0800 (PST) Received-SPF: pass (google.com: domain of sashal@kernel.org designates 198.145.29.99 as permitted sender) client-ip=198.145.29.99; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="oJ/BqPST"; spf=pass (google.com: domain of sashal@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=sashal@kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from sasha-vm.mshome.net (unknown [64.114.255.114]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id E07DC22510; Tue, 13 Nov 2018 05:50:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1542088246; bh=yJj7yxwCd9gAJEYNBBa6SsMFc1+V44MjmokWsyOmifw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=oJ/BqPSTiVKImYZKS8eXSi0Hrkn0lKHFA+WpCLOYwSLgXFbUCiGsA5+AqWwBqZHlo 8U+BKfZXko9Enx/fLikn1jQ/7RfuJb+/VoXrULkjnL4L9xyZ6H5duG7s9847hhXfYJ BieMUCDK4wBDhyADD31Wl/2mRFfSFjksHyKnVWV4= From: Sasha Levin To: stable@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Roman Gushchin , Vladimir Davydov , Andrew Morton , Linus Torvalds , Sasha Levin , linux-doc@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH AUTOSEL 4.19 40/44] mm: don't raise MEMCG_OOM event due to failed high-order allocation Date: Tue, 13 Nov 2018 00:49:46 -0500 Message-Id: <20181113054950.77898-40-sashal@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181113054950.77898-1-sashal@kernel.org> References: <20181113054950.77898-1-sashal@kernel.org> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Roman Gushchin [ Upstream commit 7a1adfddaf0d11a39fdcaf6e82a88e9c0586e08b ] It was reported that on some of our machines containers were restarted with OOM symptoms without an obvious reason. Despite there were almost no memory pressure and plenty of page cache, MEMCG_OOM event was raised occasionally, causing the container management software to think, that OOM has happened. However, no tasks have been killed. The following investigation showed that the problem is caused by a failing attempt to charge a high-order page. In such case, the OOM killer is never invoked. As shown below, it can happen under conditions, which are very far from a real OOM: e.g. there is plenty of clean page cache and no memory pressure. There is no sense in raising an OOM event in this case, as it might confuse a user and lead to wrong and excessive actions (e.g. restart the workload, as in my case). Let's look at the charging path in try_charge(). If the memory usage is about memory.max, which is absolutely natural for most memory cgroups, we try to reclaim some pages. Even if we were able to reclaim enough memory for the allocation, the following check can fail due to a race with another concurrent allocation: if (mem_cgroup_margin(mem_over_limit) >= nr_pages) goto retry; For regular pages the following condition will save us from triggering the OOM: if (nr_reclaimed && nr_pages <= (1 << PAGE_ALLOC_COSTLY_ORDER)) goto retry; But for high-order allocation this condition will intentionally fail. The reason behind is that we'll likely fall to regular pages anyway, so it's ok and even preferred to return ENOMEM. In this case the idea of raising MEMCG_OOM looks dubious. Fix this by moving MEMCG_OOM raising to mem_cgroup_oom() after allocation order check, so that the event won't be raised for high order allocations. This change doesn't affect regular pages allocation and charging. Link: http://lkml.kernel.org/r/20181004214050.7417-1-guro@fb.com Signed-off-by: Roman Gushchin Acked-by: David Rientjes Acked-by: Michal Hocko Acked-by: Johannes Weiner Cc: Vladimir Davydov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- Documentation/admin-guide/cgroup-v2.rst | 4 ++++ mm/memcontrol.c | 4 ++-- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 184193bcb262..5d9939388a78 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1127,6 +1127,10 @@ PAGE_SIZE multiple when read back. disk readahead. For now OOM in memory cgroup kills tasks iff shortage has happened inside page fault. + This event is not raised if the OOM killer is not + considered as an option, e.g. for failed high-order + allocations. + oom_kill The number of processes belonging to this cgroup killed by any kind of OOM killer. diff --git a/mm/memcontrol.c b/mm/memcontrol.c index e79cb59552d9..07c7af6f5e59 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1669,6 +1669,8 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int if (order > PAGE_ALLOC_COSTLY_ORDER) return OOM_SKIPPED; + memcg_memory_event(memcg, MEMCG_OOM); + /* * We are in the middle of the charge context here, so we * don't want to block when potentially sitting on a callstack @@ -2250,8 +2252,6 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, if (fatal_signal_pending(current)) goto force; - memcg_memory_event(mem_over_limit, MEMCG_OOM); - /* * keep retrying as long as the memcg oom killer is able to make * a forward progress or bypass the charge if the oom killer From patchwork Tue Nov 13 05:49:47 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sasha Levin X-Patchwork-Id: 10679577 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BCF7E13B5 for ; Tue, 13 Nov 2018 05:50:56 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A1D5E2A287 for ; Tue, 13 Nov 2018 05:50:56 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9E1D02A30D; Tue, 13 Nov 2018 05:50:56 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3245E2A2D8 for ; Tue, 13 Nov 2018 05:50:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A83486B026D; Tue, 13 Nov 2018 00:50:49 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 9E50F6B026E; Tue, 13 Nov 2018 00:50:49 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8AE406B026F; Tue, 13 Nov 2018 00:50:49 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id 458846B026D for ; Tue, 13 Nov 2018 00:50:49 -0500 (EST) Received: by mail-pf1-f198.google.com with SMTP id 68so3675520pfr.6 for ; Mon, 12 Nov 2018 21:50:49 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=9GzhlHUs84p2WH8pQpTUt9cdXDvDp1wmIhMi42ebRBk=; b=VEfnLvIGvxusWOe23ZTmh+JbRQUyDcGexdSGOkk7gEHIOJaxuM5m0gsczpVbcR0j6A 7YvvmtWcE/pAwxVHxr0VGfemZb8b8jo0QNFTxpEkDIvfmB07ho37nsoMqFUbavznDLey 7aC7IM+GL00pIGA7AyWHOYjyj6UQmj62xTH/2yAxbR4YK29T2PhusLBcPN8+wMEp6Mwo 05bnRZ1DcEW4hZeuw8/c+b+rRLRVZrZGwqsJVUncIKUl8yEEunDwlnV+7yN0Fh63vqc0 b+h+xgAuronvqfBoDz4JL4L4VoHHogNFtEnvpqIOx3OkPQ0ABZ0Ls6KL2oIQwBT4fIWV YO2A== X-Gm-Message-State: AGRZ1gJyzmB/0ywguCuADbP4jIZshby57jlbEVVwC3p4kMERMstZ0VZQ JxqLiuP0eA5BiPjdtWonZMpWTD+XDWWMU0nhu3ofK8LRtG+fDRHEikZVziss1S0wFkRZxdT0v4t Ui4LF2tFaqHBMLyUbQPaKUSt9B5TZkrw/99lTdckKuG24DUBgpRnPLkLVLzeBhpda7g== X-Received: by 2002:a62:7e93:: with SMTP id z141-v6mr3909446pfc.241.1542088248963; Mon, 12 Nov 2018 21:50:48 -0800 (PST) X-Google-Smtp-Source: AJdET5dceA2UBhzl3PGXl0Cc6KtE5ang9BzAG2AIRUZ57bOnIb/zyeQYisuybG0K9IC+GO/RnJIu X-Received: by 2002:a62:7e93:: with SMTP id z141-v6mr3909398pfc.241.1542088248013; Mon, 12 Nov 2018 21:50:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542088247; cv=none; d=google.com; s=arc-20160816; b=vVCi4e+VuXkGmcR7UXfTZA0YTDwL5BskgI9AXyxpWyolSZRr3ZdX2IBaFvGWpSlOJY jfQQtkTe94DIaDwL67+jA0aXbIgc/gx7igad34TRuXNvoKY6fd/69GU0FdlCbyD8V+bc zpyg1sFYa488T2nGk28JxrRaFvKZoja4+pA48JfImM8rdrCGCxKEj1ymibcI1PEAQKSy d5d4xDKejKFqnOojSVmbmSiIY8o0glBtdsrnWP8jrF4PZaJOeU9oBKz60nz3DslUF1PK TPXXG6DLRufr+CRJI0cGaNN2wdyTz1tAjKG0tZvgVmYWyDaaMsJqE9G3dN4eGFrLWbm7 kLuw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=9GzhlHUs84p2WH8pQpTUt9cdXDvDp1wmIhMi42ebRBk=; b=058dWVnqVrL5RvHKmaOoqZNW/cfp+gM4+UhZymqYIriv+qQNy8cFE62+omJHqcAIBh siPAlz9gkRbRmnjrPkvkPJ2SmHhIYmrahwgLtQ8nrNXsUIeveAVOC7TY50vJSBA9R089 7rqsqXJ4w1qFHzsEVAlwSd7FjK1a+o1bWU8EUtRpjxMWF9zJk7wKKeuUIL5PtOmfvpaE ILPh6lz7fKA1LeWQi+0KMzl+HfX49Ci111CVsMlT5aKebK0/rC1IfQ8GyvCF8JhVMZo9 N+jELuPOeNG4VZUuoP0PbMQekjdPdZRH/v04UiCOPJW0t23tGquOj8ZGobAc/75r7053 eUZQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=2dFCam6N; spf=pass (google.com: domain of sashal@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=sashal@kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id g11-v6si17462816pgs.179.2018.11.12.21.50.47 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 12 Nov 2018 21:50:47 -0800 (PST) Received-SPF: pass (google.com: domain of sashal@kernel.org designates 198.145.29.99 as permitted sender) client-ip=198.145.29.99; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=2dFCam6N; spf=pass (google.com: domain of sashal@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=sashal@kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from sasha-vm.mshome.net (unknown [64.114.255.114]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 8778F22526; Tue, 13 Nov 2018 05:50:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1542088247; bh=EZSKRIU6MFcjpKmoMayFTIflHESg+IBP15cK+xtexns=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=2dFCam6N6Bumd7AZUYMd+/wJk2thYRr3y8m5x0PUlR8IB0pc9X5QfZIEmgOImaBDw waxBPnIs6htBqsH7SbPOXvKVhjNZEkUBMZ19gkccEKuuVIvPKvvGsmwKpt2UXY3wVJ Hi/2e5b1pKu05mUtdEiwcW133JBO5z0yJWDbnZOU= From: Sasha Levin To: stable@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jann Horn , Davidlohr Bueso , Oleg Nesterov , Christoph Lameter , Kemi Wang , Andy Lutomirski , Ingo Molnar , Andrew Morton , Linus Torvalds , Sasha Levin , linux-mm@kvack.org Subject: [PATCH AUTOSEL 4.19 41/44] mm/vmstat.c: assert that vmstat_text is in sync with stat_items_size Date: Tue, 13 Nov 2018 00:49:47 -0500 Message-Id: <20181113054950.77898-41-sashal@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181113054950.77898-1-sashal@kernel.org> References: <20181113054950.77898-1-sashal@kernel.org> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jann Horn [ Upstream commit f0ecf25a093fc0589f0a6bc4c1ea068bbb67d220 ] Having two gigantic arrays that must manually be kept in sync, including ifdefs, isn't exactly robust. To make it easier to catch such issues in the future, add a BUILD_BUG_ON(). Link: http://lkml.kernel.org/r/20181001143138.95119-3-jannh@google.com Signed-off-by: Jann Horn Reviewed-by: Kees Cook Reviewed-by: Andrew Morton Acked-by: Roman Gushchin Acked-by: Michal Hocko Cc: Davidlohr Bueso Cc: Oleg Nesterov Cc: Christoph Lameter Cc: Kemi Wang Cc: Andy Lutomirski Cc: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- mm/vmstat.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/vmstat.c b/mm/vmstat.c index 7878da76abf2..b678c607e490 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1663,6 +1663,8 @@ static void *vmstat_start(struct seq_file *m, loff_t *pos) stat_items_size += sizeof(struct vm_event_state); #endif + BUILD_BUG_ON(stat_items_size != + ARRAY_SIZE(vmstat_text) * sizeof(unsigned long)); v = kmalloc(stat_items_size, GFP_KERNEL); m->private = v; if (!v) From patchwork Tue Nov 13 05:49:48 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sasha Levin X-Patchwork-Id: 10679579 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 820E013BB for ; Tue, 13 Nov 2018 05:51:01 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 728222A2E8 for ; Tue, 13 Nov 2018 05:51:01 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 70BEA2A2A0; Tue, 13 Nov 2018 05:51:01 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BAB942A2FB for ; Tue, 13 Nov 2018 05:51:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E7BAD6B026F; Tue, 13 Nov 2018 00:50:51 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E2A476B0270; Tue, 13 Nov 2018 00:50:51 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CF2526B0271; Tue, 13 Nov 2018 00:50:51 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f199.google.com (mail-pl1-f199.google.com [209.85.214.199]) by kanga.kvack.org (Postfix) with ESMTP id 8C36B6B026F for ; Tue, 13 Nov 2018 00:50:51 -0500 (EST) Received: by mail-pl1-f199.google.com with SMTP id w10-v6so8775290plz.0 for ; Mon, 12 Nov 2018 21:50:51 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=ZWp8uK7yjRuR8al3kdeCZcnSdJYUoYnxdWjJPdV27zE=; b=ZUPlLkyAObyepx5ICpfY6GLorTPBK4mQAMxPlmlbcg9m2YQxg4FTp2j7+AInrtSGUD La3iFpxjwlAgIoFO+jyMe6rDUV99vCkaxf7zoHyXSRQnSk0Ie6HYY16g5TLJbU3/M3YW iR6oNh+YYd0DuSaROSH5/qKt5MfxZayodw/xmBctejost6XmUv0Jv4MvOxItkjZYaugA nUV6z4TMaJd70MT2utCLVFiWf2/JchcblPFS+8+WfetENPf67rjPLvVyNbb0dc6Arle1 LoEFQDuoLSvwrAnozURCy0ZavU49BgGFNzOToOM0kmusImz5tJRaJYhinT95i3CfxlzP vT8g== X-Gm-Message-State: AGRZ1gK1mOz2eec6Dn5gSAQEMjADGxrlx0kPZT+EPte2CT+gFHRrAQXK 2OS2HlzvT8xgIErtbq7ya81q8SHiEJB31O/Hm7TV7ZuJBCNlYUdjCrQvbINiXz2XeuztE+aDW5z uHYlMeS68406TdmeEN17fYP41f/a8GNm547Ttl7dWE/PqsE5VCEcoVpgc8P4QVZY9aA== X-Received: by 2002:a63:a84a:: with SMTP id i10mr3553674pgp.263.1542088251172; Mon, 12 Nov 2018 21:50:51 -0800 (PST) X-Google-Smtp-Source: AJdET5e730VX+V1zMcSrQdJd+zlSg64SpeAUEaHZE24esvmesofCkTU9DgqrweZFkm6hPuoYpntd X-Received: by 2002:a63:a84a:: with SMTP id i10mr3553624pgp.263.1542088249679; Mon, 12 Nov 2018 21:50:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542088249; cv=none; d=google.com; s=arc-20160816; b=krDO974nMiI8zcv1rDH3vTrlvXu1REaWUrHU81iWxel/iDbPz29QSpE9l1yJKCxgy+ Dibl6SX/LjtlWCX9n/CAVZR/JefVXMJe5g68pZFLe62D+1F9rEwqUjzWkAug9WSwnFCc /uVbMVGkbN9+EN42CJWdLA5HsXSvldzmB7FWcPoBN7mrMeXQBT4s0ahkPZMm5zHIy+re iZu0zmJfIX1LxC5g8Bt1Jq2wpwNsY2Wl6eo5d5DWLVc7Ryr5bWRTmKBSzaoTriQ7UvIK N+z/LfkAFgPGixUdvoblch5PXj3L2SCqWjboZGwdfiSr31Ih4IIaZXQoYyhW9fo1GER6 o0Aw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ZWp8uK7yjRuR8al3kdeCZcnSdJYUoYnxdWjJPdV27zE=; b=nEV/GIbp22jsBGmeGfukNwg9M+tQNgdAI6UQAq1N2oV2YC21o8p5awhVlnqXzlcEoL AcdwNxF6gFXSrpXRIAOMjTuxLESP0nnpeiwwMCZkUhUN0UQKCbV3DaP5ryDJ+XKFz0Oc NeXgUgNIP5ruKE1U6f1iilIVB54KIToZd7e3ath28hSmhiuI1Uf7S9nttaoXgNhQmW7z 9oEdWp6ed8h0mnQ937yz1tNLd08rGLkzW+muv4+uO4dMWefE1dZMgPYMsapTg0A7OnMx QWCQRptQ6iW+wEunRYNI/u0rvrQKoXMTNPgnUmKHPo7opi9z+zXyjtekiuSI+tgjDGN0 1aTA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="M1Tt/yXv"; spf=pass (google.com: domain of sashal@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=sashal@kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id b13-v6si21456794pfc.156.2018.11.12.21.50.49 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 12 Nov 2018 21:50:49 -0800 (PST) Received-SPF: pass (google.com: domain of sashal@kernel.org designates 198.145.29.99 as permitted sender) client-ip=198.145.29.99; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="M1Tt/yXv"; spf=pass (google.com: domain of sashal@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=sashal@kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from sasha-vm.mshome.net (unknown [64.114.255.114]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 4649A2250F; Tue, 13 Nov 2018 05:50:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1542088249; bh=OtzCVW2lIBu8DpZMVoqTqFGbqGnAbsinhy4kB3LIBQs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=M1Tt/yXvjuCHcXfRJF744cGTaduwmLeOH3jppXCCUnmX0dIxIzXf+2/8HT1a8jmF2 9LaOL2QxLdtVKewk8zzU+GhhDVfQsyXPgkVLEAgrm98o0RxEHAAuWrNS7tn2KAY0Xs wSiRW6sc6A+H+Ya8rr/GILcfvuLoM7staC0ihDfo= From: Sasha Levin To: stable@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Andrea Arcangeli , Andrew Morton , Linus Torvalds , Sasha Levin , linux-mm@kvack.org Subject: [PATCH AUTOSEL 4.19 42/44] userfaultfd: allow get_mempolicy(MPOL_F_NODE|MPOL_F_ADDR) to trigger userfaults Date: Tue, 13 Nov 2018 00:49:48 -0500 Message-Id: <20181113054950.77898-42-sashal@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181113054950.77898-1-sashal@kernel.org> References: <20181113054950.77898-1-sashal@kernel.org> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Andrea Arcangeli [ Upstream commit 3b9aadf7278d16d7bed4d5d808501065f70898d8 ] get_mempolicy(MPOL_F_NODE|MPOL_F_ADDR) called a get_user_pages that would not be waiting for userfaults before failing and it would hit on a SIGBUS instead. Using get_user_pages_locked/unlocked instead will allow get_mempolicy to allow userfaults to resolve the fault and fill the hole, before grabbing the node id of the page. If the user calls get_mempolicy() with MPOL_F_ADDR | MPOL_F_NODE for an address inside an area managed by uffd and there is no page at that address, the page allocation from within get_mempolicy() will fail because get_user_pages() does not allow for page fault retry required for uffd; the user will get SIGBUS. With this patch, the page fault will be resolved by the uffd and the get_mempolicy() will continue normally. Background: Via code review, previously the syscall would have returned -EFAULT (vm_fault_to_errno), now it will block and wait for an userfault (if it's waken before the fault is resolved it'll still -EFAULT). This way get_mempolicy will give a chance to an "unaware" app to be compliant with userfaults. The reason this visible change is that becoming "userfault compliant" cannot regress anything: all other syscalls including read(2)/write(2) had to become "userfault compliant" long time ago (that's one of the things userfaultfd can do that PROT_NONE and trapping segfaults can't). So this is just one more syscall that become "userfault compliant" like all other major ones already were. This has been happening on virtio-bridge dpdk process which just called get_mempolicy on the guest space post live migration, but before the memory had a chance to be migrated to destination. I didn't run an strace to be able to show the -EFAULT going away, but I've the confirmation of the below debug aid information (only visible with CONFIG_DEBUG_VM=y) going away with the patch: [20116.371461] FAULT_FLAG_ALLOW_RETRY missing 0 [20116.371464] CPU: 1 PID: 13381 Comm: vhost-events Not tainted 4.17.12-200.fc28.x86_64 #1 [20116.371465] Hardware name: LENOVO 20FAS2BN0A/20FAS2BN0A, BIOS N1CET54W (1.22 ) 02/10/2017 [20116.371466] Call Trace: [20116.371473] dump_stack+0x5c/0x80 [20116.371476] handle_userfault.cold.37+0x1b/0x22 [20116.371479] ? remove_wait_queue+0x20/0x60 [20116.371481] ? poll_freewait+0x45/0xa0 [20116.371483] ? do_sys_poll+0x31c/0x520 [20116.371485] ? radix_tree_lookup_slot+0x1e/0x50 [20116.371488] shmem_getpage_gfp+0xce7/0xe50 [20116.371491] ? page_add_file_rmap+0x1a/0x2c0 [20116.371493] shmem_fault+0x78/0x1e0 [20116.371495] ? filemap_map_pages+0x3a1/0x450 [20116.371498] __do_fault+0x1f/0xc0 [20116.371500] __handle_mm_fault+0xe2e/0x12f0 [20116.371502] handle_mm_fault+0xda/0x200 [20116.371504] __get_user_pages+0x238/0x790 [20116.371506] get_user_pages+0x3e/0x50 [20116.371510] kernel_get_mempolicy+0x40b/0x700 [20116.371512] ? vfs_write+0x170/0x1a0 [20116.371515] __x64_sys_get_mempolicy+0x21/0x30 [20116.371517] do_syscall_64+0x5b/0x160 [20116.371520] entry_SYSCALL_64_after_hwframe+0x44/0xa9 The above harmless debug message (not a kernel crash, just a dump_stack()) is shown with CONFIG_DEBUG_VM=y to more quickly identify and improve kernel spots that may have to become "userfaultfd compliant" like this one (without having to run an strace and search for syscall misbehavior). Spots like the above are more closer to a kernel bug for the non-cooperative usages that Mike focuses on, than for for dpdk qemu-cooperative usages that reproduced it, but it's still nicer to get this fixed for dpdk too. The part of the patch that caused me to think is only the implementation issue of mpol_get, but it looks like it should work safe no matter the kind of mempolicy structure that is (the default static policy also starts at 1 so it'll go to 2 and back to 1 without crashing everything at 0). [rppt@linux.vnet.ibm.com: changelog addition] http://lkml.kernel.org/r/20180904073718.GA26916@rapoport-lnx Link: http://lkml.kernel.org/r/20180831214848.23676-1-aarcange@redhat.com Signed-off-by: Andrea Arcangeli Reported-by: Maxime Coquelin Tested-by: Dr. David Alan Gilbert Reviewed-by: Mike Rapoport Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- mm/mempolicy.c | 24 +++++++++++++++++++----- 1 file changed, 19 insertions(+), 5 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index da858f794eb6..2e76a8f65e94 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -797,16 +797,19 @@ static void get_policy_nodemask(struct mempolicy *p, nodemask_t *nodes) } } -static int lookup_node(unsigned long addr) +static int lookup_node(struct mm_struct *mm, unsigned long addr) { struct page *p; int err; - err = get_user_pages(addr & PAGE_MASK, 1, 0, &p, NULL); + int locked = 1; + err = get_user_pages_locked(addr & PAGE_MASK, 1, 0, &p, &locked); if (err >= 0) { err = page_to_nid(p); put_page(p); } + if (locked) + up_read(&mm->mmap_sem); return err; } @@ -817,7 +820,7 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask, int err; struct mm_struct *mm = current->mm; struct vm_area_struct *vma = NULL; - struct mempolicy *pol = current->mempolicy; + struct mempolicy *pol = current->mempolicy, *pol_refcount = NULL; if (flags & ~(unsigned long)(MPOL_F_NODE|MPOL_F_ADDR|MPOL_F_MEMS_ALLOWED)) @@ -857,7 +860,16 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask, if (flags & MPOL_F_NODE) { if (flags & MPOL_F_ADDR) { - err = lookup_node(addr); + /* + * Take a refcount on the mpol, lookup_node() + * wil drop the mmap_sem, so after calling + * lookup_node() only "pol" remains valid, "vma" + * is stale. + */ + pol_refcount = pol; + vma = NULL; + mpol_get(pol); + err = lookup_node(mm, addr); if (err < 0) goto out; *policy = err; @@ -892,7 +904,9 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask, out: mpol_cond_put(pol); if (vma) - up_read(¤t->mm->mmap_sem); + up_read(&mm->mmap_sem); + if (pol_refcount) + mpol_put(pol_refcount); return err; } From patchwork Tue Nov 13 05:49:49 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sasha Levin X-Patchwork-Id: 10679581 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 86A8013BB for ; Tue, 13 Nov 2018 05:51:05 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 775D52A2D9 for ; Tue, 13 Nov 2018 05:51:05 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7576A2A2EA; Tue, 13 Nov 2018 05:51:05 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E88932A2DD for ; Tue, 13 Nov 2018 05:51:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D690A6B0272; Tue, 13 Nov 2018 00:50:56 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D18CB6B0273; Tue, 13 Nov 2018 00:50:56 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BBBCA6B0274; Tue, 13 Nov 2018 00:50:56 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f198.google.com (mail-pg1-f198.google.com [209.85.215.198]) by kanga.kvack.org (Postfix) with ESMTP id 773066B0272 for ; Tue, 13 Nov 2018 00:50:56 -0500 (EST) Received: by mail-pg1-f198.google.com with SMTP id f9so7371808pgs.13 for ; Mon, 12 Nov 2018 21:50:56 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=lYYfv6n2nF3gHC3xyzSlawdDD1qw2lhIl2WFMV3Ad/w=; b=tgmJLymFmgnrmAnCpa8GivlZIXgHzO4bSMPfQBv8TxX2IznnxNWxHPWZ+9pg6D3DSa Ot79KmMtH61OzwODfW3yeUaI0AygSUnvYG4zjHreCsrBOObCnaq2bRdbTpHwCopDs0AU PNatdUP2JtEvEU45DDupChSCohwX6t6aN21KyhvSF3Qt9h6YJym10u4AcD9/Y9U3/Gsh +9HcXjS6HthNT2vyprnVkC6LNV4WsvX+v/PlkF4C/n04P9IMGy0n05CnYvAhC1O83dJj qzib3niQyTQt+h1hBK4kHVloyXxTwJRd8qkWbkIu8H9rndYZFidhMBKjgRcTTcAuZslR zkVA== X-Gm-Message-State: AGRZ1gKh7yO2PDAD8hfzbNpiCBzToDhfrvqScnGDGdmoPEY6UH6cXQpH 01dnhHiiN0zOnlevimyBiYG+cDHvskH9uFvyw5FP0G/JbYxL4rBTbQKYmmvJrtzvJia+fW3bnAz 5M6ICaF5cxIwxt0VcHa3gfzqyUZYRNQBOGASeLDl0MnNeYwMAZZGobPVrq/v3egVDEg== X-Received: by 2002:a63:194f:: with SMTP id 15mr3573367pgz.192.1542088256132; Mon, 12 Nov 2018 21:50:56 -0800 (PST) X-Google-Smtp-Source: AJdET5f+UhGNy7ITUqVKxqsSOdE8HIDre4DlwUeLubwT+logy/EujVhOoqi58brQihwS0S2qaCPI X-Received: by 2002:a63:194f:: with SMTP id 15mr3573278pgz.192.1542088253146; Mon, 12 Nov 2018 21:50:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542088253; cv=none; d=google.com; s=arc-20160816; b=sWssqdkxIoVq9NRtK/UtapN2bcWmK3WcdpN64Y7EEsKcQNLIiJfMbyH4BEwSfAfG8m 8WjP6WqFklz4/1NZ2tiqup1uiH9XQ18ESYinzw2XtnCpJL9YgK3diRIUjLEPOftSr77J vTfVaatsWB9teMvQ/ir/pux8EAx2Xv4yZyz+tBP5VcBW08q0jUxdh3PqX9MsZD/xToIR CcUNmHcbyR8N2HPnU7oEvydcvhFbz4U2XChityPh0muwOW7/65XJn9IFdU6O4PNamskA PbfrT7bOfm9tY7j+A+WMN99IugSzd+jNWjar6uSb7t1ObcULS2a86yXyhcWF9cCdg9bS rq3w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=lYYfv6n2nF3gHC3xyzSlawdDD1qw2lhIl2WFMV3Ad/w=; b=EeU+wc/jVxmA+K4Rn/2ddTM/u/U3JpqtIRGG/UigXblBBswiU/ekH7IvzRIimc8z8o D93bYSoVKuvJk4QPEMJ4QDGXGneVP6EjjfAA86XEDUCQznn31Q7jDL2QLjeJiT3vJ6UH AchnM4s81TpyovR4gMyhM9pz66VMNqa59d0FPuaHhTDVNBA/nATOvRVMllwvAoGzS++f qZ+aEiPz3c6G66WsLFZkbJrJOgQByzRUSLUMQTlsRpUZKfT2QNKKd9w8F+HkzvI6AkSa FHnWEyJ3HI3R2S601mxvmbqb1RKYPzIcKWXf8g6DDoCM2fbBGirSQQgATA6Z3/PHW7gI ugkQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=MPUDu9ip; spf=pass (google.com: domain of sashal@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=sashal@kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id j65si18143866pge.444.2018.11.12.21.50.52 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 12 Nov 2018 21:50:53 -0800 (PST) Received-SPF: pass (google.com: domain of sashal@kernel.org designates 198.145.29.99 as permitted sender) client-ip=198.145.29.99; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=MPUDu9ip; spf=pass (google.com: domain of sashal@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=sashal@kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from sasha-vm.mshome.net (unknown [64.114.255.114]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id A3F8C2250F; Tue, 13 Nov 2018 05:50:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1542088252; bh=goa8PjvyKggqGIrvQvtX0YWGuJPLnQBHobPpvTN8qtA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=MPUDu9ipWUDGcqM5xOooaHx2o/FNYLisuucti5Ej8xEONAW2T3wCZlRbxeUFML1KO kEYAsp7cmR/ERUDAOIKK8pk8ijp8mczN2dEPAsEzRd4sqxr9dNzl2LVXMxmOvUk3po U9pg9Ngho0KU/G7fJk87pHeASu76F79UoujtsCxI= From: Sasha Levin To: stable@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Roman Gushchin , Johannes Weiner , Michal Hocko , Tejun Heo , Rik van Riel , Konstantin Khlebnikov , Matthew Wilcox , Andrew Morton , Linus Torvalds , Sasha Levin , linux-mm@kvack.org Subject: [PATCH AUTOSEL 4.19 43/44] mm: don't miss the last page because of round-off error Date: Tue, 13 Nov 2018 00:49:49 -0500 Message-Id: <20181113054950.77898-43-sashal@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181113054950.77898-1-sashal@kernel.org> References: <20181113054950.77898-1-sashal@kernel.org> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Roman Gushchin [ Upstream commit 68600f623d69da428c6163275f97ca126e1a8ec5 ] I've noticed, that dying memory cgroups are often pinned in memory by a single pagecache page. Even under moderate memory pressure they sometimes stayed in such state for a long time. That looked strange. My investigation showed that the problem is caused by applying the LRU pressure balancing math: scan = div64_u64(scan * fraction[lru], denominator), where denominator = fraction[anon] + fraction[file] + 1. Because fraction[lru] is always less than denominator, if the initial scan size is 1, the result is always 0. This means the last page is not scanned and has no chances to be reclaimed. Fix this by rounding up the result of the division. In practice this change significantly improves the speed of dying cgroups reclaim. [guro@fb.com: prevent double calculation of DIV64_U64_ROUND_UP() arguments] Link: http://lkml.kernel.org/r/20180829213311.GA13501@castle Link: http://lkml.kernel.org/r/20180827162621.30187-3-guro@fb.com Signed-off-by: Roman Gushchin Reviewed-by: Andrew Morton Cc: Johannes Weiner Cc: Michal Hocko Cc: Tejun Heo Cc: Rik van Riel Cc: Konstantin Khlebnikov Cc: Matthew Wilcox Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- include/linux/math64.h | 3 +++ mm/vmscan.c | 6 ++++-- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/include/linux/math64.h b/include/linux/math64.h index 837f2f2d1d34..bb2c84afb80c 100644 --- a/include/linux/math64.h +++ b/include/linux/math64.h @@ -281,4 +281,7 @@ static inline u64 mul_u64_u32_div(u64 a, u32 mul, u32 divisor) } #endif /* mul_u64_u32_div */ +#define DIV64_U64_ROUND_UP(ll, d) \ + ({ u64 _tmp = (d); div64_u64((ll) + _tmp - 1, _tmp); }) + #endif /* _LINUX_MATH64_H */ diff --git a/mm/vmscan.c b/mm/vmscan.c index c5ef7240cbcb..961401c46334 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2456,9 +2456,11 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg, /* * Scan types proportional to swappiness and * their relative recent reclaim efficiency. + * Make sure we don't miss the last page + * because of a round-off error. */ - scan = div64_u64(scan * fraction[file], - denominator); + scan = DIV64_U64_ROUND_UP(scan * fraction[file], + denominator); break; case SCAN_FILE: case SCAN_ANON: From patchwork Tue Nov 13 05:49:50 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sasha Levin X-Patchwork-Id: 10679645 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AF893109C for ; Tue, 13 Nov 2018 05:54:56 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9999929FAC for ; Tue, 13 Nov 2018 05:54:56 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8CE192A264; Tue, 13 Nov 2018 05:54:56 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DCF0229FAC for ; Tue, 13 Nov 2018 05:54:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CA4706B029A; Tue, 13 Nov 2018 00:54:54 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C2C246B029C; Tue, 13 Nov 2018 00:54:54 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ACC2C6B029D; Tue, 13 Nov 2018 00:54:54 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by kanga.kvack.org (Postfix) with ESMTP id 67EC56B029A for ; Tue, 13 Nov 2018 00:54:54 -0500 (EST) Received: by mail-pg1-f200.google.com with SMTP id r13so7359652pgb.7 for ; Mon, 12 Nov 2018 21:54:54 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=lJoup+4NDZD4AsOOaATFOKwdwPjtOOPxQJpxr0dlKJU=; b=AnQ7G0BPiUucCmnnDYde8kNDnQDEXMk36mMeQ+2ta+9+74U03b1qmeloLapNkGkWuL k6qpfb/5048STkege9HQ/gth979ulWwbiKZE2/8mbfYxXGxVGBy6roRWkN3pyeh2bVk5 M9EU2o2RYtgdQXX14SJ8/RRt38M2UgeeNInqtLcAQs7N5oC15KKwDsjQnm+E2V8+J72u JkyrxyonTeWH6B0GIcdFrwAi7U4mk/paL3ch4zceEdzXl7rb3LHEN+LgvaLw+IwcWWuO LpQdhCUNB3IX5Q3YNGqHKETlfpIxNVpm2ek1+9ZUbDFFeJwE94eaiKhMKQunqZs4am9S fP3Q== X-Gm-Message-State: AGRZ1gIvGitxtM62paE9hpRCFDPuwBbirg95WUbtJmmUdaUJhDbCGZeF rpnYSNJNrg84AkRioi/mmyXGQ2Q96Z19b60WO/qVaJYXQvbKgGrDYM+jcRKHU1WCJYZE3SmtrqG iSI7226sDqsyXkq+L/B864EhkUQvXqUFblqOOHYo5xToOFdQgYOzZ1EcYCatiA1qE44QQf7KUFt kbf8g04ZfSBhZHXeJnujBimWs2R/Xk5zYLYhnvbsu4aX++a3NeJApJ4HTsfyttCyB8TDaBLcvRr BD+dFKWJjFoyQZH2dZj4Sck+W+wvJa34mshVdH4UhNXDTemqh+/agDyKWCUz5UU47lGWEr/yome 3PCP+4S+cpnATp7cT1hNT+m2/j+SED7+YR/Jr7ss5JjcU7VGShFyXbCmYFJC++NUu9eLM3veiOD 1 X-Received: by 2002:a17:902:a40f:: with SMTP id p15mr1624100plq.286.1542088254745; Mon, 12 Nov 2018 21:50:54 -0800 (PST) X-Received: by 2002:a17:902:a40f:: with SMTP id p15mr1624093plq.286.1542088254625; Mon, 12 Nov 2018 21:50:54 -0800 (PST) X-Google-Smtp-Source: AJdET5cZgO0LBQwH7vE6Mkll4vUjoAFOwk1AHoHn9ZKfpuuB0qWk90FMZIr/tJ8j3B4pJo/TxFNS X-Received: by 2002:a17:902:a40f:: with SMTP id p15mr1624055plq.286.1542088253535; Mon, 12 Nov 2018 21:50:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542088253; cv=none; d=google.com; s=arc-20160816; b=GExjfeAfe17vdLxibXY7cyH3IUJc914GsDbI4ZkpOMaLLxvGy8ZfDfp3WjLZU+8XK6 TpGcXK9mjUFfsrBx4ldiftytMq5ZdgfYd2led2eQHR2jcXgzMS1FjhBJYCZXQJcjM8Y/ rTlfQG8akFRxXs+4YLDyH5dHcpMakwQaRUcysVxY6m1QAFvffbn4m9nLMOdTG5KUt2YQ fPj+e6IWAZsz35cfJmsw36iXl9i0d3Dst3X4F1TPQ75WX22KWq6ut+o0Zsi5AkBDQRDa tUgsHTNNaBV+nrVRtgWGfOvMZicQPtWVd/3zSzxUjkF8RHZqCTFEM0E/cSDxuBqHPlQt C5Tw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=lJoup+4NDZD4AsOOaATFOKwdwPjtOOPxQJpxr0dlKJU=; b=LBYcV/mi2Py7EaBsdeEgcCLcrRakxHvKkNOfg4GdS3on9/smmoDm5DiQMe8mnRasva DfqR8v2CMDsZT4xSiayo+H+K58Zd0j+8eBVKCui/r0l0v7uogqEOK08dGvwzs2MfZ3m8 yHxs2Le2hFIbn0WvP0rIRS3ozIt/OxWIedK1NSdQJRNDBL7tRQntkrjVj676CnT3oYCe ymQXJsao0biWMEjDqq89fodpkXVuxCIXvNddIW9z0B0A1fapY0da0K65+ZCGAB2067Lx b5BQu9vIJnpIjyZfrmNg6EwkxKHF/Q6zEO0L5W+nxIzFIk+9YJpLaMNq0lPY+ztNE8FX IDzw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=Vxojw30r; spf=pass (google.com: domain of sashal@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=sashal@kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id g13si18325824pgk.165.2018.11.12.21.50.53 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 12 Nov 2018 21:50:53 -0800 (PST) Received-SPF: pass (google.com: domain of sashal@kernel.org designates 198.145.29.99 as permitted sender) client-ip=198.145.29.99; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=Vxojw30r; spf=pass (google.com: domain of sashal@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=sashal@kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from sasha-vm.mshome.net (unknown [64.114.255.114]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 1730E2251C; Tue, 13 Nov 2018 05:50:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1542088253; bh=2/QX0btORBDgnTEIH7KoPpLIqiZPGQBd0g8YV15QEcc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Vxojw30rfT2xVucovt21hAQcox63gLY1aoinmLA7ktOju+6jjiwAjQMMXnz6rQAk5 eh/nnY8yN55vbvHGaXiuqbJQnd8YOsJxhoMoki9e64iibKvKyae4/n0bcUdOAIJJn+ j1G6IdZSW8xdXGB6nEGGA+IHIFKcJVxRDYckzuo8= From: Sasha Levin To: stable@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Dmitry Vyukov , Pekka Enberg , David Rientjes , Joonsoo Kim , Andrew Morton , Linus Torvalds , Sasha Levin , linux-mm@kvack.org Subject: [PATCH AUTOSEL 4.19 44/44] mm: don't warn about large allocations for slab Date: Tue, 13 Nov 2018 00:49:50 -0500 Message-Id: <20181113054950.77898-44-sashal@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181113054950.77898-1-sashal@kernel.org> References: <20181113054950.77898-1-sashal@kernel.org> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Dmitry Vyukov [ Upstream commit 61448479a9f2c954cde0cfe778cb6bec5d0a748d ] Slub does not call kmalloc_slab() for sizes > KMALLOC_MAX_CACHE_SIZE, instead it falls back to kmalloc_large(). For slab KMALLOC_MAX_CACHE_SIZE == KMALLOC_MAX_SIZE and it calls kmalloc_slab() for all allocations relying on NULL return value for over-sized allocations. This inconsistency leads to unwanted warnings from kmalloc_slab() for over-sized allocations for slab. Returning NULL for failed allocations is the expected behavior. Make slub and slab code consistent by checking size > KMALLOC_MAX_CACHE_SIZE in slab before calling kmalloc_slab(). While we are here also fix the check in kmalloc_slab(). We should check against KMALLOC_MAX_CACHE_SIZE rather than KMALLOC_MAX_SIZE. It all kinda worked because for slab the constants are the same, and slub always checks the size against KMALLOC_MAX_CACHE_SIZE before kmalloc_slab(). But if we get there with size > KMALLOC_MAX_CACHE_SIZE anyhow bad things will happen. For example, in case of a newly introduced bug in slub code. Also move the check in kmalloc_slab() from function entry to the size > 192 case. This partially compensates for the additional check in slab code and makes slub code a bit faster (at least theoretically). Also drop __GFP_NOWARN in the warning check. This warning means a bug in slab code itself, user-passed flags have nothing to do with it. Nothing of this affects slob. Link: http://lkml.kernel.org/r/20180927171502.226522-1-dvyukov@gmail.com Signed-off-by: Dmitry Vyukov Reported-by: syzbot+87829a10073277282ad1@syzkaller.appspotmail.com Reported-by: syzbot+ef4e8fc3a06e9019bb40@syzkaller.appspotmail.com Reported-by: syzbot+6e438f4036df52cbb863@syzkaller.appspotmail.com Reported-by: syzbot+8574471d8734457d98aa@syzkaller.appspotmail.com Reported-by: syzbot+af1504df0807a083dbd9@syzkaller.appspotmail.com Acked-by: Christoph Lameter Acked-by: Vlastimil Babka Cc: Pekka Enberg Cc: David Rientjes Cc: Joonsoo Kim Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- mm/slab.c | 4 ++++ mm/slab_common.c | 12 ++++++------ 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/mm/slab.c b/mm/slab.c index aa76a70e087e..d73c7a4820a4 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -3675,6 +3675,8 @@ __do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller) struct kmem_cache *cachep; void *ret; + if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) + return NULL; cachep = kmalloc_slab(size, flags); if (unlikely(ZERO_OR_NULL_PTR(cachep))) return cachep; @@ -3710,6 +3712,8 @@ static __always_inline void *__do_kmalloc(size_t size, gfp_t flags, struct kmem_cache *cachep; void *ret; + if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) + return NULL; cachep = kmalloc_slab(size, flags); if (unlikely(ZERO_OR_NULL_PTR(cachep))) return cachep; diff --git a/mm/slab_common.c b/mm/slab_common.c index fea3376f9816..3a7ac4f15194 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1027,18 +1027,18 @@ struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags) { unsigned int index; - if (unlikely(size > KMALLOC_MAX_SIZE)) { - WARN_ON_ONCE(!(flags & __GFP_NOWARN)); - return NULL; - } - if (size <= 192) { if (!size) return ZERO_SIZE_PTR; index = size_index[size_index_elem(size)]; - } else + } else { + if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) { + WARN_ON(1); + return NULL; + } index = fls(size - 1); + } #ifdef CONFIG_ZONE_DMA if (unlikely((flags & GFP_DMA)))