From patchwork Sun Aug 25 00:54:40 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 11113173 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CC9B01399 for ; Sun, 25 Aug 2019 00:54:44 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8C80022CE3 for ; Sun, 25 Aug 2019 00:54:44 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="jmF1xqx0" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8C80022CE3 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B90086B04F7; Sat, 24 Aug 2019 20:54:43 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B418F6B04F9; Sat, 24 Aug 2019 20:54:43 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A55FD6B04FA; Sat, 24 Aug 2019 20:54:43 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0228.hostedemail.com [216.40.44.228]) by kanga.kvack.org (Postfix) with ESMTP id 85A636B04F7 for ; Sat, 24 Aug 2019 20:54:43 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id 2B42052A2 for ; Sun, 25 Aug 2019 00:54:43 +0000 (UTC) X-FDA: 75859130046.27.fog67_463cbf1e2813d X-Spam-Summary: 2,0,0,57fce20eb444c188,d41d8cd98f00b204,akpm@linux-foundation.org,:akpm@linux-foundation.org::m.mizuma@jp.fujitsu.com:mgorman@techsingularity.net:mm-commits@vger.kernel.org:n-horiguchi@ah.jp.nec.com:osalvador@suse.de:pavel.tatashin@microsoft.com:rientjes@google.com:stable@vger.kernel.org:torvalds@linux-foundation.org:vbabka@suse.cz,RULES_HIT:2:41:69:355:379:800:960:965:966:967:968:973:988:989:1260:1263:1345:1381:1431:1437:1535:1605:1606:1730:1747:1777:1792:2194:2196:2198:2199:2200:2201:2393:2525:2553:2559:2563:2682:2685:2689:2693:2731:2859:2898:2902:2933:2937:2939:2942:2945:2947:2951:2954:3022:3138:3139:3140:3141:3142:3743:3865:3866:3867:3868:3870:3871:3872:3874:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4043:4117:4321:4385:4390:4395:4605:5007:6119:6261:6653:6737:7576:7688:7903:7974:8531:8599:8660:8957:9025:9545:9592:10004:11026:11232:11473:11658:11914:12043:12048:12296:12297:12438:12517:12519:12555:12679:12783:12895:12986:13148:13161:13180:13229:13230: 13255:21 X-HE-Tag: fog67_463cbf1e2813d X-Filterd-Recvd-Size: 6861 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf17.hostedemail.com (Postfix) with ESMTP for ; Sun, 25 Aug 2019 00:54:42 +0000 (UTC) Received: from localhost.localdomain (c-73-231-172-41.hsd1.ca.comcast.net [73.231.172.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 15A7C2190F; Sun, 25 Aug 2019 00:54:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1566694481; bh=/nAdXxZjqQ/vrNHjU8t4XxTxSowuJMyFyK6hvPaQ6CE=; h=Date:From:To:Subject:From; b=jmF1xqx0OT0PfEa8XiLmdhk0hFX0KLqKmw5CTztQYjNQaoKpxpDZtGg0xTkOrlswV 2Wi7ZfxB6whA/vjJWUxll0hGSvd317XB9SC5M92RZCX2Z/eHPff1E5LKh3e1QWfX13 XCqtNLS2X0itSFt0YfD/EQdzuARNp4pdGowgA6Z0= Date: Sat, 24 Aug 2019 17:54:40 -0700 From: akpm@linux-foundation.org To: akpm@linux-foundation.org, linux-mm@kvack.org, m.mizuma@jp.fujitsu.com, mgorman@techsingularity.net, mm-commits@vger.kernel.org, n-horiguchi@ah.jp.nec.com, osalvador@suse.de, pavel.tatashin@microsoft.com, rientjes@google.com, stable@vger.kernel.org, torvalds@linux-foundation.org, vbabka@suse.cz Subject: [patch 02/11] mm, page_alloc: move_freepages should not examine struct page of reserved memory Message-ID: <20190825005440.tdSQS605E%akpm@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: David Rientjes Subject: mm, page_alloc: move_freepages should not examine struct page of reserved memory After commit 907ec5fca3dc ("mm: zero remaining unavailable struct pages"), struct page of reserved memory is zeroed. This causes page->flags to be 0 and fixes issues related to reading /proc/kpageflags, for example, of reserved memory. The VM_BUG_ON() in move_freepages_block(), however, assumes that page_zone() is meaningful even for reserved memory. That assumption is no longer true after the aforementioned commit. There's no reason why move_freepages_block() should be testing the legitimacy of page_zone() for reserved memory; its scope is limited only to pages on the zone's freelist. Note that pfn_valid() can be true for reserved memory: there is a backing struct page. The check for page_to_nid(page) is also buggy but reserved memory normally only appears on node 0 so the zeroing doesn't affect this. Move the debug checks to after verifying PageBuddy is true. This isolates the scope of the checks to only be for buddy pages which are on the zone's freelist which move_freepages_block() is operating on. In this case, an incorrect node or zone is a bug worthy of being warned about (and the examination of struct page is acceptable bcause this memory is not reserved). Why does move_freepages_block() gets called on reserved memory? It's simply math after finding a valid free page from the per-zone free area to use as fallback. We find the beginning and end of the pageblock of the valid page and that can bring us into memory that was reserved per the e820. pfn_valid() is still true (it's backed by a struct page), but since it's zero'd we shouldn't make any inferences here about comparing its node or zone. The current node check just happens to succeed most of the time by luck because reserved memory typically appears on node 0. The fix here is to validate that we actually have buddy pages before testing if there's any type of zone or node strangeness going on. We noticed it almost immediately after bringing 907ec5fca3dc in on CONFIG_DEBUG_VM builds. It depends on finding specific free pages in the per-zone free area where the math in move_freepages() will bring the start or end pfn into reserved memory and wanting to claim that entire pageblock as a new migratetype. So the path will be rare, require CONFIG_DEBUG_VM, and require fallback to a different migratetype. Some struct pages were already zeroed from reserve pages before 907ec5fca3c so it theoretically could trigger before this commit. I think it's rare enough under a config option that most people don't run that others may not have noticed. I wouldn't argue against a stable tag and the backport should be easy enough, but probably wouldn't single out a commit that this is fixing. Mel said: : The overhead of the debugging check is higher with this patch although : it'll only affect debug builds and the path is not particularly hot. : If this was a concern, I think it would be reasonable to simply remove : the debugging check as the zone boundaries are checked in : move_freepages_block and we never expect a zone/node to be smaller than : a pageblock and stuck in the middle of another zone. Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1908122036560.10779@chino.kir.corp.google.com Signed-off-by: David Rientjes Acked-by: Mel Gorman Cc: Naoya Horiguchi Cc: Masayoshi Mizuma Cc: Oscar Salvador Cc: Pavel Tatashin Cc: Vlastimil Babka Cc: Signed-off-by: Andrew Morton --- mm/page_alloc.c | 19 ++++--------------- 1 file changed, 4 insertions(+), 15 deletions(-) --- a/mm/page_alloc.c~mm-page_alloc-move_freepages-should-not-examine-struct-page-of-reserved-memory +++ a/mm/page_alloc.c @@ -2238,27 +2238,12 @@ static int move_freepages(struct zone *z unsigned int order; int pages_moved = 0; -#ifndef CONFIG_HOLES_IN_ZONE - /* - * page_zone is not safe to call in this context when - * CONFIG_HOLES_IN_ZONE is set. This bug check is probably redundant - * anyway as we check zone boundaries in move_freepages_block(). - * Remove at a later date when no bug reports exist related to - * grouping pages by mobility - */ - VM_BUG_ON(pfn_valid(page_to_pfn(start_page)) && - pfn_valid(page_to_pfn(end_page)) && - page_zone(start_page) != page_zone(end_page)); -#endif for (page = start_page; page <= end_page;) { if (!pfn_valid_within(page_to_pfn(page))) { page++; continue; } - /* Make sure we are not inadvertently changing nodes */ - VM_BUG_ON_PAGE(page_to_nid(page) != zone_to_nid(zone), page); - if (!PageBuddy(page)) { /* * We assume that pages that could be isolated for @@ -2273,6 +2258,10 @@ static int move_freepages(struct zone *z continue; } + /* Make sure we are not inadvertently changing nodes */ + VM_BUG_ON_PAGE(page_to_nid(page) != zone_to_nid(zone), page); + VM_BUG_ON_PAGE(page_zone(page) != zone, page); + order = page_order(page); move_to_free_area(page, &zone->free_area[order], migratetype); page += 1 << order;