From patchwork Thu Mar 25 04:36:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12162883 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_RED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB764C433DB for ; Thu, 25 Mar 2021 04:36:42 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0E30C61A1B for ; Thu, 25 Mar 2021 04:36:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0E30C61A1B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9407E6B0036; Thu, 25 Mar 2021 00:36:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8F02B6B006C; Thu, 25 Mar 2021 00:36:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 76A716B006E; Thu, 25 Mar 2021 00:36:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0145.hostedemail.com [216.40.44.145]) by kanga.kvack.org (Postfix) with ESMTP id 5693F6B0036 for ; Thu, 25 Mar 2021 00:36:41 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 0EC4E181B9DCB for ; Thu, 25 Mar 2021 04:36:41 +0000 (UTC) X-FDA: 77957135802.22.4E2EB23 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf24.hostedemail.com (Postfix) with ESMTP id 8E4CFA0009DC for ; Thu, 25 Mar 2021 04:36:38 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 652D761A14; Thu, 25 Mar 2021 04:36:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1616646997; bh=oKiOArs+2ovbbi/TVmam7HAv9xsCvhDi0QwNFevHP1o=; h=Date:From:To:Subject:In-Reply-To:From; b=COXGtEKSgsOHtyWFr7hDJ7OpHGIPqQwmIRsdXVSWHdaXs1fMa0O1J5GE0oIrYzfXi YDCWsyNUsIHHqSg2U3ZWQ3zBcFZ3aggas7OvyEYcJK7AM/2+gj3cN5JKuz72Pv+xUR 3z8MREWSyY6DXkRaf/wg8DTnLCihZ2dMqEpIGBho= Date: Wed, 24 Mar 2021 21:36:36 -0700 From: Andrew Morton To: akpm@linux-foundation.org, almasrymina@google.com, aneesh.kumar@linux.vnet.ibm.com, linmiaohe@huawei.com, linux-mm@kvack.org, liwp.linux@gmail.com, lkp@intel.com, mike.kravetz@oracle.com, mm-commits@vger.kernel.org, stable@vger.kernel.org, torvalds@linux-foundation.org Subject: [patch 01/14] hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings Message-ID: <20210325043636.q6-Nhymsz%akpm@linux-foundation.org> In-Reply-To: <20210324213324.fe1ba237e471b5de17a08775@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 8E4CFA0009DC X-Stat-Signature: er8gwsjcjdcg6afp97xfhhhojbitih6n Received-SPF: none (linux-foundation.org>: No applicable sender policy available) receiver=imf24; identity=mailfrom; envelope-from=""; helo=mail.kernel.org; client-ip=198.145.29.99 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616646998-441132 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miaohe Lin Subject: hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings The current implementation of hugetlb_cgroup for shared mappings could have different behavior. Consider the following two scenarios: 1.Assume initial css reference count of hugetlb_cgroup is 1: 1.1 Call hugetlb_reserve_pages with from = 1, to = 2. So css reference count is 2 associated with 1 file_region. 1.2 Call hugetlb_reserve_pages with from = 2, to = 3. So css reference count is 3 associated with 2 file_region. 1.3 coalesce_file_region will coalesce these two file_regions into one. So css reference count is 3 associated with 1 file_region now. 2.Assume initial css reference count of hugetlb_cgroup is 1 again: 2.1 Call hugetlb_reserve_pages with from = 1, to = 3. So css reference count is 2 associated with 1 file_region. Therefore, we might have one file_region while holding one or more css reference counts. This inconsistency could lead to imbalanced css_get() and css_put() pair. If we do css_put one by one (i.g. hole punch case), scenario 2 would put one more css reference. If we do css_put all together (i.g. truncate case), scenario 1 will leak one css reference. The imbalanced css_get() and css_put() pair would result in a non-zero reference when we try to destroy the hugetlb cgroup. The hugetlb cgroup directory is removed __but__ associated resource is not freed. This might result in OOM or can not create a new hugetlb cgroup in a busy workload ultimately. In order to fix this, we have to make sure that one file_region must hold exactly one css reference. So in coalesce_file_region case, we should release one css reference before coalescence. Also only put css reference when the entire file_region is removed. The last thing to note is that the caller of region_add() will only hold one reference to h_cg->css for the whole contiguous reservation region. But this area might be scattered when there are already some file_regions reside in it. As a result, many file_regions may share only one h_cg->css reference. In order to ensure that one file_region must hold exactly one css reference, we should do css_get() for each file_region and release the reference held by caller when they are done. [linmiaohe@huawei.com: fix imbalanced css_get and css_put pair for shared mappings] Link: https://lkml.kernel.org/r/20210316023002.53921-1-linmiaohe@huawei.com Link: https://lkml.kernel.org/r/20210301120540.37076-1-linmiaohe@huawei.com Fixes: 075a61d07a8e ("hugetlb_cgroup: add accounting for shared mappings") Reported-by: kernel test robot (auto build test ERROR) Signed-off-by: Miaohe Lin Reviewed-by: Mike Kravetz Cc: Aneesh Kumar K.V Cc: Wanpeng Li Cc: Mina Almasry Cc: Signed-off-by: Andrew Morton --- include/linux/hugetlb_cgroup.h | 15 +++++++++-- mm/hugetlb.c | 41 +++++++++++++++++++++++++++---- mm/hugetlb_cgroup.c | 10 ++++++- 3 files changed, 58 insertions(+), 8 deletions(-) --- a/include/linux/hugetlb_cgroup.h~hugetlb_cgroup-fix-imbalanced-css_get-and-css_put-pair-for-shared-mappings +++ a/include/linux/hugetlb_cgroup.h @@ -113,6 +113,11 @@ static inline bool hugetlb_cgroup_disabl return !cgroup_subsys_enabled(hugetlb_cgrp_subsys); } +static inline void hugetlb_cgroup_put_rsvd_cgroup(struct hugetlb_cgroup *h_cg) +{ + css_put(&h_cg->css); +} + extern int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages, struct hugetlb_cgroup **ptr); extern int hugetlb_cgroup_charge_cgroup_rsvd(int idx, unsigned long nr_pages, @@ -138,7 +143,8 @@ extern void hugetlb_cgroup_uncharge_coun extern void hugetlb_cgroup_uncharge_file_region(struct resv_map *resv, struct file_region *rg, - unsigned long nr_pages); + unsigned long nr_pages, + bool region_del); extern void hugetlb_cgroup_file_init(void) __init; extern void hugetlb_cgroup_migrate(struct page *oldhpage, @@ -147,7 +153,8 @@ extern void hugetlb_cgroup_migrate(struc #else static inline void hugetlb_cgroup_uncharge_file_region(struct resv_map *resv, struct file_region *rg, - unsigned long nr_pages) + unsigned long nr_pages, + bool region_del) { } @@ -185,6 +192,10 @@ static inline bool hugetlb_cgroup_disabl return true; } +static inline void hugetlb_cgroup_put_rsvd_cgroup(struct hugetlb_cgroup *h_cg) +{ +} + static inline int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages, struct hugetlb_cgroup **ptr) { --- a/mm/hugetlb.c~hugetlb_cgroup-fix-imbalanced-css_get-and-css_put-pair-for-shared-mappings +++ a/mm/hugetlb.c @@ -280,6 +280,17 @@ static void record_hugetlb_cgroup_unchar nrg->reservation_counter = &h_cg->rsvd_hugepage[hstate_index(h)]; nrg->css = &h_cg->css; + /* + * The caller will hold exactly one h_cg->css reference for the + * whole contiguous reservation region. But this area might be + * scattered when there are already some file_regions reside in + * it. As a result, many file_regions may share only one css + * reference. In order to ensure that one file_region must hold + * exactly one h_cg->css reference, we should do css_get for + * each file_region and leave the reference held by caller + * untouched. + */ + css_get(&h_cg->css); if (!resv->pages_per_hpage) resv->pages_per_hpage = pages_per_huge_page(h); /* pages_per_hpage should be the same for all entries in @@ -293,6 +304,14 @@ static void record_hugetlb_cgroup_unchar #endif } +static void put_uncharge_info(struct file_region *rg) +{ +#ifdef CONFIG_CGROUP_HUGETLB + if (rg->css) + css_put(rg->css); +#endif +} + static bool has_same_uncharge_info(struct file_region *rg, struct file_region *org) { @@ -316,6 +335,7 @@ static void coalesce_file_region(struct prg->to = rg->to; list_del(&rg->link); + put_uncharge_info(rg); kfree(rg); rg = prg; @@ -327,6 +347,7 @@ static void coalesce_file_region(struct nrg->from = rg->from; list_del(&rg->link); + put_uncharge_info(rg); kfree(rg); } } @@ -662,7 +683,7 @@ retry: del += t - f; hugetlb_cgroup_uncharge_file_region( - resv, rg, t - f); + resv, rg, t - f, false); /* New entry for end of split region */ nrg->from = t; @@ -683,7 +704,7 @@ retry: if (f <= rg->from && t >= rg->to) { /* Remove entire region */ del += rg->to - rg->from; hugetlb_cgroup_uncharge_file_region(resv, rg, - rg->to - rg->from); + rg->to - rg->from, true); list_del(&rg->link); kfree(rg); continue; @@ -691,13 +712,13 @@ retry: if (f <= rg->from) { /* Trim beginning of region */ hugetlb_cgroup_uncharge_file_region(resv, rg, - t - rg->from); + t - rg->from, false); del += t - rg->from; rg->from = t; } else { /* Trim end of region */ hugetlb_cgroup_uncharge_file_region(resv, rg, - rg->to - f); + rg->to - f, false); del += rg->to - f; rg->to = f; @@ -5187,6 +5208,10 @@ bool hugetlb_reserve_pages(struct inode */ long rsv_adjust; + /* + * hugetlb_cgroup_uncharge_cgroup_rsvd() will put the + * reference to h_cg->css. See comment below for detail. + */ hugetlb_cgroup_uncharge_cgroup_rsvd( hstate_index(h), (chg - add) * pages_per_huge_page(h), h_cg); @@ -5194,6 +5219,14 @@ bool hugetlb_reserve_pages(struct inode rsv_adjust = hugepage_subpool_put_pages(spool, chg - add); hugetlb_acct_memory(h, -rsv_adjust); + } else if (h_cg) { + /* + * The file_regions will hold their own reference to + * h_cg->css. So we should release the reference held + * via hugetlb_cgroup_charge_cgroup_rsvd() when we are + * done. + */ + hugetlb_cgroup_put_rsvd_cgroup(h_cg); } } return true; --- a/mm/hugetlb_cgroup.c~hugetlb_cgroup-fix-imbalanced-css_get-and-css_put-pair-for-shared-mappings +++ a/mm/hugetlb_cgroup.c @@ -391,7 +391,8 @@ void hugetlb_cgroup_uncharge_counter(str void hugetlb_cgroup_uncharge_file_region(struct resv_map *resv, struct file_region *rg, - unsigned long nr_pages) + unsigned long nr_pages, + bool region_del) { if (hugetlb_cgroup_disabled() || !resv || !rg || !nr_pages) return; @@ -400,7 +401,12 @@ void hugetlb_cgroup_uncharge_file_region !resv->reservation_counter) { page_counter_uncharge(rg->reservation_counter, nr_pages * resv->pages_per_hpage); - css_put(rg->css); + /* + * Only do css_put(rg->css) when we delete the entire region + * because one file_region must hold exactly one css reference. + */ + if (region_del) + css_put(rg->css); } } From patchwork Thu Mar 25 04:36:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12162885 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_RED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B6D4FC433C1 for ; Thu, 25 Mar 2021 04:36:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3711261A1B for ; Thu, 25 Mar 2021 04:36:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3711261A1B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C9D426B006C; Thu, 25 Mar 2021 00:36:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BFE1B6B006E; Thu, 25 Mar 2021 00:36:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A2B426B0070; Thu, 25 Mar 2021 00:36:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0245.hostedemail.com [216.40.44.245]) by kanga.kvack.org (Postfix) with ESMTP id 83A2D6B006C for ; Thu, 25 Mar 2021 00:36:42 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 32F2E8248D51 for ; Thu, 25 Mar 2021 04:36:42 +0000 (UTC) X-FDA: 77957135844.09.45453CE Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf30.hostedemail.com (Postfix) with ESMTP id 5EA9DE0001B4 for ; Thu, 25 Mar 2021 04:36:39 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 89CB560238; Thu, 25 Mar 2021 04:36:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1616647001; bh=gaT8IEf8GSnOVlU+YJoOeQRqSYnOUvFmFL7h2HDMNqY=; h=Date:From:To:Subject:In-Reply-To:From; b=iQioKkKyNc5hdNybfs+HuI7x3F9xdYwvyegSBHn6T+Grld3943xYsK1N4lGCnalVZ PqVv6MX0SkRaITPNQlkukgmrnHJY6n9nIFDyjG0GiLokjwW6NMDF8Tb6yNs0deTVuN 0h+uxITDlp6XxwPTcaS6raya4XZcJTabaRQmamFg= Date: Wed, 24 Mar 2021 21:36:40 -0700 From: Andrew Morton To: akpm@linux-foundation.org, andreyknvl@google.com, aryabinin@virtuozzo.com, Branislav.Rankov@arm.com, catalin.marinas@arm.com, dvyukov@google.com, elver@google.com, eugenis@google.com, glider@google.com, kevin.brodsky@arm.com, linux-mm@kvack.org, mm-commits@vger.kernel.org, pcc@google.com, stable@vger.kernel.org, torvalds@linux-foundation.org, vincenzo.frascino@arm.com, will.deacon@arm.com Subject: [patch 02/14] kasan: fix per-page tags for non-page_alloc pages Message-ID: <20210325043640.Et_g_98cn%akpm@linux-foundation.org> In-Reply-To: <20210324213324.fe1ba237e471b5de17a08775@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Stat-Signature: te41iwt1yuux9qi5izun6yr4tguz1ii5 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 5EA9DE0001B4 Received-SPF: none (linux-foundation.org>: No applicable sender policy available) receiver=imf30; identity=mailfrom; envelope-from=""; helo=mail.kernel.org; client-ip=198.145.29.99 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616646999-16918 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Andrey Konovalov Subject: kasan: fix per-page tags for non-page_alloc pages To allow performing tag checks on page_alloc addresses obtained via page_address(), tag-based KASAN modes store tags for page_alloc allocations in page->flags. Currently, the default tag value stored in page->flags is 0x00. Therefore, page_address() returns a 0x00ffff... address for pages that were not allocated via page_alloc. This might cause problems. A particular case we encountered is a conflict with KFENCE. If a KFENCE-allocated slab object is being freed via kfree(page_address(page) + offset), the address passed to kfree() will get tagged with 0x00 (as slab pages keep the default per-page tags). This leads to is_kfence_address() check failing, and a KFENCE object ending up in normal slab freelist, which causes memory corruptions. This patch changes the way KASAN stores tag in page-flags: they are now stored xor'ed with 0xff. This way, KASAN doesn't need to initialize per-page flags for every created page, which might be slow. With this change, page_address() returns natively-tagged (with 0xff) pointers for pages that didn't have tags set explicitly. This patch fixes the encountered conflict with KFENCE and prevents more similar issues that can occur in the future. Link: https://lkml.kernel.org/r/1a41abb11c51b264511d9e71c303bb16d5cb367b.1615475452.git.andreyknvl@google.com Fixes: 2813b9c02962 ("kasan, mm, arm64: tag non slab memory allocated via pagealloc") Signed-off-by: Andrey Konovalov Reviewed-by: Marco Elver Cc: Catalin Marinas Cc: Will Deacon Cc: Vincenzo Frascino Cc: Dmitry Vyukov Cc: Andrey Ryabinin Cc: Alexander Potapenko Cc: Peter Collingbourne Cc: Evgenii Stepanov Cc: Branislav Rankov Cc: Kevin Brodsky Cc: Signed-off-by: Andrew Morton --- include/linux/mm.h | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) --- a/include/linux/mm.h~kasan-fix-per-page-tags-for-non-page_alloc-pages +++ a/include/linux/mm.h @@ -1461,16 +1461,28 @@ static inline bool cpupid_match_pid(stru #if defined(CONFIG_KASAN_SW_TAGS) || defined(CONFIG_KASAN_HW_TAGS) +/* + * KASAN per-page tags are stored xor'ed with 0xff. This allows to avoid + * setting tags for all pages to native kernel tag value 0xff, as the default + * value 0x00 maps to 0xff. + */ + static inline u8 page_kasan_tag(const struct page *page) { - if (kasan_enabled()) - return (page->flags >> KASAN_TAG_PGSHIFT) & KASAN_TAG_MASK; - return 0xff; + u8 tag = 0xff; + + if (kasan_enabled()) { + tag = (page->flags >> KASAN_TAG_PGSHIFT) & KASAN_TAG_MASK; + tag ^= 0xff; + } + + return tag; } static inline void page_kasan_tag_set(struct page *page, u8 tag) { if (kasan_enabled()) { + tag ^= 0xff; page->flags &= ~(KASAN_TAG_MASK << KASAN_TAG_PGSHIFT); page->flags |= (tag & KASAN_TAG_MASK) << KASAN_TAG_PGSHIFT; } From patchwork Thu Mar 25 04:36:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12162887 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_RED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA435C433DB for ; Thu, 25 Mar 2021 04:36:46 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9398060238 for ; Thu, 25 Mar 2021 04:36:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9398060238 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 34F146B006E; Thu, 25 Mar 2021 00:36:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 300096B0070; Thu, 25 Mar 2021 00:36:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0B3A56B0071; Thu, 25 Mar 2021 00:36:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0159.hostedemail.com [216.40.44.159]) by kanga.kvack.org (Postfix) with ESMTP id D82CD6B006E for ; Thu, 25 Mar 2021 00:36:45 -0400 (EDT) Received: from smtpin38.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 95EB51814064E for ; Thu, 25 Mar 2021 04:36:45 +0000 (UTC) X-FDA: 77957135970.38.467CAFC Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf27.hostedemail.com (Postfix) with ESMTP id C0C7980192C0 for ; Thu, 25 Mar 2021 04:36:42 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id D2D1561A1A; Thu, 25 Mar 2021 04:36:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1616647004; bh=PUcF3KgHrzGiBtVp9wFO0dBtzI7xyYmREqwbxHQEUvQ=; h=Date:From:To:Subject:In-Reply-To:From; b=GwM9R8uzoniZV/WgrVnx1R8ejtH3mS1fOv/MBfU7+FDQIqydhUFxcSG5X0yZO6fbx QMrDFerayDX9coLgDwTcOr4p7awuTvNYVVdVV5ZyOkIIY6h/k9o29d70Li8KX7DjEG OJ2bYblYjr49i0Wz9UsnR2nh6TRMtGXM9PsFbJzI= Date: Wed, 24 Mar 2021 21:36:43 -0700 From: Andrew Morton To: aarcange@redhat.com, akpm@linux-foundation.org, bgardon@google.com, dimitri.sivanich@hpe.com, hannes@cmpxchg.org, jgg@nvidia.com, jgg@ziepe.ca, jglisse@redhat.com, linux-mm@kvack.org, mhocko@suse.com, mm-commits@vger.kernel.org, rientjes@google.com, seanjc@google.com, torvalds@linux-foundation.org Subject: [patch 03/14] mm/mmu_notifiers: ensure range_end() is paired with range_start() Message-ID: <20210325043643.dkjDXH1Bc%akpm@linux-foundation.org> In-Reply-To: <20210324213324.fe1ba237e471b5de17a08775@linux-foundation.org> User-Agent: s-nail v14.8.16 MIME-Version: 1.0 X-Stat-Signature: fpia6jcb5m69whnjenhspymif4q8sbg6 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: C0C7980192C0 Received-SPF: none (linux-foundation.org>: No applicable sender policy available) receiver=imf27; identity=mailfrom; envelope-from=""; helo=mail.kernel.org; client-ip=198.145.29.99 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616647002-549243 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Sean Christopherson Subject: mm/mmu_notifiers: ensure range_end() is paired with range_start() If one or more notifiers fails .invalidate_range_start(), invoke .invalidate_range_end() for "all" notifiers. If there are multiple notifiers, those that did not fail are expecting _start() and _end() to be paired, e.g. KVM's mmu_notifier_count would become imbalanced. Disallow notifiers that can fail _start() from implementing _end() so that it's unnecessary to either track which notifiers rejected _start(), or had already succeeded prior to a failed _start(). Note, the existing behavior of calling _start() on all notifiers even after a previous notifier failed _start() was an unintented "feature". Make it canon now that the behavior is depended on for correctness. As of today, the bug is likely benign: 1. The only caller of the non-blocking notifier is OOM kill. 2. The only notifiers that can fail _start() are the i915 and Nouveau drivers. 3. The only notifiers that utilize _end() are the SGI UV GRU driver and KVM. 4. The GRU driver will never coincide with the i195/Nouveau drivers. 5. An imbalanced kvm->mmu_notifier_count only causes soft lockup in the _guest_, and the guest is already doomed due to being an OOM victim. Fix the bug now to play nice with future usage, e.g. KVM has a potential use case for blocking memslot updates in KVM while an invalidation is in-progress, and failure to unblock would result in said updates being blocked indefinitely and hanging. Found by inspection. Verified by adding a second notifier in KVM that periodically returns -EAGAIN on non-blockable ranges, triggering OOM, and observing that KVM exits with an elevated notifier count. Link: https://lkml.kernel.org/r/20210311180057.1582638-1-seanjc@google.com Fixes: 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers") Signed-off-by: Sean Christopherson Suggested-by: Jason Gunthorpe Reviewed-by: Jason Gunthorpe Cc: David Rientjes Cc: Ben Gardon Cc: Michal Hocko Cc: "Jérôme Glisse" Cc: Andrea Arcangeli Cc: Johannes Weiner Cc: Dimitri Sivanich Signed-off-by: Andrew Morton --- include/linux/mmu_notifier.h | 10 +++++----- mm/mmu_notifier.c | 23 +++++++++++++++++++++++ 2 files changed, 28 insertions(+), 5 deletions(-) --- a/include/linux/mmu_notifier.h~mm-mmu_notifiers-esnure-range_end-is-paired-with-range_start +++ a/include/linux/mmu_notifier.h @@ -169,11 +169,11 @@ struct mmu_notifier_ops { * the last refcount is dropped. * * If blockable argument is set to false then the callback cannot - * sleep and has to return with -EAGAIN. 0 should be returned - * otherwise. Please note that if invalidate_range_start approves - * a non-blocking behavior then the same applies to - * invalidate_range_end. - * + * sleep and has to return with -EAGAIN if sleeping would be required. + * 0 should be returned otherwise. Please note that notifiers that can + * fail invalidate_range_start are not allowed to implement + * invalidate_range_end, as there is no mechanism for informing the + * notifier that its start failed. */ int (*invalidate_range_start)(struct mmu_notifier *subscription, const struct mmu_notifier_range *range); --- a/mm/mmu_notifier.c~mm-mmu_notifiers-esnure-range_end-is-paired-with-range_start +++ a/mm/mmu_notifier.c @@ -501,10 +501,33 @@ static int mn_hlist_invalidate_range_sta ""); WARN_ON(mmu_notifier_range_blockable(range) || _ret != -EAGAIN); + /* + * We call all the notifiers on any EAGAIN, + * there is no way for a notifier to know if + * its start method failed, thus a start that + * does EAGAIN can't also do end. + */ + WARN_ON(ops->invalidate_range_end); ret = _ret; } } } + + if (ret) { + /* + * Must be non-blocking to get here. If there are multiple + * notifiers and one or more failed start, any that succeeded + * start are expecting their end to be called. Do so now. + */ + hlist_for_each_entry_rcu(subscription, &subscriptions->list, + hlist, srcu_read_lock_held(&srcu)) { + if (!subscription->ops->invalidate_range_end) + continue; + + subscription->ops->invalidate_range_end(subscription, + range); + } + } srcu_read_unlock(&srcu, id); return ret; From patchwork Thu Mar 25 04:36:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12162889 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_RED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B6C71C433C1 for ; Thu, 25 Mar 2021 04:36:50 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 59E6461A1B for ; Thu, 25 Mar 2021 04:36:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 59E6461A1B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id EE5DA6B0071; Thu, 25 Mar 2021 00:36:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E4ED16B0072; Thu, 25 Mar 2021 00:36:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C30426B0073; Thu, 25 Mar 2021 00:36:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0173.hostedemail.com [216.40.44.173]) by kanga.kvack.org (Postfix) with ESMTP id 939BB6B0071 for ; Thu, 25 Mar 2021 00:36:48 -0400 (EDT) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 4B0468248D51 for ; Thu, 25 Mar 2021 04:36:48 +0000 (UTC) X-FDA: 77957136096.07.07BF429 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf13.hostedemail.com (Postfix) with ESMTP id 5588AE0011C9 for ; Thu, 25 Mar 2021 04:36:47 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id DF27C61A14; Thu, 25 Mar 2021 04:36:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1616647007; bh=b4ShdLbawH1da2mMX3GriiEiSupRdXsPs1YK7OK2YV0=; h=Date:From:To:Subject:In-Reply-To:From; b=Iru9u6oh+qflhrLKAYy4x3f59CUty3S2ySK3K4CZjWEbnzMudS59xv+a2z9/Qswz3 hmDEpLhEdR4sORUHCtOok6IWB+p4e68DUPKBZ5LWIPBxqUW2mMGxGKtSBCfgYRBBTw hJgxPZsWqDJ8zVYgYxtUWB7hVwcnxkj8FVbRA+S0= Date: Wed, 24 Mar 2021 21:36:46 -0700 From: Andrew Morton To: akpm@linux-foundation.org, linux-mm@kvack.org, lkp@intel.com, mm-commits@vger.kernel.org, rong.a.chen@intel.com, shuah@kernel.org, torvalds@linux-foundation.org Subject: [patch 04/14] selftests/vm: fix out-of-tree build Message-ID: <20210325043646.zWXA9MYSn%akpm@linux-foundation.org> In-Reply-To: <20210324213324.fe1ba237e471b5de17a08775@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Stat-Signature: x8mzt3yf955w89ozqixnrxcoq5s6y6ic X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 5588AE0011C9 Received-SPF: none (linux-foundation.org>: No applicable sender policy available) receiver=imf13; identity=mailfrom; envelope-from=""; helo=mail.kernel.org; client-ip=198.145.29.99 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616647007-757311 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Rong Chen Subject: selftests/vm: fix out-of-tree build When building out-of-tree, attempting to make target from $(OUTPUT) directory: make[1]: *** No rule to make target '$(OUTPUT)/protection_keys.c', needed by '$(OUTPUT)/protection_keys_32'. Link: https://lkml.kernel.org/r/20210315094700.522753-1-rong.a.chen@intel.com Signed-off-by: Rong Chen Reported-by: kernel test robot Cc: Shuah Khan Signed-off-by: Andrew Morton --- tools/testing/selftests/vm/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/tools/testing/selftests/vm/Makefile~selftests-vm-fix-out-of-tree-build +++ a/tools/testing/selftests/vm/Makefile @@ -101,7 +101,7 @@ endef ifeq ($(CAN_BUILD_I386),1) $(BINARIES_32): CFLAGS += -m32 $(BINARIES_32): LDLIBS += -lrt -ldl -lm -$(BINARIES_32): %_32: %.c +$(BINARIES_32): $(OUTPUT)/%_32: %.c $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $(notdir $^) $(LDLIBS) -o $@ $(foreach t,$(TARGETS),$(eval $(call gen-target-rule-32,$(t)))) endif @@ -109,7 +109,7 @@ endif ifeq ($(CAN_BUILD_X86_64),1) $(BINARIES_64): CFLAGS += -m64 $(BINARIES_64): LDLIBS += -lrt -ldl -$(BINARIES_64): %_64: %.c +$(BINARIES_64): $(OUTPUT)/%_64: %.c $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $(notdir $^) $(LDLIBS) -o $@ $(foreach t,$(TARGETS),$(eval $(call gen-target-rule-64,$(t)))) endif From patchwork Thu Mar 25 04:36:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12162891 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, URIBL_RED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40328C433DB for ; Thu, 25 Mar 2021 04:36:53 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D464F61A1B for ; Thu, 25 Mar 2021 04:36:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D464F61A1B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6F6B76B0072; Thu, 25 Mar 2021 00:36:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6AF416B0073; Thu, 25 Mar 2021 00:36:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 521C46B0074; Thu, 25 Mar 2021 00:36:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0156.hostedemail.com [216.40.44.156]) by kanga.kvack.org (Postfix) with ESMTP id 264A26B0072 for ; Thu, 25 Mar 2021 00:36:52 -0400 (EDT) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id E221775AA for ; Thu, 25 Mar 2021 04:36:51 +0000 (UTC) X-FDA: 77957136222.31.3D1DD81 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf21.hostedemail.com (Postfix) with ESMTP id 63FB1E0001B2 for ; Thu, 25 Mar 2021 04:36:50 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 4425261A1A; Thu, 25 Mar 2021 04:36:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1616647010; bh=X+a93unu8h61Wnule1KcshcFs3MjMECqef6bYX1cimE=; h=Date:From:To:Subject:In-Reply-To:From; b=F3fxO1BtSmpnQcccyuJpdqkNZx+Hdcg4lqxcxOZqf2Mh8RuIoyaCh47tP8GovS47x i58XjR9vPmc0M02m24+oQpCO9AqrJ1Eq8R3K3VTHEDcAotAoPIeEwDd6kRIS8tnswY c97ThM/SSoFyYaa7b7c7Or0p/6b/5NR+V/i7EMVY= Date: Wed, 24 Mar 2021 21:36:49 -0700 From: Andrew Morton To: akpm@linux-foundation.org, ks77sj@gmail.com, linux-mm@kvack.org, mm-commits@vger.kernel.org, snild@sony.com, stable@vger.kernel.org, tommyhebb@gmail.com, torvalds@linux-foundation.org, vitaly.wool@konsulko.com Subject: [patch 05/14] z3fold: prevent reclaim/free race for headless pages Message-ID: <20210325043649.utT2m0hPX%akpm@linux-foundation.org> In-Reply-To: <20210324213324.fe1ba237e471b5de17a08775@linux-foundation.org> User-Agent: s-nail v14.8.16 MIME-Version: 1.0 X-Stat-Signature: 8t7o8mqdtk8tquz6nwooxi56cxfh6mf1 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 63FB1E0001B2 Received-SPF: none (linux-foundation.org>: No applicable sender policy available) receiver=imf21; identity=mailfrom; envelope-from=""; helo=mail.kernel.org; client-ip=198.145.29.99 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616647010-681072 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Thomas Hebb Subject: z3fold: prevent reclaim/free race for headless pages commit ca0246bb97c2 ("z3fold: fix possible reclaim races") introduced the PAGE_CLAIMED flag "to avoid racing on a z3fold 'headless' page release." By atomically testing and setting the bit in each of z3fold_free() and z3fold_reclaim_page(), a double-free was avoided. However, commit dcf5aedb24f8 ("z3fold: stricter locking and more careful reclaim") appears to have unintentionally broken this behavior by moving the PAGE_CLAIMED check in z3fold_reclaim_page() to after the page lock gets taken, which only happens for non-headless pages. For headless pages, the check is now skipped entirely and races can occur again. I have observed such a race on my system: page:00000000ffbd76b7 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x165316 flags: 0x2ffff0000000000() raw: 02ffff0000000000 ffffea0004535f48 ffff8881d553a170 0000000000000000 raw: 0000000000000000 0000000000000011 00000000ffffffff 0000000000000000 page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0) ------------[ cut here ]------------ kernel BUG at include/linux/mm.h:707! invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI CPU: 2 PID: 291928 Comm: kworker/2:0 Tainted: G B 5.10.7-arch1-1-kasan #1 Hardware name: Gigabyte Technology Co., Ltd. H97N-WIFI/H97N-WIFI, BIOS F9b 03/03/2016 Workqueue: zswap-shrink shrink_worker RIP: 0010:__free_pages+0x10a/0x130 Code: c1 e7 06 48 01 ef 45 85 e4 74 d1 44 89 e6 31 d2 41 83 ec 01 e8 e7 b0 ff ff eb da 48 c7 c6 e0 32 91 88 48 89 ef e8 a6 89 f8 ff <0f> 0b 4c 89 e7 e8 fc 79 07 00 e9 33 ff ff ff 48 89 ef e8 ff 79 07 RSP: 0000:ffff88819a2ffb98 EFLAGS: 00010296 RAX: 0000000000000000 RBX: ffffea000594c5a8 RCX: 0000000000000000 RDX: 1ffffd4000b298b7 RSI: 0000000000000000 RDI: ffffea000594c5b8 RBP: ffffea000594c580 R08: 000000000000003e R09: ffff8881d5520bbb R10: ffffed103aaa4177 R11: 0000000000000001 R12: ffffea000594c5b4 R13: 0000000000000000 R14: ffff888165316000 R15: ffffea000594c588 FS: 0000000000000000(0000) GS:ffff8881d5500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7c8c3654d8 CR3: 0000000103f42004 CR4: 00000000001706e0 Call Trace: z3fold_zpool_shrink+0x9b6/0x1240 ? sugov_update_single+0x357/0x990 ? sched_clock+0x5/0x10 ? sched_clock_cpu+0x18/0x180 ? z3fold_zpool_map+0x490/0x490 ? _raw_spin_lock_irq+0x88/0xe0 shrink_worker+0x35/0x90 process_one_work+0x70c/0x1210 ? pwq_dec_nr_in_flight+0x15b/0x2a0 worker_thread+0x539/0x1200 ? __kthread_parkme+0x73/0x120 ? rescuer_thread+0x1000/0x1000 kthread+0x330/0x400 ? __kthread_bind_mask+0x90/0x90 ret_from_fork+0x22/0x30 Modules linked in: rfcomm ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ccm algif_aead des_generic libdes ecb algif_skcipher cmac bnep md4 algif_hash af_alg vfat fat intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel iwlmvm hid_logitech_hidpp kvm at24 mac80211 snd_hda_codec_realtek iTCO_wdt snd_hda_codec_generic intel_pmc_bxt snd_hda_codec_hdmi ledtrig_audio iTCO_vendor_support mei_wdt mei_hdcp snd_hda_intel snd_intel_dspcfg libarc4 soundwire_intel irqbypass iwlwifi soundwire_generic_allocation rapl soundwire_cadence intel_cstate snd_hda_codec intel_uncore btusb joydev mousedev snd_usb_audio pcspkr btrtl uvcvideo nouveau btbcm i2c_i801 btintel snd_hda_core videobuf2_vmalloc i2c_smbus snd_usbmidi_lib videobuf2_memops bluetooth snd_hwdep soundwire_bus snd_soc_rt5640 videobuf2_v4l2 cfg80211 snd_soc_rl6231 videobuf2_common snd_rawmidi lpc_ich alx videodev mdio snd_seq_device snd_soc_core mc ecdh_generic mxm_wmi mei_me hid_logitech_dj wmi snd_compress e1000e ac97_bus mei ttm rfkill snd_pcm_dmaengine ecc snd_pcm snd_timer snd soundcore mac_hid acpi_pad pkcs8_key_parser it87 hwmon_vid crypto_user fuse ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 dm_crypt cbc encrypted_keys trusted tpm rng_core usbhid dm_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper xhci_pci xhci_pci_renesas i915 video intel_gtt i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm agpgart ---[ end trace 126d646fc3dc0ad8 ]--- To fix the issue, re-add the earlier test and set in the case where we have a headless page. Link: https://lkml.kernel.org/r/c8106dbe6d8390b290cd1d7f873a2942e805349e.1615452048.git.tommyhebb@gmail.com Fixes: dcf5aedb24f8 ("z3fold: stricter locking and more careful reclaim") Signed-off-by: Thomas Hebb Reviewed-by: Vitaly Wool Cc: Jongseok Kim Cc: Snild Dolkow Cc: Signed-off-by: Andrew Morton --- mm/z3fold.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) --- a/mm/z3fold.c~z3fold-prevent-reclaim-free-race-for-headless-pages +++ a/mm/z3fold.c @@ -1346,8 +1346,22 @@ static int z3fold_reclaim_page(struct z3 page = list_entry(pos, struct page, lru); zhdr = page_address(page); - if (test_bit(PAGE_HEADLESS, &page->private)) + if (test_bit(PAGE_HEADLESS, &page->private)) { + /* + * For non-headless pages, we wait to do this + * until we have the page lock to avoid racing + * with __z3fold_alloc(). Headless pages don't + * have a lock (and __z3fold_alloc() will never + * see them), but we still need to test and set + * PAGE_CLAIMED to avoid racing with + * z3fold_free(), so just do it now before + * leaving the loop. + */ + if (test_and_set_bit(PAGE_CLAIMED, &page->private)) + continue; + break; + } if (kref_get_unless_zero(&zhdr->refcount) == 0) { zhdr = NULL;