From patchwork Thu Dec 12 19:04:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Davidlohr Bueso X-Patchwork-Id: 11289311 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AAB726C1 for ; Thu, 12 Dec 2019 19:10:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 82BD42253D for ; Thu, 12 Dec 2019 19:10:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 82BD42253D Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=stgolabs.net Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A3BA18E0005; Thu, 12 Dec 2019 14:10:56 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 9EC5C8E0001; Thu, 12 Dec 2019 14:10:56 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 901BE8E0005; Thu, 12 Dec 2019 14:10:56 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0077.hostedemail.com [216.40.44.77]) by kanga.kvack.org (Postfix) with ESMTP id 7C5E78E0001 for ; Thu, 12 Dec 2019 14:10:56 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id EB848181AEF2A for ; Thu, 12 Dec 2019 19:10:55 +0000 (UTC) X-FDA: 76257431670.14.metal14_7c7b710d5a24c X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,dave@stgolabs.net,:akpm@linux-foundation.org:mike.kravetz@oracle.com:longman@redhat.com:willy@infradead.org:linux-kernel@vger.kernel.org::mhocko@kernel.org:aneesh.kumar@linux.ibm.com,RULES_HIT:30036:30054:30070,0,RBL:195.135.220.15:@stgolabs.net:.lbl8.mailshell.net-62.14.6.2 64.201.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fs,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:26,LUA_SUMMARY:none X-HE-Tag: metal14_7c7b710d5a24c X-Filterd-Recvd-Size: 5098 Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by imf29.hostedemail.com (Postfix) with ESMTP for ; Thu, 12 Dec 2019 19:10:55 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id C6396B182; Thu, 12 Dec 2019 19:10:53 +0000 (UTC) Date: Thu, 12 Dec 2019 11:04:27 -0800 From: Davidlohr Bueso To: Andrew Morton Cc: Mike Kravetz , Waiman Long , Matthew Wilcox , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Michal Hocko , aneesh.kumar@linux.ibm.com Subject: [PATCH v2] mm/hugetlb: defer free_huge_page() to a workqueue Message-ID: <20191212190427.ouyohviijf5inhur@linux-p48b> Mail-Followup-To: Andrew Morton , Mike Kravetz , Waiman Long , Matthew Wilcox , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Michal Hocko , aneesh.kumar@linux.ibm.com References: <20191211194615.18502-1-longman@redhat.com> <4fbc39a9-2c9c-4c2c-2b13-a548afe6083c@oracle.com> <32d2d4f2-83b9-2e40-05e2-71cd07e01b80@redhat.com> <0fcce71f-bc20-0ea3-b075-46592c8d533d@oracle.com> <20191212060650.ftqq27ftutxpc5hq@linux-p48b> <20191212063050.ufrpij6s6jkv7g7j@linux-p48b> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20191212063050.ufrpij6s6jkv7g7j@linux-p48b> User-Agent: NeoMutt/20180716 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: There have been deadlock reports[1, 2] where put_page is called from softirq context and this causes trouble with the hugetlb_lock, as well as potentially the subpool lock. For such an unlikely scenario, lets not add irq dancing overhead to the lock+unlock operations, which could incur in expensive instruction dependencies, particularly when considering hard-irq safety. For example PUSHF+POPF on x86. Instead, just use a workqueue and do the free_huge_page() in regular task context. [1] https://lore.kernel.org/lkml/20191211194615.18502-1-longman@redhat.com/ [2] https://lore.kernel.org/lkml/20180905112341.21355-1-aneesh.kumar@linux.ibm.com/ Reported-by: Waiman Long Reported-by: Aneesh Kumar K.V Signed-off-by: Davidlohr Bueso --- - Changes from v1: Only use wq when in_interrupt(), otherwise business as usual. Also include the proper header file. - While I have not reproduced this issue, the v1 using wq passes all hugetlb related tests in ltp. mm/hugetlb.c | 45 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 44 insertions(+), 1 deletion(-) -- 2.16.4 diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ac65bb5e38ac..f28cf601938d 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include @@ -1136,7 +1137,13 @@ static inline void ClearPageHugeTemporary(struct page *page) page[2].mapping = NULL; } -void free_huge_page(struct page *page) +static struct workqueue_struct *hugetlb_free_page_wq; +struct hugetlb_free_page_work { + struct page *page; + struct work_struct work; +}; + +static void __free_huge_page(struct page *page) { /* * Can't pass hstate in here because it is called from the @@ -1199,6 +1206,36 @@ void free_huge_page(struct page *page) spin_unlock(&hugetlb_lock); } +static void free_huge_page_workfn(struct work_struct *work) +{ + struct page *page; + + page = container_of(work, struct hugetlb_free_page_work, work)->page; + __free_huge_page(page); +} + +void free_huge_page(struct page *page) +{ + if (unlikely(in_interrupt())) { + /* + * While uncommon, free_huge_page() can be at least + * called from softirq context, defer freeing such + * that the hugetlb_lock and spool->lock need not have + * to deal with irq dances just for this. + */ + struct hugetlb_free_page_work work; + + work.page = page; + INIT_WORK_ONSTACK(&work.work, free_huge_page_workfn); + queue_work(hugetlb_free_page_wq, &work.work); + + /* wait until the huge page freeing is done */ + flush_work(&work.work); + destroy_work_on_stack(&work.work); + } else + __free_huge_page(page); +} + static void prep_new_huge_page(struct hstate *h, struct page *page, int nid) { INIT_LIST_HEAD(&page->lru); @@ -2816,6 +2853,12 @@ static int __init hugetlb_init(void) for (i = 0; i < num_fault_mutexes; i++) mutex_init(&hugetlb_fault_mutex_table[i]); + + hugetlb_free_page_wq = alloc_workqueue("hugetlb_free_page_wq", + WQ_MEM_RECLAIM, 0); + if (!hugetlb_free_page_wq) + return -ENOMEM; + return 0; } subsys_initcall(hugetlb_init);