From patchwork Fri Feb 14 19:29:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Minchan Kim X-Patchwork-Id: 11383169 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 69E0092A for ; Fri, 14 Feb 2020 19:29:59 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2A47C24650 for ; Fri, 14 Feb 2020 19:29:59 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="qFDWtMbF" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2A47C24650 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 401976B066C; Fri, 14 Feb 2020 14:29:58 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3B20A6B066D; Fri, 14 Feb 2020 14:29:58 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2C8466B066E; Fri, 14 Feb 2020 14:29:58 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0229.hostedemail.com [216.40.44.229]) by kanga.kvack.org (Postfix) with ESMTP id 154E66B066C for ; Fri, 14 Feb 2020 14:29:58 -0500 (EST) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id AF59B180AD817 for ; Fri, 14 Feb 2020 19:29:57 +0000 (UTC) X-FDA: 76489722834.18.taste82_56e583ddaa717 X-Spam-Summary: 2,0,0,6088ddbb4ffb6b06,d41d8cd98f00b204,minchan.kim@gmail.com,:akpm@linux-foundation.org::linux-kernel@vger.kernel.org:jack@suse.cz:willy@infradead.org:josef@toxicpanda.com:hannes@cmpxchg.org:minchan@kernel.org,RULES_HIT:41:355:379:541:800:960:968:973:982:988:989:1260:1311:1314:1345:1437:1515:1535:1544:1711:1730:1747:1777:1792:1801:2393:2559:2562:2693:2898:2911:3138:3139:3140:3141:3142:3354:3622:3865:3866:3867:3868:3870:3871:3872:3874:4118:4425:4605:5007:6261:6653:7903:8957:9121:9149:9592:10004:10226:11026:11232:11233:11473:11658:11914:12043:12291:12296:12297:12438:12517:12519:12555:12679:12895:13153:13228:13894:14096:14181:14394:14721:21080:21444:21451:21627:21990:30003:30054:30070:30090,0,RBL:209.85.216.65:@gmail.com:.lbl8.mailshell.net-62.50.0.100 66.100.201.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: taste82_56e583ddaa717 X-Filterd-Recvd-Size: 7151 Received: from mail-pj1-f65.google.com (mail-pj1-f65.google.com [209.85.216.65]) by imf33.hostedemail.com (Postfix) with ESMTP for ; Fri, 14 Feb 2020 19:29:57 +0000 (UTC) Received: by mail-pj1-f65.google.com with SMTP id 12so4319028pjb.5 for ; Fri, 14 Feb 2020 11:29:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=4QEs+yW3UJgHk9TnX81qWdQWQN2YyvEyIUmkldm3VYU=; b=qFDWtMbFWV7ozT4Nt7HSo4WsK/i4G3BX8JPMEjGMkPKIohuubmhqmM4QOgS6YtjwMX olaSiH/8vVklHCOL7IKKjZhDAxwG1sex7a2lh3iwIjasuRhPxTl7ldIvytD7ajiBoGw5 mK+BdU1y2G1O0oqPX6SNDr5UpbjHMS2+OwFDmofiAaVIXOY/lRVcn5p+yJAat9izqZTr HwCNV319rhhBPZWwRHFFBHnYjZH/qydj185qpZcVnDIK2pr+lBO74/6gC4zdovi+88S9 nnWjQ5TFDwLXFm1b/hDpNLmnRXozFCREjN16xunLu0qZFvRwcuv6PdP1dGP7w71J3Kr5 i9Zg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :mime-version:content-transfer-encoding; bh=4QEs+yW3UJgHk9TnX81qWdQWQN2YyvEyIUmkldm3VYU=; b=WrXBh/kKbG+gSNPL/F4ZxPTU4ssUs/s70CABvi6bTJSC0EmFNn4u9GnQWTNUmC1+1e 3AStqtdsYP4r2i7wg48Wty5Wqjw5bARVJtqe/DxHardAPs4OOWOsLT4xuwqk+A6OLZYP qkdpzkbGdfmi9XP10o2osP1raitnd8tADsl0gIVEGfxPU4+7vjc3KgkWZc8ViO/PAZuH 5QJb0CPo/nwm1XgIFOiNlPeT/LXDjNv/pHIOM5PcTa6zKBj/kz7i8yJEVAmhCWiRFW7s sV3FQSSnmfPVo+WqCIvb1e3nLshXHMbQtVW35cNzXqwwngQPxXVzfMAJnb6aWwAqEefl 00AA== X-Gm-Message-State: APjAAAVOlIuNKeMNGY3A7+poCKhpKMJwkFzjVZfOagYpz3WA5CNGi6Yh oP0GRYqwkE+gsqr6EtWCGgU= X-Google-Smtp-Source: APXvYqw5Tl7xnpOM57l7YLJPyjDNOhD2V/+YRSknZV/B1cYBDS24ZFUTK+re3jnZ5PzqAWMesQxo7g== X-Received: by 2002:a17:902:45:: with SMTP id 63mr4891442pla.109.1581708595875; Fri, 14 Feb 2020 11:29:55 -0800 (PST) Received: from bbox-1.mtv.corp.google.com ([2620:15c:211:1:3e01:2939:5992:52da]) by smtp.gmail.com with ESMTPSA id d4sm7219795pjz.12.2020.02.14.11.29.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Feb 2020 11:29:54 -0800 (PST) From: Minchan Kim To: Andrew Morton Cc: linux-mm , LKML , Jan Kara , Matthew Wilcox , Josef Bacik , Johannes Weiner , Minchan Kim Subject: [PATCH v2 1/2] mm: make PageReadahead more strict Date: Fri, 14 Feb 2020 11:29:50 -0800 Message-Id: <20200214192951.29430-1-minchan@kernel.org> X-Mailer: git-send-email 2.25.0.265.gbab2e86ba0-goog MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Recently, I got some bugreports major page fault takes several seconds sometime. When I review drop mmap_sem logic, I found several bugs. CPU 1 CPU 2 mm_populate for () .. ret = populate_vma_page_range __get_user_pages faultin_page handle_mm_fault filemap_fault do_async_mmap_readahead shrink_page_list pageout SetPageReclaim(=SetPageReadahead) writepage SetPageWriteback if (PageReadahead(page)) maybe_unlock_mmap_for_io up_read(mmap_sem) page_cache_async_readahead() if (PageWriteback(page)) return; Here, since ret from populate_vma_page_range is zero, the loop continue to run with same address with previous iteration. It will repeat the loop until the page's writeout is done(ie, PG_writeback or PG_reclaim is clear). We could fix the above specific case via adding PageWriteback ret = populate_vma_page_range ... ... filemap_fault do_async_mmap_readahead if (!PageWriteback(page) && PageReadahead(page)) maybe_unlock_mmap_for_io up_read(mmap_sem) page_cache_async_readahead() if (PageWriteback(page)) return; Furthermore, to prevent potential issues caused by sharing PG_readahead with PG_reclaim, let's make page flag wrapper for PageReadahead with description. With that, we could remove PageWriteback check in page_cache_async_readahead, which is more clear for maintenance/ readability. Fixes: 6b4c9f446981 ("filemap: drop the mmap_sem for all blocking operations") Signed-off-by: Minchan Kim --- include/linux/page-flags.h | 28 ++++++++++++++++++++++++++-- mm/readahead.c | 6 ------ 2 files changed, 26 insertions(+), 8 deletions(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 1bf83c8fcaa7..f91a9b2a49bd 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -363,8 +363,32 @@ PAGEFLAG(MappedToDisk, mappedtodisk, PF_NO_TAIL) /* PG_readahead is only used for reads; PG_reclaim is only for writes */ PAGEFLAG(Reclaim, reclaim, PF_NO_TAIL) TESTCLEARFLAG(Reclaim, reclaim, PF_NO_TAIL) -PAGEFLAG(Readahead, reclaim, PF_NO_COMPOUND) - TESTCLEARFLAG(Readahead, reclaim, PF_NO_COMPOUND) + +SETPAGEFLAG(Readahead, reclaim, PF_NO_COMPOUND) +CLEARPAGEFLAG(Readahead, reclaim, PF_NO_COMPOUND) + +/* + * Since PG_readahead is shared with PG_reclaim of the page flags, + * PageReadahead should double check whether it's readahead marker + * or PG_reclaim. It could be done by PageWriteback check because + * PG_reclaim is always with PG_writeback. + */ +static inline int PageReadahead(struct page *page) +{ + VM_BUG_ON_PGFLAGS(PageCompound(page), page); + + return (page->flags & (1UL << PG_reclaim | 1UL << PG_writeback)) == + (1UL << PG_reclaim); +} + +/* Clear PG_readahead only if it's PG_readahead, not PG_reclaim */ +static inline int TestClearPageReadahead(struct page *page) +{ + VM_BUG_ON_PGFLAGS(PageCompound(page), page); + + return !PageWriteback(page) || + test_and_clear_bit(PG_reclaim, &page->flags); +} #ifdef CONFIG_HIGHMEM /* diff --git a/mm/readahead.c b/mm/readahead.c index 2fe72cd29b47..85b15e5a1d7b 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -553,12 +553,6 @@ page_cache_async_readahead(struct address_space *mapping, if (!ra->ra_pages) return; - /* - * Same bit is used for PG_readahead and PG_reclaim. - */ - if (PageWriteback(page)) - return; - ClearPageReadahead(page); /* From patchwork Fri Feb 14 19:29:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Minchan Kim X-Patchwork-Id: 11383171 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D989A13A4 for ; Fri, 14 Feb 2020 19:30:01 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9BA9A222C4 for ; Fri, 14 Feb 2020 19:30:01 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="HbbNon0J" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9BA9A222C4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 27B836B066D; Fri, 14 Feb 2020 14:30:00 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 22BE66B066F; Fri, 14 Feb 2020 14:30:00 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 142066B0670; Fri, 14 Feb 2020 14:30:00 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id F17FE6B066D for ; Fri, 14 Feb 2020 14:29:59 -0500 (EST) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 8A958180AD817 for ; Fri, 14 Feb 2020 19:29:59 +0000 (UTC) X-FDA: 76489722918.28.grip34_572b430b0a940 X-Spam-Summary: 2,0,0,2b3d6c9837e8e8a6,d41d8cd98f00b204,minchan.kim@gmail.com,:akpm@linux-foundation.org::linux-kernel@vger.kernel.org:jack@suse.cz:willy@infradead.org:josef@toxicpanda.com:hannes@cmpxchg.org:minchan@kernel.org,RULES_HIT:41:355:379:541:800:960:973:988:989:1260:1311:1314:1345:1359:1437:1515:1535:1544:1711:1730:1747:1777:1792:1801:2393:2553:2559:2562:2898:3138:3139:3140:3141:3142:3355:3622:3865:3866:3867:3868:3871:3872:3874:4119:4321:4605:5007:6261:6653:7875:7903:8957:9121:9149:10004:11026:11233:11473:11658:11914:12043:12295:12296:12297:12438:12517:12519:12555:12679:12895:13894:14096:14181:14394:14721:21080:21324:21444:21451:21627:21987:21990:30012:30034:30054:30070:30090,0,RBL:209.85.215.195:@gmail.com:.lbl8.mailshell.net-62.18.0.100 66.100.201.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: grip34_572b430b0a940 X-Filterd-Recvd-Size: 8007 Received: from mail-pg1-f195.google.com (mail-pg1-f195.google.com [209.85.215.195]) by imf45.hostedemail.com (Postfix) with ESMTP for ; Fri, 14 Feb 2020 19:29:58 +0000 (UTC) Received: by mail-pg1-f195.google.com with SMTP id b9so5295179pgk.12 for ; Fri, 14 Feb 2020 11:29:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=7eUMgQaKaihKgMoljstQiDj/fqDJCSe3MTjoLlLm6K8=; b=HbbNon0JXPwzqtzf+TMmfozNLvk2sykTlLtEXI+26GH1OYbdkXNgi6o7hpBOxidwOB KFFKv3yVPme1VXg0ifx4fLw0VYUqou0zJ8xp59IxrVnuUaSRbvz2wYFzBQo5UaNRyhjC R4GSCSFJHhxSq+IVen3hu7jDW7PZhKBnVDf7sNOwWIYA0PNocC/J/o5+wvkK75BWM8kk PHsUODEcORexiHE2806/gn0jETf1I/15l+7lXOiU48cNY8wDBzvDlc/bB4ss1/ciKRKW 8syOy6ehxmQ34sZC+ogIHfRiWdYCiZLiLHXdxnIzIhwvAysEpXdIqFbDnChhPZjXatTy 04sw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=7eUMgQaKaihKgMoljstQiDj/fqDJCSe3MTjoLlLm6K8=; b=WDhFbF0E1RjBzpSmeuhSWJJ0o9nQFc2kzgoMgkApijDXwy0lXngwOofQQc9eiRG/ia M0CyofdRrd3iZCpV1jyUUuooioZG45IVGMUU3TSAiFFEWy8YG7hcf9s5lrYkS59qgMOv 3vCtYgoEIMsWab2kLAxiK4kOEq3v22otvEr3Z019266a59JINNHh22qBOX6FZXxLCoPs kX28TtgA8UN4coahIcfQVK2qfdyP8lVVMvPXna25yr1PF/inUgQxd08ci3M4JmBRGjd9 JVMHBGZQWfb6lBMTyyJtE/ezCSxloR60+0t+gLr7Pl0ZGC4yClOFm+Lpvwiq5bMWR8tI 9h/A== X-Gm-Message-State: APjAAAUJfR9tHy0LjHAuBmrh7siFLDYqVmgkTTvWC8D88kgXBJdm7WgB laxvnTR4HbzfpMeybNYzv8g= X-Google-Smtp-Source: APXvYqyPF4iOUKloQHTVRqBKuM+cBSu+Va0Xza4+5VjIfvLUiAOKoGEM9GatToa/1WIUHDsKygU70A== X-Received: by 2002:a63:4823:: with SMTP id v35mr4960198pga.177.1581708597756; Fri, 14 Feb 2020 11:29:57 -0800 (PST) Received: from bbox-1.mtv.corp.google.com ([2620:15c:211:1:3e01:2939:5992:52da]) by smtp.gmail.com with ESMTPSA id d4sm7219795pjz.12.2020.02.14.11.29.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Feb 2020 11:29:56 -0800 (PST) From: Minchan Kim To: Andrew Morton Cc: linux-mm , LKML , Jan Kara , Matthew Wilcox , Josef Bacik , Johannes Weiner , Minchan Kim Subject: [PATCH v2 2/2] mm: fix long time stall from mm_populate Date: Fri, 14 Feb 2020 11:29:51 -0800 Message-Id: <20200214192951.29430-2-minchan@kernel.org> X-Mailer: git-send-email 2.25.0.265.gbab2e86ba0-goog In-Reply-To: <20200214192951.29430-1-minchan@kernel.org> References: <20200214192951.29430-1-minchan@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Basically, fault handler releases mmap_sem before requesting readahead and then it is supposed to retry lookup the page from page cache with FAULT_FLAG_TRIED so that it avoids the live lock of infinite retry. However, what happens if the fault handler find a page from page cache and the page has readahead marker but are waiting under writeback? Plus one more condition, it happens under mm_populate which repeats faulting unless it encounters error. So let's assemble conditions below. CPU 1 CPU 2 - first loop mm_populate for () .. ret = populate_vma_page_range __get_user_pages faultin_page handle_mm_fault filemap_fault do_async_mmap_readahead if (PageReadahead(pageA)) maybe_unlock_mmap_for_io up_read(mmap_sem) shrink_page_list pageout SetPageReclaim(=SetPageReadahead)(pageA) writepage SetPageWriteback(pageA) page_cache_async_readahead() ClearPageReadahead(pageA) do_async_mmap_readahead lock_page_maybe_drop_mmap goto out_retry the pageA is reclaimed and new pageB is populated to the file offset and finally has become PG_readahead - second loop __get_user_pages faultin_page handle_mm_fault filemap_fault do_async_mmap_readahead if (PageReadahead(pageB)) maybe_unlock_mmap_for_io up_read(mmap_sem) shrink_page_list pageout SetPageReclaim(=SetPageReadahead)(pageB) writepage SetPageWriteback(pageB) page_cache_async_readahead() ClearPageReadahead(pageB) do_async_mmap_readahead lock_page_maybe_drop_mmap goto out_retry It could be repeated forever so it's livelock. without involving reclaim, it could happens if ra_pages become zero by fadvise/other threads who have same fd one doing randome while the other one is sequential because page_cache_async_readahead has following condition check like PageWriteback and ra_pages are never synchrnized with fadvise and shrink_readahead_size_eio from other threads. void page_cache_async_readahead(struct address_space *mapping, unsigned long req_size) { /* no read-ahead */ if (!ra->ra_pages) return; Thus, we need to limit fault retry from mm_populate like page fault handler. Fixes: 6b4c9f446981 ("filemap: drop the mmap_sem for all blocking operations") Reviewed-by: Jan Kara Signed-off-by: Minchan Kim --- mm/gup.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 1b521e0ac1de..6f6548c63ad5 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1133,7 +1133,7 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk, * * This takes care of mlocking the pages too if VM_LOCKED is set. * - * return 0 on success, negative error code on error. + * return number of pages pinned on success, negative error code on error. * * vma->vm_mm->mmap_sem must be held. * @@ -1196,6 +1196,7 @@ int __mm_populate(unsigned long start, unsigned long len, int ignore_errors) struct vm_area_struct *vma = NULL; int locked = 0; long ret = 0; + bool tried = false; end = start + len; @@ -1226,14 +1227,18 @@ int __mm_populate(unsigned long start, unsigned long len, int ignore_errors) * double checks the vma flags, so that it won't mlock pages * if the vma was already munlocked. */ - ret = populate_vma_page_range(vma, nstart, nend, &locked); + ret = populate_vma_page_range(vma, nstart, nend, + tried ? NULL : &locked); if (ret < 0) { if (ignore_errors) { ret = 0; continue; /* continue at next VMA */ } break; - } + } else if (ret == 0) + tried = true; + else + tried = false; nend = nstart + ret * PAGE_SIZE; ret = 0; }