From patchwork Wed Oct 23 17:05:04 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Shi X-Patchwork-Id: 11207247 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F1D5E112C for ; Wed, 23 Oct 2019 17:05:22 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C865F21928 for ; Wed, 23 Oct 2019 17:05:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C865F21928 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 09AAE6B0003; Wed, 23 Oct 2019 13:05:22 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 04D796B0006; Wed, 23 Oct 2019 13:05:22 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EA3EF6B0007; Wed, 23 Oct 2019 13:05:21 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0220.hostedemail.com [216.40.44.220]) by kanga.kvack.org (Postfix) with ESMTP id C79DB6B0003 for ; Wed, 23 Oct 2019 13:05:21 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with SMTP id 7054B824999B for ; Wed, 23 Oct 2019 17:05:21 +0000 (UTC) X-FDA: 76075675242.09.sofa73_2e98d06d4901b X-Spam-Summary: 30,2,0,fe7fa707ae200b93,d41d8cd98f00b204,yang.shi@linux.alibaba.com,:hughd@google.com:aarcange@redhat.com:kirill.shutemov@linux.intel.com:gavin.dg@linux.alibaba.com:akpm@linux-foundation.org:yang.shi@linux.alibaba.com::linux-kernel@vger.kernel.org:stable@vger.kernel.org,RULES_HIT:41:355:379:541:800:960:968:973:988:989:1260:1261:1345:1431:1437:1535:1544:1711:1730:1747:1777:1792:2194:2199:2393:2553:2559:2562:3138:3139:3140:3141:3142:3354:3865:3866:3867:3868:3870:3871:3872:3874:4250:4321:4605:4837:5007:6119:6120:6261:7901:7903:8603:8957:10010:11026:11638:11639:11658:11914:12043:12048:12291:12295:12296:12297:12438:12555:12663:12679:12740:12895:12986:13161:13221:13229:13870:14181:14721:21060:21080:21220:21451:21627:30012:30034:30054:30056:30070:30090,0,RBL:115.124.30.42:@linux.alibaba.com:.lbl8.mailshell.net-62.20.2.100 64.201.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:1:0,LFtime:25,LUA _SUMMARY X-HE-Tag: sofa73_2e98d06d4901b X-Filterd-Recvd-Size: 5526 Received: from out30-42.freemail.mail.aliyun.com (out30-42.freemail.mail.aliyun.com [115.124.30.42]) by imf42.hostedemail.com (Postfix) with ESMTP for ; Wed, 23 Oct 2019 17:05:19 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R451e4;CH=green;DM=||false|;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01f04446;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=9;SR=0;TI=SMTPD_---0Tg.1n3m_1571850304; Received: from e19h19392.et15sqa.tbsite.net(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0Tg.1n3m_1571850304) by smtp.aliyun-inc.com(127.0.0.1); Thu, 24 Oct 2019 01:05:15 +0800 From: Yang Shi To: hughd@google.com, aarcange@redhat.com, kirill.shutemov@linux.intel.com, gavin.dg@linux.alibaba.com, akpm@linux-foundation.org Cc: yang.shi@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: [v2 PATCH] mm: thp: handle page cache THP correctly in PageTransCompoundMap Date: Thu, 24 Oct 2019 01:05:04 +0800 Message-Id: <1571850304-82802-1-git-send-email-yang.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We have usecase to use tmpfs as QEMU memory backend and we would like to take the advantage of THP as well. But, our test shows the EPT is not PMD mapped even though the underlying THP are PMD mapped on host. The number showed by /sys/kernel/debug/kvm/largepage is much less than the number of PMD mapped shmem pages as the below: 7f2778200000-7f2878200000 rw-s 00000000 00:14 262232 /dev/shm/qemu_back_mem.mem.Hz2hSf (deleted) Size: 4194304 kB [snip] AnonHugePages: 0 kB ShmemPmdMapped: 579584 kB [snip] Locked: 0 kB cat /sys/kernel/debug/kvm/largepages 12 And some benchmarks do worse than with anonymous THPs. By digging into the code we figured out that commit 127393fbe597 ("mm: thp: kvm: fix memory corruption in KVM with THP enabled") checks if there is a single PTE mapping on the page for anonymous THP when setting up EPT map. But, the _mapcount < 0 check doesn't fit to page cache THP since every subpage of page cache THP would get _mapcount inc'ed once it is PMD mapped, so PageTransCompoundMap() always returns false for page cache THP. This would prevent KVM from setting up PMD mapped EPT entry. So we need handle page cache THP correctly. However, when page cache THP's PMD gets split, kernel just remove the map instead of setting up PTE map like what anonymous THP does. Before KVM calls get_user_pages() the subpages may get PTE mapped even though it is still a THP since the page cache THP may be mapped by other processes at the mean time. Checking its _mapcount and whether the THP has PTE mapped or not. Although this may report some false negative cases (PTE mapped by other processes), it looks not trivial to make this accurate. With this fix /sys/kernel/debug/kvm/largepage would show reasonable pages are PMD mapped by EPT as the below: 7fbeaee00000-7fbfaee00000 rw-s 00000000 00:14 275464 /dev/shm/qemu_back_mem.mem.SKUvat (deleted) Size: 4194304 kB [snip] AnonHugePages: 0 kB ShmemPmdMapped: 557056 kB [snip] Locked: 0 kB cat /sys/kernel/debug/kvm/largepages 271 And the benchmarks are as same as anonymous THPs. Signed-off-by: Yang Shi Reported-by: Gang Deng Tested-by: Gang Deng Suggested-by: Hugh Dickins Cc: Andrea Arcangeli Cc: Kirill A. Shutemov Cc: 4.8+ --- v2: Adopted the suggestion from Hugh to use _mapcount and compound_mapcount. But I just open coding compound_mapcount to avoid duplicating the definition of compound_mapcount_ptr in two different files. Since "compound_mapcount" looks self-explained so I'm supposed the open coding would not affect the readability. include/linux/page-flags.h | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index f91cb88..954a877 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -622,12 +622,30 @@ static inline int PageTransCompound(struct page *page) * * Unlike PageTransCompound, this is safe to be called only while * split_huge_pmd() cannot run from under us, like if protected by the - * MMU notifier, otherwise it may result in page->_mapcount < 0 false + * MMU notifier, otherwise it may result in page->_mapcount check false * positives. + * + * We have to treat page cache THP differently since every subpage of it + * would get _mapcount inc'ed once it is PMD mapped. But, it may be PTE + * mapped in the current process so comparing subpage's _mapcount to + * compound_mapcount ot filter out PTE mapped case. */ static inline int PageTransCompoundMap(struct page *page) { - return PageTransCompound(page) && atomic_read(&page->_mapcount) < 0; + struct page *head; + int map_count; + + if (!PageTransCompound(page)) + return 0; + + if (PageAnon(page)) + return atomic_read(&page->_mapcount) < 0; + + head = compound_head(page); + map_count = atomic_read(&page->_mapcount); + /* File THP is PMD mapped and not double mapped */ + return map_count >= 0 && + map_count == atomic_read(&head[1].compound_mapcount); } /*