From patchwork Fri Sep 4 11:31:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Adalbert_Laz=C4=83r?= X-Patchwork-Id: 11756761 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B9E67138C for ; Fri, 4 Sep 2020 11:31:14 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7AC0C206B7 for ; Fri, 4 Sep 2020 11:31:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7AC0C206B7 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bitdefender.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C40F1900002; Fri, 4 Sep 2020 07:31:09 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id BED228E0003; Fri, 4 Sep 2020 07:31:09 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A914D900002; Fri, 4 Sep 2020 07:31:09 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0100.hostedemail.com [216.40.44.100]) by kanga.kvack.org (Postfix) with ESMTP id 7B1E88E0003 for ; Fri, 4 Sep 2020 07:31:09 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 34111181AC9C6 for ; Fri, 4 Sep 2020 11:31:09 +0000 (UTC) X-FDA: 77225162658.24.desk38_1d11326270b1 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin24.hostedemail.com (Postfix) with ESMTP id 03B661A4A0 for ; Fri, 4 Sep 2020 11:31:08 +0000 (UTC) X-Spam-Summary: 1,0,0,08ed7fd8a2f325cb,d41d8cd98f00b204,alazar@bitdefender.com,,RULES_HIT:1:2:41:69:152:355:379:800:960:966:968:973:988:989:1260:1261:1277:1311:1313:1314:1345:1359:1431:1437:1515:1516:1518:1593:1594:1605:1676:1730:1747:1777:1792:2196:2199:2393:2559:2562:2898:3138:3139:3140:3141:3142:3369:3865:3866:3867:3870:3874:4051:4250:4321:4385:4605:5007:6119:6120:6261:6742:7576:7901:7903:8957:9592:10004:11026:11473:11658:11914:12043:12219:12291:12296:12297:12438:12517:12519:12555:12679:12683:12986:13149:13230:13255:13894:14394:14659:21080:21088:21324:21451:21611:21627:21990:30054:30070:30074,0,RBL:91.199.104.161:@bitdefender.com:.lbl8.mailshell.net-64.100.201.201 62.2.8.100;04y8reextsjurn3qfqj9um1x555duyphcyfob3cjdmo6kzdgz1yjda4w3nosz1f.r8ntf39agn7rhjqqe6m9nfi78xt4dr6y6wagyywrzrc1tfhwxu7acajz5r3ajy7.1-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:25,LUA_SU MMARY:no X-HE-Tag: desk38_1d11326270b1 X-Filterd-Recvd-Size: 10279 Received: from mx01.bbu.dsd.mx.bitdefender.com (mx01.bbu.dsd.mx.bitdefender.com [91.199.104.161]) by imf19.hostedemail.com (Postfix) with ESMTP for ; Fri, 4 Sep 2020 11:31:08 +0000 (UTC) Received: from smtp.bitdefender.com (smtp01.buh.bitdefender.com [10.17.80.75]) by mx01.bbu.dsd.mx.bitdefender.com (Postfix) with ESMTPS id CD72B30747C8; Fri, 4 Sep 2020 14:31:06 +0300 (EEST) Received: from localhost.localdomain (unknown [195.189.155.252]) by smtp.bitdefender.com (Postfix) with ESMTPSA id 041833072787; Fri, 4 Sep 2020 14:31:05 +0300 (EEST) From: =?utf-8?q?Adalbert_Laz=C4=83r?= To: linux-mm@kvack.org Cc: linux-api@vger.kernel.org, Andrew Morton , Alexander Graf , Stefan Hajnoczi , Jerome Glisse , Paolo Bonzini , =?utf-8?q?Mihai_Don=C8=9Bu?= , Mircea Cirjaliu , Andy Lutomirski , Arnd Bergmann , Sargun Dhillon , Aleksa Sarai , Oleg Nesterov , Jann Horn , Kees Cook , Matthew Wilcox , Christian Brauner , =?utf-8?q?Adalbert_Laz?= =?utf-8?q?=C4=83r?= Subject: [RESEND RFC PATCH 2/5] mm: let the VMA decide how zap_pte_range() acts on mapped pages Date: Fri, 4 Sep 2020 14:31:13 +0300 Message-Id: <20200904113116.20648-3-alazar@bitdefender.com> In-Reply-To: <20200904113116.20648-1-alazar@bitdefender.com> References: <20200904113116.20648-1-alazar@bitdefender.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 03B661A4A0 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Mircea Cirjaliu Instead of having one big function to handle all cases of page unmapping, have multiple implementation-defined callbacks, each for its own VMA type. In the future, exotic VMA implementations won't have to bloat the unique zapping function with another case of mappings. Signed-off-by: Mircea Cirjaliu Signed-off-by: Adalbert Lazăr --- include/linux/mm.h | 16 ++++ mm/memory.c | 182 +++++++++++++++++++++++++-------------------- 2 files changed, 116 insertions(+), 82 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 1be4482a7b81..39e55467aa49 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -36,6 +36,7 @@ struct file_ra_state; struct user_struct; struct writeback_control; struct bdi_writeback; +struct zap_details; void init_mm_internals(void); @@ -601,6 +602,14 @@ struct vm_operations_struct { */ struct page *(*find_special_page)(struct vm_area_struct *vma, unsigned long addr); + + /* + * Called by zap_pte_range() for use by special VMAs that implement + * custom zapping behavior. + */ + int (*zap_pte)(struct vm_area_struct *vma, unsigned long addr, + pte_t *pte, int rss[], struct mmu_gather *tlb, + struct zap_details *details); }; static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm) @@ -1594,6 +1603,13 @@ static inline bool can_do_mlock(void) { return false; } extern int user_shm_lock(size_t, struct user_struct *); extern void user_shm_unlock(size_t, struct user_struct *); +/* + * Flags returned by zap_pte implementations + */ +#define ZAP_PTE_CONTINUE 0 +#define ZAP_PTE_FLUSH (1 << 0) /* Ask for TLB flush. */ +#define ZAP_PTE_BREAK (1 << 1) /* Break PTE iteration. */ + /* * Parameter block passed down to zap_pte_range in exceptional cases. */ diff --git a/mm/memory.c b/mm/memory.c index 8e78fb151f8f..a225bfd01417 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1031,18 +1031,109 @@ int copy_page_range(struct mm_struct *dst_mm, struct mm_struct *src_mm, return ret; } +static int zap_pte_common(struct vm_area_struct *vma, unsigned long addr, + pte_t *pte, int rss[], struct mmu_gather *tlb, + struct zap_details *details) +{ + struct mm_struct *mm = tlb->mm; + pte_t ptent = *pte; + swp_entry_t entry; + int flags = 0; + + if (pte_present(ptent)) { + struct page *page; + + page = vm_normal_page(vma, addr, ptent); + if (unlikely(details) && page) { + /* + * unmap_shared_mapping_pages() wants to + * invalidate cache without truncating: + * unmap shared but keep private pages. + */ + if (details->check_mapping && + details->check_mapping != page_rmapping(page)) + return 0; + } + ptent = ptep_get_and_clear_full(mm, addr, pte, tlb->fullmm); + tlb_remove_tlb_entry(tlb, pte, addr); + if (unlikely(!page)) + return 0; + + if (!PageAnon(page)) { + if (pte_dirty(ptent)) { + flags |= ZAP_PTE_FLUSH; + set_page_dirty(page); + } + if (pte_young(ptent) && + likely(!(vma->vm_flags & VM_SEQ_READ))) + mark_page_accessed(page); + } + rss[mm_counter(page)]--; + page_remove_rmap(page, false); + if (unlikely(page_mapcount(page) < 0)) + print_bad_pte(vma, addr, ptent, page); + if (unlikely(__tlb_remove_page(tlb, page))) + flags |= ZAP_PTE_FLUSH | ZAP_PTE_BREAK; + return flags; + } + + entry = pte_to_swp_entry(ptent); + if (non_swap_entry(entry) && is_device_private_entry(entry)) { + struct page *page = device_private_entry_to_page(entry); + + if (unlikely(details && details->check_mapping)) { + /* + * unmap_shared_mapping_pages() wants to + * invalidate cache without truncating: + * unmap shared but keep private pages. + */ + if (details->check_mapping != page_rmapping(page)) + return 0; + } + + pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); + rss[mm_counter(page)]--; + page_remove_rmap(page, false); + put_page(page); + return 0; + } + + /* If details->check_mapping, we leave swap entries. */ + if (unlikely(details)) + return 0; + + if (!non_swap_entry(entry)) + rss[MM_SWAPENTS]--; + else if (is_migration_entry(entry)) { + struct page *page; + + page = migration_entry_to_page(entry); + rss[mm_counter(page)]--; + } + if (unlikely(!free_swap_and_cache(entry))) + print_bad_pte(vma, addr, ptent, NULL); + pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); + + return flags; +} + static unsigned long zap_pte_range(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, unsigned long end, struct zap_details *details) { struct mm_struct *mm = tlb->mm; - int force_flush = 0; + int flags = 0; int rss[NR_MM_COUNTERS]; spinlock_t *ptl; pte_t *start_pte; pte_t *pte; - swp_entry_t entry; + + int (*zap_pte)(struct vm_area_struct *vma, unsigned long addr, + pte_t *pte, int rss[], struct mmu_gather *tlb, + struct zap_details *details) = zap_pte_common; + if (vma->vm_ops && vma->vm_ops->zap_pte) + zap_pte = vma->vm_ops->zap_pte; tlb_change_page_size(tlb, PAGE_SIZE); again: @@ -1058,92 +1149,19 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, if (!zap_is_atomic(details) && need_resched()) break; - - if (pte_present(ptent)) { - struct page *page; - - page = vm_normal_page(vma, addr, ptent); - if (unlikely(details) && page) { - /* - * unmap_shared_mapping_pages() wants to - * invalidate cache without truncating: - * unmap shared but keep private pages. - */ - if (details->check_mapping && - details->check_mapping != page_rmapping(page)) - continue; - } - ptent = ptep_get_and_clear_full(mm, addr, pte, - tlb->fullmm); - tlb_remove_tlb_entry(tlb, pte, addr); - if (unlikely(!page)) - continue; - - if (!PageAnon(page)) { - if (pte_dirty(ptent)) { - force_flush = 1; - set_page_dirty(page); - } - if (pte_young(ptent) && - likely(!(vma->vm_flags & VM_SEQ_READ))) - mark_page_accessed(page); - } - rss[mm_counter(page)]--; - page_remove_rmap(page, false); - if (unlikely(page_mapcount(page) < 0)) - print_bad_pte(vma, addr, ptent, page); - if (unlikely(__tlb_remove_page(tlb, page))) { - force_flush = 1; - addr += PAGE_SIZE; - break; - } - continue; - } - - entry = pte_to_swp_entry(ptent); - if (non_swap_entry(entry) && is_device_private_entry(entry)) { - struct page *page = device_private_entry_to_page(entry); - - if (unlikely(details && details->check_mapping)) { - /* - * unmap_shared_mapping_pages() wants to - * invalidate cache without truncating: - * unmap shared but keep private pages. - */ - if (details->check_mapping != - page_rmapping(page)) - continue; - } - - pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); - rss[mm_counter(page)]--; - page_remove_rmap(page, false); - put_page(page); - continue; + if (flags & ZAP_PTE_BREAK) { + flags &= ~ZAP_PTE_BREAK; + break; } - /* If details->check_mapping, we leave swap entries. */ - if (unlikely(details)) - continue; - - if (!non_swap_entry(entry)) - rss[MM_SWAPENTS]--; - else if (is_migration_entry(entry)) { - struct page *page; - - page = migration_entry_to_page(entry); - rss[mm_counter(page)]--; - } - if (unlikely(!free_swap_and_cache(entry))) - print_bad_pte(vma, addr, ptent, NULL); - pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); + flags |= zap_pte(vma, addr, pte, rss, tlb, details); } while (pte++, addr += PAGE_SIZE, addr != end); add_mm_rss_vec(mm, rss); arch_leave_lazy_mmu_mode(); /* Do the actual TLB flush before dropping ptl */ - if (force_flush) + if (flags & ZAP_PTE_FLUSH) tlb_flush_mmu_tlbonly(tlb); pte_unmap_unlock(start_pte, ptl); @@ -1153,8 +1171,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, * entries before releasing the ptl), free the batched * memory too. Restart if we didn't do everything. */ - if (force_flush) { - force_flush = 0; + if (flags & ZAP_PTE_FLUSH) { + flags &= ~ZAP_PTE_FLUSH; tlb_flush_mmu(tlb); }