From patchwork Fri Aug 26 06:45:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muhammad Usama Anjum X-Patchwork-Id: 12955587 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B4D0AECAAD2 for ; Fri, 26 Aug 2022 06:46:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245128AbiHZGqT (ORCPT ); Fri, 26 Aug 2022 02:46:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49822 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245096AbiHZGqE (ORCPT ); Fri, 26 Aug 2022 02:46:04 -0400 Received: from madras.collabora.co.uk (madras.collabora.co.uk [IPv6:2a00:1098:0:82:1000:25:2eeb:e5ab]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A4C4AD1E0C; Thu, 25 Aug 2022 23:45:59 -0700 (PDT) Received: from lenovo.Home (unknown [39.53.61.43]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: usama.anjum) by madras.collabora.co.uk (Postfix) with ESMTPSA id 183626601EC7; Fri, 26 Aug 2022 07:45:54 +0100 (BST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1661496358; bh=BJ+PAJ4nk2Sl0y5d43VYaapFmr5r01iee9mww5IRQaA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=epiXSowL2JFvY9+pSjuspUHsOPiZbsNKk4hciCfqcXEFiMVTjoPWAOqGHM1y9UW3H BlFDymydMLlOyu10hibgXlGQq04fMXtkJ2M+x9knz1Ghs9b4FVwQNsHQte1eN70eSs 20iFfSDEiHiAg7S0dh4+YHvddBHSs+0C7xK448/p3Twx6iNfpf6Gm26dSZAaDraPVN 2liwPrICiKOF27w5aNxLD3FKXF7Af6u6d8iTV7qCRV4vHpcobJvEWIIzUFlESIFuml hcg7/yw6efkLVY0FGCxGLf/EBfSAPIyCoUtrQ5+UX7HItqHlRfOO86/+4/6hlVwpA6 Cnt4cMePaX11g== From: Muhammad Usama Anjum To: Jonathan Corbet , Alexander Viro , Andrew Morton , Shuah Khan , linux-doc@vger.kernel.org (open list:DOCUMENTATION), linux-kernel@vger.kernel.org (open list), linux-fsdevel@vger.kernel.org (open list:PROC FILESYSTEM), linux-mm@kvack.org (open list:MEMORY MANAGEMENT), linux-kselftest@vger.kernel.org (open list:KERNEL SELFTEST FRAMEWORK) Cc: Muhammad Usama Anjum , kernel@collabora.com, Gabriel Krisman Bertazi , David Hildenbrand , Peter Enderborg , Greg KH Subject: [PATCH v3 1/4] fs/proc/task_mmu: update functions to clear the soft-dirty PTE bit Date: Fri, 26 Aug 2022 11:45:32 +0500 Message-Id: <20220826064535.1941190-2-usama.anjum@collabora.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220826064535.1941190-1-usama.anjum@collabora.com> References: <20220826064535.1941190-1-usama.anjum@collabora.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Update the clear_soft_dirty() and clear_soft_dirty_pmd() to optionally clear and return the status if page is dirty. Signed-off-by: Muhammad Usama Anjum --- Changes in v2: - Move back the functions back to their original file --- fs/proc/task_mmu.c | 82 ++++++++++++++++++++++++++++------------------ 1 file changed, 51 insertions(+), 31 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 8b4f3073f8f5..f66674033207 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -1095,8 +1095,8 @@ static inline bool pte_is_pinned(struct vm_area_struct *vma, unsigned long addr, return page_maybe_dma_pinned(page); } -static inline void clear_soft_dirty(struct vm_area_struct *vma, - unsigned long addr, pte_t *pte) +static inline bool check_soft_dirty(struct vm_area_struct *vma, + unsigned long addr, pte_t *pte, bool clear) { /* * The soft-dirty tracker uses #PF-s to catch writes @@ -1105,55 +1105,75 @@ static inline void clear_soft_dirty(struct vm_area_struct *vma, * of how soft-dirty works. */ pte_t ptent = *pte; + int dirty = 0; if (pte_present(ptent)) { pte_t old_pte; - if (pte_is_pinned(vma, addr, ptent)) - return; - old_pte = ptep_modify_prot_start(vma, addr, pte); - ptent = pte_wrprotect(old_pte); - ptent = pte_clear_soft_dirty(ptent); - ptep_modify_prot_commit(vma, addr, pte, old_pte, ptent); + dirty = pte_soft_dirty(ptent); + + if (dirty && clear && !pte_is_pinned(vma, addr, ptent)) { + old_pte = ptep_modify_prot_start(vma, addr, pte); + ptent = pte_wrprotect(old_pte); + ptent = pte_clear_soft_dirty(ptent); + ptep_modify_prot_commit(vma, addr, pte, old_pte, ptent); + } } else if (is_swap_pte(ptent)) { - ptent = pte_swp_clear_soft_dirty(ptent); - set_pte_at(vma->vm_mm, addr, pte, ptent); + dirty = pte_swp_soft_dirty(ptent); + + if (dirty && clear) { + ptent = pte_swp_clear_soft_dirty(ptent); + set_pte_at(vma->vm_mm, addr, pte, ptent); + } } + + return !!dirty; } #else -static inline void clear_soft_dirty(struct vm_area_struct *vma, - unsigned long addr, pte_t *pte) +static inline bool check_soft_dirty(struct vm_area_struct *vma, + unsigned long addr, pte_t *pte, bool clear) { + return false; } #endif #if defined(CONFIG_MEM_SOFT_DIRTY) && defined(CONFIG_TRANSPARENT_HUGEPAGE) -static inline void clear_soft_dirty_pmd(struct vm_area_struct *vma, - unsigned long addr, pmd_t *pmdp) +static inline bool check_soft_dirty_pmd(struct vm_area_struct *vma, + unsigned long addr, pmd_t *pmdp, bool clear) { pmd_t old, pmd = *pmdp; + int dirty = 0; if (pmd_present(pmd)) { - /* See comment in change_huge_pmd() */ - old = pmdp_invalidate(vma, addr, pmdp); - if (pmd_dirty(old)) - pmd = pmd_mkdirty(pmd); - if (pmd_young(old)) - pmd = pmd_mkyoung(pmd); - - pmd = pmd_wrprotect(pmd); - pmd = pmd_clear_soft_dirty(pmd); - - set_pmd_at(vma->vm_mm, addr, pmdp, pmd); + dirty = pmd_soft_dirty(pmd); + if (dirty && clear) { + /* See comment in change_huge_pmd() */ + old = pmdp_invalidate(vma, addr, pmdp); + if (pmd_dirty(old)) + pmd = pmd_mkdirty(pmd); + if (pmd_young(old)) + pmd = pmd_mkyoung(pmd); + + pmd = pmd_wrprotect(pmd); + pmd = pmd_clear_soft_dirty(pmd); + + set_pmd_at(vma->vm_mm, addr, pmdp, pmd); + } } else if (is_migration_entry(pmd_to_swp_entry(pmd))) { - pmd = pmd_swp_clear_soft_dirty(pmd); - set_pmd_at(vma->vm_mm, addr, pmdp, pmd); + dirty = pmd_swp_soft_dirty(pmd); + + if (dirty && clear) { + pmd = pmd_swp_clear_soft_dirty(pmd); + set_pmd_at(vma->vm_mm, addr, pmdp, pmd); + } } + return !!dirty; } #else -static inline void clear_soft_dirty_pmd(struct vm_area_struct *vma, - unsigned long addr, pmd_t *pmdp) +static inline bool check_soft_dirty_pmd(struct vm_area_struct *vma, + unsigned long addr, pmd_t *pmdp, bool clear) { + return false; } #endif @@ -1169,7 +1189,7 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr, ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { if (cp->type == CLEAR_REFS_SOFT_DIRTY) { - clear_soft_dirty_pmd(vma, addr, pmd); + check_soft_dirty_pmd(vma, addr, pmd, true); goto out; } @@ -1195,7 +1215,7 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr, ptent = *pte; if (cp->type == CLEAR_REFS_SOFT_DIRTY) { - clear_soft_dirty(vma, addr, pte); + check_soft_dirty(vma, addr, pte, true); continue; } From patchwork Fri Aug 26 06:45:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muhammad Usama Anjum X-Patchwork-Id: 12955589 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ED3EAECAAD8 for ; Fri, 26 Aug 2022 06:46:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245174AbiHZGqX (ORCPT ); Fri, 26 Aug 2022 02:46:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49824 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245108AbiHZGqR (ORCPT ); Fri, 26 Aug 2022 02:46:17 -0400 Received: from madras.collabora.co.uk (madras.collabora.co.uk [46.235.227.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4B3A1D1E2D; Thu, 25 Aug 2022 23:46:03 -0700 (PDT) Received: from lenovo.Home (unknown [39.53.61.43]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: usama.anjum) by madras.collabora.co.uk (Postfix) with ESMTPSA id CF8946601EC8; Fri, 26 Aug 2022 07:45:58 +0100 (BST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1661496362; bh=1jHw5sa9fTu8NOc1dlqFzISRY1ZlcDb3kYV3xpKJeTk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=XfGOdhKFwYd0TiLxAexIHGbGSvw0lmUiYdoKcDv2Ecga3r6OM2J0nZ+xn8dRa9o6H /NDzorWLIIy0X8arNr1NM3PIqLhghL14q7iJouwp0AMDvqr6aV8iN8UzX2H+0XvPhx by0zS13MJKFrMhmuelxfwedd56Rf+BNi/AtbYqsJb1cKpeHJJ9LwKhJmRNijjOZM5P 3x6LrWPI6EPfnPy4bBuxyCcsJVH17Hg7QHcxB8bPqOYFcdct47ysc0/Euj4pAZ71IU Um744KAE7cSeuGbrftvVPQCAhpvhRv9AqHeHbfQaMHe+RptUAzwMKjr9yOjV/g70Sm 2VY2kuxPXtB9w== From: Muhammad Usama Anjum To: Jonathan Corbet , Alexander Viro , Andrew Morton , Shuah Khan , linux-doc@vger.kernel.org (open list:DOCUMENTATION), linux-kernel@vger.kernel.org (open list), linux-fsdevel@vger.kernel.org (open list:PROC FILESYSTEM), linux-mm@kvack.org (open list:MEMORY MANAGEMENT), linux-kselftest@vger.kernel.org (open list:KERNEL SELFTEST FRAMEWORK) Cc: Muhammad Usama Anjum , kernel@collabora.com, Gabriel Krisman Bertazi , David Hildenbrand , Peter Enderborg , Greg KH Subject: [PATCH v3 2/4] fs/proc/task_mmu: Implement IOCTL to get and clear soft dirty PTE bit Date: Fri, 26 Aug 2022 11:45:33 +0500 Message-Id: <20220826064535.1941190-3-usama.anjum@collabora.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220826064535.1941190-1-usama.anjum@collabora.com> References: <20220826064535.1941190-1-usama.anjum@collabora.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This ioctl can be used to watch the process's memory and perform atomic operations which aren't possible through procfs. Three operations have been implemented: - PAGEMAP_SD_GET gets the soft dirty pages in a address range. - PAGEMAP_SD_CLEAR clears the soft dirty bit from dirty pages in a address range. - PAGEMAP_SD_GET_AND_CLEAR gets and clears the soft dirty bit in a address range. struct pagemap_sd_args is used as the argument of the IOCTL. In this struct: - The range is specified through start and len. - The output buffer and size is specified as vec and vec_len. - The flags can be specified in the flags field. Currently only one PAGEMAP_SD_NO_REUSED_REGIONS is supported which can be specified to ignore the VMA dirty flags. This is based on a patch from Gabriel Krisman Bertazi. Signed-off-by: Muhammad Usama Anjum --- Changes in v3: - Tighten the user-kernel interface by using explicit types and add more error checking Changes in v2: - Convert the interface from syscall to ioctl - Remove pidfd support as it doesn't make sense in ioctl --- fs/proc/task_mmu.c | 260 ++++++++++++++++++++++++++++++++++ include/uapi/linux/fs.h | 23 +++ tools/include/uapi/linux/fs.h | 23 +++ 3 files changed, 306 insertions(+) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index f66674033207..33d3d5c2ab40 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -19,6 +19,8 @@ #include #include #include +#include +#include #include #include @@ -1775,11 +1777,269 @@ static int pagemap_release(struct inode *inode, struct file *file) return 0; } +#ifdef CONFIG_MEM_SOFT_DIRTY +#define IS_CLEAR_SD_OP(op) (op == PAGEMAP_SD_CLEAR || op == PAGEMAP_SD_GET_AND_CLEAR) +#define IS_GET_SD_OP(op) (op == PAGEMAP_SD_GET || op == PAGEMAP_SD_GET_AND_CLEAR) +#define PAGEMAP_SD_FLAGS_MASK (PAGEMAP_SD_NO_REUSED_REGIONS) + +struct pagemap_sd_private { + unsigned long start; + __u64 *vec; + unsigned long vec_len; + unsigned long index; + unsigned int op; + unsigned int flags; +}; + +static int pagemap_sd_pmd_entry(pmd_t *pmd, unsigned long addr, + unsigned long end, struct mm_walk *walk) +{ + struct pagemap_sd_private *p = walk->private; + struct vm_area_struct *vma = walk->vma; + unsigned long start = addr; + spinlock_t *ptl; + pte_t *pte; + int dirty; + bool dirty_vma = (p->flags & PAGEMAP_SD_NO_REUSED_REGIONS) ? 0 : + (vma->vm_flags & VM_SOFTDIRTY); + + end = min(end, walk->vma->vm_end); + ptl = pmd_trans_huge_lock(pmd, vma); + if (ptl) { + if (dirty_vma || check_soft_dirty_pmd(vma, addr, pmd, false)) { + /* + * Break huge page into small pages if operation needs to be performed is + * on a portion of the huge page or the return buffer cannot store complete + * data. Then process this PMD as having normal pages. + */ + if ((IS_CLEAR_SD_OP(p->op) && (end - addr < HPAGE_SIZE)) || + (IS_GET_SD_OP(p->op) && (p->index + HPAGE_SIZE/PAGE_SIZE > p->vec_len))) { + spin_unlock(ptl); + split_huge_pmd(vma, pmd, addr); + goto process_smaller_pages; + } else { + dirty = check_soft_dirty_pmd(vma, addr, pmd, IS_CLEAR_SD_OP(p->op)); + if (IS_GET_SD_OP(p->op) && (dirty_vma || dirty)) { + for (; addr != end && p->index < p->vec_len; + addr += PAGE_SIZE) + p->vec[p->index++] = addr - p->start; + } + } + } + spin_unlock(ptl); + return 0; + } + +process_smaller_pages: + if (pmd_trans_unstable(pmd)) + return 0; + + pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + for (; addr != end; pte++, addr += PAGE_SIZE) { + dirty = check_soft_dirty(vma, addr, pte, IS_CLEAR_SD_OP(p->op)); + + if (IS_GET_SD_OP(p->op) && (dirty_vma || dirty)) { + p->vec[p->index++] = addr - p->start; + WARN_ON(p->index > p->vec_len); + } + } + pte_unmap_unlock(pte - 1, ptl); + cond_resched(); + + if (IS_CLEAR_SD_OP(p->op)) + flush_tlb_mm_range(vma->vm_mm, start, end, PAGE_SHIFT, false); + + return 0; +} + +static int pagemap_sd_pte_hole(unsigned long addr, unsigned long end, int depth, + struct mm_walk *walk) +{ + struct pagemap_sd_private *p = walk->private; + struct vm_area_struct *vma = walk->vma; + + if (p->flags & PAGEMAP_SD_NO_REUSED_REGIONS) + return 0; + + if (vma && (vma->vm_flags & VM_SOFTDIRTY) && IS_GET_SD_OP(p->op)) { + for (; addr != end && p->index < p->vec_len; addr += PAGE_SIZE) + p->vec[p->index++] = addr - p->start; + } + + return 0; +} + +static int pagemap_sd_pre_vma(unsigned long start, unsigned long end, struct mm_walk *walk) +{ + struct pagemap_sd_private *p = walk->private; + struct vm_area_struct *vma = walk->vma; + int ret; + unsigned long end_cut = end; + + if (p->flags & PAGEMAP_SD_NO_REUSED_REGIONS) + return 0; + + if (IS_CLEAR_SD_OP(p->op) && (vma->vm_flags & VM_SOFTDIRTY)) { + if (vma->vm_start < start) { + ret = split_vma(vma->vm_mm, vma, start, 1); + if (ret) + return ret; + } + + if (IS_GET_SD_OP(p->op)) + end_cut = min(start + p->vec_len * PAGE_SIZE, end); + + if (vma->vm_end > end_cut) { + ret = split_vma(vma->vm_mm, vma, end_cut, 0); + if (ret) + return ret; + } + } + + return 0; +} + +static void pagemap_sd_post_vma(struct mm_walk *walk) +{ + struct pagemap_sd_private *p = walk->private; + struct vm_area_struct *vma = walk->vma; + + if (p->flags & PAGEMAP_SD_NO_REUSED_REGIONS) + return; + + if (IS_CLEAR_SD_OP(p->op) && (vma->vm_flags & VM_SOFTDIRTY)) { + vma->vm_flags &= ~VM_SOFTDIRTY; + vma_set_page_prot(vma); + } +} + +static int pagemap_sd_pmd_test_walk(unsigned long start, unsigned long end, + struct mm_walk *walk) +{ + struct pagemap_sd_private *p = walk->private; + struct vm_area_struct *vma = walk->vma; + + if (IS_GET_SD_OP(p->op) && (p->index == p->vec_len)) + return -1; + + if (vma->vm_flags & VM_PFNMAP) + return 1; + + return 0; +} + +static const struct mm_walk_ops pagemap_sd_ops = { + .test_walk = pagemap_sd_pmd_test_walk, + .pre_vma = pagemap_sd_pre_vma, + .pmd_entry = pagemap_sd_pmd_entry, + .pte_hole = pagemap_sd_pte_hole, + .post_vma = pagemap_sd_post_vma, +}; + +static long do_pagemap_sd_cmd(struct mm_struct *mm, unsigned int cmd, struct pagemap_sd_args *arg) +{ + struct pagemap_sd_private sd_data; + struct mmu_notifier_range range; + unsigned long start, end; + int ret; + + start = (unsigned long)untagged_addr(arg->start); + if ((!IS_ALIGNED(start, PAGE_SIZE)) || (!access_ok((void __user *)start, arg->len))) + return -EINVAL; + + if (IS_GET_SD_OP(cmd) && + ((arg->vec_len == 0) || (!arg->vec) || (!access_ok((loff_t *)arg->vec, arg->vec_len)))) + return -EINVAL; + + if ((arg->flags & ~PAGEMAP_SD_FLAGS_MASK) || (arg->__reserved)) + return -EINVAL; + + end = start + arg->len; + sd_data.start = start; + sd_data.op = cmd; + sd_data.flags = arg->flags; + sd_data.index = 0; + sd_data.vec_len = arg->vec_len; + + if (IS_GET_SD_OP(cmd)) { + sd_data.vec = vzalloc(arg->vec_len * sizeof(loff_t)); + if (!sd_data.vec) + return -ENOMEM; + } + + if (IS_CLEAR_SD_OP(cmd)) { + mmap_write_lock(mm); + + mmu_notifier_range_init(&range, MMU_NOTIFY_SOFT_DIRTY, 0, NULL, + mm, start, end); + mmu_notifier_invalidate_range_start(&range); + inc_tlb_flush_pending(mm); + } else { + mmap_read_lock(mm); + } + + ret = walk_page_range(mm, start, end, &pagemap_sd_ops, &sd_data); + + if (IS_CLEAR_SD_OP(cmd)) { + mmu_notifier_invalidate_range_end(&range); + dec_tlb_flush_pending(mm); + + mmap_write_unlock(mm); + } else { + mmap_read_unlock(mm); + } + + if (ret < 0) + goto free_sd_data; + + if (IS_GET_SD_OP(cmd)) { + ret = copy_to_user((loff_t *)arg->vec, sd_data.vec, sd_data.index * sizeof(loff_t)); + if (ret) { + ret = -EIO; + goto free_sd_data; + } + ret = sd_data.index; + } else { + ret = 0; + } + +free_sd_data: + if (IS_GET_SD_OP(cmd)) + vfree(sd_data.vec); + + return ret; +} + +static long pagemap_sd_ioctl(struct file *file, unsigned int cmd, unsigned long arg) +{ + struct pagemap_sd_args __user *uarg = (struct pagemap_sd_args __user *)arg; + struct mm_struct *mm = file->private_data; + struct pagemap_sd_args arguments; + + switch (cmd) { + case PAGEMAP_SD_GET: + fallthrough; + case PAGEMAP_SD_CLEAR: + fallthrough; + case PAGEMAP_SD_GET_AND_CLEAR: + if (copy_from_user(&arguments, uarg, sizeof(struct pagemap_sd_args))) + return -EFAULT; + return do_pagemap_sd_cmd(mm, cmd, &arguments); + default: + return -EINVAL; + } +} +#endif /* CONFIG_MEM_SOFT_DIRTY */ + const struct file_operations proc_pagemap_operations = { .llseek = mem_lseek, /* borrow this */ .read = pagemap_read, .open = pagemap_open, .release = pagemap_release, +#ifdef CONFIG_MEM_SOFT_DIRTY + .unlocked_ioctl = pagemap_sd_ioctl, + .compat_ioctl = pagemap_sd_ioctl, +#endif /* CONFIG_MEM_SOFT_DIRTY */ }; #endif /* CONFIG_PROC_PAGE_MONITOR */ diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h index b7b56871029c..4f6d1c0ae524 100644 --- a/include/uapi/linux/fs.h +++ b/include/uapi/linux/fs.h @@ -305,4 +305,27 @@ typedef int __bitwise __kernel_rwf_t; #define RWF_SUPPORTED (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\ RWF_APPEND) +/** + * struct pagemap_sd_args - Soft-dirty IOCTL argument + * @start: Starting address + * @len: Length of the region + * @vec: Output buffer address + * @vec_len: Length of the output buffer + * @flags: Special flags for the IOCTL + * @__reserved: Reserved member to preserve data alignment. Should be 0. + */ +struct pagemap_sd_args { + __u64 __user start; + __u64 len; + __u64 __user vec; + __u64 vec_len; + __u32 flags; + __u32 __reserved; +}; + +#define PAGEMAP_SD_GET _IOWR('f', 16, struct pagemap_sd_args) +#define PAGEMAP_SD_CLEAR _IOWR('f', 17, struct pagemap_sd_args) +#define PAGEMAP_SD_GET_AND_CLEAR _IOWR('f', 18, struct pagemap_sd_args) +#define PAGEMAP_SD_NO_REUSED_REGIONS 0x1 + #endif /* _UAPI_LINUX_FS_H */ diff --git a/tools/include/uapi/linux/fs.h b/tools/include/uapi/linux/fs.h index b7b56871029c..4f6d1c0ae524 100644 --- a/tools/include/uapi/linux/fs.h +++ b/tools/include/uapi/linux/fs.h @@ -305,4 +305,27 @@ typedef int __bitwise __kernel_rwf_t; #define RWF_SUPPORTED (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\ RWF_APPEND) +/** + * struct pagemap_sd_args - Soft-dirty IOCTL argument + * @start: Starting address + * @len: Length of the region + * @vec: Output buffer address + * @vec_len: Length of the output buffer + * @flags: Special flags for the IOCTL + * @__reserved: Reserved member to preserve data alignment. Should be 0. + */ +struct pagemap_sd_args { + __u64 __user start; + __u64 len; + __u64 __user vec; + __u64 vec_len; + __u32 flags; + __u32 __reserved; +}; + +#define PAGEMAP_SD_GET _IOWR('f', 16, struct pagemap_sd_args) +#define PAGEMAP_SD_CLEAR _IOWR('f', 17, struct pagemap_sd_args) +#define PAGEMAP_SD_GET_AND_CLEAR _IOWR('f', 18, struct pagemap_sd_args) +#define PAGEMAP_SD_NO_REUSED_REGIONS 0x1 + #endif /* _UAPI_LINUX_FS_H */ From patchwork Fri Aug 26 06:45:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muhammad Usama Anjum X-Patchwork-Id: 12955588 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2116FECAAD9 for ; Fri, 26 Aug 2022 06:46:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245168AbiHZGqW (ORCPT ); Fri, 26 Aug 2022 02:46:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50186 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245114AbiHZGqS (ORCPT ); Fri, 26 Aug 2022 02:46:18 -0400 Received: from madras.collabora.co.uk (madras.collabora.co.uk [46.235.227.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CF607D21C4; Thu, 25 Aug 2022 23:46:06 -0700 (PDT) Received: from lenovo.Home (unknown [39.53.61.43]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: usama.anjum) by madras.collabora.co.uk (Postfix) with ESMTPSA id 74A016601EC5; Fri, 26 Aug 2022 07:46:02 +0100 (BST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1661496365; bh=4yvSwL7fk15687RDM21em8u33dCjiZyepG9X3yowxF0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=UJ731p7BYeJ6Kai5kB5cooRilHpcsGymeCDygp3UhG784Aj8+XlOuZ0ywcanrGW58 KAAItzV1w0jVDmEJ4qkIdkAMLgLpdWgdzeU7z25o52qyr0YtTBXqj1j6Cu5HNqcXJL D9QOHuCaMGRFf/Flp8NDns2y2P5vl+ch+sAN4jl4RusqNKrYPYdChbByghxrdTJZ0y bjIZbUDV/oQoeWGfodRcx+8NyylssXy5ePh9XNoigZ+7mXeZQqZ7ktSjczCSvcRcdW Hwq55NzJn+3OOchn6wGxJ5jIan7t46QeVD8k4DrwBO9LyZYblSaxysW87cRoWYytiW O20FAxj2AzyMw== From: Muhammad Usama Anjum To: Jonathan Corbet , Alexander Viro , Andrew Morton , Shuah Khan , linux-doc@vger.kernel.org (open list:DOCUMENTATION), linux-kernel@vger.kernel.org (open list), linux-fsdevel@vger.kernel.org (open list:PROC FILESYSTEM), linux-mm@kvack.org (open list:MEMORY MANAGEMENT), linux-kselftest@vger.kernel.org (open list:KERNEL SELFTEST FRAMEWORK) Cc: Muhammad Usama Anjum , kernel@collabora.com, Gabriel Krisman Bertazi , David Hildenbrand , Peter Enderborg , Greg KH Subject: [PATCH v3 3/4] selftests: vm: add pagemap ioctl tests Date: Fri, 26 Aug 2022 11:45:34 +0500 Message-Id: <20220826064535.1941190-4-usama.anjum@collabora.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220826064535.1941190-1-usama.anjum@collabora.com> References: <20220826064535.1941190-1-usama.anjum@collabora.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Add pagemap ioctl tests. Add several different types of tests to judge the correction of the interface. Signed-off-by: Muhammad Usama Anjum --- TAP version 13 1..44 ok 1 sanity_tests no cmd specified ok 2 sanity_tests wrong flag specified ok 3 sanity_tests reserved field specified specified ok 4 sanity_tests wrong cmd specified ok 5 sanity_tests mixture of correct and wrong cmds ok 6 sanity_tests Clear area with larger vec size 1 22 ok 7 Page testing: all new pages must be soft dirty ok 8 Page testing: all pages must not be soft dirty ok 9 Page testing: all pages dirty other than first and the last one ok 10 Page testing: only middle page dirty ok 11 Page testing: only two middle pages dirty ok 12 Page testing: only get 2 dirty pages and clear them as well ok 13 Page testing: Range clear only ok 14 Large Page testing: all new pages must be soft dirty ok 15 Large Page testing: all pages must not be soft dirty ok 16 Large Page testing: all pages dirty other than first and the last one ok 17 Large Page testing: only middle page dirty ok 18 Large Page testing: only two middle pages dirty ok 19 Large Page testing: only get 2 dirty pages and clear them as well ok 20 Large Page testing: Range clear only ok 21 Huge page testing: all new pages must be soft dirty ok 22 Huge page testing: all pages must not be soft dirty ok 23 Huge page testing: all pages dirty other than first and the last one ok 24 Huge page testing: only middle page dirty ok 25 Huge page testing: only two middle pages dirty ok 26 Huge page testing: only get 2 dirty pages and clear them as well ok 27 Huge page testing: Range clear only ok 28 Performance Page testing: page isn't dirty ok 29 Performance Page testing: all pages must not be soft dirty ok 30 Performance Page testing: all pages dirty other than first and the last one ok 31 Performance Page testing: only middle page dirty ok 32 Performance Page testing: only two middle pages dirty ok 33 Performance Page testing: only get 2 dirty pages and clear them as well ok 34 Performance Page testing: Range clear only ok 35 hpage_unit_tests all new huge page must be dirty ok 36 hpage_unit_tests all the huge page must not be dirty ok 37 hpage_unit_tests all the huge page must be dirty and clear ok 38 hpage_unit_tests only middle page dirty ok 39 hpage_unit_tests clear first half of huge page ok 40 hpage_unit_tests clear first half of huge page with limited buffer ok 41 hpage_unit_tests clear second half huge page ok 42 unmapped_region_tests Get dirty pages ok 43 unmapped_region_tests Get dirty pages ok 44 Test test_simple # Totals: pass:44 fail:0 xfail:0 xpass:0 skip:0 error:0 Changes in v3: - Add another test to do sanity of flags Changes in v2: - Update the tests to use the ioctl interface instead of syscall --- tools/testing/selftests/vm/.gitignore | 1 + tools/testing/selftests/vm/Makefile | 2 + tools/testing/selftests/vm/pagemap_ioctl.c | 649 +++++++++++++++++++++ 3 files changed, 652 insertions(+) create mode 100644 tools/testing/selftests/vm/pagemap_ioctl.c diff --git a/tools/testing/selftests/vm/.gitignore b/tools/testing/selftests/vm/.gitignore index 31e5eea2a9b9..334fde556499 100644 --- a/tools/testing/selftests/vm/.gitignore +++ b/tools/testing/selftests/vm/.gitignore @@ -16,6 +16,7 @@ mremap_dontunmap mremap_test on-fault-limit transhuge-stress +pagemap_ioctl protection_keys protection_keys_32 protection_keys_64 diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile index 266e965e724c..4296c3268f64 100644 --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -51,6 +51,7 @@ TEST_GEN_FILES += on-fault-limit TEST_GEN_FILES += thuge-gen TEST_GEN_FILES += transhuge-stress TEST_GEN_FILES += userfaultfd +TEST_GEN_PROGS += pagemap_ioctl TEST_GEN_PROGS += soft-dirty TEST_GEN_PROGS += split_huge_page_test TEST_GEN_FILES += ksm_tests @@ -98,6 +99,7 @@ TEST_FILES += va_128TBswitch.sh include ../lib.mk $(OUTPUT)/madv_populate: vm_util.c +$(OUTPUT)/pagemap_ioctl: vm_util.c $(OUTPUT)/soft-dirty: vm_util.c $(OUTPUT)/split_huge_page_test: vm_util.c diff --git a/tools/testing/selftests/vm/pagemap_ioctl.c b/tools/testing/selftests/vm/pagemap_ioctl.c new file mode 100644 index 000000000000..7775247b9cdc --- /dev/null +++ b/tools/testing/selftests/vm/pagemap_ioctl.c @@ -0,0 +1,649 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include +#include +#include "vm_util.h" +#include "../kselftest.h" +#include +#include +#include + +#define TEST_ITERATIONS 10000 +#define PAGEMAP "/proc/self/pagemap" +int pagemap_fd; + +static long pagemap_ioctl(void *start, int len, unsigned int cmd, loff_t *vec, + int vec_len, int flag) +{ + struct pagemap_sd_args arg; + int ret; + + arg.start = (uintptr_t)start; + arg.len = len; + arg.vec = (uintptr_t)vec; + arg.vec_len = vec_len; + arg.flags = flag; + arg.__reserved = 0; + + ret = ioctl(pagemap_fd, cmd, &arg); + + return ret; +} + +static long pagemap_ioctl_res(void *start, int len, unsigned int cmd, loff_t *vec, + int vec_len, int flag, int res) +{ + struct pagemap_sd_args arg; + int ret; + + arg.start = (uintptr_t)start; + arg.len = len; + arg.vec = (uintptr_t)vec; + arg.vec_len = vec_len; + arg.flags = flag; + arg.__reserved = res; + + ret = ioctl(pagemap_fd, cmd, &arg); + + return ret; +} + +int sanity_tests(int page_size) +{ + char *mem; + int mem_size, vec_size, ret; + loff_t *vec; + + /* 1. wrong operation */ + vec_size = 100; + mem_size = page_size; + + vec = malloc(sizeof(loff_t) * vec_size); + mem = mmap(NULL, mem_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0); + if (!mem || !vec) + ksft_exit_fail_msg("error nomem\n"); + + ksft_test_result(pagemap_ioctl(mem, mem_size, 0, vec, vec_size, 0) < 0, + "%s no cmd specified\n", __func__); + ksft_test_result(pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET_AND_CLEAR, vec, vec_size, 8) + < 0, "%s wrong flag specified\n", __func__); + ksft_test_result(pagemap_ioctl_res(mem, mem_size, PAGEMAP_SD_GET_AND_CLEAR, vec, vec_size, 8, 10) + < 0, "%s reserved field specified specified\n", __func__); + ksft_test_result(pagemap_ioctl(mem, mem_size, 0x01000000, vec, vec_size, 0) < 0, + "%s wrong cmd specified\n", __func__); + ksft_test_result(pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET | 0xFF, + vec, vec_size, 0) < 0, + "%s mixture of correct and wrong cmds\n", __func__); + + /* 2. Clear area with larger vec size */ + ret = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET_AND_CLEAR, + vec, vec_size, 0); + ksft_test_result(ret >= 0, "%s Clear area with larger vec size %d %d\n", __func__, ret, errno); + + free(vec); + munmap(mem, mem_size); + return 0; +} + +void *gethugepage(int map_size) +{ + int ret; + char *map; + size_t hpage_len = read_pmd_pagesize(); + + map = memalign(hpage_len, map_size); + if (!map) + ksft_exit_fail_msg("memalign failed %d %s\n", errno, strerror(errno)); + + ret = madvise(map, map_size, MADV_HUGEPAGE); + if (ret) + ksft_exit_fail_msg("madvise failed %d %d %s\n", ret, errno, strerror(errno)); + + memset(map, 0, map_size); + + if (check_huge(map)) + return map; + + free(map); + return NULL; + +} + +int hpage_unit_tests(int page_size) +{ + char *map; + int i, ret; + size_t hpage_len = read_pmd_pagesize(); + size_t num_pages = 1; + int map_size = hpage_len * num_pages; + int vec_size = map_size/page_size; + loff_t *vec, *vec2; + + vec = malloc(sizeof(loff_t) * vec_size); + vec2 = malloc(sizeof(loff_t) * vec_size); + if (!vec || !vec2) + ksft_exit_fail_msg("malloc failed\n"); + + map = gethugepage(map_size); + if (map) { + // 1. all new huge page must be dirty + ret = pagemap_ioctl(map, map_size, PAGEMAP_SD_GET_AND_CLEAR, vec, vec_size, 0); + if (ret < 0) + ksft_exit_fail_msg("error %d %d %s\n", ret, errno, strerror(errno)); + + for (i = 0; i < vec_size; i++) + if (vec[i] != i * page_size) + break; + + ksft_test_result(i == vec_size, "%s all new huge page must be dirty\n", __func__); + + // 2. all the huge page must not be dirty + ret = pagemap_ioctl(map, map_size, PAGEMAP_SD_GET, vec, vec_size, 0); + if (ret < 0) + ksft_exit_fail_msg("error %d %d %s\n", ret, errno, strerror(errno)); + + ksft_test_result(ret == 0, "%s all the huge page must not be dirty\n", __func__); + + // 3. all the huge page must be dirty and clear dirty as well + memset(map, -1, map_size); + ret = pagemap_ioctl(map, map_size, PAGEMAP_SD_GET_AND_CLEAR, vec, vec_size, 0); + if (ret < 0) + ksft_exit_fail_msg("error %d %d %s\n", ret, errno, strerror(errno)); + + for (i = 0; i < vec_size; i++) + if (vec[i] != i * page_size) + break; + + ksft_test_result(ret == vec_size && i == vec_size, + "%s all the huge page must be dirty and clear\n", __func__); + + // 4. only middle page dirty + free(map); + map = gethugepage(map_size); + clear_softdirty(); + map[vec_size/2 * page_size]++; + + ret = pagemap_ioctl(map, map_size, PAGEMAP_SD_GET, vec, vec_size, 0); + if (ret < 0) + ksft_exit_fail_msg("error %d %d %s\n", ret, errno, strerror(errno)); + + for (i = 0; i < vec_size; i++) { + if (vec[i] == vec_size/2 * page_size) + break; + } + ksft_test_result(vec[i] == vec_size/2 * page_size, + "%s only middle page dirty\n", __func__); + + free(map); + } else { + ksft_test_result_skip("all new huge page must be dirty\n"); + ksft_test_result_skip("all the huge page must not be dirty\n"); + ksft_test_result_skip("all the huge page must be dirty and clear\n"); + ksft_test_result_skip("only middle page dirty\n"); + } + + // 5. clear first half of huge page + map = gethugepage(map_size); + if (map) { + ret = pagemap_ioctl(map, map_size/2, PAGEMAP_SD_CLEAR, NULL, 0, 0); + if (ret < 0) + ksft_exit_fail_msg("error %d %d %s\n", ret, errno, strerror(errno)); + + ret = pagemap_ioctl(map, map_size, PAGEMAP_SD_GET, vec, vec_size, 0); + if (ret < 0) + ksft_exit_fail_msg("error %d %d %s\n", ret, errno, strerror(errno)); + + for (i = 0; i < vec_size/2; i++) + if (vec[i] != (i + vec_size/2) * page_size) + break; + + ksft_test_result(i == vec_size/2 && ret == vec_size/2, + "%s clear first half of huge page\n", __func__); + free(map); + } else { + ksft_test_result_skip("clear first half of huge page\n"); + } + + // 6. clear first half of huge page with limited buffer + map = gethugepage(map_size); + if (map) { + ret = pagemap_ioctl(map, map_size, PAGEMAP_SD_GET_AND_CLEAR, vec, vec_size/2, 0); + if (ret < 0) + ksft_exit_fail_msg("error %d %d %s\n", ret, errno, strerror(errno)); + + ret = pagemap_ioctl(map, map_size, PAGEMAP_SD_GET, vec, vec_size, 0); + if (ret < 0) + ksft_exit_fail_msg("error %d %d %s\n", ret, errno, strerror(errno)); + + for (i = 0; i < vec_size/2; i++) + if (vec[i] != (i + vec_size/2) * page_size) + break; + + ksft_test_result(i == vec_size/2 && ret == vec_size/2, + "%s clear first half of huge page with limited buffer\n", + __func__); + free(map); + } else { + ksft_test_result_skip("clear first half of huge page with limited buffer\n"); + } + + // 7. clear second half of huge page + map = gethugepage(map_size); + if (map) { + memset(map, -1, map_size); + ret = pagemap_ioctl(map + map_size/2, map_size/2, PAGEMAP_SD_CLEAR, NULL, 0, 0); + if (ret < 0) + ksft_exit_fail_msg("error %d %d %s\n", ret, errno, strerror(errno)); + + ret = pagemap_ioctl(map, map_size, PAGEMAP_SD_GET, vec, vec_size, 0); + if (ret < 0) + ksft_exit_fail_msg("error %d %d %s\n", ret, errno, strerror(errno)); + + for (i = 0; i < vec_size/2; i++) + if (vec[i] != i * page_size) + break; + + ksft_test_result(i == vec_size/2, "%s clear second half huge page\n", __func__); + free(map); + } else { + ksft_test_result_skip("clear second half huge page\n"); + } + + free(vec); + free(vec2); + return 0; +} + +int base_tests(char *prefix, char *mem, int mem_size, int page_size, int skip) +{ + int vec_size, i, j, ret, dirty_pages, dirty_pages2; + loff_t *vec, *vec2; + + if (skip) { + ksft_test_result_skip("%s all new pages must be soft dirty\n", prefix); + ksft_test_result_skip("%s all pages must not be soft dirty\n", prefix); + ksft_test_result_skip("%s all pages dirty other than first and the last one\n", + prefix); + ksft_test_result_skip("%s only middle page dirty\n", prefix); + ksft_test_result_skip("%s only two middle pages dirty\n", prefix); + ksft_test_result_skip("%s only get 2 dirty pages and clear them as well\n", prefix); + ksft_test_result_skip("%s Range clear only\n", prefix); + return 0; + } + + vec_size = mem_size/page_size; + vec = malloc(sizeof(loff_t) * vec_size); + vec2 = malloc(sizeof(loff_t) * vec_size); + + /* 1. all new pages must be soft dirty and clear the range for next test */ + dirty_pages = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET_AND_CLEAR, vec, vec_size - 2, 0); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + dirty_pages2 = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET_AND_CLEAR, vec2, vec_size, 0); + if (dirty_pages2 < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages2, errno, strerror(errno)); + + for (i = 0; i < dirty_pages; i++) + if (vec[i] != i * page_size) + break; + for (j = 0; j < dirty_pages2; j++) + if (vec2[j] != (j + vec_size - 2) * page_size) + break; + + ksft_test_result(dirty_pages == vec_size - 2 && i == dirty_pages && + dirty_pages2 == 2 && j == dirty_pages2, + "%s all new pages must be soft dirty\n", prefix); + + // 2. all pages must not be soft dirty + dirty_pages = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET, vec, vec_size, 0); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + ksft_test_result(dirty_pages == 0, "%s all pages must not be soft dirty\n", prefix); + + // 3. all pages dirty other than first and the last one + memset(mem + page_size, -1, (mem_size - (2 * page_size))); + + dirty_pages = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET, vec, vec_size, 0); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + ksft_test_result(dirty_pages >= vec_size - 2 && dirty_pages <= vec_size, + "%s all pages dirty other than first and the last one\n", prefix); + + // 4. only middle page dirty + clear_softdirty(); + mem[vec_size/2 * page_size]++; + + dirty_pages = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET, vec, vec_size, 0); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + for (i = 0; i < vec_size; i++) { + if (vec[i] == vec_size/2 * page_size) + break; + } + ksft_test_result(vec[i] == vec_size/2 * page_size, + "%s only middle page dirty\n", prefix); + + // 5. only two middle pages dirty and walk over only middle pages + clear_softdirty(); + mem[vec_size/2 * page_size]++; + mem[(vec_size/2 + 1) * page_size]++; + + dirty_pages = pagemap_ioctl(&mem[vec_size/2 * page_size], 2 * page_size, + PAGEMAP_SD_GET, vec, vec_size, 0); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + ksft_test_result(dirty_pages == 2 && vec[0] == 0 && vec[1] == page_size, + "%s only two middle pages dirty\n", prefix); + + /* 6. only get 2 dirty pages and clear them as well */ + memset(mem, -1, mem_size); + + /* get and clear second and third pages */ + ret = pagemap_ioctl(mem + page_size, 2 * page_size, PAGEMAP_SD_GET_AND_CLEAR, vec, 2, 0); + if (ret < 0) + ksft_exit_fail_msg("error %d %d %s\n", ret, errno, strerror(errno)); + + dirty_pages = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET, vec2, vec_size, 0); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + for (i = 0; i < vec_size - 2; i++) { + if (i == 0 && (vec[i] != 0 || vec2[i] != 0)) + break; + else if (i == 1 && (vec[i] != page_size || vec2[i] != (i + 2) * page_size)) + break; + else if (i > 1 && (vec2[i] != (i + 2) * page_size)) + break; + } + + ksft_test_result(dirty_pages == vec_size - 2 && i == vec_size - 2, + "%s only get 2 dirty pages and clear them as well\n", prefix); + /* 7. Range clear only */ + memset(mem, -1, mem_size); + dirty_pages = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_CLEAR, NULL, 0, 0); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + dirty_pages2 = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET, vec, vec_size, 0); + if (dirty_pages2 < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages2, errno, strerror(errno)); + + ksft_test_result(dirty_pages == 0 && dirty_pages2 == 0, "%s Range clear only\n", + prefix); + + free(vec); + free(vec2); + return 0; +} + +int performance_base_tests(char *prefix, char *mem, int mem_size, int page_size, int skip) +{ + int vec_size, i, ret, dirty_pages, dirty_pages2; + loff_t *vec, *vec2; + + if (skip) { + ksft_test_result_skip("%s all new pages must be soft dirty\n", prefix); + ksft_test_result_skip("%s all pages must not be soft dirty\n", prefix); + ksft_test_result_skip("%s all pages dirty other than first and the last one\n", + prefix); + ksft_test_result_skip("%s only middle page dirty\n", prefix); + ksft_test_result_skip("%s only two middle pages dirty\n", prefix); + ksft_test_result_skip("%s only get 2 dirty pages and clear them as well\n", prefix); + ksft_test_result_skip("%s Range clear only\n", prefix); + return 0; + } + + vec_size = mem_size/page_size; + vec = malloc(sizeof(loff_t) * vec_size); + vec2 = malloc(sizeof(loff_t) * vec_size); + + /* 1. all new pages must be soft dirty and clear the range for next test */ + dirty_pages = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET_AND_CLEAR, + vec, vec_size - 2, PAGEMAP_SD_NO_REUSED_REGIONS); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + dirty_pages2 = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET_AND_CLEAR, + vec2, vec_size, PAGEMAP_SD_NO_REUSED_REGIONS); + if (dirty_pages2 < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages2, errno, strerror(errno)); + + ksft_test_result(dirty_pages == 0 && dirty_pages2 == 0, + "%s page isn't dirty\n", prefix); + + // 2. all pages must not be soft dirty + dirty_pages = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET, + vec, vec_size, PAGEMAP_SD_NO_REUSED_REGIONS); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + ksft_test_result(dirty_pages == 0, "%s all pages must not be soft dirty\n", prefix); + + // 3. all pages dirty other than first and the last one + memset(mem + page_size, -1, (mem_size - 2 * page_size)); + + dirty_pages = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET, + vec, vec_size, PAGEMAP_SD_NO_REUSED_REGIONS); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + for (i = 0; i < dirty_pages; i++) { + if (vec[i] != (i + 1) * page_size) + break; + } + + ksft_test_result(dirty_pages == vec_size - 2 && i == vec_size - 2, + "%s all pages dirty other than first and the last one\n", prefix); + + // 4. only middle page dirty + clear_softdirty(); + mem[vec_size/2 * page_size]++; + + dirty_pages = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET, + vec, vec_size, PAGEMAP_SD_NO_REUSED_REGIONS); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + for (i = 0; i < vec_size; i++) { + if (vec[i] == vec_size/2 * page_size) + break; + } + ksft_test_result(vec[i] == vec_size/2 * page_size, + "%s only middle page dirty\n", prefix); + + // 5. only two middle pages dirty and walk over only middle pages + clear_softdirty(); + mem[vec_size/2 * page_size]++; + mem[(vec_size/2 + 1) * page_size]++; + + dirty_pages = pagemap_ioctl(&mem[vec_size/2 * page_size], 2 * page_size, + PAGEMAP_SD_GET, vec, vec_size, PAGEMAP_SD_NO_REUSED_REGIONS); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + ksft_test_result(dirty_pages == 2 && vec[0] == 0 && vec[1] == page_size, + "%s only two middle pages dirty\n", prefix); + + /* 6. only get 2 dirty pages and clear them as well */ + memset(mem, -1, mem_size); + + /* get and clear second and third pages */ + ret = pagemap_ioctl(mem + page_size, 2 * page_size, PAGEMAP_SD_GET_AND_CLEAR, + vec, 2, PAGEMAP_SD_NO_REUSED_REGIONS); + if (ret < 0) + ksft_exit_fail_msg("error %d %d %s\n", ret, errno, strerror(errno)); + + dirty_pages = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET, + vec2, vec_size, PAGEMAP_SD_NO_REUSED_REGIONS); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + for (i = 0; i < vec_size - 2; i++) { + if (i == 0 && (vec[i] != 0 || vec2[i] != 0)) + break; + else if (i == 1 && (vec[i] != page_size || vec2[i] != (i + 2) * page_size)) + break; + else if (i > 1 && (vec2[i] != (i + 2) * page_size)) + break; + } + + ksft_test_result(dirty_pages == vec_size - 2 && i == vec_size - 2, + "%s only get 2 dirty pages and clear them as well\n", prefix); + + /* 7. Range clear only */ + memset(mem, -1, mem_size); + dirty_pages = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_CLEAR, + NULL, 0, PAGEMAP_SD_NO_REUSED_REGIONS); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + dirty_pages2 = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET, + vec, vec_size, PAGEMAP_SD_NO_REUSED_REGIONS); + if (dirty_pages2 < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages2, errno, strerror(errno)); + + ksft_test_result(dirty_pages == 0 && dirty_pages2 == 0, "%s Range clear only\n", + prefix); + + free(vec); + free(vec2); + return 0; +} + +int unmapped_region_tests(int page_size) +{ + void *start = (void *)0x10000000; + int dirty_pages, len = 0x00040000; + int vec_size = len / page_size; + loff_t *vec = malloc(sizeof(loff_t) * vec_size); + + /* 1. Get dirty pages */ + dirty_pages = pagemap_ioctl(start, len, PAGEMAP_SD_GET, vec, vec_size, 0); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + ksft_test_result(dirty_pages >= 0, "%s Get dirty pages\n", __func__); + + /* 2. Clear dirty bit of whole address space */ + dirty_pages = pagemap_ioctl(0, 0x7FFFFFFF, PAGEMAP_SD_CLEAR, NULL, 0, 0); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + ksft_test_result(dirty_pages == 0, "%s Get dirty pages\n", __func__); + + free(vec); + return 0; +} + +static void test_simple(int page_size) +{ + int i; + char *map; + loff_t *vec = NULL; + + map = aligned_alloc(page_size, page_size); + if (!map) + ksft_exit_fail_msg("mmap failed\n"); + + clear_softdirty(); + + for (i = 0 ; i < TEST_ITERATIONS; i++) { + if (pagemap_ioctl(map, page_size, PAGEMAP_SD_GET, vec, 1, 0) == 1) { + ksft_print_msg("dirty bit was 1, but should be 0 (i=%d)\n", i); + break; + } + + clear_softdirty(); + // Write something to the page to get the dirty bit enabled on the page + map[0]++; + + if (pagemap_ioctl(map, page_size, PAGEMAP_SD_GET, vec, 1, 0) == 0) { + ksft_print_msg("dirty bit was 0, but should be 1 (i=%d)\n", i); + break; + } + + clear_softdirty(); + } + free(map); + + ksft_test_result(i == TEST_ITERATIONS, "Test %s\n", __func__); +} + +int main(int argc, char **argv) +{ + int page_size = getpagesize(); + size_t hpage_len = read_pmd_pagesize(); + char *mem, *map; + int mem_size; + + ksft_print_header(); + ksft_set_plan(44); + + pagemap_fd = open(PAGEMAP, O_RDWR); + if (pagemap_fd < 0) + return -EINVAL; + + /* 1. Sanity testing */ + sanity_tests(page_size); + + /* 2. Normal page testing */ + mem_size = 10 * page_size; + mem = mmap(NULL, mem_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0); + if (!mem) + ksft_exit_fail_msg("error nomem\n"); + + base_tests("Page testing:", mem, mem_size, page_size, 0); + + munmap(mem, mem_size); + + /* 3. Large page testing */ + mem_size = 512 * 10 * page_size; + mem = mmap(NULL, mem_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0); + if (!mem) + ksft_exit_fail_msg("error nomem\n"); + + base_tests("Large Page testing:", mem, mem_size, page_size, 0); + + munmap(mem, mem_size); + + /* 4. Huge page testing */ + map = gethugepage(hpage_len); + if (check_huge(map)) + base_tests("Huge page testing:", map, hpage_len, page_size, 0); + else + base_tests("Huge page testing:", NULL, 0, 0, 1); + + free(map); + + /* 5. Normal page testing */ + mem_size = 10 * page_size; + mem = mmap(NULL, mem_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0); + if (!mem) + ksft_exit_fail_msg("error nomem\n"); + + performance_base_tests("Performance Page testing:", mem, mem_size, page_size, 0); + + munmap(mem, mem_size); + + /* 6. Huge page tests */ + hpage_unit_tests(page_size); + + /* 7. Unmapped address test */ + unmapped_region_tests(page_size); + + /* 8. Iterative test */ + test_simple(page_size); + + close(pagemap_fd); + return ksft_exit_pass(); +} From patchwork Fri Aug 26 06:45:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muhammad Usama Anjum X-Patchwork-Id: 12955590 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3C7F5ECAAD8 for ; Fri, 26 Aug 2022 06:46:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245189AbiHZGqb (ORCPT ); Fri, 26 Aug 2022 02:46:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50208 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236722AbiHZGqS (ORCPT ); Fri, 26 Aug 2022 02:46:18 -0400 Received: from madras.collabora.co.uk (madras.collabora.co.uk [IPv6:2a00:1098:0:82:1000:25:2eeb:e5ab]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8C634D21D2; Thu, 25 Aug 2022 23:46:10 -0700 (PDT) Received: from lenovo.Home (unknown [39.53.61.43]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: usama.anjum) by madras.collabora.co.uk (Postfix) with ESMTPSA id 28ACA6601EC7; Fri, 26 Aug 2022 07:46:05 +0100 (BST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1661496369; bh=qFwfAW+YdGQMEoUt1Jsqu4meeyL+SBRRa4jSpOBxxag=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=MqLhLBD000uR8bos55kq6B0zd6KcI+zVA3+3+1lh9CHVeLA7Xtm5TlApu0RsP5K6Q GaTS146h+PQSc60dfjK0trnYUuQOrRDaO9xp9SZwkO26Fwo6/AXsB68X7nL6ejn2xN sbI/9tPhriK2OPjmRvSqjE99pvEhHSjGwslyTNLwsJ3HTxVrAQVWKYgarxEMYKm+0l Sgty4ZQl62sAs2T9NeauCEYI6N5UOeifcCNSLx+EGyN5ZtKTej96u354C53osstFIm 3A8U9UByjTVrOhh56q9iP0+NkyA9dw5QqnJ9qx6XihHoVtVxicMbVyfE8C40X1Q7TP ampV4YcnPyCtg== From: Muhammad Usama Anjum To: Jonathan Corbet , Alexander Viro , Andrew Morton , Shuah Khan , linux-doc@vger.kernel.org (open list:DOCUMENTATION), linux-kernel@vger.kernel.org (open list), linux-fsdevel@vger.kernel.org (open list:PROC FILESYSTEM), linux-mm@kvack.org (open list:MEMORY MANAGEMENT), linux-kselftest@vger.kernel.org (open list:KERNEL SELFTEST FRAMEWORK) Cc: Muhammad Usama Anjum , kernel@collabora.com, Gabriel Krisman Bertazi , David Hildenbrand , Peter Enderborg , Greg KH Subject: [PATCH v3 4/4] mm: add documentation of the new ioctl on pagemap Date: Fri, 26 Aug 2022 11:45:35 +0500 Message-Id: <20220826064535.1941190-5-usama.anjum@collabora.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220826064535.1941190-1-usama.anjum@collabora.com> References: <20220826064535.1941190-1-usama.anjum@collabora.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Add the explanation of the added ioctl on pagemap file. It can be used to get, clear or both soft-dirty PTE bit of the specified range. or both at the same time. Signed-off-by: Muhammad Usama Anjum --- Changes in v2: - Update documentation to mention ioctl instead of the syscall --- Documentation/admin-guide/mm/soft-dirty.rst | 42 ++++++++++++++++++++- 1 file changed, 41 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/mm/soft-dirty.rst b/Documentation/admin-guide/mm/soft-dirty.rst index cb0cfd6672fa..d3d33e63a965 100644 --- a/Documentation/admin-guide/mm/soft-dirty.rst +++ b/Documentation/admin-guide/mm/soft-dirty.rst @@ -5,7 +5,12 @@ Soft-Dirty PTEs =============== The soft-dirty is a bit on a PTE which helps to track which pages a task -writes to. In order to do this tracking one should +writes to. + +Using Proc FS +------------- + +In order to do this tracking one should 1. Clear soft-dirty bits from the task's PTEs. @@ -20,6 +25,41 @@ writes to. In order to do this tracking one should 64-bit qword is the soft-dirty one. If set, the respective PTE was written to since step 1. +Using IOCTL +----------- + +The IOCTL on the ``/proc/PID/pagemap`` can be can be used to find the dirty pages +atomically. The following commands are supported:: + + MEMWATCH_SD_GET + Get the page offsets which are soft dirty. + + MEMWATCH_SD_CLEAR + Clear the pages which are soft dirty. + + MEMWATCH_SD_GET_AND_CLEAR + Get and clear the pages which are soft dirty. + +The struct :c:type:`pagemap_sd_args` is used as the argument. In this struct: + + 1. The range is specified through start and len. The len argument need not be + the multiple of the page size, but since the information is returned for the + whole pages, len is effectively rounded up to the next multiple of the page + size. + + 2. The output buffer and size is specified in vec and vec_len. The offsets of + the dirty pages from start are returned in vec. The ioctl returns when the + whole range has been searched or vec is completely filled. The whole range + isn't cleared if vec fills up completely. + + 3. The flags can be specified in flags field. Currently only one flag, + PAGEMAP_SD_NO_REUSED_REGIONS is supported which can be specified to ignore + the VMA dirty flags for better performance. This flag shows only those pages + dirty which have been written to by the user. All new allocations aren't returned + to be dirty. + +Explanation +----------- Internally, to do this tracking, the writable bit is cleared from PTEs when the soft-dirty bit is cleared. So, after this, when the task tries to