From patchwork Tue Apr 13 09:14:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: zhukeqian X-Patchwork-Id: 12199745 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ADD3AC433B4 for ; Tue, 13 Apr 2021 09:15:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 89A33610CE for ; Tue, 13 Apr 2021 09:15:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245440AbhDMJPh (ORCPT ); Tue, 13 Apr 2021 05:15:37 -0400 Received: from szxga05-in.huawei.com ([45.249.212.191]:16549 "EHLO szxga05-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245281AbhDMJPc (ORCPT ); Tue, 13 Apr 2021 05:15:32 -0400 Received: from DGGEMS412-HUB.china.huawei.com (unknown [172.30.72.59]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4FKKcg0RjGzPqkx; Tue, 13 Apr 2021 17:12:19 +0800 (CST) Received: from DESKTOP-5IS4806.china.huawei.com (10.174.187.224) by DGGEMS412-HUB.china.huawei.com (10.3.19.212) with Microsoft SMTP Server id 14.3.498.0; Tue, 13 Apr 2021 17:15:02 +0800 From: Keqian Zhu To: , , Alex Williamson , Kirti Wankhede , "Cornelia Huck" , Yi Sun , Tian Kevin CC: Robin Murphy , Will Deacon , "Joerg Roedel" , Jean-Philippe Brucker , Jonathan Cameron , Lu Baolu , , , , Subject: [PATCH 1/3] vfio/iommu_type1: Add HWDBM status maintanance Date: Tue, 13 Apr 2021 17:14:43 +0800 Message-ID: <20210413091445.7448-2-zhukeqian1@huawei.com> X-Mailer: git-send-email 2.8.4.windows.1 In-Reply-To: <20210413091445.7448-1-zhukeqian1@huawei.com> References: <20210413091445.7448-1-zhukeqian1@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.174.187.224] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Kunkun Jiang We are going to optimize dirty log tracking based on iommu HWDBM feature, but the dirty log from iommu is useful only when all iommu backed groups are with HWDBM feature. This maintains a counter in vfio_iommu, which is used in the policy of dirty bitmap population in next patch. This also maintains a counter in vfio_domain, which is used in the policy of switch dirty log in next patch. Co-developed-by: Keqian Zhu Signed-off-by: Kunkun Jiang Reported-by: kernel test robot Reported-by: kernel test robot --- drivers/vfio/vfio_iommu_type1.c | 44 +++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 45cbfd4879a5..9cb9ce021b22 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -73,6 +73,7 @@ struct vfio_iommu { unsigned int vaddr_invalid_count; uint64_t pgsize_bitmap; uint64_t num_non_pinned_groups; + uint64_t num_non_hwdbm_groups; wait_queue_head_t vaddr_wait; bool v2; bool nesting; @@ -85,6 +86,7 @@ struct vfio_domain { struct iommu_domain *domain; struct list_head next; struct list_head group_list; + uint64_t num_non_hwdbm_groups; int prot; /* IOMMU_CACHE */ bool fgsp; /* Fine-grained super pages */ }; @@ -116,6 +118,7 @@ struct vfio_group { struct list_head next; bool mdev_group; /* An mdev group */ bool pinned_page_dirty_scope; + bool iommu_hwdbm; /* For iommu-backed group */ }; struct vfio_iova { @@ -2252,6 +2255,44 @@ static void vfio_iommu_iova_insert_copy(struct vfio_iommu *iommu, list_splice_tail(iova_copy, iova); } +static int vfio_dev_enable_feature(struct device *dev, void *data) +{ + enum iommu_dev_features *feat = data; + + if (iommu_dev_feature_enabled(dev, *feat)) + return 0; + + return iommu_dev_enable_feature(dev, *feat); +} + +static bool vfio_group_supports_hwdbm(struct vfio_group *group) +{ + enum iommu_dev_features feat = IOMMU_DEV_FEAT_HWDBM; + + return !iommu_group_for_each_dev(group->iommu_group, &feat, + vfio_dev_enable_feature); +} + +/* + * Called after a new group is added to the group_list of domain, or before an + * old group is removed from the group_list of domain. + */ +static void vfio_iommu_update_hwdbm(struct vfio_iommu *iommu, + struct vfio_domain *domain, + struct vfio_group *group, + bool attach) +{ + /* Update the HWDBM status of group, domain and iommu */ + group->iommu_hwdbm = vfio_group_supports_hwdbm(group); + if (!group->iommu_hwdbm && attach) { + domain->num_non_hwdbm_groups++; + iommu->num_non_hwdbm_groups++; + } else if (!group->iommu_hwdbm && !attach) { + domain->num_non_hwdbm_groups--; + iommu->num_non_hwdbm_groups--; + } +} + static int vfio_iommu_type1_attach_group(void *iommu_data, struct iommu_group *iommu_group) { @@ -2409,6 +2450,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, vfio_iommu_detach_group(domain, group); if (!vfio_iommu_attach_group(d, group)) { list_add(&group->next, &d->group_list); + vfio_iommu_update_hwdbm(iommu, d, group, true); iommu_domain_free(domain->domain); kfree(domain); goto done; @@ -2435,6 +2477,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, list_add(&domain->next, &iommu->domain_list); vfio_update_pgsize_bitmap(iommu); + vfio_iommu_update_hwdbm(iommu, domain, group, true); done: /* Delete the old one and insert new iova list */ vfio_iommu_iova_insert_copy(iommu, &iova_copy); @@ -2618,6 +2661,7 @@ static void vfio_iommu_type1_detach_group(void *iommu_data, continue; vfio_iommu_detach_group(domain, group); + vfio_iommu_update_hwdbm(iommu, domain, group, false); update_dirty_scope = !group->pinned_page_dirty_scope; list_del(&group->next); kfree(group); From patchwork Tue Apr 13 09:14:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: zhukeqian X-Patchwork-Id: 12199747 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE657C43461 for ; Tue, 13 Apr 2021 09:15:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B1861610CE for ; Tue, 13 Apr 2021 09:15:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245487AbhDMJPj (ORCPT ); Tue, 13 Apr 2021 05:15:39 -0400 Received: from szxga05-in.huawei.com ([45.249.212.191]:16550 "EHLO szxga05-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245376AbhDMJPd (ORCPT ); Tue, 13 Apr 2021 05:15:33 -0400 Received: from DGGEMS412-HUB.china.huawei.com (unknown [172.30.72.59]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4FKKcg0r0HzPql5; Tue, 13 Apr 2021 17:12:19 +0800 (CST) Received: from DESKTOP-5IS4806.china.huawei.com (10.174.187.224) by DGGEMS412-HUB.china.huawei.com (10.3.19.212) with Microsoft SMTP Server id 14.3.498.0; Tue, 13 Apr 2021 17:15:03 +0800 From: Keqian Zhu To: , , Alex Williamson , Kirti Wankhede , "Cornelia Huck" , Yi Sun , Tian Kevin CC: Robin Murphy , Will Deacon , "Joerg Roedel" , Jean-Philippe Brucker , Jonathan Cameron , Lu Baolu , , , , Subject: [PATCH 2/3] vfio/iommu_type1: Optimize dirty bitmap population based on iommu HWDBM Date: Tue, 13 Apr 2021 17:14:44 +0800 Message-ID: <20210413091445.7448-3-zhukeqian1@huawei.com> X-Mailer: git-send-email 2.8.4.windows.1 In-Reply-To: <20210413091445.7448-1-zhukeqian1@huawei.com> References: <20210413091445.7448-1-zhukeqian1@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.174.187.224] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Kunkun Jiang In the past if vfio_iommu is not of pinned_page_dirty_scope and vfio_dma is iommu_mapped, we populate full dirty bitmap for this vfio_dma. Now we can try to get dirty log from iommu before make the lousy decision. The bitmap population: In detail, if all vfio_group are of pinned_page_dirty_scope, the dirty bitmap population is not affected. If there are vfio_groups not of pinned_page_dirty_scope and their domains support HWDBM, then we can try to get dirty log from IOMMU. Otherwise, lead to full dirty bitmap. Consider DMA and group hotplug: Start dirty log for newly added DMA range, and stop dirty log for DMA range going to remove. If a domain don't support HWDBM at start, but can support it after hotplug some groups (attach a first group with HWDBM or detach all groups without HWDBM). If a domain support HWDBM at start, but do not support it after hotplug some groups (attach a group without HWDBM or detach all groups without HWDBM). So our policy is that switch dirty log for domains dynamically. Co-developed-by: Keqian Zhu Signed-off-by: Kunkun Jiang Reported-by: kernel test robot --- drivers/vfio/vfio_iommu_type1.c | 166 ++++++++++++++++++++++++++++++-- 1 file changed, 159 insertions(+), 7 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 9cb9ce021b22..77950e47f56f 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -1202,6 +1202,46 @@ static void vfio_update_pgsize_bitmap(struct vfio_iommu *iommu) } } +static int vfio_iommu_dirty_log_clear(struct vfio_iommu *iommu, + dma_addr_t start_iova, size_t size, + unsigned long *bitmap_buffer, + dma_addr_t base_iova, + unsigned long pgshift) +{ + struct vfio_domain *d; + int ret = 0; + + list_for_each_entry(d, &iommu->domain_list, next) { + ret = iommu_clear_dirty_log(d->domain, start_iova, size, + bitmap_buffer, base_iova, pgshift); + if (ret) { + pr_warn("vfio_iommu dirty log clear failed!\n"); + break; + } + } + + return ret; +} + +static int vfio_iommu_dirty_log_sync(struct vfio_iommu *iommu, + struct vfio_dma *dma, + unsigned long pgshift) +{ + struct vfio_domain *d; + int ret = 0; + + list_for_each_entry(d, &iommu->domain_list, next) { + ret = iommu_sync_dirty_log(d->domain, dma->iova, dma->size, + dma->bitmap, dma->iova, pgshift); + if (ret) { + pr_warn("vfio_iommu dirty log sync failed!\n"); + break; + } + } + + return ret; +} + static int update_user_bitmap(u64 __user *bitmap, struct vfio_iommu *iommu, struct vfio_dma *dma, dma_addr_t base_iova, size_t pgsize) @@ -1212,13 +1252,22 @@ static int update_user_bitmap(u64 __user *bitmap, struct vfio_iommu *iommu, unsigned long copy_offset = bit_offset / BITS_PER_LONG; unsigned long shift = bit_offset % BITS_PER_LONG; unsigned long leftover; + int ret; - /* - * mark all pages dirty if any IOMMU capable device is not able - * to report dirty pages and all pages are pinned and mapped. - */ - if (iommu->num_non_pinned_groups && dma->iommu_mapped) + if (!iommu->num_non_pinned_groups || !dma->iommu_mapped) { + /* nothing to do */ + } else if (!iommu->num_non_hwdbm_groups) { + /* try to get dirty log from IOMMU */ + ret = vfio_iommu_dirty_log_sync(iommu, dma, pgshift); + if (ret) + return ret; + } else { + /* + * mark all pages dirty if any IOMMU capable device is not able + * to report dirty pages and all pages are pinned and mapped. + */ bitmap_set(dma->bitmap, 0, nbits); + } if (shift) { bitmap_shift_left(dma->bitmap, dma->bitmap, shift, @@ -1236,6 +1285,12 @@ static int update_user_bitmap(u64 __user *bitmap, struct vfio_iommu *iommu, DIRTY_BITMAP_BYTES(nbits + shift))) return -EFAULT; + /* Recover the bitmap if it'll be used to clear hardware dirty log */ + if (shift && iommu->num_non_pinned_groups && dma->iommu_mapped && + !iommu->num_non_hwdbm_groups) + bitmap_shift_right(dma->bitmap, dma->bitmap, shift, + nbits + shift); + return 0; } @@ -1274,6 +1329,16 @@ static int vfio_iova_dirty_bitmap(u64 __user *bitmap, struct vfio_iommu *iommu, if (ret) return ret; + /* Clear iommu dirty log to re-enable dirty log tracking */ + if (iommu->num_non_pinned_groups && dma->iommu_mapped && + !iommu->num_non_hwdbm_groups) { + ret = vfio_iommu_dirty_log_clear(iommu, dma->iova, + dma->size, dma->bitmap, dma->iova, + pgshift); + if (ret) + return ret; + } + /* * Re-populate bitmap to include all pinned pages which are * considered as dirty but exclude pages which are unpinned and @@ -1294,6 +1359,22 @@ static int verify_bitmap_size(uint64_t npages, uint64_t bitmap_size) return 0; } +static void vfio_dma_dirty_log_switch(struct vfio_iommu *iommu, + struct vfio_dma *dma, bool enable) +{ + struct vfio_domain *d; + + if (!dma->iommu_mapped) + return; + + list_for_each_entry(d, &iommu->domain_list, next) { + if (d->num_non_hwdbm_groups) + continue; + WARN_ON(iommu_switch_dirty_log(d->domain, enable, dma->iova, + dma->size, d->prot | dma->prot)); + } +} + static int vfio_dma_do_unmap(struct vfio_iommu *iommu, struct vfio_iommu_type1_dma_unmap *unmap, struct vfio_bitmap *bitmap) @@ -1446,6 +1527,10 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu, break; } + /* Stop log for removed dma */ + if (iommu->dirty_page_tracking) + vfio_dma_dirty_log_switch(iommu, dma, false); + unmapped += dma->size; n = rb_next(n); vfio_remove_dma(iommu, dma); @@ -1677,8 +1762,13 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, if (!ret && iommu->dirty_page_tracking) { ret = vfio_dma_bitmap_alloc(dma, pgsize); - if (ret) + if (ret) { vfio_remove_dma(iommu, dma); + goto out_unlock; + } + + /* Start dirty log for newly added dma */ + vfio_dma_dirty_log_switch(iommu, dma, true); } out_unlock: @@ -2273,6 +2363,21 @@ static bool vfio_group_supports_hwdbm(struct vfio_group *group) vfio_dev_enable_feature); } +static void vfio_domain_dirty_log_switch(struct vfio_iommu *iommu, + struct vfio_domain *d, bool enable) +{ + struct rb_node *n; + struct vfio_dma *dma; + + for (n = rb_first(&iommu->dma_list); n; n = rb_next(n)) { + dma = rb_entry(n, struct vfio_dma, node); + if (!dma->iommu_mapped) + continue; + WARN_ON(iommu_switch_dirty_log(d->domain, enable, dma->iova, + dma->size, d->prot | dma->prot)); + } +} + /* * Called after a new group is added to the group_list of domain, or before an * old group is removed from the group_list of domain. @@ -2282,6 +2387,10 @@ static void vfio_iommu_update_hwdbm(struct vfio_iommu *iommu, struct vfio_group *group, bool attach) { + uint64_t old_num_non_hwdbm = domain->num_non_hwdbm_groups; + bool singular = list_is_singular(&domain->group_list); + bool log_enabled, should_enable; + /* Update the HWDBM status of group, domain and iommu */ group->iommu_hwdbm = vfio_group_supports_hwdbm(group); if (!group->iommu_hwdbm && attach) { @@ -2291,6 +2400,30 @@ static void vfio_iommu_update_hwdbm(struct vfio_iommu *iommu, domain->num_non_hwdbm_groups--; iommu->num_non_hwdbm_groups--; } + + if (!iommu->dirty_page_tracking) + return; + + /* + * The vfio_domain can switch dirty log tracking dynamically due to + * group attach/detach. The basic idea is to convert current dirty log + * status to desired dirty log status. + * + * If num_non_hwdbm_groups is zero then dirty log has been enabled. One + * exception is that this is the first group attached to a domain. + * + * If the updated num_non_hwdbm_groups is zero then dirty log should be + * enabled. One exception is that this is the last group detached from + * a domain. + */ + log_enabled = !old_num_non_hwdbm && !(attach && singular); + should_enable = !domain->num_non_hwdbm_groups && !(!attach && singular); + + /* Switch dirty log tracking when status changed */ + if (should_enable && !log_enabled) + vfio_domain_dirty_log_switch(iommu, domain, true); + else if (!should_enable && log_enabled) + vfio_domain_dirty_log_switch(iommu, domain, false); } static int vfio_iommu_type1_attach_group(void *iommu_data, @@ -3046,6 +3179,22 @@ static int vfio_iommu_type1_unmap_dma(struct vfio_iommu *iommu, -EFAULT : 0; } +static void vfio_iommu_dirty_log_switch(struct vfio_iommu *iommu, bool enable) +{ + struct vfio_domain *d; + + /* + * We enable dirty log tracking for these vfio_domains that support + * HWDBM. Even if all iommu domains don't support HWDBM for now. They + * may support it after detach some groups. + */ + list_for_each_entry(d, &iommu->domain_list, next) { + if (d->num_non_hwdbm_groups) + continue; + vfio_domain_dirty_log_switch(iommu, d, enable); + } +} + static int vfio_iommu_type1_dirty_pages(struct vfio_iommu *iommu, unsigned long arg) { @@ -3078,8 +3227,10 @@ static int vfio_iommu_type1_dirty_pages(struct vfio_iommu *iommu, pgsize = 1 << __ffs(iommu->pgsize_bitmap); if (!iommu->dirty_page_tracking) { ret = vfio_dma_bitmap_alloc_all(iommu, pgsize); - if (!ret) + if (!ret) { iommu->dirty_page_tracking = true; + vfio_iommu_dirty_log_switch(iommu, true); + } } mutex_unlock(&iommu->lock); return ret; @@ -3088,6 +3239,7 @@ static int vfio_iommu_type1_dirty_pages(struct vfio_iommu *iommu, if (iommu->dirty_page_tracking) { iommu->dirty_page_tracking = false; vfio_dma_bitmap_free_all(iommu); + vfio_iommu_dirty_log_switch(iommu, false); } mutex_unlock(&iommu->lock); return 0; From patchwork Tue Apr 13 09:14:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: zhukeqian X-Patchwork-Id: 12199741 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 88F10C433ED for ; Tue, 13 Apr 2021 09:15:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 66ADF610CE for ; Tue, 13 Apr 2021 09:15:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245406AbhDMJPc (ORCPT ); Tue, 13 Apr 2021 05:15:32 -0400 Received: from szxga05-in.huawei.com ([45.249.212.191]:16551 "EHLO szxga05-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229544AbhDMJPc (ORCPT ); Tue, 13 Apr 2021 05:15:32 -0400 Received: from DGGEMS412-HUB.china.huawei.com (unknown [172.30.72.59]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4FKKcg1clTzPqlG; Tue, 13 Apr 2021 17:12:19 +0800 (CST) Received: from DESKTOP-5IS4806.china.huawei.com (10.174.187.224) by DGGEMS412-HUB.china.huawei.com (10.3.19.212) with Microsoft SMTP Server id 14.3.498.0; Tue, 13 Apr 2021 17:15:04 +0800 From: Keqian Zhu To: , , Alex Williamson , Kirti Wankhede , "Cornelia Huck" , Yi Sun , Tian Kevin CC: Robin Murphy , Will Deacon , "Joerg Roedel" , Jean-Philippe Brucker , Jonathan Cameron , Lu Baolu , , , , Subject: [PATCH 3/3] vfio/iommu_type1: Add support for manual dirty log clear Date: Tue, 13 Apr 2021 17:14:45 +0800 Message-ID: <20210413091445.7448-4-zhukeqian1@huawei.com> X-Mailer: git-send-email 2.8.4.windows.1 In-Reply-To: <20210413091445.7448-1-zhukeqian1@huawei.com> References: <20210413091445.7448-1-zhukeqian1@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.174.187.224] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Kunkun Jiang In the past, we clear dirty log immediately after sync dirty log to userspace. This may cause redundant dirty handling if userspace handles dirty log iteratively: After vfio clears dirty log, new dirty log starts to generate. These new dirty log will be reported to userspace even if they are generated before userspace handles the same dirty page. That's to say, we should minimize the time gap of dirty log clearing and dirty log handling. We can give userspace the interface to clear dirty log. Co-developed-by: Keqian Zhu Signed-off-by: Kunkun Jiang --- drivers/vfio/vfio_iommu_type1.c | 100 ++++++++++++++++++++++++++++++-- include/uapi/linux/vfio.h | 28 ++++++++- 2 files changed, 123 insertions(+), 5 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 77950e47f56f..d9c4a27b3c4e 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -78,6 +78,7 @@ struct vfio_iommu { bool v2; bool nesting; bool dirty_page_tracking; + bool dirty_log_manual_clear; bool pinned_page_dirty_scope; bool container_open; }; @@ -1242,6 +1243,78 @@ static int vfio_iommu_dirty_log_sync(struct vfio_iommu *iommu, return ret; } +static int vfio_iova_dirty_log_clear(u64 __user *bitmap, + struct vfio_iommu *iommu, + dma_addr_t iova, size_t size, + size_t pgsize) +{ + struct vfio_dma *dma; + struct rb_node *n; + dma_addr_t start_iova, end_iova, riova; + unsigned long pgshift = __ffs(pgsize); + unsigned long bitmap_size; + unsigned long *bitmap_buffer = NULL; + bool clear_valid; + int rs, re, start, end, dma_offset; + int ret = 0; + + bitmap_size = DIRTY_BITMAP_BYTES(size >> pgshift); + bitmap_buffer = kvmalloc(bitmap_size, GFP_KERNEL); + if (!bitmap_buffer) { + ret = -ENOMEM; + goto out; + } + + if (copy_from_user(bitmap_buffer, bitmap, bitmap_size)) { + ret = -EFAULT; + goto out; + } + + for (n = rb_first(&iommu->dma_list); n; n = rb_next(n)) { + dma = rb_entry(n, struct vfio_dma, node); + if (!dma->iommu_mapped) + continue; + if ((dma->iova + dma->size - 1) < iova) + continue; + if (dma->iova > iova + size - 1) + break; + + start_iova = max(iova, dma->iova); + end_iova = min(iova + size, dma->iova + dma->size); + + /* Similar logic as the tail of vfio_iova_dirty_bitmap */ + + clear_valid = false; + start = (start_iova - iova) >> pgshift; + end = (end_iova - iova) >> pgshift; + bitmap_for_each_set_region(bitmap_buffer, rs, re, start, end) { + clear_valid = true; + riova = iova + (rs << pgshift); + dma_offset = (riova - dma->iova) >> pgshift; + bitmap_clear(dma->bitmap, dma_offset, re - rs); + } + + if (clear_valid) + vfio_dma_populate_bitmap(dma, pgsize); + + if (clear_valid && !iommu->pinned_page_dirty_scope && + dma->iommu_mapped && !iommu->num_non_hwdbm_groups) { + ret = vfio_iommu_dirty_log_clear(iommu, start_iova, + end_iova - start_iova, bitmap_buffer, + iova, pgshift); + if (ret) { + pr_warn("dma dirty log clear failed!\n"); + goto out; + } + } + + } + +out: + kfree(bitmap_buffer); + return ret; +} + static int update_user_bitmap(u64 __user *bitmap, struct vfio_iommu *iommu, struct vfio_dma *dma, dma_addr_t base_iova, size_t pgsize) @@ -1329,6 +1402,10 @@ static int vfio_iova_dirty_bitmap(u64 __user *bitmap, struct vfio_iommu *iommu, if (ret) return ret; + /* Do not clear dirty automatically when manual_clear enabled */ + if (iommu->dirty_log_manual_clear) + continue; + /* Clear iommu dirty log to re-enable dirty log tracking */ if (iommu->num_non_pinned_groups && dma->iommu_mapped && !iommu->num_non_hwdbm_groups) { @@ -2946,6 +3023,11 @@ static int vfio_iommu_type1_check_extension(struct vfio_iommu *iommu, if (!iommu) return 0; return vfio_domains_have_iommu_cache(iommu); + case VFIO_DIRTY_LOG_MANUAL_CLEAR: + if (!iommu) + return 0; + iommu->dirty_log_manual_clear = true; + return 1; default: return 0; } @@ -3201,7 +3283,8 @@ static int vfio_iommu_type1_dirty_pages(struct vfio_iommu *iommu, struct vfio_iommu_type1_dirty_bitmap dirty; uint32_t mask = VFIO_IOMMU_DIRTY_PAGES_FLAG_START | VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP | - VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP; + VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP | + VFIO_IOMMU_DIRTY_PAGES_FLAG_CLEAR_BITMAP; unsigned long minsz; int ret = 0; @@ -3243,7 +3326,8 @@ static int vfio_iommu_type1_dirty_pages(struct vfio_iommu *iommu, } mutex_unlock(&iommu->lock); return 0; - } else if (dirty.flags & VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP) { + } else if (dirty.flags & (VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP | + VFIO_IOMMU_DIRTY_PAGES_FLAG_CLEAR_BITMAP)) { struct vfio_iommu_type1_dirty_bitmap_get range; unsigned long pgshift; size_t data_size = dirty.argsz - minsz; @@ -3286,13 +3370,21 @@ static int vfio_iommu_type1_dirty_pages(struct vfio_iommu *iommu, goto out_unlock; } - if (iommu->dirty_page_tracking) + if (!iommu->dirty_page_tracking) { + ret = -EINVAL; + goto out_unlock; + } + + if (dirty.flags & VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP) ret = vfio_iova_dirty_bitmap(range.bitmap.data, iommu, range.iova, range.size, range.bitmap.pgsize); else - ret = -EINVAL; + ret = vfio_iova_dirty_log_clear(range.bitmap.data, + iommu, range.iova, + range.size, + range.bitmap.pgsize); out_unlock: mutex_unlock(&iommu->lock); diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 8ce36c1d53ca..784dc3cf2a8f 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -52,6 +52,14 @@ /* Supports the vaddr flag for DMA map and unmap */ #define VFIO_UPDATE_VADDR 10 +/* + * The vfio_iommu driver may support user clears dirty log manually, which means + * dirty log is not cleared automatically after dirty log is copied to userspace, + * it's user's duty to clear dirty log. Note: when user queries this extension + * and vfio_iommu driver supports it, then it is enabled. + */ +#define VFIO_DIRTY_LOG_MANUAL_CLEAR 11 + /* * The IOCTL interface is designed for extensibility by embedding the * structure length (argsz) and flags into structures passed between @@ -1188,7 +1196,24 @@ struct vfio_iommu_type1_dma_unmap { * actual bitmap. If dirty pages logging is not enabled, an error will be * returned. * - * Only one of the flags _START, _STOP and _GET may be specified at a time. + * Calling the IOCTL with VFIO_IOMMU_DIRTY_PAGES_FLAG_CLEAR_BITMAP flag set, + * instructs the IOMMU driver to clear the dirty status of pages in a bitmap + * for IOMMU container for a given IOVA range. The user must specify the IOVA + * range, the bitmap and the pgsize through the structure + * vfio_iommu_type1_dirty_bitmap_get in the data[] portion. This interface + * supports clearing a bitmap of the smallest supported pgsize only and can be + * modified in future to clear a bitmap of any specified supported pgsize. The + * user must provide a memory area for the bitmap memory and specify its size + * in bitmap.size. One bit is used to represent one page consecutively starting + * from iova offset. The user should provide page size in bitmap.pgsize field. + * A bit set in the bitmap indicates that the page at that offset from iova is + * cleared the dirty status, and dirty tracking is re-enabled for that page. The + * caller must set argsz to a value including the size of structure + * vfio_iommu_dirty_bitmap_get, but excluing the size of the actual bitmap. If + * dirty pages logging is not enabled, an error will be returned. + * + * Only one of the flags _START, _STOP, _GET and _CLEAR may be specified at a + * time. * */ struct vfio_iommu_type1_dirty_bitmap { @@ -1197,6 +1222,7 @@ struct vfio_iommu_type1_dirty_bitmap { #define VFIO_IOMMU_DIRTY_PAGES_FLAG_START (1 << 0) #define VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP (1 << 1) #define VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP (1 << 2) +#define VFIO_IOMMU_DIRTY_PAGES_FLAG_CLEAR_BITMAP (1 << 3) __u8 data[]; };