diff mbox series

[v3,02/12] iommu: Add iommu_split_block interface

Message ID 20210413085457.25400-3-zhukeqian1@huawei.com (mailing list archive)
State New, archived
Headers show
Series iommu/smmuv3: Implement hardware dirty log tracking | expand

Commit Message

zhukeqian April 13, 2021, 8:54 a.m. UTC
Block(largepage) mapping is not a proper granule for dirty log tracking.
Take an extreme example, if DMA writes one byte, under 1G mapping, the
dirty amount reported is 1G, but under 4K mapping, the dirty amount is
just 4K.

This adds a new interface named iommu_split_block in IOMMU base layer.
A specific IOMMU driver can invoke it during start dirty log. If so, the
driver also need to realize the split_block iommu ops.

We flush all iotlbs after the whole procedure is completed to ease the
pressure of IOMMU, as we will hanle a huge range of mapping in general.

Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
---
 drivers/iommu/iommu.c | 41 +++++++++++++++++++++++++++++++++++++++++
 include/linux/iommu.h | 11 +++++++++++
 2 files changed, 52 insertions(+)

Comments

Baolu Lu April 14, 2021, 7:14 a.m. UTC | #1
On 4/13/21 4:54 PM, Keqian Zhu wrote:
> Block(largepage) mapping is not a proper granule for dirty log tracking.
> Take an extreme example, if DMA writes one byte, under 1G mapping, the
> dirty amount reported is 1G, but under 4K mapping, the dirty amount is
> just 4K.
> 
> This adds a new interface named iommu_split_block in IOMMU base layer.
> A specific IOMMU driver can invoke it during start dirty log. If so, the
> driver also need to realize the split_block iommu ops.
> 
> We flush all iotlbs after the whole procedure is completed to ease the
> pressure of IOMMU, as we will hanle a huge range of mapping in general.
> 
> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
> ---
>   drivers/iommu/iommu.c | 41 +++++++++++++++++++++++++++++++++++++++++
>   include/linux/iommu.h | 11 +++++++++++
>   2 files changed, 52 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 667b2d6d2fc0..bb413a927870 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2721,6 +2721,47 @@ int iommu_domain_set_attr(struct iommu_domain *domain,
>   }
>   EXPORT_SYMBOL_GPL(iommu_domain_set_attr);
>   
> +int iommu_split_block(struct iommu_domain *domain, unsigned long iova,
> +		      size_t size)
> +{
> +	const struct iommu_ops *ops = domain->ops;
> +	unsigned int min_pagesz;
> +	size_t pgsize;
> +	bool flush = false;
> +	int ret = 0;
> +
> +	if (unlikely(!ops || !ops->split_block))
> +		return -ENODEV;
> +
> +	min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
> +	if (!IS_ALIGNED(iova | size, min_pagesz)) {
> +		pr_err("unaligned: iova 0x%lx size 0x%zx min_pagesz 0x%x\n",
> +		       iova, size, min_pagesz);
> +		return -EINVAL;
> +	}
> +
> +	while (size) {
> +		flush = true;
> +
> +		pgsize = iommu_pgsize(domain, iova, size);
> +
> +		ret = ops->split_block(domain, iova, pgsize);
> +		if (ret)
> +			break;
> +
> +		pr_debug("split handled: iova 0x%lx size 0x%zx\n", iova, pgsize);
> +
> +		iova += pgsize;
> +		size -= pgsize;
> +	}
> +
> +	if (flush)
> +		iommu_flush_iotlb_all(domain);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_split_block);

Do you really have any consumers of this interface other than the dirty
bit tracking? If not, I don't suggest to make this as a generic IOMMU
interface.

There is an implicit requirement for such interfaces. The
iommu_map/unmap(iova, size) shouldn't be called at the same time.
Currently there's no such sanity check in the iommu core. A poorly
written driver could mess up the kernel by misusing this interface.

This also applies to iommu_merge_page().

Best regards,
baolu
zhukeqian April 19, 2021, 9:32 a.m. UTC | #2
Hi Baolu,

On 2021/4/14 15:14, Lu Baolu wrote:
> On 4/13/21 4:54 PM, Keqian Zhu wrote:
>> Block(largepage) mapping is not a proper granule for dirty log tracking.
>> Take an extreme example, if DMA writes one byte, under 1G mapping, the
>> dirty amount reported is 1G, but under 4K mapping, the dirty amount is
>> just 4K.
>>
>> This adds a new interface named iommu_split_block in IOMMU base layer.
>> A specific IOMMU driver can invoke it during start dirty log. If so, the
>> driver also need to realize the split_block iommu ops.
>>
>> We flush all iotlbs after the whole procedure is completed to ease the
>> pressure of IOMMU, as we will hanle a huge range of mapping in general.
>>
>> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
>> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
>> ---
>>   drivers/iommu/iommu.c | 41 +++++++++++++++++++++++++++++++++++++++++
>>   include/linux/iommu.h | 11 +++++++++++
>>   2 files changed, 52 insertions(+)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index 667b2d6d2fc0..bb413a927870 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -2721,6 +2721,47 @@ int iommu_domain_set_attr(struct iommu_domain *domain,
>>   }
>>   EXPORT_SYMBOL_GPL(iommu_domain_set_attr);
>>   +int iommu_split_block(struct iommu_domain *domain, unsigned long iova,
>> +              size_t size)
>> +{
>> +    const struct iommu_ops *ops = domain->ops;
>> +    unsigned int min_pagesz;
>> +    size_t pgsize;
>> +    bool flush = false;
>> +    int ret = 0;
>> +
>> +    if (unlikely(!ops || !ops->split_block))
>> +        return -ENODEV;
>> +
>> +    min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
>> +    if (!IS_ALIGNED(iova | size, min_pagesz)) {
>> +        pr_err("unaligned: iova 0x%lx size 0x%zx min_pagesz 0x%x\n",
>> +               iova, size, min_pagesz);
>> +        return -EINVAL;
>> +    }
>> +
>> +    while (size) {
>> +        flush = true;
>> +
>> +        pgsize = iommu_pgsize(domain, iova, size);
>> +
>> +        ret = ops->split_block(domain, iova, pgsize);
>> +        if (ret)
>> +            break;
>> +
>> +        pr_debug("split handled: iova 0x%lx size 0x%zx\n", iova, pgsize);
>> +
>> +        iova += pgsize;
>> +        size -= pgsize;
>> +    }
>> +
>> +    if (flush)
>> +        iommu_flush_iotlb_all(domain);
>> +
>> +    return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_split_block);
> 
> Do you really have any consumers of this interface other than the dirty
> bit tracking? If not, I don't suggest to make this as a generic IOMMU
> interface.
> 
> There is an implicit requirement for such interfaces. The
> iommu_map/unmap(iova, size) shouldn't be called at the same time.
> Currently there's no such sanity check in the iommu core. A poorly
> written driver could mess up the kernel by misusing this interface.

Yes, I don't think up a scenario except dirty tracking.

Indeed, we'd better not make them as a generic interface.

Do you have any suggestion that underlying iommu drivers can share these code but
not make it as a generic iommu interface?

I have a not so good idea. Make the "split" interfaces as a static function, and
transfer the function pointer to start_dirty_log. But it looks weird and inflexible.

On the other hand, if a driver calls map/unmap with split/merge at the same time,
it's a bug of driver, it should follow the rule.

> 
> This also applies to iommu_merge_page().
>

Thanks,
Keqian
Baolu Lu April 19, 2021, 1:33 p.m. UTC | #3
Hi Keqian,

On 2021/4/19 17:32, Keqian Zhu wrote:
>>> +EXPORT_SYMBOL_GPL(iommu_split_block);
>> Do you really have any consumers of this interface other than the dirty
>> bit tracking? If not, I don't suggest to make this as a generic IOMMU
>> interface.
>>
>> There is an implicit requirement for such interfaces. The
>> iommu_map/unmap(iova, size) shouldn't be called at the same time.
>> Currently there's no such sanity check in the iommu core. A poorly
>> written driver could mess up the kernel by misusing this interface.
> Yes, I don't think up a scenario except dirty tracking.
> 
> Indeed, we'd better not make them as a generic interface.
> 
> Do you have any suggestion that underlying iommu drivers can share these code but
> not make it as a generic iommu interface?
> 
> I have a not so good idea. Make the "split" interfaces as a static function, and
> transfer the function pointer to start_dirty_log. But it looks weird and inflexible.

I understand splitting/merging super pages is an optimization, but not a
functional requirement. So is it possible to let the vendor iommu driver
decide whether splitting super pages when starting dirty bit tracking
and the opposite operation during when stopping it? The requirement for
upper layer is that starting/stopping dirty bit tracking and
mapping/unmapping are mutually exclusive.

> 
> On the other hand, if a driver calls map/unmap with split/merge at the same time,
> it's a bug of driver, it should follow the rule.
> 

Best regards,
baolu
zhukeqian April 20, 2021, 1:25 a.m. UTC | #4
Hi Baolu,

On 2021/4/19 21:33, Lu Baolu wrote:
> Hi Keqian,
> 
> On 2021/4/19 17:32, Keqian Zhu wrote:
>>>> +EXPORT_SYMBOL_GPL(iommu_split_block);
>>> Do you really have any consumers of this interface other than the dirty
>>> bit tracking? If not, I don't suggest to make this as a generic IOMMU
>>> interface.
>>>
>>> There is an implicit requirement for such interfaces. The
>>> iommu_map/unmap(iova, size) shouldn't be called at the same time.
>>> Currently there's no such sanity check in the iommu core. A poorly
>>> written driver could mess up the kernel by misusing this interface.
>> Yes, I don't think up a scenario except dirty tracking.
>>
>> Indeed, we'd better not make them as a generic interface.
>>
>> Do you have any suggestion that underlying iommu drivers can share these code but
>> not make it as a generic iommu interface?
>>
>> I have a not so good idea. Make the "split" interfaces as a static function, and
>> transfer the function pointer to start_dirty_log. But it looks weird and inflexible.
> 
> I understand splitting/merging super pages is an optimization, but not a
> functional requirement. So is it possible to let the vendor iommu driver
> decide whether splitting super pages when starting dirty bit tracking
> and the opposite operation during when stopping it? The requirement for
Right. If I understand you correct, actually that is what this series does.
We realized split/merge in IOMMU core layer, but don't force vendor driver to use it.

The problem is that when we expose these interfaces to vendor IOMMU driver, will also
expose them to upper driver.

> upper layer is that starting/stopping dirty bit tracking and
> mapping/unmapping are mutually exclusive.
OK, I will explicitly add the hints. Thanks.

Thanks,
Keqian
> 
>>
>> On the other hand, if a driver calls map/unmap with split/merge at the same time,
>> it's a bug of driver, it should follow the rule.
>>
> 
> Best regards,
> baolu
> .
>
Baolu Lu April 20, 2021, 2:09 a.m. UTC | #5
Hi Keqian,

On 4/20/21 9:25 AM, Keqian Zhu wrote:
> Hi Baolu,
> 
> On 2021/4/19 21:33, Lu Baolu wrote:
>> Hi Keqian,
>>
>> On 2021/4/19 17:32, Keqian Zhu wrote:
>>>>> +EXPORT_SYMBOL_GPL(iommu_split_block);
>>>> Do you really have any consumers of this interface other than the dirty
>>>> bit tracking? If not, I don't suggest to make this as a generic IOMMU
>>>> interface.
>>>>
>>>> There is an implicit requirement for such interfaces. The
>>>> iommu_map/unmap(iova, size) shouldn't be called at the same time.
>>>> Currently there's no such sanity check in the iommu core. A poorly
>>>> written driver could mess up the kernel by misusing this interface.
>>> Yes, I don't think up a scenario except dirty tracking.
>>>
>>> Indeed, we'd better not make them as a generic interface.
>>>
>>> Do you have any suggestion that underlying iommu drivers can share these code but
>>> not make it as a generic iommu interface?
>>>
>>> I have a not so good idea. Make the "split" interfaces as a static function, and
>>> transfer the function pointer to start_dirty_log. But it looks weird and inflexible.
>>
>> I understand splitting/merging super pages is an optimization, but not a
>> functional requirement. So is it possible to let the vendor iommu driver
>> decide whether splitting super pages when starting dirty bit tracking
>> and the opposite operation during when stopping it? The requirement for
> Right. If I understand you correct, actually that is what this series does.

I mean to say no generic APIs, jut do it by the iommu subsystem itself.
It's totally transparent to the upper level, just like what map() does.
The upper layer doesn't care about either super page or small page is
in use when do a mapping, right?

If you want to consolidate some code, how about putting them in
start/stop_tracking()?

Best regards,
baolu

> We realized split/merge in IOMMU core layer, but don't force vendor driver to use it.
> 
> The problem is that when we expose these interfaces to vendor IOMMU driver, will also
> expose them to upper driver.
> 
>> upper layer is that starting/stopping dirty bit tracking and
>> mapping/unmapping are mutually exclusive.
> OK, I will explicitly add the hints. Thanks.
> 
> Thanks,
> Keqian
>>
>>>
>>> On the other hand, if a driver calls map/unmap with split/merge at the same time,
>>> it's a bug of driver, it should follow the rule.
>>>
>>
>> Best regards,
>> baolu
>> .
>>
zhukeqian April 20, 2021, 7:32 a.m. UTC | #6
Hi Baolu,

Cheers for the your quick reply.

On 2021/4/20 10:09, Lu Baolu wrote:
> Hi Keqian,
> 
> On 4/20/21 9:25 AM, Keqian Zhu wrote:
>> Hi Baolu,
>>
>> On 2021/4/19 21:33, Lu Baolu wrote:
>>> Hi Keqian,
>>>
>>> On 2021/4/19 17:32, Keqian Zhu wrote:
>>>>>> +EXPORT_SYMBOL_GPL(iommu_split_block);
>>>>> Do you really have any consumers of this interface other than the dirty
>>>>> bit tracking? If not, I don't suggest to make this as a generic IOMMU
>>>>> interface.
>>>>>
>>>>> There is an implicit requirement for such interfaces. The
>>>>> iommu_map/unmap(iova, size) shouldn't be called at the same time.
>>>>> Currently there's no such sanity check in the iommu core. A poorly
>>>>> written driver could mess up the kernel by misusing this interface.
>>>> Yes, I don't think up a scenario except dirty tracking.
>>>>
>>>> Indeed, we'd better not make them as a generic interface.
>>>>
>>>> Do you have any suggestion that underlying iommu drivers can share these code but
>>>> not make it as a generic iommu interface?
>>>>
>>>> I have a not so good idea. Make the "split" interfaces as a static function, and
>>>> transfer the function pointer to start_dirty_log. But it looks weird and inflexible.
>>>
>>> I understand splitting/merging super pages is an optimization, but not a
>>> functional requirement. So is it possible to let the vendor iommu driver
>>> decide whether splitting super pages when starting dirty bit tracking
>>> and the opposite operation during when stopping it? The requirement for
>> Right. If I understand you correct, actually that is what this series does.
> 
> I mean to say no generic APIs, jut do it by the iommu subsystem itself.
> It's totally transparent to the upper level, just like what map() does.
> The upper layer doesn't care about either super page or small page is
> in use when do a mapping, right?
> 
> If you want to consolidate some code, how about putting them in
> start/stop_tracking()?

Yep, this reminds me. What we want to reuse is the logic of "chunk by chunk" in split().
We can implement switch_dirty_log to be "chunk by chunk" too (just the same as sync/clear),
then the vendor iommu driver can invoke it's own private implementation of split().
So we can completely remove split() in the IOMMU core layer.

example code logic

iommu.c:
switch_dirty_log(big range) {
    for_each_iommu_page(big range) {
          ops->switch_dirty_log(iommu_pgsize)
    }
}

vendor iommu driver:
switch_dirty_log(iommu_pgsize) {

    if (enable) {
        ops->split_block(iommu_pgsize)
        /* And other actions, such as enable hardware capability */
    } else {
        for_each_continuous_physical_address(iommu_pgsize)
            ops->merge_page()
    }
}

Besides, vendor iommu driver can invoke split() in clear_dirty_log instead of in switch_dirty_log.
The benefit is that we usually clear dirty log gradually during dirty tracking, then we can split
large page mapping gradually, which speedup start_dirty_log and make less side effect on DMA performance.

Does it looks good for you?

Thanks,
Keqian
Baolu Lu April 20, 2021, 7:53 a.m. UTC | #7
On 4/20/21 3:32 PM, Keqian Zhu wrote:
> Hi Baolu,
> 
> Cheers for the your quick reply.
> 
> On 2021/4/20 10:09, Lu Baolu wrote:
>> Hi Keqian,
>>
>> On 4/20/21 9:25 AM, Keqian Zhu wrote:
>>> Hi Baolu,
>>>
>>> On 2021/4/19 21:33, Lu Baolu wrote:
>>>> Hi Keqian,
>>>>
>>>> On 2021/4/19 17:32, Keqian Zhu wrote:
>>>>>>> +EXPORT_SYMBOL_GPL(iommu_split_block);
>>>>>> Do you really have any consumers of this interface other than the dirty
>>>>>> bit tracking? If not, I don't suggest to make this as a generic IOMMU
>>>>>> interface.
>>>>>>
>>>>>> There is an implicit requirement for such interfaces. The
>>>>>> iommu_map/unmap(iova, size) shouldn't be called at the same time.
>>>>>> Currently there's no such sanity check in the iommu core. A poorly
>>>>>> written driver could mess up the kernel by misusing this interface.
>>>>> Yes, I don't think up a scenario except dirty tracking.
>>>>>
>>>>> Indeed, we'd better not make them as a generic interface.
>>>>>
>>>>> Do you have any suggestion that underlying iommu drivers can share these code but
>>>>> not make it as a generic iommu interface?
>>>>>
>>>>> I have a not so good idea. Make the "split" interfaces as a static function, and
>>>>> transfer the function pointer to start_dirty_log. But it looks weird and inflexible.
>>>>
>>>> I understand splitting/merging super pages is an optimization, but not a
>>>> functional requirement. So is it possible to let the vendor iommu driver
>>>> decide whether splitting super pages when starting dirty bit tracking
>>>> and the opposite operation during when stopping it? The requirement for
>>> Right. If I understand you correct, actually that is what this series does.
>>
>> I mean to say no generic APIs, jut do it by the iommu subsystem itself.
>> It's totally transparent to the upper level, just like what map() does.
>> The upper layer doesn't care about either super page or small page is
>> in use when do a mapping, right?
>>
>> If you want to consolidate some code, how about putting them in
>> start/stop_tracking()?
> 
> Yep, this reminds me. What we want to reuse is the logic of "chunk by chunk" in split().
> We can implement switch_dirty_log to be "chunk by chunk" too (just the same as sync/clear),
> then the vendor iommu driver can invoke it's own private implementation of split().
> So we can completely remove split() in the IOMMU core layer.
> 
> example code logic
> 
> iommu.c:
> switch_dirty_log(big range) {
>      for_each_iommu_page(big range) {
>            ops->switch_dirty_log(iommu_pgsize)
>      }
> }
> 
> vendor iommu driver:
> switch_dirty_log(iommu_pgsize) {
> 
>      if (enable) {
>          ops->split_block(iommu_pgsize)
>          /* And other actions, such as enable hardware capability */
>      } else {
>          for_each_continuous_physical_address(iommu_pgsize)
>              ops->merge_page()
>      }
> }
> 
> Besides, vendor iommu driver can invoke split() in clear_dirty_log instead of in switch_dirty_log.
> The benefit is that we usually clear dirty log gradually during dirty tracking, then we can split
> large page mapping gradually, which speedup start_dirty_log and make less side effect on DMA performance.
> 
> Does it looks good for you?

Yes. It's clearer now.

> 
> Thanks,
> Keqian
> 

Best regards,
baolu
diff mbox series

Patch

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 667b2d6d2fc0..bb413a927870 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2721,6 +2721,47 @@  int iommu_domain_set_attr(struct iommu_domain *domain,
 }
 EXPORT_SYMBOL_GPL(iommu_domain_set_attr);
 
+int iommu_split_block(struct iommu_domain *domain, unsigned long iova,
+		      size_t size)
+{
+	const struct iommu_ops *ops = domain->ops;
+	unsigned int min_pagesz;
+	size_t pgsize;
+	bool flush = false;
+	int ret = 0;
+
+	if (unlikely(!ops || !ops->split_block))
+		return -ENODEV;
+
+	min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
+	if (!IS_ALIGNED(iova | size, min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx size 0x%zx min_pagesz 0x%x\n",
+		       iova, size, min_pagesz);
+		return -EINVAL;
+	}
+
+	while (size) {
+		flush = true;
+
+		pgsize = iommu_pgsize(domain, iova, size);
+
+		ret = ops->split_block(domain, iova, pgsize);
+		if (ret)
+			break;
+
+		pr_debug("split handled: iova 0x%lx size 0x%zx\n", iova, pgsize);
+
+		iova += pgsize;
+		size -= pgsize;
+	}
+
+	if (flush)
+		iommu_flush_iotlb_all(domain);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_split_block);
+
 int iommu_switch_dirty_log(struct iommu_domain *domain, bool enable,
 			   unsigned long iova, size_t size, int prot)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 7f9ed9f520e2..c6c90ac069e3 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -208,6 +208,7 @@  struct iommu_iotlb_gather {
  * @device_group: find iommu group for a particular device
  * @domain_get_attr: Query domain attributes
  * @domain_set_attr: Change domain attributes
+ * @split_block: Split block mapping into page mapping
  * @switch_dirty_log: Perform actions to start|stop dirty log tracking
  * @sync_dirty_log: Sync dirty log from IOMMU into a dirty bitmap
  * @clear_dirty_log: Clear dirty log of IOMMU by a mask bitmap
@@ -267,6 +268,8 @@  struct iommu_ops {
 			       enum iommu_attr attr, void *data);
 
 	/* Track dirty log */
+	int (*split_block)(struct iommu_domain *domain, unsigned long iova,
+			   size_t size);
 	int (*switch_dirty_log)(struct iommu_domain *domain, bool enable,
 				unsigned long iova, size_t size, int prot);
 	int (*sync_dirty_log)(struct iommu_domain *domain,
@@ -529,6 +532,8 @@  extern int iommu_domain_get_attr(struct iommu_domain *domain, enum iommu_attr,
 				 void *data);
 extern int iommu_domain_set_attr(struct iommu_domain *domain, enum iommu_attr,
 				 void *data);
+extern int iommu_split_block(struct iommu_domain *domain, unsigned long iova,
+			     size_t size);
 extern int iommu_switch_dirty_log(struct iommu_domain *domain, bool enable,
 				  unsigned long iova, size_t size, int prot);
 extern int iommu_sync_dirty_log(struct iommu_domain *domain, unsigned long iova,
@@ -929,6 +934,12 @@  static inline int iommu_domain_set_attr(struct iommu_domain *domain,
 	return -EINVAL;
 }
 
+static inline int iommu_split_block(struct iommu_domain *domain,
+				    unsigned long iova, size_t size)
+{
+	return -EINVAL;
+}
+
 static inline int iommu_switch_dirty_log(struct iommu_domain *domain,
 					 bool enable, unsigned long iova,
 					 size_t size, int prot)