diff mbox series

[v3,5/5] udmabuf: remove udmabuf_folio

Message ID 20240813090518.3252469-6-link@vivo.com (mailing list archive)
State New, archived
Headers show
Series udmbuf bug fix and some improvements | expand

Commit Message

Huan Yang Aug. 13, 2024, 9:05 a.m. UTC
Currently, udmabuf handles folio by creating an unpin list to record
each folio obtained from the list and unpinning them when released. To
maintain this approach, many data structures have been established.

However, maintaining this type of data structure requires a significant
amount of memory and traversing the list is a substantial overhead,
which is not friendly to the CPU cache.

Considering that during creation, we arranged the folio array in the
order of pin and set the offset according to pgcnt.

We actually don't need to use unpin_list to unpin during release.
Instead, we can iterate through the folios array during release and
unpin any folio that is different from the ones previously accessed.

By this, not only saves the overhead of the udmabuf_folio data structure
but also makes array access more cache-friendly.

Signed-off-by: Huan Yang <link@vivo.com>
---
 drivers/dma-buf/udmabuf.c | 68 +++++++++++++++++----------------------
 1 file changed, 30 insertions(+), 38 deletions(-)

Comments

kernel test robot Aug. 16, 2024, 12:54 p.m. UTC | #1
Hi Huan,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 033a4691702cdca3a613256b0623b8eeacb4985e]

url:    https://github.com/intel-lab-lkp/linux/commits/Huan-Yang/udmabuf-cancel-mmap-page-fault-direct-map-it/20240814-231504
base:   033a4691702cdca3a613256b0623b8eeacb4985e
patch link:    https://lore.kernel.org/r/20240813090518.3252469-6-link%40vivo.com
patch subject: [PATCH v3 5/5] udmabuf: remove udmabuf_folio
config: s390-allyesconfig (https://download.01.org/0day-ci/archive/20240816/202408162012.cL9pnFSm-lkp@intel.com/config)
compiler: s390-linux-gcc (GCC) 14.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240816/202408162012.cL9pnFSm-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202408162012.cL9pnFSm-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/dma-buf/udmabuf.c:175: warning: Function parameter or struct member 'ubuf' not described in 'unpin_all_folios'


vim +175 drivers/dma-buf/udmabuf.c

17a7ce20349045 Gurchetan Singh 2019-12-02  165  
d934739404652b Huan Yang       2024-08-13  166  /**
d934739404652b Huan Yang       2024-08-13  167   * unpin_all_folios:		unpin each folio we pinned in create
d934739404652b Huan Yang       2024-08-13  168   * The udmabuf set all folio in folios and pinned it, but for large folio,
d934739404652b Huan Yang       2024-08-13  169   * We may have only used a small portion of the physical in the folio.
d934739404652b Huan Yang       2024-08-13  170   * we will repeatedly, sequentially set the folio into the array to ensure
d934739404652b Huan Yang       2024-08-13  171   * that the offset can index the correct folio at the corresponding index.
d934739404652b Huan Yang       2024-08-13  172   * Hence, we only need to unpin the first iterred folio.
d934739404652b Huan Yang       2024-08-13  173   */
d934739404652b Huan Yang       2024-08-13  174  static void unpin_all_folios(struct udmabuf *ubuf)
c6a3194c05e7e6 Vivek Kasireddy 2024-06-23 @175  {
d934739404652b Huan Yang       2024-08-13  176  	pgoff_t pg;
d934739404652b Huan Yang       2024-08-13  177  	struct folio *last = NULL;
c6a3194c05e7e6 Vivek Kasireddy 2024-06-23  178  
d934739404652b Huan Yang       2024-08-13  179  	for (pg = 0; pg < ubuf->pagecount; pg++) {
d934739404652b Huan Yang       2024-08-13  180  		struct folio *tmp = ubuf->folios[pg];
c6a3194c05e7e6 Vivek Kasireddy 2024-06-23  181  
d934739404652b Huan Yang       2024-08-13  182  		if (tmp == last)
d934739404652b Huan Yang       2024-08-13  183  			continue;
c6a3194c05e7e6 Vivek Kasireddy 2024-06-23  184  
d934739404652b Huan Yang       2024-08-13  185  		last = tmp;
d934739404652b Huan Yang       2024-08-13  186  		unpin_folio(tmp);
d934739404652b Huan Yang       2024-08-13  187  	}
c6a3194c05e7e6 Vivek Kasireddy 2024-06-23  188  }
c6a3194c05e7e6 Vivek Kasireddy 2024-06-23  189
Vivek Kasireddy Aug. 17, 2024, 1:05 a.m. UTC | #2
Hi Huan,

> 
> Currently, udmabuf handles folio by creating an unpin list to record
> each folio obtained from the list and unpinning them when released. To
> maintain this approach, many data structures have been established.
> 
> However, maintaining this type of data structure requires a significant
> amount of memory and traversing the list is a substantial overhead,
> which is not friendly to the CPU cache.
> 
> Considering that during creation, we arranged the folio array in the
> order of pin and set the offset according to pgcnt.
> 
> We actually don't need to use unpin_list to unpin during release.
> Instead, we can iterate through the folios array during release and
> unpin any folio that is different from the ones previously accessed.
No, that won't work because iterating the folios array doesn't tell you
anything about how many times a folio was pinned (via memfd_pin_folios()),
as a folio could be part of multiple ranges.

For example, if userspace provides ranges 64..128 and 256..512 (assuming
these are 4k sized subpage offsets and we have a 2MB hugetlb folio), then
the same folio would cover both ranges and there would be 2 entries for
this folio in unpin_list. But, with your logic, we'd be erroneously unpinning
it only once.

Not sure if there are any great solutions available to address this situation,
but one option I can think of is to convert unpin_list to unpin array (dynamically
resized with krealloc?) and track its length separately. Or, as suggested earlier,
another way is to not use unpin_list for memfds backed by shmem, but I suspect
this may not work if THP is enabled.

Thanks,
Vivek

> 
> By this, not only saves the overhead of the udmabuf_folio data structure
> but also makes array access more cache-friendly.
> 
> Signed-off-by: Huan Yang <link@vivo.com>
> ---
>  drivers/dma-buf/udmabuf.c | 68 +++++++++++++++++----------------------
>  1 file changed, 30 insertions(+), 38 deletions(-)
> 
> diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
> index 8f9cb0e2e71a..1e7f46c33d1a 100644
> --- a/drivers/dma-buf/udmabuf.c
> +++ b/drivers/dma-buf/udmabuf.c
> @@ -26,16 +26,19 @@ MODULE_PARM_DESC(size_limit_mb, "Max size of a
> dmabuf, in megabytes. Default is
> 
>  struct udmabuf {
>  	pgoff_t pagecount;
> -	struct folio **folios;
>  	struct sg_table *sg;
>  	struct miscdevice *device;
> +	struct folio **folios;
> +	/**
> +	 * offset in folios array's folio, byte unit.
> +	 * udmabuf can use either shmem or hugetlb pages, an array based
> on
> +	 * pages may not be suitable.
> +	 * Especially when HVO is enabled, the tail page will be released,
> +	 * so our reference to the page will no longer be correct.
> +	 * Hence, it's necessary to record the offset in order to reference
> +	 * the correct PFN within the folio.
> +	 */
>  	pgoff_t *offsets;
> -	struct list_head unpin_list;
> -};
> -
> -struct udmabuf_folio {
> -	struct folio *folio;
> -	struct list_head list;
>  };
> 
>  static int mmap_udmabuf(struct dma_buf *buf, struct vm_area_struct
> *vma)
> @@ -160,32 +163,28 @@ static void unmap_udmabuf(struct
> dma_buf_attachment *at,
>  	return put_sg_table(at->dev, sg, direction);
>  }
> 
> -static void unpin_all_folios(struct list_head *unpin_list)
> +/**
> + * unpin_all_folios:		unpin each folio we pinned in create
> + * The udmabuf set all folio in folios and pinned it, but for large folio,
> + * We may have only used a small portion of the physical in the folio.
> + * we will repeatedly, sequentially set the folio into the array to ensure
> + * that the offset can index the correct folio at the corresponding index.
> + * Hence, we only need to unpin the first iterred folio.
> + */
> +static void unpin_all_folios(struct udmabuf *ubuf)
>  {
> -	struct udmabuf_folio *ubuf_folio;
> -
> -	while (!list_empty(unpin_list)) {
> -		ubuf_folio = list_first_entry(unpin_list,
> -					      struct udmabuf_folio, list);
> -		unpin_folio(ubuf_folio->folio);
> -
> -		list_del(&ubuf_folio->list);
> -		kfree(ubuf_folio);
> -	}
> -}
> +	pgoff_t pg;
> +	struct folio *last = NULL;
> 
> -static int add_to_unpin_list(struct list_head *unpin_list,
> -			     struct folio *folio)
> -{
> -	struct udmabuf_folio *ubuf_folio;
> +	for (pg = 0; pg < ubuf->pagecount; pg++) {
> +		struct folio *tmp = ubuf->folios[pg];
> 
> -	ubuf_folio = kzalloc(sizeof(*ubuf_folio), GFP_KERNEL);
> -	if (!ubuf_folio)
> -		return -ENOMEM;
> +		if (tmp == last)
> +			continue;
> 
> -	ubuf_folio->folio = folio;
> -	list_add_tail(&ubuf_folio->list, unpin_list);
> -	return 0;
> +		last = tmp;
> +		unpin_folio(tmp);
> +	}
>  }
> 
>  static void release_udmabuf(struct dma_buf *buf)
> @@ -196,7 +195,7 @@ static void release_udmabuf(struct dma_buf *buf)
>  	if (ubuf->sg)
>  		put_sg_table(dev, ubuf->sg, DMA_BIDIRECTIONAL);
> 
> -	unpin_all_folios(&ubuf->unpin_list);
> +	unpin_all_folios(ubuf);
>  	kvfree(ubuf->offsets);
>  	kvfree(ubuf->folios);
>  	kfree(ubuf);
> @@ -308,7 +307,6 @@ static long udmabuf_create(struct miscdevice
> *device,
>  	if (!ubuf)
>  		return -ENOMEM;
> 
> -	INIT_LIST_HEAD(&ubuf->unpin_list);
>  	pglimit = (size_limit_mb * 1024 * 1024) >> PAGE_SHIFT;
>  	for (i = 0; i < head->count; i++) {
>  		if (!IS_ALIGNED(list[i].offset, PAGE_SIZE))
> @@ -366,12 +364,6 @@ static long udmabuf_create(struct miscdevice
> *device,
>  			u32 k;
>  			long fsize = folio_size(folios[j]);
> 
> -			ret = add_to_unpin_list(&ubuf->unpin_list, folios[j]);
> -			if (ret < 0) {
> -				kfree(folios);
> -				goto err;
> -			}
> -
>  			for (k = pgoff; k < fsize; k += PAGE_SIZE) {
>  				ubuf->folios[pgbuf] = folios[j];
>  				ubuf->offsets[pgbuf] = k;
> @@ -399,7 +391,7 @@ static long udmabuf_create(struct miscdevice
> *device,
>  err:
>  	if (memfd)
>  		fput(memfd);
> -	unpin_all_folios(&ubuf->unpin_list);
> +	unpin_all_folios(ubuf);
>  	kvfree(ubuf->offsets);
>  	kvfree(ubuf->folios);
>  	kfree(ubuf);
> --
> 2.45.2
Huan Yang Aug. 20, 2024, 1:41 a.m. UTC | #3
在 2024/8/17 9:05, Kasireddy, Vivek 写道:
> Hi Huan,
>
>> Currently, udmabuf handles folio by creating an unpin list to record
>> each folio obtained from the list and unpinning them when released. To
>> maintain this approach, many data structures have been established.
>>
>> However, maintaining this type of data structure requires a significant
>> amount of memory and traversing the list is a substantial overhead,
>> which is not friendly to the CPU cache.
>>
>> Considering that during creation, we arranged the folio array in the
>> order of pin and set the offset according to pgcnt.
>>
>> We actually don't need to use unpin_list to unpin during release.
>> Instead, we can iterate through the folios array during release and
>> unpin any folio that is different from the ones previously accessed.
> No, that won't work because iterating the folios array doesn't tell you
> anything about how many times a folio was pinned (via memfd_pin_folios()),
> as a folio could be part of multiple ranges.
>
> For example, if userspace provides ranges 64..128 and 256..512 (assuming
> these are 4k sized subpage offsets and we have a 2MB hugetlb folio), then
> the same folio would cover both ranges and there would be 2 entries for
> this folio in unpin_list. But, with your logic, we'd be erroneously unpinning
> it only once.
:(, too complex. I got a misunderstand, thank you.
>
> Not sure if there are any great solutions available to address this situation,
> but one option I can think of is to convert unpin_list to unpin array (dynamically
> resized with krealloc?) and track its length separately. Or, as suggested earlier,

Maybe just a folio array(size pagecount) set each folio like unpin list?

even if waste some memory, but access will fast than list. (and low than 
unpin_list)

> another way is to not use unpin_list for memfds backed by shmem, but I suspect
> this may not work if THP is enabled.
>
> Thanks,
> Vivek
>
>> By this, not only saves the overhead of the udmabuf_folio data structure
>> but also makes array access more cache-friendly.
>>
>> Signed-off-by: Huan Yang <link@vivo.com>
>> ---
>>   drivers/dma-buf/udmabuf.c | 68 +++++++++++++++++----------------------
>>   1 file changed, 30 insertions(+), 38 deletions(-)
>>
>> diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
>> index 8f9cb0e2e71a..1e7f46c33d1a 100644
>> --- a/drivers/dma-buf/udmabuf.c
>> +++ b/drivers/dma-buf/udmabuf.c
>> @@ -26,16 +26,19 @@ MODULE_PARM_DESC(size_limit_mb, "Max size of a
>> dmabuf, in megabytes. Default is
>>
>>   struct udmabuf {
>>   	pgoff_t pagecount;
>> -	struct folio **folios;
>>   	struct sg_table *sg;
>>   	struct miscdevice *device;
>> +	struct folio **folios;
>> +	/**
>> +	 * offset in folios array's folio, byte unit.
>> +	 * udmabuf can use either shmem or hugetlb pages, an array based
>> on
>> +	 * pages may not be suitable.
>> +	 * Especially when HVO is enabled, the tail page will be released,
>> +	 * so our reference to the page will no longer be correct.
>> +	 * Hence, it's necessary to record the offset in order to reference
>> +	 * the correct PFN within the folio.
>> +	 */
>>   	pgoff_t *offsets;
>> -	struct list_head unpin_list;
>> -};
>> -
>> -struct udmabuf_folio {
>> -	struct folio *folio;
>> -	struct list_head list;
>>   };
>>
>>   static int mmap_udmabuf(struct dma_buf *buf, struct vm_area_struct
>> *vma)
>> @@ -160,32 +163,28 @@ static void unmap_udmabuf(struct
>> dma_buf_attachment *at,
>>   	return put_sg_table(at->dev, sg, direction);
>>   }
>>
>> -static void unpin_all_folios(struct list_head *unpin_list)
>> +/**
>> + * unpin_all_folios:		unpin each folio we pinned in create
>> + * The udmabuf set all folio in folios and pinned it, but for large folio,
>> + * We may have only used a small portion of the physical in the folio.
>> + * we will repeatedly, sequentially set the folio into the array to ensure
>> + * that the offset can index the correct folio at the corresponding index.
>> + * Hence, we only need to unpin the first iterred folio.
>> + */
>> +static void unpin_all_folios(struct udmabuf *ubuf)
>>   {
>> -	struct udmabuf_folio *ubuf_folio;
>> -
>> -	while (!list_empty(unpin_list)) {
>> -		ubuf_folio = list_first_entry(unpin_list,
>> -					      struct udmabuf_folio, list);
>> -		unpin_folio(ubuf_folio->folio);
>> -
>> -		list_del(&ubuf_folio->list);
>> -		kfree(ubuf_folio);
>> -	}
>> -}
>> +	pgoff_t pg;
>> +	struct folio *last = NULL;
>>
>> -static int add_to_unpin_list(struct list_head *unpin_list,
>> -			     struct folio *folio)
>> -{
>> -	struct udmabuf_folio *ubuf_folio;
>> +	for (pg = 0; pg < ubuf->pagecount; pg++) {
>> +		struct folio *tmp = ubuf->folios[pg];
>>
>> -	ubuf_folio = kzalloc(sizeof(*ubuf_folio), GFP_KERNEL);
>> -	if (!ubuf_folio)
>> -		return -ENOMEM;
>> +		if (tmp == last)
>> +			continue;
>>
>> -	ubuf_folio->folio = folio;
>> -	list_add_tail(&ubuf_folio->list, unpin_list);
>> -	return 0;
>> +		last = tmp;
>> +		unpin_folio(tmp);
>> +	}
>>   }
>>
>>   static void release_udmabuf(struct dma_buf *buf)
>> @@ -196,7 +195,7 @@ static void release_udmabuf(struct dma_buf *buf)
>>   	if (ubuf->sg)
>>   		put_sg_table(dev, ubuf->sg, DMA_BIDIRECTIONAL);
>>
>> -	unpin_all_folios(&ubuf->unpin_list);
>> +	unpin_all_folios(ubuf);
>>   	kvfree(ubuf->offsets);
>>   	kvfree(ubuf->folios);
>>   	kfree(ubuf);
>> @@ -308,7 +307,6 @@ static long udmabuf_create(struct miscdevice
>> *device,
>>   	if (!ubuf)
>>   		return -ENOMEM;
>>
>> -	INIT_LIST_HEAD(&ubuf->unpin_list);
>>   	pglimit = (size_limit_mb * 1024 * 1024) >> PAGE_SHIFT;
>>   	for (i = 0; i < head->count; i++) {
>>   		if (!IS_ALIGNED(list[i].offset, PAGE_SIZE))
>> @@ -366,12 +364,6 @@ static long udmabuf_create(struct miscdevice
>> *device,
>>   			u32 k;
>>   			long fsize = folio_size(folios[j]);
>>
>> -			ret = add_to_unpin_list(&ubuf->unpin_list, folios[j]);
>> -			if (ret < 0) {
>> -				kfree(folios);
>> -				goto err;
>> -			}
>> -
>>   			for (k = pgoff; k < fsize; k += PAGE_SIZE) {
>>   				ubuf->folios[pgbuf] = folios[j];
>>   				ubuf->offsets[pgbuf] = k;
>> @@ -399,7 +391,7 @@ static long udmabuf_create(struct miscdevice
>> *device,
>>   err:
>>   	if (memfd)
>>   		fput(memfd);
>> -	unpin_all_folios(&ubuf->unpin_list);
>> +	unpin_all_folios(ubuf);
>>   	kvfree(ubuf->offsets);
>>   	kvfree(ubuf->folios);
>>   	kfree(ubuf);
>> --
>> 2.45.2
diff mbox series

Patch

diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
index 8f9cb0e2e71a..1e7f46c33d1a 100644
--- a/drivers/dma-buf/udmabuf.c
+++ b/drivers/dma-buf/udmabuf.c
@@ -26,16 +26,19 @@  MODULE_PARM_DESC(size_limit_mb, "Max size of a dmabuf, in megabytes. Default is
 
 struct udmabuf {
 	pgoff_t pagecount;
-	struct folio **folios;
 	struct sg_table *sg;
 	struct miscdevice *device;
+	struct folio **folios;
+	/**
+	 * offset in folios array's folio, byte unit.
+	 * udmabuf can use either shmem or hugetlb pages, an array based on
+	 * pages may not be suitable.
+	 * Especially when HVO is enabled, the tail page will be released,
+	 * so our reference to the page will no longer be correct.
+	 * Hence, it's necessary to record the offset in order to reference
+	 * the correct PFN within the folio.
+	 */
 	pgoff_t *offsets;
-	struct list_head unpin_list;
-};
-
-struct udmabuf_folio {
-	struct folio *folio;
-	struct list_head list;
 };
 
 static int mmap_udmabuf(struct dma_buf *buf, struct vm_area_struct *vma)
@@ -160,32 +163,28 @@  static void unmap_udmabuf(struct dma_buf_attachment *at,
 	return put_sg_table(at->dev, sg, direction);
 }
 
-static void unpin_all_folios(struct list_head *unpin_list)
+/**
+ * unpin_all_folios:		unpin each folio we pinned in create
+ * The udmabuf set all folio in folios and pinned it, but for large folio,
+ * We may have only used a small portion of the physical in the folio.
+ * we will repeatedly, sequentially set the folio into the array to ensure
+ * that the offset can index the correct folio at the corresponding index.
+ * Hence, we only need to unpin the first iterred folio.
+ */
+static void unpin_all_folios(struct udmabuf *ubuf)
 {
-	struct udmabuf_folio *ubuf_folio;
-
-	while (!list_empty(unpin_list)) {
-		ubuf_folio = list_first_entry(unpin_list,
-					      struct udmabuf_folio, list);
-		unpin_folio(ubuf_folio->folio);
-
-		list_del(&ubuf_folio->list);
-		kfree(ubuf_folio);
-	}
-}
+	pgoff_t pg;
+	struct folio *last = NULL;
 
-static int add_to_unpin_list(struct list_head *unpin_list,
-			     struct folio *folio)
-{
-	struct udmabuf_folio *ubuf_folio;
+	for (pg = 0; pg < ubuf->pagecount; pg++) {
+		struct folio *tmp = ubuf->folios[pg];
 
-	ubuf_folio = kzalloc(sizeof(*ubuf_folio), GFP_KERNEL);
-	if (!ubuf_folio)
-		return -ENOMEM;
+		if (tmp == last)
+			continue;
 
-	ubuf_folio->folio = folio;
-	list_add_tail(&ubuf_folio->list, unpin_list);
-	return 0;
+		last = tmp;
+		unpin_folio(tmp);
+	}
 }
 
 static void release_udmabuf(struct dma_buf *buf)
@@ -196,7 +195,7 @@  static void release_udmabuf(struct dma_buf *buf)
 	if (ubuf->sg)
 		put_sg_table(dev, ubuf->sg, DMA_BIDIRECTIONAL);
 
-	unpin_all_folios(&ubuf->unpin_list);
+	unpin_all_folios(ubuf);
 	kvfree(ubuf->offsets);
 	kvfree(ubuf->folios);
 	kfree(ubuf);
@@ -308,7 +307,6 @@  static long udmabuf_create(struct miscdevice *device,
 	if (!ubuf)
 		return -ENOMEM;
 
-	INIT_LIST_HEAD(&ubuf->unpin_list);
 	pglimit = (size_limit_mb * 1024 * 1024) >> PAGE_SHIFT;
 	for (i = 0; i < head->count; i++) {
 		if (!IS_ALIGNED(list[i].offset, PAGE_SIZE))
@@ -366,12 +364,6 @@  static long udmabuf_create(struct miscdevice *device,
 			u32 k;
 			long fsize = folio_size(folios[j]);
 
-			ret = add_to_unpin_list(&ubuf->unpin_list, folios[j]);
-			if (ret < 0) {
-				kfree(folios);
-				goto err;
-			}
-
 			for (k = pgoff; k < fsize; k += PAGE_SIZE) {
 				ubuf->folios[pgbuf] = folios[j];
 				ubuf->offsets[pgbuf] = k;
@@ -399,7 +391,7 @@  static long udmabuf_create(struct miscdevice *device,
 err:
 	if (memfd)
 		fput(memfd);
-	unpin_all_folios(&ubuf->unpin_list);
+	unpin_all_folios(ubuf);
 	kvfree(ubuf->offsets);
 	kvfree(ubuf->folios);
 	kfree(ubuf);