diff mbox series

[v2] zram: easy the allocation of zcomp_strm's buffers through vmalloc

Message ID 20240206202511.4799-1-21cnbao@gmail.com (mailing list archive)
State New, archived
Headers show
Series [v2] zram: easy the allocation of zcomp_strm's buffers through vmalloc | expand

Commit Message

Barry Song Feb. 6, 2024, 8:25 p.m. UTC
From: Barry Song <v-songbaohua@oppo.com>

Firstly, there is no need to keep zcomp_strm's buffers contiguous
physically.

Secondly, The recent mTHP project has provided the possibility to
swapout and swapin large folios. Compressing/decompressing large
blocks can hugely decrease CPU consumption and improve compression
ratio. This requires us to make zRAM support the compression and
decompression for large objects.
With the support of large objects in zRAM of our out-of-tree code,
we have observed many allocation failures during CPU hotplug as
large objects need larger buffers. So this change is also more
future-proof once we begin to bring up multiple sizes in zRAM.

Signed-off-by: Barry Song <v-songbaohua@oppo.com>
---
 drivers/block/zram/zcomp.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Sergey Senozhatsky Feb. 7, 2024, 1:44 a.m. UTC | #1
On (24/02/07 09:25), Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
> 
> Firstly, there is no need to keep zcomp_strm's buffers contiguous
> physically.
> 
> Secondly, The recent mTHP project has provided the possibility to
> swapout and swapin large folios. Compressing/decompressing large
> blocks can hugely decrease CPU consumption and improve compression
> ratio. This requires us to make zRAM support the compression and
> decompression for large objects.
> With the support of large objects in zRAM of our out-of-tree code,
> we have observed many allocation failures during CPU hotplug as
> large objects need larger buffers. So this change is also more
> future-proof once we begin to bring up multiple sizes in zRAM.
> 
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>

Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>

Note:
Taking it in NOT because of the out-of-tree code (we don't really
do that), but because this is executed from CPU offline/online
paths, which can happen on devices with fragmented memory (a valid
concern IMHO).

Minchan, if you have any objections, please chime in.

> @@ -37,7 +38,7 @@ static void zcomp_strm_free(struct zcomp_strm *zstrm)
>  {
>  	if (!IS_ERR_OR_NULL(zstrm->tfm))
>  		crypto_free_comp(zstrm->tfm);
> -	free_pages((unsigned long)zstrm->buffer, 1);
> +	vfree(zstrm->buffer);
>  	zstrm->tfm = NULL;
>  	zstrm->buffer = NULL;
>  }
> @@ -53,7 +54,7 @@ static int zcomp_strm_init(struct zcomp_strm *zstrm, struct zcomp *comp)
>  	 * allocate 2 pages. 1 for compressed data, plus 1 extra for the
>  	 * case when compressed size is larger than the original one
>  	 */
> -	zstrm->buffer = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 1);
> +	zstrm->buffer = vzalloc(2 * PAGE_SIZE);
>  	if (IS_ERR_OR_NULL(zstrm->tfm) || !zstrm->buffer) {
>  		zcomp_strm_free(zstrm);
>  		return -ENOMEM;
> -- 
> 2.34.1
>
Jens Axboe Feb. 7, 2024, 2:40 a.m. UTC | #2
On 2/6/24 6:44 PM, Sergey Senozhatsky wrote:
> On (24/02/07 09:25), Barry Song wrote:
>> From: Barry Song <v-songbaohua@oppo.com>
>>
>> Firstly, there is no need to keep zcomp_strm's buffers contiguous
>> physically.
>>
>> Secondly, The recent mTHP project has provided the possibility to
>> swapout and swapin large folios. Compressing/decompressing large
>> blocks can hugely decrease CPU consumption and improve compression
>> ratio. This requires us to make zRAM support the compression and
>> decompression for large objects.
>> With the support of large objects in zRAM of our out-of-tree code,
>> we have observed many allocation failures during CPU hotplug as
>> large objects need larger buffers. So this change is also more
>> future-proof once we begin to bring up multiple sizes in zRAM.
>>
>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> 
> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> 
> Note:
> Taking it in NOT because of the out-of-tree code (we don't really
> do that), but because this is executed from CPU offline/online
> paths, which can happen on devices with fragmented memory (a valid
> concern IMHO).
> 
> Minchan, if you have any objections, please chime in.

Not Minchan, but I do have an issue with the title of the commit, it
doesn't make any sense. Can the maintainer please re-write that to be
something that is appropriate and actually describes what the patch
does?
Sergey Senozhatsky Feb. 7, 2024, 3:14 a.m. UTC | #3
On (24/02/06 19:40), Jens Axboe wrote:
> On 2/6/24 6:44 PM, Sergey Senozhatsky wrote:
> > On (24/02/07 09:25), Barry Song wrote:
> >> From: Barry Song <v-songbaohua@oppo.com>
> >>
> >> Firstly, there is no need to keep zcomp_strm's buffers contiguous
> >> physically.
> >>
> >> Secondly, The recent mTHP project has provided the possibility to
> >> swapout and swapin large folios. Compressing/decompressing large
> >> blocks can hugely decrease CPU consumption and improve compression
> >> ratio. This requires us to make zRAM support the compression and
> >> decompression for large objects.
> >> With the support of large objects in zRAM of our out-of-tree code,
> >> we have observed many allocation failures during CPU hotplug as
> >> large objects need larger buffers. So this change is also more
> >> future-proof once we begin to bring up multiple sizes in zRAM.
> >>
> >> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> > 
> > Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> > 
> > Note:
> > Taking it in NOT because of the out-of-tree code (we don't really
> > do that), but because this is executed from CPU offline/online
> > paths, which can happen on devices with fragmented memory (a valid
> > concern IMHO).
> > 
> > Minchan, if you have any objections, please chime in.
> 
> Not Minchan, but I do have an issue with the title of the commit, it
> doesn't make any sense. Can the maintainer please re-write that to be
> something that is appropriate and actually describes what the patch
> does?

Thanks Jens. I fully agree, I requested a better commit message in
v1 feedback, we probably still can improve on this.


Something like this?

---

zram: do not allocate physically contiguous strm buffers

Currently zram allocates 2 physically contigous pages per-CPU's
compression stream (we may have up to 3 streams per-CPU). Since
those buffers are per-CPU we allocate them from CPU hotplug path,
which may have higher risks of failed allocations on devices with
fragmented memory.

Switch to virtually contiguos allocations - crypto comp does not
seem impose requirements on compression working buffers to be
physically contiguous.
Jens Axboe Feb. 7, 2024, 3:17 a.m. UTC | #4
On 2/6/24 8:14 PM, Sergey Senozhatsky wrote:
> On (24/02/06 19:40), Jens Axboe wrote:
>> On 2/6/24 6:44 PM, Sergey Senozhatsky wrote:
>>> On (24/02/07 09:25), Barry Song wrote:
>>>> From: Barry Song <v-songbaohua@oppo.com>
>>>>
>>>> Firstly, there is no need to keep zcomp_strm's buffers contiguous
>>>> physically.
>>>>
>>>> Secondly, The recent mTHP project has provided the possibility to
>>>> swapout and swapin large folios. Compressing/decompressing large
>>>> blocks can hugely decrease CPU consumption and improve compression
>>>> ratio. This requires us to make zRAM support the compression and
>>>> decompression for large objects.
>>>> With the support of large objects in zRAM of our out-of-tree code,
>>>> we have observed many allocation failures during CPU hotplug as
>>>> large objects need larger buffers. So this change is also more
>>>> future-proof once we begin to bring up multiple sizes in zRAM.
>>>>
>>>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
>>>
>>> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
>>>
>>> Note:
>>> Taking it in NOT because of the out-of-tree code (we don't really
>>> do that), but because this is executed from CPU offline/online
>>> paths, which can happen on devices with fragmented memory (a valid
>>> concern IMHO).
>>>
>>> Minchan, if you have any objections, please chime in.
>>
>> Not Minchan, but I do have an issue with the title of the commit, it
>> doesn't make any sense. Can the maintainer please re-write that to be
>> something that is appropriate and actually describes what the patch
>> does?
> 
> Thanks Jens. I fully agree, I requested a better commit message in
> v1 feedback, we probably still can improve on this.
> 
> 
> Something like this?
> 
> ---
> 
> zram: do not allocate physically contiguous strm buffers
> 
> Currently zram allocates 2 physically contigous pages per-CPU's
> compression stream (we may have up to 3 streams per-CPU). Since
> those buffers are per-CPU we allocate them from CPU hotplug path,
> which may have higher risks of failed allocations on devices with
> fragmented memory.
> 
> Switch to virtually contiguos allocations - crypto comp does not
> seem impose requirements on compression working buffers to be
> physically contiguous.

Yep, this is much better! Thanks.
Sergey Senozhatsky Feb. 7, 2024, 3:21 a.m. UTC | #5
On (24/02/06 20:17), Jens Axboe wrote:
[..]
> >>> Minchan, if you have any objections, please chime in.
> >>
> >> Not Minchan, but I do have an issue with the title of the commit, it
> >> doesn't make any sense. Can the maintainer please re-write that to be
> >> something that is appropriate and actually describes what the patch
> >> does?
> > 
> > Thanks Jens. I fully agree, I requested a better commit message in
> > v1 feedback, we probably still can improve on this.
> > 
> > 
> > Something like this?
> > 
> > ---
> > 
> > zram: do not allocate physically contiguous strm buffers
> > 
> > Currently zram allocates 2 physically contigous pages per-CPU's
> > compression stream (we may have up to 3 streams per-CPU). Since
> > those buffers are per-CPU we allocate them from CPU hotplug path,
> > which may have higher risks of failed allocations on devices with
> > fragmented memory.
> > 
> > Switch to virtually contiguos allocations - crypto comp does not
> > seem impose requirements on compression working buffers to be
> > physically contiguous.
> 
> Yep, this is much better! Thanks.

Thanks.

Barry, can you please send v3 with the suggested subject and commit
message?
Sergey Senozhatsky Feb. 7, 2024, 4:36 a.m. UTC | #6
On (24/02/07 12:14), Sergey Senozhatsky wrote:
> ---
> 
> zram: do not allocate physically contiguous strm buffers
> 
> Currently zram allocates 2 physically contigous pages per-CPU's
> compression stream (we may have up to 3 streams per-CPU). Since

Correction:
								^ up to 4

> those buffers are per-CPU we allocate them from CPU hotplug path,
> which may have higher risks of failed allocations on devices with
> fragmented memory.
> 
> Switch to virtually contiguos allocations - crypto comp does not
> seem impose requirements on compression working buffers to be
> physically contiguous.
Barry Song Feb. 7, 2024, 5 a.m. UTC | #7
On Wed, Feb 7, 2024 at 4:21 PM Sergey Senozhatsky
<senozhatsky@chromium.org> wrote:
>
> On (24/02/06 20:17), Jens Axboe wrote:
> [..]
> > >>> Minchan, if you have any objections, please chime in.
> > >>
> > >> Not Minchan, but I do have an issue with the title of the commit, it
> > >> doesn't make any sense. Can the maintainer please re-write that to be
> > >> something that is appropriate and actually describes what the patch
> > >> does?
> > >
> > > Thanks Jens. I fully agree, I requested a better commit message in
> > > v1 feedback, we probably still can improve on this.
> > >
> > >
> > > Something like this?
> > >
> > > ---
> > >
> > > zram: do not allocate physically contiguous strm buffers
> > >
> > > Currently zram allocates 2 physically contigous pages per-CPU's
> > > compression stream (we may have up to 3 streams per-CPU). Since
> > > those buffers are per-CPU we allocate them from CPU hotplug path,
> > > which may have higher risks of failed allocations on devices with
> > > fragmented memory.
> > >
> > > Switch to virtually contiguos allocations - crypto comp does not
> > > seem impose requirements on compression working buffers to be
> > > physically contiguous.
> >
> > Yep, this is much better! Thanks.
>
> Thanks.
>

Hi Sergey, Jens,

> Barry, can you please send v3 with the suggested subject and commit
> message?

Thanks for your comments and improvements. will send v3 accordingly.

Best regard
Barry
diff mbox series

Patch

diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
index 55af4efd7983..8237b08c49d8 100644
--- a/drivers/block/zram/zcomp.c
+++ b/drivers/block/zram/zcomp.c
@@ -11,6 +11,7 @@ 
 #include <linux/sched.h>
 #include <linux/cpu.h>
 #include <linux/crypto.h>
+#include <linux/vmalloc.h>
 
 #include "zcomp.h"
 
@@ -37,7 +38,7 @@  static void zcomp_strm_free(struct zcomp_strm *zstrm)
 {
 	if (!IS_ERR_OR_NULL(zstrm->tfm))
 		crypto_free_comp(zstrm->tfm);
-	free_pages((unsigned long)zstrm->buffer, 1);
+	vfree(zstrm->buffer);
 	zstrm->tfm = NULL;
 	zstrm->buffer = NULL;
 }
@@ -53,7 +54,7 @@  static int zcomp_strm_init(struct zcomp_strm *zstrm, struct zcomp *comp)
 	 * allocate 2 pages. 1 for compressed data, plus 1 extra for the
 	 * case when compressed size is larger than the original one
 	 */
-	zstrm->buffer = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 1);
+	zstrm->buffer = vzalloc(2 * PAGE_SIZE);
 	if (IS_ERR_OR_NULL(zstrm->tfm) || !zstrm->buffer) {
 		zcomp_strm_free(zstrm);
 		return -ENOMEM;