diff mbox series

f2fs: Reduce zoned block device memory usage

Message ID 20190304070416.13429-1-damien.lemoal@wdc.com (mailing list archive)
State New, archived
Headers show
Series f2fs: Reduce zoned block device memory usage | expand

Commit Message

Damien Le Moal March 4, 2019, 7:04 a.m. UTC
For zoned block devices, an array of zone types for each device is
allocated and initialized in order to determine if a section is stored
on a sequential zone (zone reset needed) or a conventional zone (no
zone reset needed and regular discard applies). Considering this usage,
the zone types stored in memory can be replaced with a bitmap to
indicate an equivalent information, that is, if a zone is sequential or
not. This reduces the memory usage for each zoned device by roughly 8:
on a 14TB disk with zones of 256 MB, the zone type array consumes
13x4KB pages while the bitmap uses only 2x4KB pages.

This patch changes the f2fs_dev_info structure blkz_type field to the
bitmap blkz_seq. Access to this bitmap is done using the helper
function f2fs_blkz_is_seq(), which is a rewrite of the function
get_blkz_type().

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 fs/f2fs/f2fs.h    | 13 +++++++------
 fs/f2fs/segment.c | 23 +++++++----------------
 fs/f2fs/super.c   | 13 ++++++++-----
 3 files changed, 22 insertions(+), 27 deletions(-)

Comments

Johannes Thumshirn March 4, 2019, 9:46 a.m. UTC | #1
Hi Damien,

On 04/03/2019 08:04, Damien Le Moal wrote:
> @@ -2765,9 +2765,11 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, int devi)
>  	if (nr_sectors & (bdev_zone_sectors(bdev) - 1))
>  		FDEV(devi).nr_blkz++;
>  
> -	FDEV(devi).blkz_type = f2fs_kmalloc(sbi, FDEV(devi).nr_blkz,
> -								GFP_KERNEL);
> -	if (!FDEV(devi).blkz_type)
> +	FDEV(devi).blkz_seq = f2fs_kzalloc(sbi,
> +					BITS_TO_LONGS(FDEV(devi).nr_blkz)
> +					* sizeof(unsigned long),
> +					GFP_KERNEL);
> +	if (!FDEV(devi).blkz_seq)
>  		return -ENOMEM;

Not so sure about F2FS internals, but there is a bitmap_zalloc() in the
normal kernel library.

Byte,
	Johannes
Damien Le Moal March 5, 2019, 2:56 a.m. UTC | #2
Johannes,

On 2019/03/04 18:46, Johannes Thumshirn wrote:
> Hi Damien,
> 
> On 04/03/2019 08:04, Damien Le Moal wrote:
>> @@ -2765,9 +2765,11 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, int devi)
>>  	if (nr_sectors & (bdev_zone_sectors(bdev) - 1))
>>  		FDEV(devi).nr_blkz++;
>>  
>> -	FDEV(devi).blkz_type = f2fs_kmalloc(sbi, FDEV(devi).nr_blkz,
>> -								GFP_KERNEL);
>> -	if (!FDEV(devi).blkz_type)
>> +	FDEV(devi).blkz_seq = f2fs_kzalloc(sbi,
>> +					BITS_TO_LONGS(FDEV(devi).nr_blkz)
>> +					* sizeof(unsigned long),
>> +					GFP_KERNEL);
>> +	if (!FDEV(devi).blkz_seq)
>>  		return -ENOMEM;
> 
> Not so sure about F2FS internals, but there is a bitmap_zalloc() in the
> normal kernel library.

Yes indeed... f2fs_kzalloc uses f2fs_kmalloc(__GFP_ZERO) and f2fs_kmalloc is
basically kmalloc() or kvmalloc() but with error injection for tests. So I used
that instead of bitmap_zalloc() to preserve the error injection test.

Jaegeuk,

Which do you prefer ? The patch as it is or switching to bitmap_zalloc() ?
Whichever is fine with me so I can send a v2 if you prefer bitmap_zalloc().

Best regards.
Jaegeuk Kim March 5, 2019, 4:55 p.m. UTC | #3
On 03/05, Damien Le Moal wrote:
> Johannes,
> 
> On 2019/03/04 18:46, Johannes Thumshirn wrote:
> > Hi Damien,
> > 
> > On 04/03/2019 08:04, Damien Le Moal wrote:
> >> @@ -2765,9 +2765,11 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, int devi)
> >>  	if (nr_sectors & (bdev_zone_sectors(bdev) - 1))
> >>  		FDEV(devi).nr_blkz++;
> >>  
> >> -	FDEV(devi).blkz_type = f2fs_kmalloc(sbi, FDEV(devi).nr_blkz,
> >> -								GFP_KERNEL);
> >> -	if (!FDEV(devi).blkz_type)
> >> +	FDEV(devi).blkz_seq = f2fs_kzalloc(sbi,
> >> +					BITS_TO_LONGS(FDEV(devi).nr_blkz)
> >> +					* sizeof(unsigned long),
> >> +					GFP_KERNEL);
> >> +	if (!FDEV(devi).blkz_seq)
> >>  		return -ENOMEM;
> > 
> > Not so sure about F2FS internals, but there is a bitmap_zalloc() in the
> > normal kernel library.
> 
> Yes indeed... f2fs_kzalloc uses f2fs_kmalloc(__GFP_ZERO) and f2fs_kmalloc is
> basically kmalloc() or kvmalloc() but with error injection for tests. So I used
> that instead of bitmap_zalloc() to preserve the error injection test.
> 
> Jaegeuk,
> 
> Which do you prefer ? The patch as it is or switching to bitmap_zalloc() ?
> Whichever is fine with me so I can send a v2 if you prefer bitmap_zalloc().

Hi Damien,

I think f2fs_kmalloc would be fine for fault injection tests. It seems it'd
better to write a clean-up patch which replaces all the bitmap allocations
in f2fs with single f2fs_bitmap_zalloc() at once.

I'll take a look at this.
Thanks,

> 
> Best regards.
> 
> -- 
> Damien Le Moal
> Western Digital Research
Damien Le Moal March 5, 2019, 11:53 p.m. UTC | #4
On 2019/03/06 1:55, Jaegeuk Kim wrote:
> On 03/05, Damien Le Moal wrote:
>> Johannes,
>>
>> On 2019/03/04 18:46, Johannes Thumshirn wrote:
>>> Hi Damien,
>>>
>>> On 04/03/2019 08:04, Damien Le Moal wrote:
>>>> @@ -2765,9 +2765,11 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, int devi)
>>>>  	if (nr_sectors & (bdev_zone_sectors(bdev) - 1))
>>>>  		FDEV(devi).nr_blkz++;
>>>>  
>>>> -	FDEV(devi).blkz_type = f2fs_kmalloc(sbi, FDEV(devi).nr_blkz,
>>>> -								GFP_KERNEL);
>>>> -	if (!FDEV(devi).blkz_type)
>>>> +	FDEV(devi).blkz_seq = f2fs_kzalloc(sbi,
>>>> +					BITS_TO_LONGS(FDEV(devi).nr_blkz)
>>>> +					* sizeof(unsigned long),
>>>> +					GFP_KERNEL);
>>>> +	if (!FDEV(devi).blkz_seq)
>>>>  		return -ENOMEM;
>>>
>>> Not so sure about F2FS internals, but there is a bitmap_zalloc() in the
>>> normal kernel library.
>>
>> Yes indeed... f2fs_kzalloc uses f2fs_kmalloc(__GFP_ZERO) and f2fs_kmalloc is
>> basically kmalloc() or kvmalloc() but with error injection for tests. So I used
>> that instead of bitmap_zalloc() to preserve the error injection test.
>>
>> Jaegeuk,
>>
>> Which do you prefer ? The patch as it is or switching to bitmap_zalloc() ?
>> Whichever is fine with me so I can send a v2 if you prefer bitmap_zalloc().
> 
> Hi Damien,
> 
> I think f2fs_kmalloc would be fine for fault injection tests. It seems it'd
> better to write a clean-up patch which replaces all the bitmap allocations
> in f2fs with single f2fs_bitmap_zalloc() at once.

Sounds good to me. So will you take the patch as is ? Any comment on it ?
This change was tested on a 15TB SMR disks.

> 
> I'll take a look at this.
> Thanks,
> 
>>
>> Best regards.
>>
>> -- 
>> Damien Le Moal
>> Western Digital Research
>
Jaegeuk Kim March 6, 2019, 3:22 a.m. UTC | #5
On 03/04, Damien Le Moal wrote:
> For zoned block devices, an array of zone types for each device is
> allocated and initialized in order to determine if a section is stored
> on a sequential zone (zone reset needed) or a conventional zone (no
> zone reset needed and regular discard applies). Considering this usage,
> the zone types stored in memory can be replaced with a bitmap to
> indicate an equivalent information, that is, if a zone is sequential or
> not. This reduces the memory usage for each zoned device by roughly 8:
> on a 14TB disk with zones of 256 MB, the zone type array consumes
> 13x4KB pages while the bitmap uses only 2x4KB pages.
> 
> This patch changes the f2fs_dev_info structure blkz_type field to the
> bitmap blkz_seq. Access to this bitmap is done using the helper
> function f2fs_blkz_is_seq(), which is a rewrite of the function
> get_blkz_type().
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
>  fs/f2fs/f2fs.h    | 13 +++++++------
>  fs/f2fs/segment.c | 23 +++++++----------------
>  fs/f2fs/super.c   | 13 ++++++++-----
>  3 files changed, 22 insertions(+), 27 deletions(-)
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 12fabd6735dd..d7b2de930352 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -1067,8 +1067,8 @@ struct f2fs_dev_info {
>  	block_t start_blk;
>  	block_t end_blk;
>  #ifdef CONFIG_BLK_DEV_ZONED
> -	unsigned int nr_blkz;			/* Total number of zones */
> -	u8 *blkz_type;				/* Array of zones type */
> +	unsigned int nr_blkz;		/* Total number of zones */
> +	unsigned long *blkz_seq;	/* Bitmap indicating sequential zones */
>  #endif
>  };
>  
> @@ -3508,16 +3508,17 @@ F2FS_FEATURE_FUNCS(lost_found, LOST_FOUND);
>  F2FS_FEATURE_FUNCS(sb_chksum, SB_CHKSUM);
>  
>  #ifdef CONFIG_BLK_DEV_ZONED
> -static inline int get_blkz_type(struct f2fs_sb_info *sbi,
> -			struct block_device *bdev, block_t blkaddr)
> +static inline bool f2fs_blkz_is_seq(struct f2fs_sb_info *sbi,
> +				    struct block_device *bdev, block_t blkaddr)
>  {
>  	unsigned int zno = blkaddr >> sbi->log_blocks_per_blkz;
>  	int i;
>  
>  	for (i = 0; i < sbi->s_ndevs; i++)
>  		if (FDEV(i).bdev == bdev)
> -			return FDEV(i).blkz_type[zno];
> -	return -EINVAL;
> +			return test_bit(zno, FDEV(i).blkz_seq);
> +	WARN_ON_ONCE(1);
> +	return false;
>  }
>  #endif
>  
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 9b79056d705d..65941070776c 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -1703,19 +1703,8 @@ static int __f2fs_issue_discard_zone(struct f2fs_sb_info *sbi,
>  		blkstart -= FDEV(devi).start_blk;
>  	}
>  
> -	/*
> -	 * We need to know the type of the zone: for conventional zones,
> -	 * use regular discard if the drive supports it. For sequential
> -	 * zones, reset the zone write pointer.
> -	 */
> -	switch (get_blkz_type(sbi, bdev, blkstart)) {
> -
> -	case BLK_ZONE_TYPE_CONVENTIONAL:
> -		if (!blk_queue_discard(bdev_get_queue(bdev)))
> -			return 0;
> -		return __queue_discard_cmd(sbi, bdev, lblkstart, blklen);
> -	case BLK_ZONE_TYPE_SEQWRITE_REQ:
> -	case BLK_ZONE_TYPE_SEQWRITE_PREF:
> +	/* For sequential zones, reset the zone write pointer */
> +	if (f2fs_blkz_is_seq(sbi, bdev, blkstart)) {
>  		sector = SECTOR_FROM_BLOCK(blkstart);
>  		nr_sects = SECTOR_FROM_BLOCK(blklen);
>  
> @@ -1730,10 +1719,12 @@ static int __f2fs_issue_discard_zone(struct f2fs_sb_info *sbi,
>  		trace_f2fs_issue_reset_zone(bdev, blkstart);
>  		return blkdev_reset_zones(bdev, sector,
>  					  nr_sects, GFP_NOFS);
> -	default:
> -		/* Unknown zone type: broken device ? */
> -		return -EIO;
>  	}
> +
> +	 /* For conventional zones, use regular discard if supported */
> +	if (!blk_queue_discard(bdev_get_queue(bdev)))
> +		return 0;
> +	return __queue_discard_cmd(sbi, bdev, lblkstart, blklen);
>  }
>  #endif
>  
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index c46a1d4318d4..44860b4285b9 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -1017,7 +1017,7 @@ static void destroy_device_list(struct f2fs_sb_info *sbi)
>  	for (i = 0; i < sbi->s_ndevs; i++) {
>  		blkdev_put(FDEV(i).bdev, FMODE_EXCL);
>  #ifdef CONFIG_BLK_DEV_ZONED
> -		kvfree(FDEV(i).blkz_type);
> +		kfree(FDEV(i).blkz_seq);

We need to use kvfree() since f2fs_kzalloc() can do kvmalloc().

Thanks,

>  #endif
>  	}
>  	kvfree(sbi->devs);
> @@ -2765,9 +2765,11 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, int devi)
>  	if (nr_sectors & (bdev_zone_sectors(bdev) - 1))
>  		FDEV(devi).nr_blkz++;
>  
> -	FDEV(devi).blkz_type = f2fs_kmalloc(sbi, FDEV(devi).nr_blkz,
> -								GFP_KERNEL);
> -	if (!FDEV(devi).blkz_type)
> +	FDEV(devi).blkz_seq = f2fs_kzalloc(sbi,
> +					BITS_TO_LONGS(FDEV(devi).nr_blkz)
> +					* sizeof(unsigned long),
> +					GFP_KERNEL);
> +	if (!FDEV(devi).blkz_seq)
>  		return -ENOMEM;
>  
>  #define F2FS_REPORT_NR_ZONES   4096
> @@ -2794,7 +2796,8 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, int devi)
>  		}
>  
>  		for (i = 0; i < nr_zones; i++) {
> -			FDEV(devi).blkz_type[n] = zones[i].type;
> +			if (zones[i].type != BLK_ZONE_TYPE_CONVENTIONAL)
> +				set_bit(n, FDEV(devi).blkz_seq);
>  			sector += zones[i].len;
>  			n++;
>  		}
> -- 
> 2.20.1
diff mbox series

Patch

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 12fabd6735dd..d7b2de930352 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1067,8 +1067,8 @@  struct f2fs_dev_info {
 	block_t start_blk;
 	block_t end_blk;
 #ifdef CONFIG_BLK_DEV_ZONED
-	unsigned int nr_blkz;			/* Total number of zones */
-	u8 *blkz_type;				/* Array of zones type */
+	unsigned int nr_blkz;		/* Total number of zones */
+	unsigned long *blkz_seq;	/* Bitmap indicating sequential zones */
 #endif
 };
 
@@ -3508,16 +3508,17 @@  F2FS_FEATURE_FUNCS(lost_found, LOST_FOUND);
 F2FS_FEATURE_FUNCS(sb_chksum, SB_CHKSUM);
 
 #ifdef CONFIG_BLK_DEV_ZONED
-static inline int get_blkz_type(struct f2fs_sb_info *sbi,
-			struct block_device *bdev, block_t blkaddr)
+static inline bool f2fs_blkz_is_seq(struct f2fs_sb_info *sbi,
+				    struct block_device *bdev, block_t blkaddr)
 {
 	unsigned int zno = blkaddr >> sbi->log_blocks_per_blkz;
 	int i;
 
 	for (i = 0; i < sbi->s_ndevs; i++)
 		if (FDEV(i).bdev == bdev)
-			return FDEV(i).blkz_type[zno];
-	return -EINVAL;
+			return test_bit(zno, FDEV(i).blkz_seq);
+	WARN_ON_ONCE(1);
+	return false;
 }
 #endif
 
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 9b79056d705d..65941070776c 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -1703,19 +1703,8 @@  static int __f2fs_issue_discard_zone(struct f2fs_sb_info *sbi,
 		blkstart -= FDEV(devi).start_blk;
 	}
 
-	/*
-	 * We need to know the type of the zone: for conventional zones,
-	 * use regular discard if the drive supports it. For sequential
-	 * zones, reset the zone write pointer.
-	 */
-	switch (get_blkz_type(sbi, bdev, blkstart)) {
-
-	case BLK_ZONE_TYPE_CONVENTIONAL:
-		if (!blk_queue_discard(bdev_get_queue(bdev)))
-			return 0;
-		return __queue_discard_cmd(sbi, bdev, lblkstart, blklen);
-	case BLK_ZONE_TYPE_SEQWRITE_REQ:
-	case BLK_ZONE_TYPE_SEQWRITE_PREF:
+	/* For sequential zones, reset the zone write pointer */
+	if (f2fs_blkz_is_seq(sbi, bdev, blkstart)) {
 		sector = SECTOR_FROM_BLOCK(blkstart);
 		nr_sects = SECTOR_FROM_BLOCK(blklen);
 
@@ -1730,10 +1719,12 @@  static int __f2fs_issue_discard_zone(struct f2fs_sb_info *sbi,
 		trace_f2fs_issue_reset_zone(bdev, blkstart);
 		return blkdev_reset_zones(bdev, sector,
 					  nr_sects, GFP_NOFS);
-	default:
-		/* Unknown zone type: broken device ? */
-		return -EIO;
 	}
+
+	 /* For conventional zones, use regular discard if supported */
+	if (!blk_queue_discard(bdev_get_queue(bdev)))
+		return 0;
+	return __queue_discard_cmd(sbi, bdev, lblkstart, blklen);
 }
 #endif
 
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index c46a1d4318d4..44860b4285b9 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1017,7 +1017,7 @@  static void destroy_device_list(struct f2fs_sb_info *sbi)
 	for (i = 0; i < sbi->s_ndevs; i++) {
 		blkdev_put(FDEV(i).bdev, FMODE_EXCL);
 #ifdef CONFIG_BLK_DEV_ZONED
-		kvfree(FDEV(i).blkz_type);
+		kfree(FDEV(i).blkz_seq);
 #endif
 	}
 	kvfree(sbi->devs);
@@ -2765,9 +2765,11 @@  static int init_blkz_info(struct f2fs_sb_info *sbi, int devi)
 	if (nr_sectors & (bdev_zone_sectors(bdev) - 1))
 		FDEV(devi).nr_blkz++;
 
-	FDEV(devi).blkz_type = f2fs_kmalloc(sbi, FDEV(devi).nr_blkz,
-								GFP_KERNEL);
-	if (!FDEV(devi).blkz_type)
+	FDEV(devi).blkz_seq = f2fs_kzalloc(sbi,
+					BITS_TO_LONGS(FDEV(devi).nr_blkz)
+					* sizeof(unsigned long),
+					GFP_KERNEL);
+	if (!FDEV(devi).blkz_seq)
 		return -ENOMEM;
 
 #define F2FS_REPORT_NR_ZONES   4096
@@ -2794,7 +2796,8 @@  static int init_blkz_info(struct f2fs_sb_info *sbi, int devi)
 		}
 
 		for (i = 0; i < nr_zones; i++) {
-			FDEV(devi).blkz_type[n] = zones[i].type;
+			if (zones[i].type != BLK_ZONE_TYPE_CONVENTIONAL)
+				set_bit(n, FDEV(devi).blkz_seq);
 			sector += zones[i].len;
 			n++;
 		}