[f2fs-dev,v2] f2fs: New victim selection for GC

Message ID	20231208090444epcms2p33884216391931d04c1771dfb51a08a44@epcms2p3 (mailing list archive)
State	Superseded
Headers	show Return-Path: <linux-f2fs-devel-bounces@lists.sourceforge.net> Mime-Version: 1.0 From: Yonggil Song <yonggil.song@samsung.com> To: "jaegeuk@kernel.org" <jaegeuk@kernel.org>, "chao@kernel.org" <chao@kernel.org>, "linux-f2fs-devel@lists.sourceforge.net" <linux-f2fs-devel@lists.sourceforge.net>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org> Message-ID: <20231208090444epcms2p33884216391931d04c1771dfb51a08a44@epcms2p3> Date: Fri, 08 Dec 2023 18:04:44 +0900 CMS-TYPE: 102P DLP-Filter: Pass References: <CGME20231208090444epcms2p33884216391931d04c1771dfb51a08a44@epcms2p3> Subject: [f2fs-dev] [PATCH v2] f2fs: New victim selection for GC Precedence: list Reply-To: yonggil.song@samsung.com Cc: Seokhwan Kim <sukka.kim@samsung.com>, Siwoo Jung <siu.jung@samsung.com> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: linux-f2fs-devel-bounces@lists.sourceforge.net
Series	[f2fs-dev,v2] f2fs: New victim selection for GC \| expand [f2fs-dev,v2] f2fs: New victim selection for GC

Message ID

20231208090444epcms2p33884216391931d04c1771dfb51a08a44@epcms2p3 (mailing list archive)

State

Superseded

Headers

Mime-Version: 1.0
From: Yonggil Song <yonggil.song@samsung.com>
To: "jaegeuk@kernel.org" <jaegeuk@kernel.org>, "chao@kernel.org"
 <chao@kernel.org>, "linux-f2fs-devel@lists.sourceforge.net"
 <linux-f2fs-devel@lists.sourceforge.net>, "linux-kernel@vger.kernel.org"
 <linux-kernel@vger.kernel.org>
Message-ID: <20231208090444epcms2p33884216391931d04c1771dfb51a08a44@epcms2p3>
Date: Fri, 08 Dec 2023 18:04:44 +0900
CMS-TYPE: 102P
DLP-Filter: Pass
References: 
 <CGME20231208090444epcms2p33884216391931d04c1771dfb51a08a44@epcms2p3>
Subject: [f2fs-dev] [PATCH v2] f2fs: New victim selection for GC
Precedence: list
Reply-To: yonggil.song@samsung.com
Cc: Seokhwan Kim <sukka.kim@samsung.com>, Siwoo Jung <siu.jung@samsung.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: linux-f2fs-devel-bounces@lists.sourceforge.net

Series

[f2fs-dev,v2] f2fs: New victim selection for GC | expand

Commit Message

Yonggil Song Dec. 8, 2023, 9:04 a.m. UTC

Overview
========

This patch introduces a new way to preference data sections when selecting
GC victims. Migration of data blocks causes invalidation of node blocks.
Therefore, in situations where GC is frequent, selecting data blocks as
victims can reduce unnecessary block migration by invalidating node blocks.
For exceptional situations where free sections are insufficient, node blocks
are selected as victims instead of data blocks to get extra free sections.

Problem
=======

If the total amount of nodes is larger than the size of one section, nodes
occupy multiple sections, and node victims are often selected because the
gc cost is lowered by data block migration in GC. Since moving the data
section causes frequent node victim selection, victim threshing occurs in
the node section. This results in an increase in WAF.

Experiment
==========

Test environment is as follows.

        System info
          - 3.6GHz, 16 core CPU
          - 36GiB Memory
        Device info
          - a conventional null_blk with 228MiB
          - a sequential null_blk with 4068 zones of 8MiB
        Format
          - mkfs.f2fs <conv null_blk> -c <seq null_blk> -m -Z 8 -o 3.89
        Mount
          - mount <conv null_blk> <mount point>
        Fio script
	  - fio --rw=randwrite --bs=4k --ba=4k --filesize=31187m --norandommap --overwrite=1 --name=job1 --filename=./mnt/sustain --io_size=128g
	WAF calculation
	  - (IOs on conv. null_blk + IOs on seq. null_blk) / random write IOs

Conclusion
==========

This experiment showed that the WAF was reduced by 29% (18.75 -> 13.3) when
the data section was selected first when selecting GC victims. This was
achieved by reducing the migration of the node blocks by 69.4%
(253,131,743 blks -> 77,463,278 blks). It is possible to achieve low WAF
performance with the GC victim selection method in environments where the
section size is relatively small.

Signed-off-by: Yonggil Song <yonggil.song@samsung.com>
---
 fs/f2fs/f2fs.h |  1 +
 fs/f2fs/gc.c   | 98 ++++++++++++++++++++++++++++++++++++++------------
 2 files changed, 77 insertions(+), 22 deletions(-)

Comments

Jaegeuk Kim Dec. 11, 2023, 10:47 p.m. UTC | #1

On 12/08, Yonggil Song wrote:
> Overview
> ========
> 
> This patch introduces a new way to preference data sections when selecting
> GC victims. Migration of data blocks causes invalidation of node blocks.
> Therefore, in situations where GC is frequent, selecting data blocks as
> victims can reduce unnecessary block migration by invalidating node blocks.
> For exceptional situations where free sections are insufficient, node blocks
> are selected as victims instead of data blocks to get extra free sections.
> 
> Problem
> =======
> 
> If the total amount of nodes is larger than the size of one section, nodes
> occupy multiple sections, and node victims are often selected because the
> gc cost is lowered by data block migration in GC. Since moving the data
> section causes frequent node victim selection, victim threshing occurs in
> the node section. This results in an increase in WAF.
> 
> Experiment
> ==========
> 
> Test environment is as follows.
> 
>         System info
>           - 3.6GHz, 16 core CPU
>           - 36GiB Memory
>         Device info
>           - a conventional null_blk with 228MiB
>           - a sequential null_blk with 4068 zones of 8MiB
>         Format
>           - mkfs.f2fs <conv null_blk> -c <seq null_blk> -m -Z 8 -o 3.89
>         Mount
>           - mount <conv null_blk> <mount point>
>         Fio script
> 	  - fio --rw=randwrite --bs=4k --ba=4k --filesize=31187m --norandommap --overwrite=1 --name=job1 --filename=./mnt/sustain --io_size=128g
> 	WAF calculation
> 	  - (IOs on conv. null_blk + IOs on seq. null_blk) / random write IOs
> 
> Conclusion
> ==========
> 
> This experiment showed that the WAF was reduced by 29% (18.75 -> 13.3) when
> the data section was selected first when selecting GC victims. This was
> achieved by reducing the migration of the node blocks by 69.4%
> (253,131,743 blks -> 77,463,278 blks). It is possible to achieve low WAF
> performance with the GC victim selection method in environments where the
> section size is relatively small.
> 
> Signed-off-by: Yonggil Song <yonggil.song@samsung.com>
> ---
>  fs/f2fs/f2fs.h |  1 +
>  fs/f2fs/gc.c   | 98 ++++++++++++++++++++++++++++++++++++++------------
>  2 files changed, 77 insertions(+), 22 deletions(-)
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 9043cedfa12b..578d57f6022f 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -1649,6 +1649,7 @@ struct f2fs_sb_info {
>  	struct f2fs_mount_info mount_opt;	/* mount options */
>  
>  	/* for cleaning operations */
> +	bool need_node_clean;			/* need to clean dirty nodes */
>  	struct f2fs_rwsem gc_lock;		/*
>  						 * semaphore for GC, avoid
>  						 * race between GC and GC or CP
> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index f550cdeaa663..682dcf0de59e 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -368,6 +368,14 @@ static inline unsigned int get_gc_cost(struct f2fs_sb_info *sbi,
>  	if (p->alloc_mode == SSR)
>  		return get_seg_entry(sbi, segno)->ckpt_valid_blocks;
>  
> +	/*
> +	 * If we don't need to clean dirty nodes,
> +	 * we just skip node victims.
> +	 */
> +	if (IS_NODESEG(get_seg_entry(sbi, segno)->type) &&
> +		!sbi->need_node_clean)
> +		return get_max_cost(sbi, p);

How about differentiating the gc cost between data vs. node by adding some
weights? By default, data is preferred, while node is better in the worst case?

> +
>  	/* alloc_mode == LFS */
>  	if (p->gc_mode == GC_GREEDY)
>  		return get_valid_blocks(sbi, segno, true);
> @@ -557,6 +565,14 @@ static void atgc_lookup_victim(struct f2fs_sb_info *sbi,
>  	if (ve->mtime >= max_mtime || ve->mtime < min_mtime)
>  		goto skip;
>  
> +	/*
> +	 * If we don't need to clean dirty nodes,
> +	 * we just skip node victims.
> +	 */
> +	if (IS_NODESEG(get_seg_entry(sbi, ve->segno)->type) &&
> +	    !sbi->need_node_clean)
> +		goto skip;
> +
>  	/* age = 10000 * x% * 60 */
>  	age = div64_u64(accu * (max_mtime - ve->mtime), total_time) *
>  								age_weight;
> @@ -913,7 +929,21 @@ int f2fs_get_victim(struct f2fs_sb_info *sbi, unsigned int *result,
>  		goto retry;
>  	}
>  
> +
>  	if (p.min_segno != NULL_SEGNO) {
> +		if (sbi->need_node_clean &&
> +		    IS_DATASEG(get_seg_entry(sbi, p.min_segno)->type)) {
> +			 /*
> +			  * we need to clean node sections.
> +			  * but, data victim cost is the lowest.
> +			  * if free sections are enough, stop cleaning node victim.
> +			  * if not, it goes on by GCing data victims.
> +			  */
> +			if (has_enough_free_secs(sbi, prefree_segments(sbi), 0)) {
> +				p.min_segno = NULL_SEGNO;
> +				goto out;
> +			}
> +		}
>  got_it:
>  		*result = (p.min_segno / p.ofs_unit) * p.ofs_unit;
>  got_result:
> @@ -1830,8 +1860,27 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control)
>  		goto stop;
>  	}
>  
> +	__get_secs_required(sbi, NULL, &upper_secs, NULL);
> +
> +	/*
> +	 * Write checkpoint to reclaim prefree segments.
> +	 * We need more three extra sections for writer's data/node/dentry.
> +	 */
> +	if (free_sections(sbi) <= upper_secs + NR_GC_CHECKPOINT_SECS) {
> +		sbi->need_node_clean = true;
> +
> +		if (prefree_segments(sbi)) {
> +			stat_inc_cp_call_count(sbi, TOTAL_CALL);
> +			ret = f2fs_write_checkpoint(sbi, &cpc);
> +			if (ret)
> +				goto stop;
> +			/* Reset due to checkpoint */
> +			sec_freed = 0;
> +		}
> +	}
> +
>  	/* Let's run FG_GC, if we don't have enough space. */
> -	if (has_not_enough_free_secs(sbi, 0, 0)) {
> +	if (gc_type == BG_GC && has_not_enough_free_secs(sbi, 0, 0)) {
>  		gc_type = FG_GC;
>  
>  		/*
> @@ -1858,10 +1907,22 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control)
>  	ret = __get_victim(sbi, &segno, gc_type);
>  	if (ret) {
>  		/* allow to search victim from sections has pinned data */
> -		if (ret == -ENODATA && gc_type == FG_GC &&
> -				f2fs_pinned_section_exists(DIRTY_I(sbi))) {
> -			f2fs_unpin_all_sections(sbi, false);
> -			goto retry;
> +		if (ret == -ENODATA && gc_type == FG_GC) {
> +			if (f2fs_pinned_section_exists(DIRTY_I(sbi))) {
> +				f2fs_unpin_all_sections(sbi, false);
> +				goto retry;
> +			}
> +			/*
> +			 * If we have no more data victims, let's start to
> +			 * clean dirty nodes.
> +			 */
> +			if (!sbi->need_node_clean) {
> +				sbi->need_node_clean = true;
> +				goto retry;
> +			}
> +			/* node cleaning is over */
> +			else if (sbi->need_node_clean)
> +				goto stop;
>  		}
>  		goto stop;
>  	}
> @@ -1882,7 +1943,13 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control)
>  			if (!gc_control->no_bg_gc &&
>  			    total_sec_freed < gc_control->nr_free_secs)
>  				goto go_gc_more;
> -			goto stop;
> +			/*
> +			 * If the need_node_clean flag is set
> +			 * even though there are enough free
> +			 * sections, node cleaning will continue.
> +			 */
> +			if (!sbi->need_node_clean)
> +				goto stop;
>  		}
>  		if (sbi->skipped_gc_rwsem)
>  			skipped_round++;
> @@ -1897,21 +1964,6 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control)
>  		goto stop;
>  	}
>  
> -	__get_secs_required(sbi, NULL, &upper_secs, NULL);
> -
> -	/*
> -	 * Write checkpoint to reclaim prefree segments.
> -	 * We need more three extra sections for writer's data/node/dentry.
> -	 */
> -	if (free_sections(sbi) <= upper_secs + NR_GC_CHECKPOINT_SECS &&
> -				prefree_segments(sbi)) {
> -		stat_inc_cp_call_count(sbi, TOTAL_CALL);
> -		ret = f2fs_write_checkpoint(sbi, &cpc);
> -		if (ret)
> -			goto stop;
> -		/* Reset due to checkpoint */
> -		sec_freed = 0;
> -	}
>  go_gc_more:
>  	segno = NULL_SEGNO;
>  	goto gc_more;
> @@ -1920,8 +1972,10 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control)
>  	SIT_I(sbi)->last_victim[ALLOC_NEXT] = 0;
>  	SIT_I(sbi)->last_victim[FLUSH_DEVICE] = gc_control->victim_segno;
>  
> -	if (gc_type == FG_GC)
> +	if (gc_type == FG_GC) {
>  		f2fs_unpin_all_sections(sbi, true);
> +		sbi->need_node_clean = false;
> +	}
>  
>  	trace_f2fs_gc_end(sbi->sb, ret, total_freed, total_sec_freed,
>  				get_pages(sbi, F2FS_DIRTY_NODES),
> -- 
> 2.34.1

Yonggil Song Dec. 14, 2023, 2:26 a.m. UTC | #2

> On 12/08, Yonggil Song wrote:
> > Overview
> > ========
> > 
> > This patch introduces a new way to preference data sections when selecting
> > GC victims. Migration of data blocks causes invalidation of node blocks.
> > Therefore, in situations where GC is frequent, selecting data blocks as
> > victims can reduce unnecessary block migration by invalidating node blocks.
> > For exceptional situations where free sections are insufficient, node blocks
> > are selected as victims instead of data blocks to get extra free sections.
> > 
> > Problem
> > =======
> > 
> > If the total amount of nodes is larger than the size of one section, nodes
> > occupy multiple sections, and node victims are often selected because the
> > gc cost is lowered by data block migration in GC. Since moving the data
> > section causes frequent node victim selection, victim threshing occurs in
> > the node section. This results in an increase in WAF.
> > 
> > Experiment
> > ==========
> > 
> > Test environment is as follows.
> > 
> >         System info
> >           - 3.6GHz, 16 core CPU
> >           - 36GiB Memory
> >         Device info
> >           - a conventional null_blk with 228MiB
> >           - a sequential null_blk with 4068 zones of 8MiB
> >         Format
> >           - mkfs.f2fs <conv null_blk> -c <seq null_blk> -m -Z 8 -o 3.89
> >         Mount
> >           - mount <conv null_blk> <mount point>
> >         Fio script
> > 	  - fio --rw=randwrite --bs=4k --ba=4k --filesize=31187m --norandommap --overwrite=1 --name=job1 --filename=./mnt/sustain --io_size=128g
> > 	WAF calculation
> > 	  - (IOs on conv. null_blk + IOs on seq. null_blk) / random write IOs
> > 
> > Conclusion
> > ==========
> > 
> > This experiment showed that the WAF was reduced by 29% (18.75 -> 13.3) when
> > the data section was selected first when selecting GC victims. This was
> > achieved by reducing the migration of the node blocks by 69.4%
> > (253,131,743 blks -> 77,463,278 blks). It is possible to achieve low WAF
> > performance with the GC victim selection method in environments where the
> > section size is relatively small.
> > 
> > Signed-off-by: Yonggil Song <yonggil.song@samsung.com>
> > ---
> >  fs/f2fs/f2fs.h |  1 +
> >  fs/f2fs/gc.c   | 98 ++++++++++++++++++++++++++++++++++++++------------
> >  2 files changed, 77 insertions(+), 22 deletions(-)
> > 
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index 9043cedfa12b..578d57f6022f 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -1649,6 +1649,7 @@ struct f2fs_sb_info {
> >  	struct f2fs_mount_info mount_opt;	/* mount options */
> >  
> >  	/* for cleaning operations */
> > +	bool need_node_clean;			/* need to clean dirty nodes */
> >  	struct f2fs_rwsem gc_lock;		/*
> >  						 * semaphore for GC, avoid
> >  						 * race between GC and GC or CP
> > diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> > index f550cdeaa663..682dcf0de59e 100644
> > --- a/fs/f2fs/gc.c
> > +++ b/fs/f2fs/gc.c
> > @@ -368,6 +368,14 @@ static inline unsigned int get_gc_cost(struct f2fs_sb_info *sbi,
> >  	if (p->alloc_mode == SSR)
> >  		return get_seg_entry(sbi, segno)->ckpt_valid_blocks;
> >  
> > +	/*
> > +	 * If we don't need to clean dirty nodes,
> > +	 * we just skip node victims.
> > +	 */
> > +	if (IS_NODESEG(get_seg_entry(sbi, segno)->type) &&
> > +		!sbi->need_node_clean)
> > +		return get_max_cost(sbi, p);
> 
> How about differentiating the gc cost between data vs. node by adding some
> weights? By default, data is preferred, while node is better in the worst case?
> 

Okay, I will work on v3 with your comments

> > +
> >  	/* alloc_mode == LFS */
> >  	if (p->gc_mode == GC_GREEDY)
> >  		return get_valid_blocks(sbi, segno, true);
> > @@ -557,6 +565,14 @@ static void atgc_lookup_victim(struct f2fs_sb_info *sbi,
> >  	if (ve->mtime >= max_mtime || ve->mtime < min_mtime)
> >  		goto skip;
> >  
> > +	/*
> > +	 * If we don't need to clean dirty nodes,
> > +	 * we just skip node victims.
> > +	 */
> > +	if (IS_NODESEG(get_seg_entry(sbi, ve->segno)->type) &&
> > +	    !sbi->need_node_clean)
> > +		goto skip;
> > +
> >  	/* age = 10000 * x% * 60 */
> >  	age = div64_u64(accu * (max_mtime - ve->mtime), total_time) *
> >  								age_weight;
> > @@ -913,7 +929,21 @@ int f2fs_get_victim(struct f2fs_sb_info *sbi, unsigned int *result,
> >  		goto retry;
> >  	}
> >  
> > +
> >  	if (p.min_segno != NULL_SEGNO) {
> > +		if (sbi->need_node_clean &&
> > +		    IS_DATASEG(get_seg_entry(sbi, p.min_segno)->type)) {
> > +			 /*
> > +			  * we need to clean node sections.
> > +			  * but, data victim cost is the lowest.
> > +			  * if free sections are enough, stop cleaning node victim.
> > +			  * if not, it goes on by GCing data victims.
> > +			  */
> > +			if (has_enough_free_secs(sbi, prefree_segments(sbi), 0)) {
> > +				p.min_segno = NULL_SEGNO;
> > +				goto out;
> > +			}
> > +		}
> >  got_it:
> >  		*result = (p.min_segno / p.ofs_unit) * p.ofs_unit;
> >  got_result:
> > @@ -1830,8 +1860,27 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control)
> >  		goto stop;
> >  	}
> >  
> > +	__get_secs_required(sbi, NULL, &upper_secs, NULL);
> > +
> > +	/*
> > +	 * Write checkpoint to reclaim prefree segments.
> > +	 * We need more three extra sections for writer's data/node/dentry.
> > +	 */
> > +	if (free_sections(sbi) <= upper_secs + NR_GC_CHECKPOINT_SECS) {
> > +		sbi->need_node_clean = true;
> > +
> > +		if (prefree_segments(sbi)) {
> > +			stat_inc_cp_call_count(sbi, TOTAL_CALL);
> > +			ret = f2fs_write_checkpoint(sbi, &cpc);
> > +			if (ret)
> > +				goto stop;
> > +			/* Reset due to checkpoint */
> > +			sec_freed = 0;
> > +		}
> > +	}
> > +
> >  	/* Let's run FG_GC, if we don't have enough space. */
> > -	if (has_not_enough_free_secs(sbi, 0, 0)) {
> > +	if (gc_type == BG_GC && has_not_enough_free_secs(sbi, 0, 0)) {
> >  		gc_type = FG_GC;
> >  
> >  		/*
> > @@ -1858,10 +1907,22 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control)
> >  	ret = __get_victim(sbi, &segno, gc_type);
> >  	if (ret) {
> >  		/* allow to search victim from sections has pinned data */
> > -		if (ret == -ENODATA && gc_type == FG_GC &&
> > -				f2fs_pinned_section_exists(DIRTY_I(sbi))) {
> > -			f2fs_unpin_all_sections(sbi, false);
> > -			goto retry;
> > +		if (ret == -ENODATA && gc_type == FG_GC) {
> > +			if (f2fs_pinned_section_exists(DIRTY_I(sbi))) {
> > +				f2fs_unpin_all_sections(sbi, false);
> > +				goto retry;
> > +			}
> > +			/*
> > +			 * If we have no more data victims, let's start to
> > +			 * clean dirty nodes.
> > +			 */
> > +			if (!sbi->need_node_clean) {
> > +				sbi->need_node_clean = true;
> > +				goto retry;
> > +			}
> > +			/* node cleaning is over */
> > +			else if (sbi->need_node_clean)
> > +				goto stop;
> >  		}
> >  		goto stop;
> >  	}
> > @@ -1882,7 +1943,13 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control)
> >  			if (!gc_control->no_bg_gc &&
> >  			    total_sec_freed < gc_control->nr_free_secs)
> >  				goto go_gc_more;
> > -			goto stop;
> > +			/*
> > +			 * If the need_node_clean flag is set
> > +			 * even though there are enough free
> > +			 * sections, node cleaning will continue.
> > +			 */
> > +			if (!sbi->need_node_clean)
> > +				goto stop;
> >  		}
> >  		if (sbi->skipped_gc_rwsem)
> >  			skipped_round++;
> > @@ -1897,21 +1964,6 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control)
> >  		goto stop;
> >  	}
> >  
> > -	__get_secs_required(sbi, NULL, &upper_secs, NULL);
> > -
> > -	/*
> > -	 * Write checkpoint to reclaim prefree segments.
> > -	 * We need more three extra sections for writer's data/node/dentry.
> > -	 */
> > -	if (free_sections(sbi) <= upper_secs + NR_GC_CHECKPOINT_SECS &&
> > -				prefree_segments(sbi)) {
> > -		stat_inc_cp_call_count(sbi, TOTAL_CALL);
> > -		ret = f2fs_write_checkpoint(sbi, &cpc);
> > -		if (ret)
> > -			goto stop;
> > -		/* Reset due to checkpoint */
> > -		sec_freed = 0;
> > -	}
> >  go_gc_more:
> >  	segno = NULL_SEGNO;
> >  	goto gc_more;
> > @@ -1920,8 +1972,10 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control)
> >  	SIT_I(sbi)->last_victim[ALLOC_NEXT] = 0;
> >  	SIT_I(sbi)->last_victim[FLUSH_DEVICE] = gc_control->victim_segno;
> >  
> > -	if (gc_type == FG_GC)
> > +	if (gc_type == FG_GC) {
> >  		f2fs_unpin_all_sections(sbi, true);
> > +		sbi->need_node_clean = false;
> > +	}
> >  
> >  	trace_f2fs_gc_end(sbi->sb, ret, total_freed, total_sec_freed,
> >  				get_pages(sbi, F2FS_DIRTY_NODES),
> > -- 
> > 2.34.1

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 9043cedfa12b..578d57f6022f 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1649,6 +1649,7 @@  struct f2fs_sb_info {
 	struct f2fs_mount_info mount_opt;	/* mount options */
 
 	/* for cleaning operations */
+	bool need_node_clean;			/* need to clean dirty nodes */
 	struct f2fs_rwsem gc_lock;		/*
 						 * semaphore for GC, avoid
 						 * race between GC and GC or CP
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index f550cdeaa663..682dcf0de59e 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -368,6 +368,14 @@  static inline unsigned int get_gc_cost(struct f2fs_sb_info *sbi,
 	if (p->alloc_mode == SSR)
 		return get_seg_entry(sbi, segno)->ckpt_valid_blocks;
 
+	/*
+	 * If we don't need to clean dirty nodes,
+	 * we just skip node victims.
+	 */
+	if (IS_NODESEG(get_seg_entry(sbi, segno)->type) &&
+		!sbi->need_node_clean)
+		return get_max_cost(sbi, p);
+
 	/* alloc_mode == LFS */
 	if (p->gc_mode == GC_GREEDY)
 		return get_valid_blocks(sbi, segno, true);
@@ -557,6 +565,14 @@  static void atgc_lookup_victim(struct f2fs_sb_info *sbi,
 	if (ve->mtime >= max_mtime || ve->mtime < min_mtime)
 		goto skip;
 
+	/*
+	 * If we don't need to clean dirty nodes,
+	 * we just skip node victims.
+	 */
+	if (IS_NODESEG(get_seg_entry(sbi, ve->segno)->type) &&
+	    !sbi->need_node_clean)
+		goto skip;
+
 	/* age = 10000 * x% * 60 */
 	age = div64_u64(accu * (max_mtime - ve->mtime), total_time) *
 								age_weight;
@@ -913,7 +929,21 @@  int f2fs_get_victim(struct f2fs_sb_info *sbi, unsigned int *result,
 		goto retry;
 	}
 
+
 	if (p.min_segno != NULL_SEGNO) {
+		if (sbi->need_node_clean &&
+		    IS_DATASEG(get_seg_entry(sbi, p.min_segno)->type)) {
+			 /*
+			  * we need to clean node sections.
+			  * but, data victim cost is the lowest.
+			  * if free sections are enough, stop cleaning node victim.
+			  * if not, it goes on by GCing data victims.
+			  */
+			if (has_enough_free_secs(sbi, prefree_segments(sbi), 0)) {
+				p.min_segno = NULL_SEGNO;
+				goto out;
+			}
+		}
 got_it:
 		*result = (p.min_segno / p.ofs_unit) * p.ofs_unit;
 got_result:
@@ -1830,8 +1860,27 @@  int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control)
 		goto stop;
 	}
 
+	__get_secs_required(sbi, NULL, &upper_secs, NULL);
+
+	/*
+	 * Write checkpoint to reclaim prefree segments.
+	 * We need more three extra sections for writer's data/node/dentry.
+	 */
+	if (free_sections(sbi) <= upper_secs + NR_GC_CHECKPOINT_SECS) {
+		sbi->need_node_clean = true;
+
+		if (prefree_segments(sbi)) {
+			stat_inc_cp_call_count(sbi, TOTAL_CALL);
+			ret = f2fs_write_checkpoint(sbi, &cpc);
+			if (ret)
+				goto stop;
+			/* Reset due to checkpoint */
+			sec_freed = 0;
+		}
+	}
+
 	/* Let's run FG_GC, if we don't have enough space. */
-	if (has_not_enough_free_secs(sbi, 0, 0)) {
+	if (gc_type == BG_GC && has_not_enough_free_secs(sbi, 0, 0)) {
 		gc_type = FG_GC;
 
 		/*
@@ -1858,10 +1907,22 @@  int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control)
 	ret = __get_victim(sbi, &segno, gc_type);
 	if (ret) {
 		/* allow to search victim from sections has pinned data */
-		if (ret == -ENODATA && gc_type == FG_GC &&
-				f2fs_pinned_section_exists(DIRTY_I(sbi))) {
-			f2fs_unpin_all_sections(sbi, false);
-			goto retry;
+		if (ret == -ENODATA && gc_type == FG_GC) {
+			if (f2fs_pinned_section_exists(DIRTY_I(sbi))) {
+				f2fs_unpin_all_sections(sbi, false);
+				goto retry;
+			}
+			/*
+			 * If we have no more data victims, let's start to
+			 * clean dirty nodes.
+			 */
+			if (!sbi->need_node_clean) {
+				sbi->need_node_clean = true;
+				goto retry;
+			}
+			/* node cleaning is over */
+			else if (sbi->need_node_clean)
+				goto stop;
 		}
 		goto stop;
 	}
@@ -1882,7 +1943,13 @@  int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control)
 			if (!gc_control->no_bg_gc &&
 			    total_sec_freed < gc_control->nr_free_secs)
 				goto go_gc_more;
-			goto stop;
+			/*
+			 * If the need_node_clean flag is set
+			 * even though there are enough free
+			 * sections, node cleaning will continue.
+			 */
+			if (!sbi->need_node_clean)
+				goto stop;
 		}
 		if (sbi->skipped_gc_rwsem)
 			skipped_round++;
@@ -1897,21 +1964,6 @@  int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control)
 		goto stop;
 	}
 
-	__get_secs_required(sbi, NULL, &upper_secs, NULL);
-
-	/*
-	 * Write checkpoint to reclaim prefree segments.
-	 * We need more three extra sections for writer's data/node/dentry.
-	 */
-	if (free_sections(sbi) <= upper_secs + NR_GC_CHECKPOINT_SECS &&
-				prefree_segments(sbi)) {
-		stat_inc_cp_call_count(sbi, TOTAL_CALL);
-		ret = f2fs_write_checkpoint(sbi, &cpc);
-		if (ret)
-			goto stop;
-		/* Reset due to checkpoint */
-		sec_freed = 0;
-	}
 go_gc_more:
 	segno = NULL_SEGNO;
 	goto gc_more;
@@ -1920,8 +1972,10 @@  int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control)
 	SIT_I(sbi)->last_victim[ALLOC_NEXT] = 0;
 	SIT_I(sbi)->last_victim[FLUSH_DEVICE] = gc_control->victim_segno;
 
-	if (gc_type == FG_GC)
+	if (gc_type == FG_GC) {
 		f2fs_unpin_all_sections(sbi, true);
+		sbi->need_node_clean = false;
+	}
 
 	trace_f2fs_gc_end(sbi->sb, ret, total_freed, total_sec_freed,
 				get_pages(sbi, F2FS_DIRTY_NODES),

[f2fs-dev,v2] f2fs: New victim selection for GC

Commit Message

Comments

Patch