[12/12] btrfs: add correction to handle -1 edge case in async discard
diff mbox series

Message ID f00ffdb40462c1dd9b611ee06cf19b2d495e398b.1577999991.git.dennis@kernel.org
State New
Headers show
Series
  • btrfs: async discard follow up
Related show

Commit Message

Dennis Zhou Jan. 2, 2020, 9:26 p.m. UTC
From Dave's testing, it's possible to drive a file system to have -1
discardable_extents and a corresponding negative discardable_bytes. As
btrfs_discard_calc_delay() is the only user of discardable_extents, we
can correct here for any negative discardable_extents/discardable_bytes.

Reported-by: David Sterba <dsterba@suse.com>
Signed-off-by: Dennis Zhou <dennis@kernel.org>
---
 fs/btrfs/discard.c | 24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

Comments

David Sterba Jan. 3, 2020, 2:42 p.m. UTC | #1
On Thu, Jan 02, 2020 at 04:26:46PM -0500, Dennis Zhou wrote:
> >From Dave's testing, it's possible to drive a file system to have -1
> discardable_extents and a corresponding negative discardable_bytes. As
> btrfs_discard_calc_delay() is the only user of discardable_extents, we
> can correct here for any negative discardable_extents/discardable_bytes.

Changelog updated with the rough description of the workload.
Nikolay Borisov Jan. 5, 2020, 8:35 p.m. UTC | #2
On 2.01.20 г. 23:26 ч., Dennis Zhou wrote:
> From Dave's testing, it's possible to drive a file system to have -1
> discardable_extents and a corresponding negative discardable_bytes. As
> btrfs_discard_calc_delay() is the only user of discardable_extents, we
> can correct here for any negative discardable_extents/discardable_bytes.
> 
> Reported-by: David Sterba <dsterba@suse.com>
> Signed-off-by: Dennis Zhou <dennis@kernel.org>
> ---
>  fs/btrfs/discard.c | 24 +++++++++++++++++++++---
>  1 file changed, 21 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
> index d5a89e3755ed..d2c7851e31de 100644
> --- a/fs/btrfs/discard.c
> +++ b/fs/btrfs/discard.c
> @@ -518,14 +518,32 @@ void btrfs_discard_calc_delay(struct btrfs_discard_ctl *discard_ctl)
>  {
>  	s32 discardable_extents =
>  		atomic_read(&discard_ctl->discardable_extents);
> +	s64 discardable_bytes = atomic64_read(&discard_ctl->discardable_bytes);
>  	unsigned iops_limit;
>  	unsigned long delay, lower_limit = BTRFS_DISCARD_MIN_DELAY_MSEC;
>  
> -	if (!discardable_extents)
> -		return;
> -
>  	spin_lock(&discard_ctl->lock);
>  
> +	/*
> +	 * The following is to fix a potential -1 discrepenancy that I'm not
> +	 * sure how to reproduce.  But given that this is the only place that
> +	 * utilizes these numbers and this is only called by from
> +	 * btrfs_finish_extent_commit() which is synchronized, we can correct
> +	 * here.
> +	 */
> +	if (discardable_extents < 0)
> +		atomic_add(-discardable_extents,
> +			   &discard_ctl->discardable_extents);
> +
> +	if (discardable_bytes < 0)
> +		atomic64_add(-discardable_bytes,
> +			     &discard_ctl->discardable_bytes);
> +
> +	if (discardable_extents <= 0) {
> +		spin_unlock(&discard_ctl->lock);
> +		return;
> +	}

Perhaps a WARN_ON for each of those conditions? AFAIU this is papering
over a real issue which is still not fully diagnosed, no? In this case
if someone hits it in the wild they could come back with some stack traces?

> +
>  	iops_limit = READ_ONCE(discard_ctl->iops_limit);
>  	if (iops_limit)
>  		lower_limit = max_t(unsigned long, lower_limit,
>
David Sterba Jan. 6, 2020, 1:44 p.m. UTC | #3
On Sun, Jan 05, 2020 at 10:35:56PM +0200, Nikolay Borisov wrote:
> > +	/*
> > +	 * The following is to fix a potential -1 discrepenancy that I'm not
> > +	 * sure how to reproduce.  But given that this is the only place that
> > +	 * utilizes these numbers and this is only called by from
> > +	 * btrfs_finish_extent_commit() which is synchronized, we can correct
> > +	 * here.
> > +	 */
> > +	if (discardable_extents < 0)
> > +		atomic_add(-discardable_extents,
> > +			   &discard_ctl->discardable_extents);
> > +
> > +	if (discardable_bytes < 0)
> > +		atomic64_add(-discardable_bytes,
> > +			     &discard_ctl->discardable_bytes);
> > +
> > +	if (discardable_extents <= 0) {
> > +		spin_unlock(&discard_ctl->lock);
> > +		return;
> > +	}
> 
> Perhaps a WARN_ON for each of those conditions? AFAIU this is papering
> over a real issue which is still not fully diagnosed, no? In this case
> if someone hits it in the wild they could come back with some stack traces?

I don't think the stacktrace itself would help us, the call chain will
be always the same. We need a reproducer for it and random user reports
would likely not help either, besides that the issue happens. Some sort
of developer-only warning would be desirable though.

Patch
diff mbox series

diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
index d5a89e3755ed..d2c7851e31de 100644
--- a/fs/btrfs/discard.c
+++ b/fs/btrfs/discard.c
@@ -518,14 +518,32 @@  void btrfs_discard_calc_delay(struct btrfs_discard_ctl *discard_ctl)
 {
 	s32 discardable_extents =
 		atomic_read(&discard_ctl->discardable_extents);
+	s64 discardable_bytes = atomic64_read(&discard_ctl->discardable_bytes);
 	unsigned iops_limit;
 	unsigned long delay, lower_limit = BTRFS_DISCARD_MIN_DELAY_MSEC;
 
-	if (!discardable_extents)
-		return;
-
 	spin_lock(&discard_ctl->lock);
 
+	/*
+	 * The following is to fix a potential -1 discrepenancy that I'm not
+	 * sure how to reproduce.  But given that this is the only place that
+	 * utilizes these numbers and this is only called by from
+	 * btrfs_finish_extent_commit() which is synchronized, we can correct
+	 * here.
+	 */
+	if (discardable_extents < 0)
+		atomic_add(-discardable_extents,
+			   &discard_ctl->discardable_extents);
+
+	if (discardable_bytes < 0)
+		atomic64_add(-discardable_bytes,
+			     &discard_ctl->discardable_bytes);
+
+	if (discardable_extents <= 0) {
+		spin_unlock(&discard_ctl->lock);
+		return;
+	}
+
 	iops_limit = READ_ONCE(discard_ctl->iops_limit);
 	if (iops_limit)
 		lower_limit = max_t(unsigned long, lower_limit,