diff mbox

[1/1] btrfs: Fix NO_SPACE bug caused by delayed-iput

Message ID 1424920823-12176-2-git-send-email-zhaolei@cn.fujitsu.com (mailing list archive)
State Superseded
Headers show

Commit Message

Zhaolei Feb. 26, 2015, 3:20 a.m. UTC
From: Zhao Lei <zhaolei@cn.fujitsu.com>

Steps to reproduce:
  while true; do
    dd if=/dev/zero of=/btrfs_dir/file count=[fs_size * 75%]
    rm /btrfs_dir/file
    sync
  done

  And we'll see dd failed because btrfs return NO_SPACE.

Reason:
  Normally, btrfs_commit_transaction() call btrfs_run_delayed_iputs()
  in end to free fs space for next write, but sometimes it hadn't
  done work on time, because btrfs-cleaner thread get delayed-iputs
  from list before, but do iput() after next write.

  This is log:
  [ 2569.050776] comm=btrfs-cleaner func=btrfs_evict_inode() begin

  [ 2569.084280] comm=sync func=btrfs_commit_transaction() call btrfs_run_delayed_iputs()
  [ 2569.085418] comm=sync func=btrfs_commit_transaction() done btrfs_run_delayed_iputs()
  [ 2569.087554] comm=sync func=btrfs_commit_transaction() end

  [ 2569.191081] comm=dd begin
  [ 2569.790112] comm=dd func=__btrfs_buffered_write() ret=-28

  [ 2569.847479] comm=btrfs-cleaner func=add_pinned_bytes() 0 + 32677888 = 32677888
  [ 2569.849530] comm=btrfs-cleaner func=add_pinned_bytes() 32677888 + 23834624 = 56512512
  ...
  [ 2569.903893] comm=btrfs-cleaner func=add_pinned_bytes() 943976448 + 21762048 = 965738496
  [ 2569.908270] comm=btrfs-cleaner func=btrfs_evict_inode() end

Fix:
  Make btrfs_commit_transaction() wait current running delayed-iputs()
  done in end.

Test:
  Use script similar to above(more complex),
  before patch:
    7 failed in 100 * 20 loop.
  after patch:
    0 failed in 100 * 20 loop.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
---
 fs/btrfs/ctree.h       | 1 +
 fs/btrfs/disk-io.c     | 3 ++-
 fs/btrfs/inode.c       | 4 ++++
 fs/btrfs/transaction.c | 6 +++++-
 4 files changed, 12 insertions(+), 2 deletions(-)

Comments

David Sterba Feb. 26, 2015, 10:29 p.m. UTC | #1
On Thu, Feb 26, 2015 at 11:20:23AM +0800, Zhaolei wrote:
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -2068,8 +2068,12 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
>  
>  	kmem_cache_free(btrfs_trans_handle_cachep, trans);
>  
> -	if (current != root->fs_info->transaction_kthread)
> +	if (current != root->fs_info->transaction_kthread) {
>  		btrfs_run_delayed_iputs(root);

Using a mutex for this kind of synchronization is not entirely correct
though it works.

> +		/* make sure that all running delayed iput are done */
> +		down_write(&root->fs_info->delayed_iput_sem);
> +		up_write(&root->fs_info->delayed_iput_sem);
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhaolei Feb. 27, 2015, 1:21 a.m. UTC | #2
Hi, David Sterba

* From: David Sterba [mailto:dsterba@suse.cz]
> Sent: Friday, February 27, 2015 6:29 AM
> To: Zhaolei
> Cc: linux-btrfs@vger.kernel.org
> Subject: Re: [PATCH 1/1] btrfs: Fix NO_SPACE bug caused by delayed-iput
> 
> On Thu, Feb 26, 2015 at 11:20:23AM +0800, Zhaolei wrote:
> > --- a/fs/btrfs/transaction.c
> > +++ b/fs/btrfs/transaction.c
> > @@ -2068,8 +2068,12 @@ int btrfs_commit_transaction(struct
> > btrfs_trans_handle *trans,
> >
> >  	kmem_cache_free(btrfs_trans_handle_cachep, trans);
> >
> > -	if (current != root->fs_info->transaction_kthread)
> > +	if (current != root->fs_info->transaction_kthread) {
> >  		btrfs_run_delayed_iputs(root);
> 
> Using a mutex for this kind of synchronization is not entirely correct though it
> works.
> 
rw sem is best way I thought which was choosed from 
mutex, rw_sem, and wait_queue, although the function name is not
self documented...
Do you have some suggestion for this kind of synchronization?

Thanks
Zhaolei

> > +		/* make sure that all running delayed iput are done */
> > +		down_write(&root->fs_info->delayed_iput_sem);
> > +		up_write(&root->fs_info->delayed_iput_sem);


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba March 18, 2015, 3:59 p.m. UTC | #3
On Fri, Feb 27, 2015 at 09:21:43AM +0800, Zhao Lei wrote:
> Hi, David Sterba
> 
> * From: David Sterba [mailto:dsterba@suse.cz]
> > Sent: Friday, February 27, 2015 6:29 AM
> > To: Zhaolei
> > Cc: linux-btrfs@vger.kernel.org
> > Subject: Re: [PATCH 1/1] btrfs: Fix NO_SPACE bug caused by delayed-iput
> > 
> > On Thu, Feb 26, 2015 at 11:20:23AM +0800, Zhaolei wrote:
> > > --- a/fs/btrfs/transaction.c
> > > +++ b/fs/btrfs/transaction.c
> > > @@ -2068,8 +2068,12 @@ int btrfs_commit_transaction(struct
> > > btrfs_trans_handle *trans,
> > >
> > >  	kmem_cache_free(btrfs_trans_handle_cachep, trans);
> > >
> > > -	if (current != root->fs_info->transaction_kthread)
> > > +	if (current != root->fs_info->transaction_kthread) {
> > >  		btrfs_run_delayed_iputs(root);
> > 
> > Using a mutex for this kind of synchronization is not entirely correct though it
> > works.
> > 
> rw sem is best way I thought which was choosed from 
> mutex, rw_sem, and wait_queue, although the function name is not
> self documented...
> Do you have some suggestion for this kind of synchronization?

No better ideas at the moment, please proceed.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 84c3b00..ec2dac0 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1513,6 +1513,7 @@  struct btrfs_fs_info {
 
 	spinlock_t delayed_iput_lock;
 	struct list_head delayed_iputs;
+	struct rw_semaphore delayed_iput_sem;
 
 	/* this protects tree_mod_seq_list */
 	spinlock_t tree_mod_seq_lock;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f79f385..1c0d8ec 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2241,11 +2241,12 @@  int open_ctree(struct super_block *sb,
 	spin_lock_init(&fs_info->qgroup_op_lock);
 	spin_lock_init(&fs_info->buffer_lock);
 	spin_lock_init(&fs_info->unused_bgs_lock);
-	mutex_init(&fs_info->unused_bg_unpin_mutex);
 	rwlock_init(&fs_info->tree_mod_log_lock);
+	mutex_init(&fs_info->unused_bg_unpin_mutex);
 	mutex_init(&fs_info->reloc_mutex);
 	mutex_init(&fs_info->delalloc_root_mutex);
 	seqlock_init(&fs_info->profiles_lock);
+	init_rwsem(&fs_info->delayed_iput_sem);
 
 	init_completion(&fs_info->kobj_unregister);
 	INIT_LIST_HEAD(&fs_info->dirty_cowonly_roots);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index a85c23d..a396bb9 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3087,6 +3087,8 @@  void btrfs_run_delayed_iputs(struct btrfs_root *root)
 	if (empty)
 		return;
 
+	down_read(&fs_info->delayed_iput_sem);
+
 	spin_lock(&fs_info->delayed_iput_lock);
 	list_splice_init(&fs_info->delayed_iputs, &list);
 	spin_unlock(&fs_info->delayed_iput_lock);
@@ -3097,6 +3099,8 @@  void btrfs_run_delayed_iputs(struct btrfs_root *root)
 		iput(delayed->inode);
 		kfree(delayed);
 	}
+
+	up_read(&fs_info->delayed_iput_sem);
 }
 
 /*
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 7e80f32..175cdef 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -2068,8 +2068,12 @@  int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
 
 	kmem_cache_free(btrfs_trans_handle_cachep, trans);
 
-	if (current != root->fs_info->transaction_kthread)
+	if (current != root->fs_info->transaction_kthread) {
 		btrfs_run_delayed_iputs(root);
+		/* make sure that all running delayed iput are done */
+		down_write(&root->fs_info->delayed_iput_sem);
+		up_write(&root->fs_info->delayed_iput_sem);
+	}
 
 	return ret;