diff mbox

[1/2] btrfs: Fix out-of-space bug caused by searching commit_root in find_free_dev_extent()

Message ID 1423212122-12952-2-git-send-email-zhaolei@cn.fujitsu.com (mailing list archive)
State New, archived
Headers show

Commit Message

Zhaolei Feb. 6, 2015, 8:42 a.m. UTC
From: Zhao Lei <zhaolei@cn.fujitsu.com>

Btrfs will report NO_SPACE when we create and remove files for several times:
1: Create a single-dev btrfs fs with default option
2: Write a file into it to take up most fs space
3: Delete above file
4: Wait about 100s to let chunk removed
5: goto 2

Script is like following:
 #!/bin/bash

 # Recommend 1.2G space, too large disk will make test slow
 DEV="/dev/sda16"
 MNT="/mnt/tmp"

 dev_size="$(lsblk -bn -o SIZE "$DEV")" || exit 2
 file_size_m=$((dev_size * 75 / 100 / 1024 / 1024))

 echo "Loop write ${file_size_m}M file on $((dev_size / 1024 / 1024))M dev"

 for ((i = 0; i < 10; i++)); do umount "$MNT" 2>/dev/null; done
 echo "mkfs $DEV"
 mkfs.btrfs -f "$DEV" >/dev/null || exit 2
 echo "mount $DEV $MNT"
 mount "$DEV" "$MNT" || exit 2

 for ((loop_i = 0; loop_i < 20; loop_i++)); do
     echo
     echo "loop $loop_i"

     echo "dd file..."
     cmd=(dd if=/dev/zero of="$MNT"/file0 bs=1M count="$file_size_m")
     "${cmd[@]}" 2>/dev/null || {
         # NO_SPACE error triggered
         echo "dd failed: ${cmd[*]}"
         exit 1
     }

     echo "rm file..."
     rm -f "$MNT"/file0 || exit 2

     for ((i = 0; i < 10; i++)); do
         df "$MNT" | tail -1
         sleep 10
     done
 done

Reason:
Btrfs get free disk space by calling find_free_dev_extent() when
re-create chunk for write, but current code set search_commit_root
flag in searching tree, so disk spaces which was freed from dev_tree
but not commited can not get from find_free_dev_extent(), then
btrfs report NO_SPACE on above case.

Fix:
Remove "path->search_commit_root = 1" in find_free_dev_extent().

Tested by above script, and confirmed action with many printk.

Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
---
 fs/btrfs/volumes.c | 1 -
 1 file changed, 1 deletion(-)

Comments

Filipe Manana Feb. 6, 2015, 9:54 a.m. UTC | #1
On Fri, Feb 6, 2015 at 8:42 AM, Zhaolei <zhaolei@cn.fujitsu.com> wrote:
> From: Zhao Lei <zhaolei@cn.fujitsu.com>
>
> Btrfs will report NO_SPACE when we create and remove files for several times:
> 1: Create a single-dev btrfs fs with default option
> 2: Write a file into it to take up most fs space
> 3: Delete above file
> 4: Wait about 100s to let chunk removed
> 5: goto 2
>
> Script is like following:
>  #!/bin/bash
>
>  # Recommend 1.2G space, too large disk will make test slow
>  DEV="/dev/sda16"
>  MNT="/mnt/tmp"
>
>  dev_size="$(lsblk -bn -o SIZE "$DEV")" || exit 2
>  file_size_m=$((dev_size * 75 / 100 / 1024 / 1024))
>
>  echo "Loop write ${file_size_m}M file on $((dev_size / 1024 / 1024))M dev"
>
>  for ((i = 0; i < 10; i++)); do umount "$MNT" 2>/dev/null; done
>  echo "mkfs $DEV"
>  mkfs.btrfs -f "$DEV" >/dev/null || exit 2
>  echo "mount $DEV $MNT"
>  mount "$DEV" "$MNT" || exit 2
>
>  for ((loop_i = 0; loop_i < 20; loop_i++)); do
>      echo
>      echo "loop $loop_i"
>
>      echo "dd file..."
>      cmd=(dd if=/dev/zero of="$MNT"/file0 bs=1M count="$file_size_m")
>      "${cmd[@]}" 2>/dev/null || {
>          # NO_SPACE error triggered
>          echo "dd failed: ${cmd[*]}"
>          exit 1
>      }
>
>      echo "rm file..."
>      rm -f "$MNT"/file0 || exit 2
>
>      for ((i = 0; i < 10; i++)); do
>          df "$MNT" | tail -1
>          sleep 10
>      done
>  done
>
> Reason:
> Btrfs get free disk space by calling find_free_dev_extent() when
> re-create chunk for write, but current code set search_commit_root
> flag in searching tree, so disk spaces which was freed from dev_tree
> but not commited can not get from find_free_dev_extent(), then
> btrfs report NO_SPACE on above case.

Yes, and that is correct.
If you re-use the same physical space within the same transaction,
write to it and a crash happens before writing the next super block
(transaction commit), the next time the fs is mounted it will have
data or metadata corrupted.

With your change this could happen:

1) Delete metadata block group X, which takes physical space [2G, 3G]
(assuming single metadata profile);

2) A chunk allocation for a data block group happens, and it gets that
same physical space in the range [2G, 3G];

3) Data is written into it - for e.g. due to memory pressure, the VM
calls writepages() for some inodes;

4) Crash/reboot happens before the transaction commits;

5) On the next mount, you have metadata corrupted - there's now file
data instead of btree nodes/leafs in metadata the block group at the
physical range [2G, 3G]

That said, unless I missed something, I disagree with this patch.

>
> Fix:
> Remove "path->search_commit_root = 1" in find_free_dev_extent().
>
> Tested by above script, and confirmed action with many printk.
>
> Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
>
> Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
> ---
>  fs/btrfs/volumes.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 50c5a87..5cd0930 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1147,7 +1147,6 @@ again:
>         }
>
>         path->reada = 2;
> -       path->search_commit_root = 1;
>         path->skip_locking = 1;
>
>         key.objectid = device->devid;
> --
> 1.8.5.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhaolei Feb. 6, 2015, 10:50 a.m. UTC | #2
Hi, Filipe

> > Reason:
> > Btrfs get free disk space by calling find_free_dev_extent() when
> > re-create chunk for write, but current code set search_commit_root
> > flag in searching tree, so disk spaces which was freed from dev_tree
> > but not commited can not get from find_free_dev_extent(), then
> > btrfs report NO_SPACE on above case.
> 
> Yes, and that is correct.
> If you re-use the same physical space within the same transaction,
> write to it and a crash happens before writing the next super block
> (transaction commit), the next time the fs is mounted it will have
> data or metadata corrupted.
> 
> With your change this could happen:
> 
> 1) Delete metadata block group X, which takes physical space [2G, 3G]
> (assuming single metadata profile);
> 
> 2) A chunk allocation for a data block group happens, and it gets that
> same physical space in the range [2G, 3G];
> 
> 3) Data is written into it - for e.g. due to memory pressure, the VM
> calls writepages() for some inodes;
> 
> 4) Crash/reboot happens before the transaction commits;
> 
> 5) On the next mount, you have metadata corrupted - there's now file
> data instead of btree nodes/leafs in metadata the block group at the
> physical range [2G, 3G]
> 
Thanks for notice.

Since above NO_SPACE bug is still exist and need to be fixed, maybe we
need to do a commit transaction after remove_bgs.
I'll test this way and resend.

Thanks
Zhaolei

> That said, unless I missed something, I disagree with this patch.
> 
> >
> > Fix:
> > Remove "path->search_commit_root = 1" in find_free_dev_extent().
> >
> > Tested by above script, and confirmed action with many printk.
> >
> > Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
> >
> > Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
> > ---
> >  fs/btrfs/volumes.c | 1 -
> >  1 file changed, 1 deletion(-)
> >
> > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> > index 50c5a87..5cd0930 100644
> > --- a/fs/btrfs/volumes.c
> > +++ b/fs/btrfs/volumes.c
> > @@ -1147,7 +1147,6 @@ again:
> >         }
> >
> >         path->reada = 2;
> > -       path->search_commit_root = 1;
> >         path->skip_locking = 1;
> >
> >         key.objectid = device->devid;
> > --
> > 1.8.5.1
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> --
> Filipe David Manana,
> 
> "Reasonable men adapt themselves to the world.
>  Unreasonable men adapt the world to themselves.
>  That's why all progress depends on unreasonable men."


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 50c5a87..5cd0930 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1147,7 +1147,6 @@  again:
 	}
 
 	path->reada = 2;
-	path->search_commit_root = 1;
 	path->skip_locking = 1;
 
 	key.objectid = device->devid;