diff mbox

xfstests/btrfs: add test for quota groups and drop snapshot

Message ID 20140709224150.GA5484@wotan.suse.de (mailing list archive)
State Not Applicable
Headers show

Commit Message

Mark Fasheh July 9, 2014, 10:41 p.m. UTC
Test btrfs quota group consistency operations during snapshot delete.  Btrfs
has had long standing issues with drop snapshot failing to properly account
for quota groups. This test crafts a snapshot tree with shared and exclusive
elements. The tree is removed and then quota group consistency is checked.

This issue is fixed by the linux kernel btrfs patch series:
   [PATCH 0/3] btrfs: qgroup fixes for btrfs_drop_snapshot
   [PATCH 1/3] btrfs: add trace for qgroup accounting
   [PATCH 2/3] btrfs: qgroup: account shared subtrees during snapshot delete
   [PATCH 3/3] Btrfs: __btrfs_mod_ref should always use no_quota

The following btrfsprogs patch set is needed for the actual check of qgroup
consistency:
   [PATCH 1/5] btrfs-progs: print qgroup excl as unsigned
   [PATCH 2/5] btrfs-progs: import ulist
   [PATCH 3/5] btrfs-progs: add quota group verify code
   [PATCH 4/5] btrfs-progs: show extent state for a subvolume
   [PATCH 5/5] btrfs-progs: ignore orphaned qgroups by default

The btrfsprogs patches can be found in the following repo:

https://github.com/markfasheh/btrfs-progs-patches/tree/qgroup-verify

This patch to xfstests can be found in the following repo:

https://github.com/markfasheh/xfstests-patches/tree/qgroup-drop-snapshot

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
---
 tests/btrfs/057     | 117 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/btrfs/057.out |   7 ++++
 tests/btrfs/group   |   1 +
 3 files changed, 125 insertions(+)
 create mode 100755 tests/btrfs/057
 create mode 100644 tests/btrfs/057.out

Comments

Dave Chinner July 10, 2014, 12:43 a.m. UTC | #1
On Wed, Jul 09, 2014 at 03:41:50PM -0700, Mark Fasheh wrote:
> Test btrfs quota group consistency operations during snapshot delete.  Btrfs
> has had long standing issues with drop snapshot failing to properly account
> for quota groups. This test crafts a snapshot tree with shared and exclusive
> elements. The tree is removed and then quota group consistency is checked.
> 
> This issue is fixed by the linux kernel btrfs patch series:
>    [PATCH 0/3] btrfs: qgroup fixes for btrfs_drop_snapshot
>    [PATCH 1/3] btrfs: add trace for qgroup accounting
>    [PATCH 2/3] btrfs: qgroup: account shared subtrees during snapshot delete
>    [PATCH 3/3] Btrfs: __btrfs_mod_ref should always use no_quota
> 
> The following btrfsprogs patch set is needed for the actual check of qgroup
> consistency:
>    [PATCH 1/5] btrfs-progs: print qgroup excl as unsigned
>    [PATCH 2/5] btrfs-progs: import ulist
>    [PATCH 3/5] btrfs-progs: add quota group verify code
>    [PATCH 4/5] btrfs-progs: show extent state for a subvolume
>    [PATCH 5/5] btrfs-progs: ignore orphaned qgroups by default
> 
> The btrfsprogs patches can be found in the following repo:
> 
> https://github.com/markfasheh/btrfs-progs-patches/tree/qgroup-verify
> 
> This patch to xfstests can be found in the following repo:
> 
> https://github.com/markfasheh/xfstests-patches/tree/qgroup-drop-snapshot
> 
....
> +rm -f $seqres.full
> +
> +_scratch_mkfs >/dev/null 2>&1
> +_scratch_mount
> +
> +# This always reproduces level 1 trees
> +maxfiles=100
> +
> +echo "create file set"
> +
> +# Make a bunch of small files in a directory. This is designed to expand
> +# the filesystem tree to something more than zero levels.
> +mkdir $SCRATCH_MNT/files
> +for i in `seq -w 0 $maxfiles`;
> +do
> +    dd status=none if=/dev/zero of=$SCRATCH_MNT/files/file$i bs=4096 count=4
> +done

	$XFS_IO_PROG -f -c "pwrite 0 16384" $SCRATCH_MNT/files/file$i > /dev/null

> +
> +# create a snapshot of what we just did
> +$BTRFS_UTIL_PROG fi sy $SCRATCH_MNT
> +$BTRFS_UTIL_PROG su sna $SCRATCH_MNT $SCRATCH_MNT/snap1
> +mv $SCRATCH_MNT/snap1/files $SCRATCH_MNT/snap1/old

You need to filter the output. i.e. _filter_scratch

> +# same thing as before but on the snapshot. this way we can generate
> +# some exclusively owned tree nodes.
> +echo "create file set on snapshot"
> +mkdir $SCRATCH_MNT/snap1/files
> +for i in `seq -w 0 $maxfiles`;
> +do
> +    dd status=none if=/dev/zero of=$SCRATCH_MNT/snap1/files/file$i bs=4096 count=4
> +done

Same again.

> +
> +# Enable qgroups now that we have our filesystem prepared. This
> +# will kick off a scan which we will have to wait for below.
> +$BTRFS_UTIL_PROG qu en $SCRATCH_MNT
> +sleep 30

That seems rather arbitrary. The sleeps you are adding add well over
a minute to the runtime, and a quota scan of a filesystem with 200
files should be almost instantenous.

> +_scratch_unmount
> +_scratch_mount

What is the purpose of this?

> +# Ok, delete the snapshot we made previously. Since btrfs drop
> +# snapshot is a delayed action with no way to force it, we have to
> +# impose another sleep here.
> +$BTRFS_UTIL_PROG su de $SCRATCH_MNT/snap1
> +sleep 45

That's indicative of a bug, yes?

> +_scratch_unmount
> +
> +# generate a qgroup report and look for inconsistent groups
> +$BTRFS_UTIL_PROG check --qgroup-report $SCRATCH_DEV | grep -q "Counts for qgroup.*different"
> +RETVAL=$?
> +if [ $RETVAL -eq 0 ]; then
> +    status=1
> +fi

RETVAL! Get your RETVAL here! RETVAL!

No need to shout ;)

> new file mode 100644
> index 0000000..b8a146c
> --- /dev/null
> +++ b/tests/btrfs/057.out
> @@ -0,0 +1,7 @@
> +QA output created by 057
> +create file set
> +FSSync '/xfstest2'
> +Create a snapshot of '/xfstest2' in '/xfstest2/snap1'
> +create file set on snapshot
> +Transaction commit: none (default)
> +Delete subvolume '/xfstest2/snap1'

The scratch mountpoint output is what requires filtering - it's
different for everyone, and so needs to anonymised to SCRATCH_MNT....

> diff --git a/tests/btrfs/group b/tests/btrfs/group
> index 2da7127..ebc38c5 100644
> --- a/tests/btrfs/group
> +++ b/tests/btrfs/group
> @@ -59,3 +59,4 @@
>  054 auto quick
>  055 auto quick
>  056 auto quick
> +057 auto quick

"quick" means the test takes less than a few seconds to execute.
This test takes a couple of minutes, so it should not be in the quick
group.

Cheers,

Dave.
Mark Fasheh July 10, 2014, 5:36 p.m. UTC | #2
Hey Dave, thanks for the patch review! Pretty much all of what you wrote
sounds good to me, there's just one or two items I wanted to clarify - those
comments are inline. Thanks again,

On Thu, Jul 10, 2014 at 10:43:30AM +1000, Dave Chinner wrote:
> On Wed, Jul 09, 2014 at 03:41:50PM -0700, Mark Fasheh wrote:
> > +
> > +# Enable qgroups now that we have our filesystem prepared. This
> > +# will kick off a scan which we will have to wait for below.
> > +$BTRFS_UTIL_PROG qu en $SCRATCH_MNT
> > +sleep 30
> 
> That seems rather arbitrary. The sleeps you are adding add well over
> a minute to the runtime, and a quota scan of a filesystem with 200
> files should be almost instantenous.

Yeah I'll bring that back down to 5 seconds? It's 30 from my testing because
I was being paranoid and neglected to update it for the rest of the world.


> > +_scratch_unmount
> > +_scratch_mount
> 
> What is the purpose of this?

This is kind of 'maximum paranoia' again from my own test script. The idea
was to make _absolutely_ certain that all metadata found it's way to disk
and won't be optimized in layout any more. There's a decent chance it
doesn't do anything but it doesn't seem a huge deal. I wasn't clear though -
do you want it removed or can I comment it for clarity?


> > +# Ok, delete the snapshot we made previously. Since btrfs drop
> > +# snapshot is a delayed action with no way to force it, we have to
> > +# impose another sleep here.
> > +$BTRFS_UTIL_PROG su de $SCRATCH_MNT/snap1
> > +sleep 45
> 
> That's indicative of a bug, yes?

No, that's just how it happens. In fact, if you unmount while a snapshot is
being dropped, progress of the drop will be recorded and it will be
continued on next mount. However, since we *must* have the drop_snapshot
complete for this test I have the large sleep. Unlike the previous sleep I
don't think this can be reduced by much :(
	--Mark

--
Mark Fasheh
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zach Brown July 10, 2014, 6:32 p.m. UTC | #3
On Thu, Jul 10, 2014 at 10:36:14AM -0700, Mark Fasheh wrote:
> On Thu, Jul 10, 2014 at 10:43:30AM +1000, Dave Chinner wrote:
> > On Wed, Jul 09, 2014 at 03:41:50PM -0700, Mark Fasheh wrote:
> > > +
> > > +# Enable qgroups now that we have our filesystem prepared. This
> > > +# will kick off a scan which we will have to wait for below.
> > > +$BTRFS_UTIL_PROG qu en $SCRATCH_MNT
> > > +sleep 30
> > 
> > That seems rather arbitrary. The sleeps you are adding add well over
> > a minute to the runtime, and a quota scan of a filesystem with 200
> > files should be almost instantenous.
> 
> Yeah I'll bring that back down to 5 seconds?

How long does it usually take?

What interfaces would be needed for this to work precisely so we don't
have to play this game ever again?

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark Fasheh July 10, 2014, 7 p.m. UTC | #4
On Thu, Jul 10, 2014 at 11:32:28AM -0700, Zach Brown wrote:
> On Thu, Jul 10, 2014 at 10:36:14AM -0700, Mark Fasheh wrote:
> > On Thu, Jul 10, 2014 at 10:43:30AM +1000, Dave Chinner wrote:
> > > On Wed, Jul 09, 2014 at 03:41:50PM -0700, Mark Fasheh wrote:
> > > > +
> > > > +# Enable qgroups now that we have our filesystem prepared. This
> > > > +# will kick off a scan which we will have to wait for below.
> > > > +$BTRFS_UTIL_PROG qu en $SCRATCH_MNT
> > > > +sleep 30
> > > 
> > > That seems rather arbitrary. The sleeps you are adding add well over
> > > a minute to the runtime, and a quota scan of a filesystem with 200
> > > files should be almost instantenous.
> > 
> > Yeah I'll bring that back down to 5 seconds?
> 
> How long does it usually take?
> 
> What interfaces would be needed for this to work precisely so we don't
> have to play this game ever again?

Well there's also the 'sleep 45' below because we need to be certain that
btrfs_drop_snapshot gets run. This was all a bit of a pain during debugging
to be honest.

So in my experience, an interface to make debugging easier would involve
running every delayed action in the file system to completion, including a
sync of dirty blocks to disk. In theory, this would include any delayed
actions that were kicked off as a result of the actions you are syncing.
You'd do it all from a point in time of course so that we don't spin forever
on a busy filesystem. I do not know whether this is feasible.

Given something like that, you'd just replace the calls to sleep with 'btrfs
fi synctheworldandwait' and know that on return, the actions you just queued
up completed.
	--Mark

--
Mark Fasheh
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zach Brown July 10, 2014, 7:05 p.m. UTC | #5
On Thu, Jul 10, 2014 at 12:00:55PM -0700, Mark Fasheh wrote:
> On Thu, Jul 10, 2014 at 11:32:28AM -0700, Zach Brown wrote:
> > On Thu, Jul 10, 2014 at 10:36:14AM -0700, Mark Fasheh wrote:
> > > On Thu, Jul 10, 2014 at 10:43:30AM +1000, Dave Chinner wrote:
> > > > On Wed, Jul 09, 2014 at 03:41:50PM -0700, Mark Fasheh wrote:
> > > > > +
> > > > > +# Enable qgroups now that we have our filesystem prepared. This
> > > > > +# will kick off a scan which we will have to wait for below.
> > > > > +$BTRFS_UTIL_PROG qu en $SCRATCH_MNT
> > > > > +sleep 30
> > > > 
> > > > That seems rather arbitrary. The sleeps you are adding add well over
> > > > a minute to the runtime, and a quota scan of a filesystem with 200
> > > > files should be almost instantenous.
> > > 
> > > Yeah I'll bring that back down to 5 seconds?
> > 
> > How long does it usually take?
> > 
> > What interfaces would be needed for this to work precisely so we don't
> > have to play this game ever again?
> 
> Well there's also the 'sleep 45' below because we need to be certain that
> btrfs_drop_snapshot gets run. This was all a bit of a pain during debugging
> to be honest.

Yeah.  It seems like there's an opportunity for sync flags in the
commands.

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark Fasheh July 10, 2014, 7:24 p.m. UTC | #6
On Thu, Jul 10, 2014 at 12:05:05PM -0700, Zach Brown wrote:
> On Thu, Jul 10, 2014 at 12:00:55PM -0700, Mark Fasheh wrote:
> > On Thu, Jul 10, 2014 at 11:32:28AM -0700, Zach Brown wrote:
> > > On Thu, Jul 10, 2014 at 10:36:14AM -0700, Mark Fasheh wrote:
> > > > On Thu, Jul 10, 2014 at 10:43:30AM +1000, Dave Chinner wrote:
> > > > > On Wed, Jul 09, 2014 at 03:41:50PM -0700, Mark Fasheh wrote:
> > > > > > +
> > > > > > +# Enable qgroups now that we have our filesystem prepared. This
> > > > > > +# will kick off a scan which we will have to wait for below.
> > > > > > +$BTRFS_UTIL_PROG qu en $SCRATCH_MNT
> > > > > > +sleep 30
> > > > > 
> > > > > That seems rather arbitrary. The sleeps you are adding add well over
> > > > > a minute to the runtime, and a quota scan of a filesystem with 200
> > > > > files should be almost instantenous.
> > > > 
> > > > Yeah I'll bring that back down to 5 seconds?
> > > 
> > > How long does it usually take?
> > > 
> > > What interfaces would be needed for this to work precisely so we don't
> > > have to play this game ever again?
> > 
> > Well there's also the 'sleep 45' below because we need to be certain that
> > btrfs_drop_snapshot gets run. This was all a bit of a pain during debugging
> > to be honest.
> 
> Yeah.  It seems like there's an opportunity for sync flags in the
> commands.

Yep, that would've helped.
	--Mark

--
Mark Fasheh
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner July 10, 2014, 9:10 p.m. UTC | #7
On Thu, Jul 10, 2014 at 10:36:14AM -0700, Mark Fasheh wrote:
> Hey Dave, thanks for the patch review! Pretty much all of what you wrote
> sounds good to me, there's just one or two items I wanted to clarify - those
> comments are inline. Thanks again,
> 
> On Thu, Jul 10, 2014 at 10:43:30AM +1000, Dave Chinner wrote:
> > On Wed, Jul 09, 2014 at 03:41:50PM -0700, Mark Fasheh wrote:
> > > +
> > > +# Enable qgroups now that we have our filesystem prepared. This
> > > +# will kick off a scan which we will have to wait for below.
> > > +$BTRFS_UTIL_PROG qu en $SCRATCH_MNT
> > > +sleep 30
> > 
> > That seems rather arbitrary. The sleeps you are adding add well over
> > a minute to the runtime, and a quota scan of a filesystem with 200
> > files should be almost instantenous.
> 
> Yeah I'll bring that back down to 5 seconds? It's 30 from my testing because
> I was being paranoid and neglected to update it for the rest of the world.

Be nice to have the btrfs command wait for it to complete. Not being
able to query the status of background work or wait for it is
somewhat user unfriendly. If you could poll, then a 1s sleep in a
poll loop would be fine. Short of that, then I guess sleep 5 is the
best we can do.

> 
> > > +_scratch_unmount
> > > +_scratch_mount
> > 
> > What is the purpose of this?
> 
> This is kind of 'maximum paranoia' again from my own test script. The idea
> was to make _absolutely_ certain that all metadata found it's way to disk
> and won't be optimized in layout any more. There's a decent chance it
> doesn't do anything but it doesn't seem a huge deal. I wasn't clear though -
> do you want it removed or can I comment it for clarity?

Comment. If someone reads the test in 2 years time they won't
have to ask "wtf?"...

> > > +# Ok, delete the snapshot we made previously. Since btrfs drop
> > > +# snapshot is a delayed action with no way to force it, we have to
> > > +# impose another sleep here.
> > > +$BTRFS_UTIL_PROG su de $SCRATCH_MNT/snap1
> > > +sleep 45
> > 
> > That's indicative of a bug, yes?
> 
> No, that's just how it happens. In fact, if you unmount while a snapshot is
> being dropped, progress of the drop will be recorded and it will be
> continued on next mount. However, since we *must* have the drop_snapshot
> complete for this test I have the large sleep. Unlike the previous sleep I
> don't think this can be reduced by much :(

Right, again the "can't wait or poll for status of background work"
issue comes up.  That's the bug in the UI I was refering to. I guess
that we'll just have to wait for a long time here. Pretty hacky,
though...

Cheers,

Dave.
Duncan July 10, 2014, 9:31 p.m. UTC | #8
Mark Fasheh posted on Thu, 10 Jul 2014 12:00:55 -0700 as excerpted:

> Given something like that, you'd just replace the calls to sleep with
> 'btrfs fi synctheworldandwait' and know that on return, the actions you
> just queued up completed.

I'll admit to not really knowing what I'm talking about here, but on 
first intuition, What about either btrfs filesystem sync calls, or 
mounting with the synctime (don't have time to look up the specific 
option ATM) mount option?  Normal sync is 30 seconds, but the mount 
option can be used to make that 5 seconds or whatever.  And I don't know 
whether btrfs filesystem sync is synchronous or not.

But that might help reduce it below 30 and 45 seconds, anyway, even if 
some sleep is still required.
David Sterba July 22, 2014, 6:01 p.m. UTC | #9
On Wed, Jul 09, 2014 at 03:41:50PM -0700, Mark Fasheh wrote:
> +$BTRFS_UTIL_PROG fi sy $SCRATCH_MNT
> +$BTRFS_UTIL_PROG su sna $SCRATCH_MNT $SCRATCH_MNT/snap1

> +$BTRFS_UTIL_PROG qu en $SCRATCH_MNT

> +$BTRFS_UTIL_PROG su de $SCRATCH_MNT/snap1

Please spell the full names of the subcommands.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba July 22, 2014, 6:10 p.m. UTC | #10
On Thu, Jul 10, 2014 at 12:00:55PM -0700, Mark Fasheh wrote:
> On Thu, Jul 10, 2014 at 11:32:28AM -0700, Zach Brown wrote:
> > On Thu, Jul 10, 2014 at 10:36:14AM -0700, Mark Fasheh wrote:
> > > On Thu, Jul 10, 2014 at 10:43:30AM +1000, Dave Chinner wrote:
> > > > On Wed, Jul 09, 2014 at 03:41:50PM -0700, Mark Fasheh wrote:
> > > > > +
> > > > > +# Enable qgroups now that we have our filesystem prepared. This
> > > > > +# will kick off a scan which we will have to wait for below.
> > > > > +$BTRFS_UTIL_PROG qu en $SCRATCH_MNT
> > > > > +sleep 30
> > > > 
> > > > That seems rather arbitrary. The sleeps you are adding add well over
> > > > a minute to the runtime, and a quota scan of a filesystem with 200
> > > > files should be almost instantenous.
> > > 
> > > Yeah I'll bring that back down to 5 seconds?
> > 
> > How long does it usually take?
> > 
> > What interfaces would be needed for this to work precisely so we don't
> > have to play this game ever again?
> 
> Well there's also the 'sleep 45' below because we need to be certain that
> btrfs_drop_snapshot gets run. This was all a bit of a pain during debugging
> to be honest.

If you need to wait just for the quota scan, then use
'btrfs quota rescan -w' that will wait until the rescan finishes.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba July 22, 2014, 7:05 p.m. UTC | #11
On Thu, Jul 10, 2014 at 12:00:55PM -0700, Mark Fasheh wrote:
> On Thu, Jul 10, 2014 at 11:32:28AM -0700, Zach Brown wrote:
> > What interfaces would be needed for this to work precisely so we don't
> > have to play this game ever again?
> 
> Well there's also the 'sleep 45' below because we need to be certain that
> btrfs_drop_snapshot gets run. This was all a bit of a pain during debugging
> to be honest.
> 
> So in my experience, an interface to make debugging easier would involve
> running every delayed action in the file system to completion, including a
> sync of dirty blocks to disk. In theory, this would include any delayed
> actions that were kicked off as a result of the actions you are syncing.
> You'd do it all from a point in time of course so that we don't spin forever
> on a busy filesystem. I do not know whether this is feasible.
> 
> Given something like that, you'd just replace the calls to sleep with 'btrfs
> fi synctheworldandwait' and know that on return, the actions you just queued
> up completed.

Waiting until some subvolume gets completely removed needs some work. In
your case the cleaner thread sleeps and is woken up at the transaction
commit time. As there is no other activity on the filesystem this
happens at the periodic commit time, usually 30 seconds. 'sync' will not
help, because it needs a transaction in progress.

I have patchset in works that addresses a different problem but
introduces a functionality that keeps track of some global pending
actions. This could be easily enhanced to trigger the commit with sync
if there was a snapshot deletion since last commit regardless of a
transaction running status.

This still does not cover the part where we want a command that waits
until a given subvolume is completely removed, but I have a draft for
that as well.

Unfortunatelly until both parts are in place the sleep is the only
reliable way. Oh well.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba July 23, 2014, 12:30 p.m. UTC | #12
On Tue, Jul 22, 2014 at 09:05:32PM +0200, David Sterba wrote:
> On Thu, Jul 10, 2014 at 12:00:55PM -0700, Mark Fasheh wrote:
> > On Thu, Jul 10, 2014 at 11:32:28AM -0700, Zach Brown wrote:
> > > What interfaces would be needed for this to work precisely so we don't
> > > have to play this game ever again?
> > 
> > Well there's also the 'sleep 45' below because we need to be certain that
> > btrfs_drop_snapshot gets run. This was all a bit of a pain during debugging
> > to be honest.
> > 
> > So in my experience, an interface to make debugging easier would involve
> > running every delayed action in the file system to completion, including a
> > sync of dirty blocks to disk. In theory, this would include any delayed
> > actions that were kicked off as a result of the actions you are syncing.
> > You'd do it all from a point in time of course so that we don't spin forever
> > on a busy filesystem. I do not know whether this is feasible.
> > 
> > Given something like that, you'd just replace the calls to sleep with 'btrfs
> > fi synctheworldandwait' and know that on return, the actions you just queued
> > up completed.
> 
> Waiting until some subvolume gets completely removed needs some work. In
> your case the cleaner thread sleeps and is woken up at the transaction
> commit time. As there is no other activity on the filesystem this
> happens at the periodic commit time, usually 30 seconds. 'sync' will not
> help, because it needs a transaction in progress.
> 
> I have patchset in works that addresses a different problem but
> introduces a functionality that keeps track of some global pending
> actions. This could be easily enhanced to trigger the commit with sync
> if there was a snapshot deletion since last commit regardless of a
> transaction running status.
> 
> This still does not cover the part where we want a command that waits
> until a given subvolume is completely removed, but I have a draft for
> that as well.
> 
> Unfortunatelly until both parts are in place the sleep is the only
> reliable way. Oh well.

And it turned out to be much simpler, I was wrong about transaction
commit poking the cleaner thread. It's actually the transaction commit
thread that wakes up cleaner and that's why it's always at the periodic
commit time.

The proposed fix is to wake up transaction thread from the sync ioctl
(not the generic sync call through).

For the qgroup test needs it's enough to delete subvolumes, issue 'btrfs
fi sync' and wait until there are no subvols listed (btrfs list -d).

Without the fix, this will take up to 30 seconds, with the fix it's
immediate. The test script will not need any change.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Josef Bacik July 23, 2014, 1:25 p.m. UTC | #13
On 07/23/2014 08:30 AM, David Sterba wrote:
> On Tue, Jul 22, 2014 at 09:05:32PM +0200, David Sterba wrote:
>> On Thu, Jul 10, 2014 at 12:00:55PM -0700, Mark Fasheh wrote:
>>> On Thu, Jul 10, 2014 at 11:32:28AM -0700, Zach Brown wrote:
>>>> What interfaces would be needed for this to work precisely so we don't
>>>> have to play this game ever again?
>>>
>>> Well there's also the 'sleep 45' below because we need to be certain that
>>> btrfs_drop_snapshot gets run. This was all a bit of a pain during debugging
>>> to be honest.
>>>
>>> So in my experience, an interface to make debugging easier would involve
>>> running every delayed action in the file system to completion, including a
>>> sync of dirty blocks to disk. In theory, this would include any delayed
>>> actions that were kicked off as a result of the actions you are syncing.
>>> You'd do it all from a point in time of course so that we don't spin forever
>>> on a busy filesystem. I do not know whether this is feasible.
>>>
>>> Given something like that, you'd just replace the calls to sleep with 'btrfs
>>> fi synctheworldandwait' and know that on return, the actions you just queued
>>> up completed.
>>
>> Waiting until some subvolume gets completely removed needs some work. In
>> your case the cleaner thread sleeps and is woken up at the transaction
>> commit time. As there is no other activity on the filesystem this
>> happens at the periodic commit time, usually 30 seconds. 'sync' will not
>> help, because it needs a transaction in progress.
>>
>> I have patchset in works that addresses a different problem but
>> introduces a functionality that keeps track of some global pending
>> actions. This could be easily enhanced to trigger the commit with sync
>> if there was a snapshot deletion since last commit regardless of a
>> transaction running status.
>>
>> This still does not cover the part where we want a command that waits
>> until a given subvolume is completely removed, but I have a draft for
>> that as well.
>>
>> Unfortunatelly until both parts are in place the sleep is the only
>> reliable way. Oh well.
>
> And it turned out to be much simpler, I was wrong about transaction
> commit poking the cleaner thread. It's actually the transaction commit
> thread that wakes up cleaner and that's why it's always at the periodic
> commit time.
>
> The proposed fix is to wake up transaction thread from the sync ioctl
> (not the generic sync call through).
>
> For the qgroup test needs it's enough to delete subvolumes, issue 'btrfs
> fi sync' and wait until there are no subvols listed (btrfs list -d).
>
> Without the fix, this will take up to 30 seconds, with the fix it's
> immediate. The test script will not need any change.
>

I'll just add a sync flag to the snapshot creation and then Mark can 
redo this test to use the sync flag once that stuff is in.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/tests/btrfs/057 b/tests/btrfs/057
new file mode 100755
index 0000000..0aca2ca
--- /dev/null
+++ b/tests/btrfs/057
@@ -0,0 +1,117 @@ 
+#! /bin/bash
+# FS QA Test No. btrfs/057
+#
+# Test btrfs quota group consistency operations during snapshot
+# delete. Btrfs has had long standing issues with drop snapshot
+# failing to properly account for quota groups. This test crafts a
+# snapshot tree with shared and exclusive elements. The tree is
+# removed and then quota group consistency is checked.
+#
+# This issue is fixed by the linux kernel btrfs patch series:
+#
+#    [PATCH 0/3] btrfs: qgroup fixes for btrfs_drop_snapshot
+#    [PATCH 1/3] btrfs: add trace for qgroup accounting
+#    [PATCH 2/3] btrfs: qgroup: account shared subtrees during snapshot delete
+#    [PATCH 3/3] Btrfs: __btrfs_mod_ref should always use no_quota
+#
+#-----------------------------------------------------------------------
+# Copyright (C) 2014 SUSE Linux Products GmbH. All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+#-----------------------------------------------------------------------
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+tmp=/tmp/$$
+status=1	# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+	_cleanup_flakey
+	rm -fr $tmp
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/dmflakey
+
+# real QA test starts here
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_need_to_be_root
+
+rm -f $seqres.full
+
+_scratch_mkfs >/dev/null 2>&1
+_scratch_mount
+
+# This always reproduces level 1 trees
+maxfiles=100
+
+echo "create file set"
+
+# Make a bunch of small files in a directory. This is designed to expand
+# the filesystem tree to something more than zero levels.
+mkdir $SCRATCH_MNT/files
+for i in `seq -w 0 $maxfiles`;
+do
+    dd status=none if=/dev/zero of=$SCRATCH_MNT/files/file$i bs=4096 count=4
+done
+
+# create a snapshot of what we just did
+$BTRFS_UTIL_PROG fi sy $SCRATCH_MNT
+$BTRFS_UTIL_PROG su sna $SCRATCH_MNT $SCRATCH_MNT/snap1
+mv $SCRATCH_MNT/snap1/files $SCRATCH_MNT/snap1/old
+
+# same thing as before but on the snapshot. this way we can generate
+# some exclusively owned tree nodes.
+echo "create file set on snapshot"
+mkdir $SCRATCH_MNT/snap1/files
+for i in `seq -w 0 $maxfiles`;
+do
+    dd status=none if=/dev/zero of=$SCRATCH_MNT/snap1/files/file$i bs=4096 count=4
+done
+
+# Enable qgroups now that we have our filesystem prepared. This
+# will kick off a scan which we will have to wait for below.
+$BTRFS_UTIL_PROG qu en $SCRATCH_MNT
+sleep 30
+
+_scratch_unmount
+_scratch_mount
+
+# Ok, delete the snapshot we made previously. Since btrfs drop
+# snapshot is a delayed action with no way to force it, we have to
+# impose another sleep here.
+$BTRFS_UTIL_PROG su de $SCRATCH_MNT/snap1
+sleep 45
+
+_scratch_unmount
+
+# generate a qgroup report and look for inconsistent groups
+$BTRFS_UTIL_PROG check --qgroup-report $SCRATCH_DEV | grep -q "Counts for qgroup.*different"
+RETVAL=$?
+if [ $RETVAL -eq 0 ]; then
+    status=1
+fi
+
+status=0
+exit
diff --git a/tests/btrfs/057.out b/tests/btrfs/057.out
new file mode 100644
index 0000000..b8a146c
--- /dev/null
+++ b/tests/btrfs/057.out
@@ -0,0 +1,7 @@ 
+QA output created by 057
+create file set
+FSSync '/xfstest2'
+Create a snapshot of '/xfstest2' in '/xfstest2/snap1'
+create file set on snapshot
+Transaction commit: none (default)
+Delete subvolume '/xfstest2/snap1'
diff --git a/tests/btrfs/group b/tests/btrfs/group
index 2da7127..ebc38c5 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -59,3 +59,4 @@ 
 054 auto quick
 055 auto quick
 056 auto quick
+057 auto quick