fstests: btrfs/179 call quota rescan

Message ID	1581076895-6688-1-git-send-email-anand.jain@oracle.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=bDwJ=33=vger.kernel.org=fstests-owner@kernel.org> From: Anand Jain <anand.jain@oracle.com> To: fstests@vger.kernel.org Cc: linux-btrfs@vger.kernel.org Subject: [PATCH] fstests: btrfs/179 call quota rescan Date: Fri, 7 Feb 2020 20:01:35 +0800 Message-Id: <1581076895-6688-1-git-send-email-anand.jain@oracle.com> Sender: fstests-owner@vger.kernel.org Precedence: bulk
Series	fstests: btrfs/179 call quota rescan \| expand fstests: btrfs/179 call quota rescan

Anand Jain Feb. 7, 2020, 12:01 p.m. UTC

On some systems btrfs/179 fails as the check finds that there is
difference in the qgroup counts.

By the async nature of qgroup tree scan, the latest qgroup counts at the
time of umount might not be upto date, if it isn't then the check will
report the difference in count. The difference in qgroup counts are anyway
updated in the following mount, so it is not a real issue that this test
case is trying to verify. So make sure the qgroup counts are updated
before unmount happens and make the check happy.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 tests/btrfs/179 | 8 ++++++++
 1 file changed, 8 insertions(+)

Qu Wenruo Feb. 7, 2020, 12:15 p.m. UTC | #1

On 2020/2/7 下午8:01, Anand Jain wrote:
> On some systems btrfs/179 fails as the check finds that there is
> difference in the qgroup counts.
> 
> By the async nature of qgroup tree scan, the latest qgroup counts at the
> time of umount might not be upto date,

Yes, so far so good.

> if it isn't then the check will
> report the difference in count. The difference in qgroup counts are anyway
> updated in the following mount, so it is not a real issue that this test
> case is trying to verify.

No problem either.

> So make sure the qgroup counts are updated
> before unmount happens and make the check happy.

But the solution doesn't look correct to me.

We should either make btrfs-check to handle such half-dropped case
better, or find a way to wait for all subvolume drop to be finished in
test case.

Papering the test by rescan is not a good idea at all.
If one day we really hit some qgroup accounting problem, this papering
way could hugely reduce the coverage.

Thanks,
Qu

> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
>  tests/btrfs/179 | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/tests/btrfs/179 b/tests/btrfs/179
> index 4a24ea419a7e..74e91841eaa6 100755
> --- a/tests/btrfs/179
> +++ b/tests/btrfs/179
> @@ -109,6 +109,14 @@ wait $snapshot_pid
>  kill $delete_pid
>  wait $delete_pid
>  
> +# By the async nature of qgroup tree scan, the latest qgroup counts at the time
> +# of umount might not be upto date, if it isn't then the check will report the
> +# difference in count. The difference in qgroup counts are anyway updated in the
> +# following mount, so it is not a real issue that this test case is trying to
> +# verify. So make sure the qgroup counts are updated before unmount happens.
> +
> +$BTRFS_UTIL_PROG quota rescan -w $SCRATCH_MNT >> $seqres.full
> +
>  # success, all done
>  echo "Silence is golden"
>  
>

Anand Jain Feb. 7, 2020, 3:59 p.m. UTC | #2

On 7/2/20 8:15 PM, Qu Wenruo wrote:
> 
> 
> On 2020/2/7 下午8:01, Anand Jain wrote:
>> On some systems btrfs/179 fails as the check finds that there is
>> difference in the qgroup counts.
>>
>> By the async nature of qgroup tree scan, the latest qgroup counts at the
>> time of umount might not be upto date,
> 
> Yes, so far so good.
> 
>> if it isn't then the check will
>> report the difference in count. The difference in qgroup counts are anyway
>> updated in the following mount, so it is not a real issue that this test
>> case is trying to verify.
> 
> No problem either.
> 
>> So make sure the qgroup counts are updated
>> before unmount happens and make the check happy.
> 
> But the solution doesn't look correct to me.
> 
> We should either make btrfs-check to handle such half-dropped case
> better,

  Check is ok. The count as check counts matches with the count after 
the mount. So what is recorded in the qgroup is not upto date.

> or find a way to wait for all subvolume drop to be finished in
> test case.

Yes this is one way. Just wait for few seconds will do, test passes. Do 
you know any better way?

Thanks, Anand

> Papering the test by rescan is not a good idea at all.
> If one day we really hit some qgroup accounting problem, this papering
> way could hugely reduce the coverage.
> 


> Thanks,
> Qu
> 
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>>   tests/btrfs/179 | 8 ++++++++
>>   1 file changed, 8 insertions(+)
>>
>> diff --git a/tests/btrfs/179 b/tests/btrfs/179
>> index 4a24ea419a7e..74e91841eaa6 100755
>> --- a/tests/btrfs/179
>> +++ b/tests/btrfs/179
>> @@ -109,6 +109,14 @@ wait $snapshot_pid
>>   kill $delete_pid
>>   wait $delete_pid
>>   
>> +# By the async nature of qgroup tree scan, the latest qgroup counts at the time
>> +# of umount might not be upto date, if it isn't then the check will report the
>> +# difference in count. The difference in qgroup counts are anyway updated in the
>> +# following mount, so it is not a real issue that this test case is trying to
>> +# verify. So make sure the qgroup counts are updated before unmount happens.
>> +
>> +$BTRFS_UTIL_PROG quota rescan -w $SCRATCH_MNT >> $seqres.full
>> +
>>   # success, all done
>>   echo "Silence is golden"
>>   
>>
>

Qu Wenruo Feb. 7, 2020, 11:28 p.m. UTC | #3

On 2020/2/7 下午11:59, Anand Jain wrote:
> 
> 
> On 7/2/20 8:15 PM, Qu Wenruo wrote:
>>
>>
>> On 2020/2/7 下午8:01, Anand Jain wrote:
>>> On some systems btrfs/179 fails as the check finds that there is
>>> difference in the qgroup counts.
>>>
>>> By the async nature of qgroup tree scan, the latest qgroup counts at the
>>> time of umount might not be upto date,
>>
>> Yes, so far so good.
>>
>>> if it isn't then the check will
>>> report the difference in count. The difference in qgroup counts are
>>> anyway
>>> updated in the following mount, so it is not a real issue that this test
>>> case is trying to verify.
>>
>> No problem either.
>>
>>> So make sure the qgroup counts are updated
>>> before unmount happens and make the check happy.
>>
>> But the solution doesn't look correct to me.
>>
>> We should either make btrfs-check to handle such half-dropped case
>> better,
> 
>  Check is ok. The count as check counts matches with the count after the
> mount. So what is recorded in the qgroup is not upto date.

Nope. Qgroup records what's in commit tree. For unmounted fs, there is
no difference in commit tree and current tree.

Thus the qgroup scan in btrfs-progs is different from kernel.
Please go check how the btrfs-progs code to see how the difference comes.

> 
>> or find a way to wait for all subvolume drop to be finished in
>> test case.
> 
> Yes this is one way. Just wait for few seconds will do, test passes. Do
> you know any better way?

I didn't remember when, but it looks like `btrfs fi sync` used to wait
for snapshot drop.
But not now. If we have a way to wait for cleaner to finish, we can
solve it pretty easily.

Thanks,
Qu

> 
> Thanks, Anand
> 
>> Papering the test by rescan is not a good idea at all.
>> If one day we really hit some qgroup accounting problem, this papering
>> way could hugely reduce the coverage.
>>
> 
> 
>> Thanks,
>> Qu
>>
>>>
>>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>>> ---
>>>   tests/btrfs/179 | 8 ++++++++
>>>   1 file changed, 8 insertions(+)
>>>
>>> diff --git a/tests/btrfs/179 b/tests/btrfs/179
>>> index 4a24ea419a7e..74e91841eaa6 100755
>>> --- a/tests/btrfs/179
>>> +++ b/tests/btrfs/179
>>> @@ -109,6 +109,14 @@ wait $snapshot_pid
>>>   kill $delete_pid
>>>   wait $delete_pid
>>>   +# By the async nature of qgroup tree scan, the latest qgroup
>>> counts at the time
>>> +# of umount might not be upto date, if it isn't then the check will
>>> report the
>>> +# difference in count. The difference in qgroup counts are anyway
>>> updated in the
>>> +# following mount, so it is not a real issue that this test case is
>>> trying to
>>> +# verify. So make sure the qgroup counts are updated before unmount
>>> happens.
>>> +
>>> +$BTRFS_UTIL_PROG quota rescan -w $SCRATCH_MNT >> $seqres.full
>>> +
>>>   # success, all done
>>>   echo "Silence is golden"
>>>  
>>

Anand Jain Feb. 8, 2020, 9:06 a.m. UTC | #4

On 2/8/20 7:28 AM, Qu Wenruo wrote:
> 
> 
> On 2020/2/7 下午11:59, Anand Jain wrote:
>>
>>
>> On 7/2/20 8:15 PM, Qu Wenruo wrote:
>>>
>>>
>>> On 2020/2/7 下午8:01, Anand Jain wrote:
>>>> On some systems btrfs/179 fails as the check finds that there is
>>>> difference in the qgroup counts.
>>>>
>>>> By the async nature of qgroup tree scan, the latest qgroup counts at the
>>>> time of umount might not be upto date,
>>>
>>> Yes, so far so good.
>>>
>>>> if it isn't then the check will
>>>> report the difference in count. The difference in qgroup counts are
>>>> anyway
>>>> updated in the following mount, so it is not a real issue that this test
>>>> case is trying to verify.
>>>
>>> No problem either.
>>>
>>>> So make sure the qgroup counts are updated
>>>> before unmount happens and make the check happy.
>>>
>>> But the solution doesn't look correct to me.
>>>
>>> We should either make btrfs-check to handle such half-dropped case
>>> better,
>>
>>   Check is ok. The count as check counts matches with the count after the
>> mount. So what is recorded in the qgroup is not upto date.
> 
> Nope. Qgroup records what's in commit tree. For unmounted fs, there is
> no difference in commit tree and current tree.
> 
> Thus the qgroup scan in btrfs-progs is different from kernel.
> Please go check how the btrfs-progs code to see how the difference comes.
> 
>>
>>> or find a way to wait for all subvolume drop to be finished in
>>> test case.
>>
>> Yes this is one way. Just wait for few seconds will do, test passes. Do
>> you know any better way?
> 
> I didn't remember when, but it looks like `btrfs fi sync` used to wait
> for snapshot drop.
> But not now. If we have a way to wait for cleaner to finish, we can
> solve it pretty easily.

A sleep at the end of the test case also makes it count consistent.
As the intention of the test case is to test for the hang, so sleep 5
at the end of the test case is reasonable.

Thanks, Anand

> Thanks,
> Qu
> 
>>
>> Thanks, Anand
>>
>>> Papering the test by rescan is not a good idea at all.
>>> If one day we really hit some qgroup accounting problem, this papering
>>> way could hugely reduce the coverage.
>>>
>>
>>
>>> Thanks,
>>> Qu
>>>
>>>>
>>>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>>>> ---
>>>>    tests/btrfs/179 | 8 ++++++++
>>>>    1 file changed, 8 insertions(+)
>>>>
>>>> diff --git a/tests/btrfs/179 b/tests/btrfs/179
>>>> index 4a24ea419a7e..74e91841eaa6 100755
>>>> --- a/tests/btrfs/179
>>>> +++ b/tests/btrfs/179
>>>> @@ -109,6 +109,14 @@ wait $snapshot_pid
>>>>    kill $delete_pid
>>>>    wait $delete_pid
>>>>    +# By the async nature of qgroup tree scan, the latest qgroup
>>>> counts at the time
>>>> +# of umount might not be upto date, if it isn't then the check will
>>>> report the
>>>> +# difference in count. The difference in qgroup counts are anyway
>>>> updated in the
>>>> +# following mount, so it is not a real issue that this test case is
>>>> trying to
>>>> +# verify. So make sure the qgroup counts are updated before unmount
>>>> happens.
>>>> +
>>>> +$BTRFS_UTIL_PROG quota rescan -w $SCRATCH_MNT >> $seqres.full
>>>> +
>>>>    # success, all done
>>>>    echo "Silence is golden"
>>>>   
>>>
>

Qu Wenruo Feb. 10, 2020, 1:36 a.m. UTC | #5

On 2020/2/8 下午5:06, Anand Jain wrote:
> 
> 
> On 2/8/20 7:28 AM, Qu Wenruo wrote:
>>
>>
>> On 2020/2/7 下午11:59, Anand Jain wrote:
>>>
>>>
>>> On 7/2/20 8:15 PM, Qu Wenruo wrote:
>>>>
>>>>
>>>> On 2020/2/7 下午8:01, Anand Jain wrote:
>>>>> On some systems btrfs/179 fails as the check finds that there is
>>>>> difference in the qgroup counts.
>>>>>
>>>>> By the async nature of qgroup tree scan, the latest qgroup counts
>>>>> at the
>>>>> time of umount might not be upto date,
>>>>
>>>> Yes, so far so good.
>>>>
>>>>> if it isn't then the check will
>>>>> report the difference in count. The difference in qgroup counts are
>>>>> anyway
>>>>> updated in the following mount, so it is not a real issue that this
>>>>> test
>>>>> case is trying to verify.
>>>>
>>>> No problem either.
>>>>
>>>>> So make sure the qgroup counts are updated
>>>>> before unmount happens and make the check happy.
>>>>
>>>> But the solution doesn't look correct to me.
>>>>
>>>> We should either make btrfs-check to handle such half-dropped case
>>>> better,
>>>
>>>   Check is ok. The count as check counts matches with the count after
>>> the
>>> mount. So what is recorded in the qgroup is not upto date.
>>
>> Nope. Qgroup records what's in commit tree. For unmounted fs, there is
>> no difference in commit tree and current tree.
>>
>> Thus the qgroup scan in btrfs-progs is different from kernel.
>> Please go check how the btrfs-progs code to see how the difference comes.
>>
>>>
>>>> or find a way to wait for all subvolume drop to be finished in
>>>> test case.
>>>
>>> Yes this is one way. Just wait for few seconds will do, test passes. Do
>>> you know any better way?
>>
>> I didn't remember when, but it looks like `btrfs fi sync` used to wait
>> for snapshot drop.
>> But not now. If we have a way to wait for cleaner to finish, we can
>> solve it pretty easily.
> 
> A sleep at the end of the test case also makes it count consistent.
> As the intention of the test case is to test for the hang, so sleep 5
> at the end of the test case is reasonable.

That looks like a valid workaround.

Although the immediate number 5 looks no that generic for all test
environments.

I really hope to find a stable way to wait for all subvolume drops other
than rely on some hard coded numbers.

Thanks,
Qu

> 
> Thanks, Anand
> 
>> Thanks,
>> Qu
>>
>>>
>>> Thanks, Anand
>>>
>>>> Papering the test by rescan is not a good idea at all.
>>>> If one day we really hit some qgroup accounting problem, this papering
>>>> way could hugely reduce the coverage.
>>>>
>>>
>>>
>>>> Thanks,
>>>> Qu
>>>>
>>>>>
>>>>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>>>>> ---
>>>>>    tests/btrfs/179 | 8 ++++++++
>>>>>    1 file changed, 8 insertions(+)
>>>>>
>>>>> diff --git a/tests/btrfs/179 b/tests/btrfs/179
>>>>> index 4a24ea419a7e..74e91841eaa6 100755
>>>>> --- a/tests/btrfs/179
>>>>> +++ b/tests/btrfs/179
>>>>> @@ -109,6 +109,14 @@ wait $snapshot_pid
>>>>>    kill $delete_pid
>>>>>    wait $delete_pid
>>>>>    +# By the async nature of qgroup tree scan, the latest qgroup
>>>>> counts at the time
>>>>> +# of umount might not be upto date, if it isn't then the check will
>>>>> report the
>>>>> +# difference in count. The difference in qgroup counts are anyway
>>>>> updated in the
>>>>> +# following mount, so it is not a real issue that this test case is
>>>>> trying to
>>>>> +# verify. So make sure the qgroup counts are updated before unmount
>>>>> happens.
>>>>> +
>>>>> +$BTRFS_UTIL_PROG quota rescan -w $SCRATCH_MNT >> $seqres.full
>>>>> +
>>>>>    # success, all done
>>>>>    echo "Silence is golden"
>>>>>   
>>>>
>>

Nikolay Borisov Feb. 10, 2020, 7:45 a.m. UTC | #6

On 10.02.20 г. 3:36 ч., Qu Wenruo wrote:
> 
> 
> On 2020/2/8 下午5:06, Anand Jain wrote:
>>
>>
>> On 2/8/20 7:28 AM, Qu Wenruo wrote:
>>>
>>>
>>> On 2020/2/7 下午11:59, Anand Jain wrote:
>>>>
>>>>
>>>> On 7/2/20 8:15 PM, Qu Wenruo wrote:
>>>>>
>>>>>
>>>>> On 2020/2/7 下午8:01, Anand Jain wrote:
>>>>>> On some systems btrfs/179 fails as the check finds that there is
>>>>>> difference in the qgroup counts.
>>>>>>
>>>>>> By the async nature of qgroup tree scan, the latest qgroup counts
>>>>>> at the
>>>>>> time of umount might not be upto date,
>>>>>
>>>>> Yes, so far so good.
>>>>>
>>>>>> if it isn't then the check will
>>>>>> report the difference in count. The difference in qgroup counts are
>>>>>> anyway
>>>>>> updated in the following mount, so it is not a real issue that this
>>>>>> test
>>>>>> case is trying to verify.
>>>>>
>>>>> No problem either.
>>>>>
>>>>>> So make sure the qgroup counts are updated
>>>>>> before unmount happens and make the check happy.
>>>>>
>>>>> But the solution doesn't look correct to me.
>>>>>
>>>>> We should either make btrfs-check to handle such half-dropped case
>>>>> better,
>>>>
>>>>   Check is ok. The count as check counts matches with the count after
>>>> the
>>>> mount. So what is recorded in the qgroup is not upto date.
>>>
>>> Nope. Qgroup records what's in commit tree. For unmounted fs, there is
>>> no difference in commit tree and current tree.
>>>
>>> Thus the qgroup scan in btrfs-progs is different from kernel.
>>> Please go check how the btrfs-progs code to see how the difference comes.
>>>
>>>>
>>>>> or find a way to wait for all subvolume drop to be finished in
>>>>> test case.
>>>>
>>>> Yes this is one way. Just wait for few seconds will do, test passes. Do
>>>> you know any better way?
>>>
>>> I didn't remember when, but it looks like `btrfs fi sync` used to wait
>>> for snapshot drop.
>>> But not now. If we have a way to wait for cleaner to finish, we can
>>> solve it pretty easily.
>>
>> A sleep at the end of the test case also makes it count consistent.
>> As the intention of the test case is to test for the hang, so sleep 5
>> at the end of the test case is reasonable.
> 
> That looks like a valid workaround.
> 
> Although the immediate number 5 looks no that generic for all test
> environments.
> 
> I really hope to find a stable way to wait for all subvolume drops other
> than rely on some hard coded numbers.

 what about btrfs filesystem sync?


<snip>

Qu Wenruo Feb. 10, 2020, 7:55 a.m. UTC | #7

On 2020/2/10 下午3:45, Nikolay Borisov wrote:
>
>
> On 10.02.20 г. 3:36 ч., Qu Wenruo wrote:
>>
>>
>> On 2020/2/8 下午5:06, Anand Jain wrote:
>>>
>>>
>>> On 2/8/20 7:28 AM, Qu Wenruo wrote:
>>>>
>>>>
>>>> On 2020/2/7 下午11:59, Anand Jain wrote:
>>>>>
>>>>>
>>>>> On 7/2/20 8:15 PM, Qu Wenruo wrote:
>>>>>>
>>>>>>
>>>>>> On 2020/2/7 下午8:01, Anand Jain wrote:
>>>>>>> On some systems btrfs/179 fails as the check finds that there is
>>>>>>> difference in the qgroup counts.
>>>>>>>
>>>>>>> By the async nature of qgroup tree scan, the latest qgroup counts
>>>>>>> at the
>>>>>>> time of umount might not be upto date,
>>>>>>
>>>>>> Yes, so far so good.
>>>>>>
>>>>>>> if it isn't then the check will
>>>>>>> report the difference in count. The difference in qgroup counts are
>>>>>>> anyway
>>>>>>> updated in the following mount, so it is not a real issue that this
>>>>>>> test
>>>>>>> case is trying to verify.
>>>>>>
>>>>>> No problem either.
>>>>>>
>>>>>>> So make sure the qgroup counts are updated
>>>>>>> before unmount happens and make the check happy.
>>>>>>
>>>>>> But the solution doesn't look correct to me.
>>>>>>
>>>>>> We should either make btrfs-check to handle such half-dropped case
>>>>>> better,
>>>>>
>>>>>   Check is ok. The count as check counts matches with the count after
>>>>> the
>>>>> mount. So what is recorded in the qgroup is not upto date.
>>>>
>>>> Nope. Qgroup records what's in commit tree. For unmounted fs, there is
>>>> no difference in commit tree and current tree.
>>>>
>>>> Thus the qgroup scan in btrfs-progs is different from kernel.
>>>> Please go check how the btrfs-progs code to see how the difference comes.
>>>>
>>>>>
>>>>>> or find a way to wait for all subvolume drop to be finished in
>>>>>> test case.
>>>>>
>>>>> Yes this is one way. Just wait for few seconds will do, test passes. Do
>>>>> you know any better way?
>>>>
>>>> I didn't remember when, but it looks like `btrfs fi sync` used to wait
>>>> for snapshot drop.
>>>> But not now. If we have a way to wait for cleaner to finish, we can
>>>> solve it pretty easily.
>>>
>>> A sleep at the end of the test case also makes it count consistent.
>>> As the intention of the test case is to test for the hang, so sleep 5
>>> at the end of the test case is reasonable.
>>
>> That looks like a valid workaround.
>>
>> Although the immediate number 5 looks no that generic for all test
>> environments.
>>
>> I really hope to find a stable way to wait for all subvolume drops other
>> than rely on some hard coded numbers.
>
>  what about btrfs filesystem sync?

The only cleaner related work of that ioctl is waking up
transaction_kthread, which will also wake up cleaner_kthread.

It triggers clean up, but not wait for it.

And my first run of such added fi sync failed too, so not good enough I
guess.

Thanks,
Qu


>
>
> <snip>
>

Qu Wenruo Feb. 10, 2020, 8:47 a.m. UTC | #8

On 2020/2/10 下午3:55, Qu Wenruo wrote:

>>>
>>> That looks like a valid workaround.
>>>
>>> Although the immediate number 5 looks no that generic for all test
>>> environments.
>>>
>>> I really hope to find a stable way to wait for all subvolume drops other
>>> than rely on some hard coded numbers.
>>
>>  what about btrfs filesystem sync?
>
> The only cleaner related work of that ioctl is waking up
> transaction_kthread, which will also wake up cleaner_kthread.
>
> It triggers clean up, but not wait for it.
>
> And my first run of such added fi sync failed too, so not good enough I
> guess.

Although 'fi sync' does nothing better than vanilla sync, Nikolay also
mentioned about 'subv sync', which does the trick!

Nikolay rocks!

That would be the proper way to solve the problem.
And it's time to update the man page of `btrfs-filesystme`.

Thanks,
Qu

>
> Thanks,
> Qu
>
>
>>
>>
>> <snip>
>>

fstests: btrfs/179 call quota rescan

Commit Message

Comments

Patch