Message ID | 1581076895-6688-1-git-send-email-anand.jain@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fstests: btrfs/179 call quota rescan | expand |
On 2020/2/7 下午8:01, Anand Jain wrote: > On some systems btrfs/179 fails as the check finds that there is > difference in the qgroup counts. > > By the async nature of qgroup tree scan, the latest qgroup counts at the > time of umount might not be upto date, Yes, so far so good. > if it isn't then the check will > report the difference in count. The difference in qgroup counts are anyway > updated in the following mount, so it is not a real issue that this test > case is trying to verify. No problem either. > So make sure the qgroup counts are updated > before unmount happens and make the check happy. But the solution doesn't look correct to me. We should either make btrfs-check to handle such half-dropped case better, or find a way to wait for all subvolume drop to be finished in test case. Papering the test by rescan is not a good idea at all. If one day we really hit some qgroup accounting problem, this papering way could hugely reduce the coverage. Thanks, Qu > > Signed-off-by: Anand Jain <anand.jain@oracle.com> > --- > tests/btrfs/179 | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/tests/btrfs/179 b/tests/btrfs/179 > index 4a24ea419a7e..74e91841eaa6 100755 > --- a/tests/btrfs/179 > +++ b/tests/btrfs/179 > @@ -109,6 +109,14 @@ wait $snapshot_pid > kill $delete_pid > wait $delete_pid > > +# By the async nature of qgroup tree scan, the latest qgroup counts at the time > +# of umount might not be upto date, if it isn't then the check will report the > +# difference in count. The difference in qgroup counts are anyway updated in the > +# following mount, so it is not a real issue that this test case is trying to > +# verify. So make sure the qgroup counts are updated before unmount happens. > + > +$BTRFS_UTIL_PROG quota rescan -w $SCRATCH_MNT >> $seqres.full > + > # success, all done > echo "Silence is golden" > >
On 7/2/20 8:15 PM, Qu Wenruo wrote: > > > On 2020/2/7 下午8:01, Anand Jain wrote: >> On some systems btrfs/179 fails as the check finds that there is >> difference in the qgroup counts. >> >> By the async nature of qgroup tree scan, the latest qgroup counts at the >> time of umount might not be upto date, > > Yes, so far so good. > >> if it isn't then the check will >> report the difference in count. The difference in qgroup counts are anyway >> updated in the following mount, so it is not a real issue that this test >> case is trying to verify. > > No problem either. > >> So make sure the qgroup counts are updated >> before unmount happens and make the check happy. > > But the solution doesn't look correct to me. > > We should either make btrfs-check to handle such half-dropped case > better, Check is ok. The count as check counts matches with the count after the mount. So what is recorded in the qgroup is not upto date. > or find a way to wait for all subvolume drop to be finished in > test case. Yes this is one way. Just wait for few seconds will do, test passes. Do you know any better way? Thanks, Anand > Papering the test by rescan is not a good idea at all. > If one day we really hit some qgroup accounting problem, this papering > way could hugely reduce the coverage. > > Thanks, > Qu > >> >> Signed-off-by: Anand Jain <anand.jain@oracle.com> >> --- >> tests/btrfs/179 | 8 ++++++++ >> 1 file changed, 8 insertions(+) >> >> diff --git a/tests/btrfs/179 b/tests/btrfs/179 >> index 4a24ea419a7e..74e91841eaa6 100755 >> --- a/tests/btrfs/179 >> +++ b/tests/btrfs/179 >> @@ -109,6 +109,14 @@ wait $snapshot_pid >> kill $delete_pid >> wait $delete_pid >> >> +# By the async nature of qgroup tree scan, the latest qgroup counts at the time >> +# of umount might not be upto date, if it isn't then the check will report the >> +# difference in count. The difference in qgroup counts are anyway updated in the >> +# following mount, so it is not a real issue that this test case is trying to >> +# verify. So make sure the qgroup counts are updated before unmount happens. >> + >> +$BTRFS_UTIL_PROG quota rescan -w $SCRATCH_MNT >> $seqres.full >> + >> # success, all done >> echo "Silence is golden" >> >> >
On 2020/2/7 下午11:59, Anand Jain wrote: > > > On 7/2/20 8:15 PM, Qu Wenruo wrote: >> >> >> On 2020/2/7 下午8:01, Anand Jain wrote: >>> On some systems btrfs/179 fails as the check finds that there is >>> difference in the qgroup counts. >>> >>> By the async nature of qgroup tree scan, the latest qgroup counts at the >>> time of umount might not be upto date, >> >> Yes, so far so good. >> >>> if it isn't then the check will >>> report the difference in count. The difference in qgroup counts are >>> anyway >>> updated in the following mount, so it is not a real issue that this test >>> case is trying to verify. >> >> No problem either. >> >>> So make sure the qgroup counts are updated >>> before unmount happens and make the check happy. >> >> But the solution doesn't look correct to me. >> >> We should either make btrfs-check to handle such half-dropped case >> better, > > Check is ok. The count as check counts matches with the count after the > mount. So what is recorded in the qgroup is not upto date. Nope. Qgroup records what's in commit tree. For unmounted fs, there is no difference in commit tree and current tree. Thus the qgroup scan in btrfs-progs is different from kernel. Please go check how the btrfs-progs code to see how the difference comes. > >> or find a way to wait for all subvolume drop to be finished in >> test case. > > Yes this is one way. Just wait for few seconds will do, test passes. Do > you know any better way? I didn't remember when, but it looks like `btrfs fi sync` used to wait for snapshot drop. But not now. If we have a way to wait for cleaner to finish, we can solve it pretty easily. Thanks, Qu > > Thanks, Anand > >> Papering the test by rescan is not a good idea at all. >> If one day we really hit some qgroup accounting problem, this papering >> way could hugely reduce the coverage. >> > > >> Thanks, >> Qu >> >>> >>> Signed-off-by: Anand Jain <anand.jain@oracle.com> >>> --- >>> tests/btrfs/179 | 8 ++++++++ >>> 1 file changed, 8 insertions(+) >>> >>> diff --git a/tests/btrfs/179 b/tests/btrfs/179 >>> index 4a24ea419a7e..74e91841eaa6 100755 >>> --- a/tests/btrfs/179 >>> +++ b/tests/btrfs/179 >>> @@ -109,6 +109,14 @@ wait $snapshot_pid >>> kill $delete_pid >>> wait $delete_pid >>> +# By the async nature of qgroup tree scan, the latest qgroup >>> counts at the time >>> +# of umount might not be upto date, if it isn't then the check will >>> report the >>> +# difference in count. The difference in qgroup counts are anyway >>> updated in the >>> +# following mount, so it is not a real issue that this test case is >>> trying to >>> +# verify. So make sure the qgroup counts are updated before unmount >>> happens. >>> + >>> +$BTRFS_UTIL_PROG quota rescan -w $SCRATCH_MNT >> $seqres.full >>> + >>> # success, all done >>> echo "Silence is golden" >>> >>
On 2/8/20 7:28 AM, Qu Wenruo wrote: > > > On 2020/2/7 下午11:59, Anand Jain wrote: >> >> >> On 7/2/20 8:15 PM, Qu Wenruo wrote: >>> >>> >>> On 2020/2/7 下午8:01, Anand Jain wrote: >>>> On some systems btrfs/179 fails as the check finds that there is >>>> difference in the qgroup counts. >>>> >>>> By the async nature of qgroup tree scan, the latest qgroup counts at the >>>> time of umount might not be upto date, >>> >>> Yes, so far so good. >>> >>>> if it isn't then the check will >>>> report the difference in count. The difference in qgroup counts are >>>> anyway >>>> updated in the following mount, so it is not a real issue that this test >>>> case is trying to verify. >>> >>> No problem either. >>> >>>> So make sure the qgroup counts are updated >>>> before unmount happens and make the check happy. >>> >>> But the solution doesn't look correct to me. >>> >>> We should either make btrfs-check to handle such half-dropped case >>> better, >> >> Check is ok. The count as check counts matches with the count after the >> mount. So what is recorded in the qgroup is not upto date. > > Nope. Qgroup records what's in commit tree. For unmounted fs, there is > no difference in commit tree and current tree. > > Thus the qgroup scan in btrfs-progs is different from kernel. > Please go check how the btrfs-progs code to see how the difference comes. > >> >>> or find a way to wait for all subvolume drop to be finished in >>> test case. >> >> Yes this is one way. Just wait for few seconds will do, test passes. Do >> you know any better way? > > I didn't remember when, but it looks like `btrfs fi sync` used to wait > for snapshot drop. > But not now. If we have a way to wait for cleaner to finish, we can > solve it pretty easily. A sleep at the end of the test case also makes it count consistent. As the intention of the test case is to test for the hang, so sleep 5 at the end of the test case is reasonable. Thanks, Anand > Thanks, > Qu > >> >> Thanks, Anand >> >>> Papering the test by rescan is not a good idea at all. >>> If one day we really hit some qgroup accounting problem, this papering >>> way could hugely reduce the coverage. >>> >> >> >>> Thanks, >>> Qu >>> >>>> >>>> Signed-off-by: Anand Jain <anand.jain@oracle.com> >>>> --- >>>> tests/btrfs/179 | 8 ++++++++ >>>> 1 file changed, 8 insertions(+) >>>> >>>> diff --git a/tests/btrfs/179 b/tests/btrfs/179 >>>> index 4a24ea419a7e..74e91841eaa6 100755 >>>> --- a/tests/btrfs/179 >>>> +++ b/tests/btrfs/179 >>>> @@ -109,6 +109,14 @@ wait $snapshot_pid >>>> kill $delete_pid >>>> wait $delete_pid >>>> +# By the async nature of qgroup tree scan, the latest qgroup >>>> counts at the time >>>> +# of umount might not be upto date, if it isn't then the check will >>>> report the >>>> +# difference in count. The difference in qgroup counts are anyway >>>> updated in the >>>> +# following mount, so it is not a real issue that this test case is >>>> trying to >>>> +# verify. So make sure the qgroup counts are updated before unmount >>>> happens. >>>> + >>>> +$BTRFS_UTIL_PROG quota rescan -w $SCRATCH_MNT >> $seqres.full >>>> + >>>> # success, all done >>>> echo "Silence is golden" >>>> >>> >
On 2020/2/8 下午5:06, Anand Jain wrote: > > > On 2/8/20 7:28 AM, Qu Wenruo wrote: >> >> >> On 2020/2/7 下午11:59, Anand Jain wrote: >>> >>> >>> On 7/2/20 8:15 PM, Qu Wenruo wrote: >>>> >>>> >>>> On 2020/2/7 下午8:01, Anand Jain wrote: >>>>> On some systems btrfs/179 fails as the check finds that there is >>>>> difference in the qgroup counts. >>>>> >>>>> By the async nature of qgroup tree scan, the latest qgroup counts >>>>> at the >>>>> time of umount might not be upto date, >>>> >>>> Yes, so far so good. >>>> >>>>> if it isn't then the check will >>>>> report the difference in count. The difference in qgroup counts are >>>>> anyway >>>>> updated in the following mount, so it is not a real issue that this >>>>> test >>>>> case is trying to verify. >>>> >>>> No problem either. >>>> >>>>> So make sure the qgroup counts are updated >>>>> before unmount happens and make the check happy. >>>> >>>> But the solution doesn't look correct to me. >>>> >>>> We should either make btrfs-check to handle such half-dropped case >>>> better, >>> >>> Check is ok. The count as check counts matches with the count after >>> the >>> mount. So what is recorded in the qgroup is not upto date. >> >> Nope. Qgroup records what's in commit tree. For unmounted fs, there is >> no difference in commit tree and current tree. >> >> Thus the qgroup scan in btrfs-progs is different from kernel. >> Please go check how the btrfs-progs code to see how the difference comes. >> >>> >>>> or find a way to wait for all subvolume drop to be finished in >>>> test case. >>> >>> Yes this is one way. Just wait for few seconds will do, test passes. Do >>> you know any better way? >> >> I didn't remember when, but it looks like `btrfs fi sync` used to wait >> for snapshot drop. >> But not now. If we have a way to wait for cleaner to finish, we can >> solve it pretty easily. > > A sleep at the end of the test case also makes it count consistent. > As the intention of the test case is to test for the hang, so sleep 5 > at the end of the test case is reasonable. That looks like a valid workaround. Although the immediate number 5 looks no that generic for all test environments. I really hope to find a stable way to wait for all subvolume drops other than rely on some hard coded numbers. Thanks, Qu > > Thanks, Anand > >> Thanks, >> Qu >> >>> >>> Thanks, Anand >>> >>>> Papering the test by rescan is not a good idea at all. >>>> If one day we really hit some qgroup accounting problem, this papering >>>> way could hugely reduce the coverage. >>>> >>> >>> >>>> Thanks, >>>> Qu >>>> >>>>> >>>>> Signed-off-by: Anand Jain <anand.jain@oracle.com> >>>>> --- >>>>> tests/btrfs/179 | 8 ++++++++ >>>>> 1 file changed, 8 insertions(+) >>>>> >>>>> diff --git a/tests/btrfs/179 b/tests/btrfs/179 >>>>> index 4a24ea419a7e..74e91841eaa6 100755 >>>>> --- a/tests/btrfs/179 >>>>> +++ b/tests/btrfs/179 >>>>> @@ -109,6 +109,14 @@ wait $snapshot_pid >>>>> kill $delete_pid >>>>> wait $delete_pid >>>>> +# By the async nature of qgroup tree scan, the latest qgroup >>>>> counts at the time >>>>> +# of umount might not be upto date, if it isn't then the check will >>>>> report the >>>>> +# difference in count. The difference in qgroup counts are anyway >>>>> updated in the >>>>> +# following mount, so it is not a real issue that this test case is >>>>> trying to >>>>> +# verify. So make sure the qgroup counts are updated before unmount >>>>> happens. >>>>> + >>>>> +$BTRFS_UTIL_PROG quota rescan -w $SCRATCH_MNT >> $seqres.full >>>>> + >>>>> # success, all done >>>>> echo "Silence is golden" >>>>> >>>> >>
On 10.02.20 г. 3:36 ч., Qu Wenruo wrote: > > > On 2020/2/8 下午5:06, Anand Jain wrote: >> >> >> On 2/8/20 7:28 AM, Qu Wenruo wrote: >>> >>> >>> On 2020/2/7 下午11:59, Anand Jain wrote: >>>> >>>> >>>> On 7/2/20 8:15 PM, Qu Wenruo wrote: >>>>> >>>>> >>>>> On 2020/2/7 下午8:01, Anand Jain wrote: >>>>>> On some systems btrfs/179 fails as the check finds that there is >>>>>> difference in the qgroup counts. >>>>>> >>>>>> By the async nature of qgroup tree scan, the latest qgroup counts >>>>>> at the >>>>>> time of umount might not be upto date, >>>>> >>>>> Yes, so far so good. >>>>> >>>>>> if it isn't then the check will >>>>>> report the difference in count. The difference in qgroup counts are >>>>>> anyway >>>>>> updated in the following mount, so it is not a real issue that this >>>>>> test >>>>>> case is trying to verify. >>>>> >>>>> No problem either. >>>>> >>>>>> So make sure the qgroup counts are updated >>>>>> before unmount happens and make the check happy. >>>>> >>>>> But the solution doesn't look correct to me. >>>>> >>>>> We should either make btrfs-check to handle such half-dropped case >>>>> better, >>>> >>>> Check is ok. The count as check counts matches with the count after >>>> the >>>> mount. So what is recorded in the qgroup is not upto date. >>> >>> Nope. Qgroup records what's in commit tree. For unmounted fs, there is >>> no difference in commit tree and current tree. >>> >>> Thus the qgroup scan in btrfs-progs is different from kernel. >>> Please go check how the btrfs-progs code to see how the difference comes. >>> >>>> >>>>> or find a way to wait for all subvolume drop to be finished in >>>>> test case. >>>> >>>> Yes this is one way. Just wait for few seconds will do, test passes. Do >>>> you know any better way? >>> >>> I didn't remember when, but it looks like `btrfs fi sync` used to wait >>> for snapshot drop. >>> But not now. If we have a way to wait for cleaner to finish, we can >>> solve it pretty easily. >> >> A sleep at the end of the test case also makes it count consistent. >> As the intention of the test case is to test for the hang, so sleep 5 >> at the end of the test case is reasonable. > > That looks like a valid workaround. > > Although the immediate number 5 looks no that generic for all test > environments. > > I really hope to find a stable way to wait for all subvolume drops other > than rely on some hard coded numbers. what about btrfs filesystem sync? <snip>
On 2020/2/10 下午3:45, Nikolay Borisov wrote: > > > On 10.02.20 г. 3:36 ч., Qu Wenruo wrote: >> >> >> On 2020/2/8 下午5:06, Anand Jain wrote: >>> >>> >>> On 2/8/20 7:28 AM, Qu Wenruo wrote: >>>> >>>> >>>> On 2020/2/7 下午11:59, Anand Jain wrote: >>>>> >>>>> >>>>> On 7/2/20 8:15 PM, Qu Wenruo wrote: >>>>>> >>>>>> >>>>>> On 2020/2/7 下午8:01, Anand Jain wrote: >>>>>>> On some systems btrfs/179 fails as the check finds that there is >>>>>>> difference in the qgroup counts. >>>>>>> >>>>>>> By the async nature of qgroup tree scan, the latest qgroup counts >>>>>>> at the >>>>>>> time of umount might not be upto date, >>>>>> >>>>>> Yes, so far so good. >>>>>> >>>>>>> if it isn't then the check will >>>>>>> report the difference in count. The difference in qgroup counts are >>>>>>> anyway >>>>>>> updated in the following mount, so it is not a real issue that this >>>>>>> test >>>>>>> case is trying to verify. >>>>>> >>>>>> No problem either. >>>>>> >>>>>>> So make sure the qgroup counts are updated >>>>>>> before unmount happens and make the check happy. >>>>>> >>>>>> But the solution doesn't look correct to me. >>>>>> >>>>>> We should either make btrfs-check to handle such half-dropped case >>>>>> better, >>>>> >>>>> Check is ok. The count as check counts matches with the count after >>>>> the >>>>> mount. So what is recorded in the qgroup is not upto date. >>>> >>>> Nope. Qgroup records what's in commit tree. For unmounted fs, there is >>>> no difference in commit tree and current tree. >>>> >>>> Thus the qgroup scan in btrfs-progs is different from kernel. >>>> Please go check how the btrfs-progs code to see how the difference comes. >>>> >>>>> >>>>>> or find a way to wait for all subvolume drop to be finished in >>>>>> test case. >>>>> >>>>> Yes this is one way. Just wait for few seconds will do, test passes. Do >>>>> you know any better way? >>>> >>>> I didn't remember when, but it looks like `btrfs fi sync` used to wait >>>> for snapshot drop. >>>> But not now. If we have a way to wait for cleaner to finish, we can >>>> solve it pretty easily. >>> >>> A sleep at the end of the test case also makes it count consistent. >>> As the intention of the test case is to test for the hang, so sleep 5 >>> at the end of the test case is reasonable. >> >> That looks like a valid workaround. >> >> Although the immediate number 5 looks no that generic for all test >> environments. >> >> I really hope to find a stable way to wait for all subvolume drops other >> than rely on some hard coded numbers. > > what about btrfs filesystem sync? The only cleaner related work of that ioctl is waking up transaction_kthread, which will also wake up cleaner_kthread. It triggers clean up, but not wait for it. And my first run of such added fi sync failed too, so not good enough I guess. Thanks, Qu > > > <snip> >
On 2020/2/10 下午3:55, Qu Wenruo wrote: >>> >>> That looks like a valid workaround. >>> >>> Although the immediate number 5 looks no that generic for all test >>> environments. >>> >>> I really hope to find a stable way to wait for all subvolume drops other >>> than rely on some hard coded numbers. >> >> what about btrfs filesystem sync? > > The only cleaner related work of that ioctl is waking up > transaction_kthread, which will also wake up cleaner_kthread. > > It triggers clean up, but not wait for it. > > And my first run of such added fi sync failed too, so not good enough I > guess. Although 'fi sync' does nothing better than vanilla sync, Nikolay also mentioned about 'subv sync', which does the trick! Nikolay rocks! That would be the proper way to solve the problem. And it's time to update the man page of `btrfs-filesystme`. Thanks, Qu > > Thanks, > Qu > > >> >> >> <snip> >>
diff --git a/tests/btrfs/179 b/tests/btrfs/179 index 4a24ea419a7e..74e91841eaa6 100755 --- a/tests/btrfs/179 +++ b/tests/btrfs/179 @@ -109,6 +109,14 @@ wait $snapshot_pid kill $delete_pid wait $delete_pid +# By the async nature of qgroup tree scan, the latest qgroup counts at the time +# of umount might not be upto date, if it isn't then the check will report the +# difference in count. The difference in qgroup counts are anyway updated in the +# following mount, so it is not a real issue that this test case is trying to +# verify. So make sure the qgroup counts are updated before unmount happens. + +$BTRFS_UTIL_PROG quota rescan -w $SCRATCH_MNT >> $seqres.full + # success, all done echo "Silence is golden"
On some systems btrfs/179 fails as the check finds that there is difference in the qgroup counts. By the async nature of qgroup tree scan, the latest qgroup counts at the time of umount might not be upto date, if it isn't then the check will report the difference in count. The difference in qgroup counts are anyway updated in the following mount, so it is not a real issue that this test case is trying to verify. So make sure the qgroup counts are updated before unmount happens and make the check happy. Signed-off-by: Anand Jain <anand.jain@oracle.com> --- tests/btrfs/179 | 8 ++++++++ 1 file changed, 8 insertions(+)