diff mbox

[v2,00/14] Crash consistency xfstest using dm-log-writes

Message ID CAOQ4uxgbyd1FgF9GsjEwhf+KC-CMd=1QK-F=tevBpq5YByW=Zg@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Amir Goldstein Sept. 1, 2017, 6:52 a.m. UTC
[CC list, Ted]

On Thu, Aug 31, 2017 at 11:54 PM, Josef Bacik <josef@toxicpanda.com> wrote:
> On Thu, Aug 31, 2017 at 05:02:46PM +0300, Amir Goldstein wrote:
>> On Thu, Aug 31, 2017 at 4:43 PM, Josef Bacik <josef@toxicpanda.com> wrote:
>> > On Thu, Aug 31, 2017 at 03:48:44PM +0300, Amir Goldstein wrote:
>> >>
>> >> Josef,
>> >>
>> >> I am at lost with these log corruptions.
>> >> I see log entry bios submitted and log_end_io report success,
>> >> but then in the log I see old data on disk where that entry should be.
>> >> This happens quite randomly and I assume it also happens on
>> >> logged data, because tests sometime fail on checksum on ext4.
>> >>
>> >> Mean while I added some more log entry sanity checks and debug
>> >> prints to replay-log to debug the corruption:
>> >> https://github.com/amir73il/xfstests/commit/bb946deb0dc285867be394613ddb19ce281392cc
>> >>
>> >> This only happens to me when running in kvm, so maybe something
>> >> with the virtio devices is fishy.
>> >>
>> >> Anyway, I ran out of time to work on this for now, so if you have
>> >> any ideas and/or time to test this issue, let me know.
>> >>
>> >
...
>>
>
> Alright I tested it and it's working fine for me.  I'm creating three lv's and
> then doing
>
> -drive file=/dev/mapper/whatever,format=raw,cache=none,if=virtio,aio=native
>
> And I get /dev/vd[bcd] which I use for my test/scratch/log dev and it works out
> fine.  What is your -drive option line and I'll duplicate what you are doing.
> Thanks,
>

I am using Ted's kvm-xfstests, so this is the qemu command line:
https://github.com/tytso/xfstests-bld/blob/master/kvm-xfstests/kvm-xfstests#L104

The only difference in -drive command is no aio=native.
BINGO! when I add aio-native there are no more log corruptions :)
Please try to use aio=threads to see if you also get log corruptions.

Thing is we cannot change kvm-xfstests to always use aio=native because
it is not recommended for sparse images:
https://access.redhat.com/articles/41313
I will try to work something out so that kvm-xfstest will use aio=native
when using the recommended (by not default) LV setup.

However, why would aio=threads cause log corruption?
Does it indicate a bug in kvm-qemu or in dm-log-writes??

Did you try to use kvm-xfstests? its quite convenient to deploy in masses,
so I think it would be ideal to integrate crash tests with.
It also helps unifying the environment between us fs developers
when a bug can not be reproduced on another system. see:
https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-xfstests.md

Anyway, if you do end up using kvm-xfstests, you'l need this
small patch to automatically define the log-writes device:

kvm-xfstests defined 2 sets of test/scratch a small and a large set
and uses only one of those sets depending on command line,
so I use the "other" scratch as the log writes device.

Amir.
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Josef Bacik Sept. 1, 2017, 7:03 a.m. UTC | #1
On Fri, Sep 01, 2017 at 09:52:18AM +0300, Amir Goldstein wrote:
> [CC list, Ted]
> 
> On Thu, Aug 31, 2017 at 11:54 PM, Josef Bacik <josef@toxicpanda.com> wrote:
> > On Thu, Aug 31, 2017 at 05:02:46PM +0300, Amir Goldstein wrote:
> >> On Thu, Aug 31, 2017 at 4:43 PM, Josef Bacik <josef@toxicpanda.com> wrote:
> >> > On Thu, Aug 31, 2017 at 03:48:44PM +0300, Amir Goldstein wrote:
> >> >>
> >> >> Josef,
> >> >>
> >> >> I am at lost with these log corruptions.
> >> >> I see log entry bios submitted and log_end_io report success,
> >> >> but then in the log I see old data on disk where that entry should be.
> >> >> This happens quite randomly and I assume it also happens on
> >> >> logged data, because tests sometime fail on checksum on ext4.
> >> >>
> >> >> Mean while I added some more log entry sanity checks and debug
> >> >> prints to replay-log to debug the corruption:
> >> >> https://github.com/amir73il/xfstests/commit/bb946deb0dc285867be394613ddb19ce281392cc
> >> >>
> >> >> This only happens to me when running in kvm, so maybe something
> >> >> with the virtio devices is fishy.
> >> >>
> >> >> Anyway, I ran out of time to work on this for now, so if you have
> >> >> any ideas and/or time to test this issue, let me know.
> >> >>
> >> >
> ...
> >>
> >
> > Alright I tested it and it's working fine for me.  I'm creating three lv's and
> > then doing
> >
> > -drive file=/dev/mapper/whatever,format=raw,cache=none,if=virtio,aio=native
> >
> > And I get /dev/vd[bcd] which I use for my test/scratch/log dev and it works out
> > fine.  What is your -drive option line and I'll duplicate what you are doing.
> > Thanks,
> >
> 
> I am using Ted's kvm-xfstests, so this is the qemu command line:
> https://github.com/tytso/xfstests-bld/blob/master/kvm-xfstests/kvm-xfstests#L104
> 
> The only difference in -drive command is no aio=native.
> BINGO! when I add aio-native there are no more log corruptions :)
> Please try to use aio=threads to see if you also get log corruptions.
> 
> Thing is we cannot change kvm-xfstests to always use aio=native because
> it is not recommended for sparse images:
> https://access.redhat.com/articles/41313
> I will try to work something out so that kvm-xfstest will use aio=native
> when using the recommended (by not default) LV setup.
> 
> However, why would aio=threads cause log corruption?
> Does it indicate a bug in kvm-qemu or in dm-log-writes??
> 
> Did you try to use kvm-xfstests? its quite convenient to deploy in masses,
> so I think it would be ideal to integrate crash tests with.
> It also helps unifying the environment between us fs developers
> when a bug can not be reproduced on another system. see:
> https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-xfstests.md
> 
> Anyway, if you do end up using kvm-xfstests, you'l need this
> small patch to automatically define the log-writes device:
> 
> --- a/kvm-xfstests/test-appliance/files/root/runtests.sh
> +++ b/kvm-xfstests/test-appliance/files/root/runtests.sh
> @@ -269,9 +269,11 @@ do
>             if test "$SIZE" = "large" ; then
>                 export SCRATCH_DEV=$LG_SCR_DEV
>                 export SCRATCH_MNT=$LG_SCR_MNT
> +               export LOGWRITES_DEV=$SM_SCR_DEV
>             else
>                 export SCRATCH_DEV=$SM_SCR_DEV
>                 export SCRATCH_MNT=$SM_SCR_MNT
> +               export LOGWRITES_DEV=$LG_SCR_DEV
>             fi
>         fi
> 
> kvm-xfstests defined 2 sets of test/scratch a small and a large set
> and uses only one of those sets depending on command line,
> so I use the "other" scratch as the log writes device.
>

Cool I didn't know about kvm-xfstests, I'll give that a whirl.  The baby just
woke me up but when I get up for real I'll switch my config to use aio=threads
and see what happens, but I'm starting to suspect there's a bug in qemu.
Thanks,

Josef 
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Josef Bacik Sept. 1, 2017, 8:07 p.m. UTC | #2
On Fri, Sep 01, 2017 at 09:52:18AM +0300, Amir Goldstein wrote:
> [CC list, Ted]
> 
> On Thu, Aug 31, 2017 at 11:54 PM, Josef Bacik <josef@toxicpanda.com> wrote:
> > On Thu, Aug 31, 2017 at 05:02:46PM +0300, Amir Goldstein wrote:
> >> On Thu, Aug 31, 2017 at 4:43 PM, Josef Bacik <josef@toxicpanda.com> wrote:
> >> > On Thu, Aug 31, 2017 at 03:48:44PM +0300, Amir Goldstein wrote:
> >> >>
> >> >> Josef,
> >> >>
> >> >> I am at lost with these log corruptions.
> >> >> I see log entry bios submitted and log_end_io report success,
> >> >> but then in the log I see old data on disk where that entry should be.
> >> >> This happens quite randomly and I assume it also happens on
> >> >> logged data, because tests sometime fail on checksum on ext4.
> >> >>
> >> >> Mean while I added some more log entry sanity checks and debug
> >> >> prints to replay-log to debug the corruption:
> >> >> https://github.com/amir73il/xfstests/commit/bb946deb0dc285867be394613ddb19ce281392cc
> >> >>
> >> >> This only happens to me when running in kvm, so maybe something
> >> >> with the virtio devices is fishy.
> >> >>
> >> >> Anyway, I ran out of time to work on this for now, so if you have
> >> >> any ideas and/or time to test this issue, let me know.
> >> >>
> >> >
> ...
> >>
> >
> > Alright I tested it and it's working fine for me.  I'm creating three lv's and
> > then doing
> >
> > -drive file=/dev/mapper/whatever,format=raw,cache=none,if=virtio,aio=native
> >
> > And I get /dev/vd[bcd] which I use for my test/scratch/log dev and it works out
> > fine.  What is your -drive option line and I'll duplicate what you are doing.
> > Thanks,
> >
> 
> I am using Ted's kvm-xfstests, so this is the qemu command line:
> https://github.com/tytso/xfstests-bld/blob/master/kvm-xfstests/kvm-xfstests#L104
> 
> The only difference in -drive command is no aio=native.
> BINGO! when I add aio-native there are no more log corruptions :)
> Please try to use aio=threads to see if you also get log corruptions.
> 
> Thing is we cannot change kvm-xfstests to always use aio=native because
> it is not recommended for sparse images:
> https://access.redhat.com/articles/41313
> I will try to work something out so that kvm-xfstest will use aio=native
> when using the recommended (by not default) LV setup.
> 
> However, why would aio=threads cause log corruption?
> Does it indicate a bug in kvm-qemu or in dm-log-writes??

So I've been running this in a loop all day with aio=threads and it's not
blowing up.  This is my qemu version

QEMU emulator version 2.9.0(qemu-2.9.0-1.fb1)

Maybe it has to do with the version of qemu?  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Amir Goldstein Sept. 3, 2017, 1:39 p.m. UTC | #3
On Fri, Sep 1, 2017 at 11:07 PM, Josef Bacik <josef@toxicpanda.com> wrote:
> On Fri, Sep 01, 2017 at 09:52:18AM +0300, Amir Goldstein wrote:
>> [CC list, Ted]
>>
>> On Thu, Aug 31, 2017 at 11:54 PM, Josef Bacik <josef@toxicpanda.com> wrote:
>> > On Thu, Aug 31, 2017 at 05:02:46PM +0300, Amir Goldstein wrote:
>> >> On Thu, Aug 31, 2017 at 4:43 PM, Josef Bacik <josef@toxicpanda.com> wrote:
>> >> > On Thu, Aug 31, 2017 at 03:48:44PM +0300, Amir Goldstein wrote:
>> >> >>
>> >> >> Josef,
>> >> >>
>> >> >> I am at lost with these log corruptions.
>> >> >> I see log entry bios submitted and log_end_io report success,
>> >> >> but then in the log I see old data on disk where that entry should be.
>> >> >> This happens quite randomly and I assume it also happens on
>> >> >> logged data, because tests sometime fail on checksum on ext4.
>> >> >>
>> >> >> Mean while I added some more log entry sanity checks and debug
>> >> >> prints to replay-log to debug the corruption:
>> >> >> https://github.com/amir73il/xfstests/commit/bb946deb0dc285867be394613ddb19ce281392cc
>> >> >>
>> >> >> This only happens to me when running in kvm, so maybe something
>> >> >> with the virtio devices is fishy.
>> >> >>
>> >> >> Anyway, I ran out of time to work on this for now, so if you have
>> >> >> any ideas and/or time to test this issue, let me know.
>> >> >>
>> >> >
>> ...
>> >>
>> >
>> > Alright I tested it and it's working fine for me.  I'm creating three lv's and
>> > then doing
>> >
>> > -drive file=/dev/mapper/whatever,format=raw,cache=none,if=virtio,aio=native
>> >
>> > And I get /dev/vd[bcd] which I use for my test/scratch/log dev and it works out
>> > fine.  What is your -drive option line and I'll duplicate what you are doing.
>> > Thanks,
>> >
>>
>> I am using Ted's kvm-xfstests, so this is the qemu command line:
>> https://github.com/tytso/xfstests-bld/blob/master/kvm-xfstests/kvm-xfstests#L104
>>
>> The only difference in -drive command is no aio=native.
>> BINGO! when I add aio-native there are no more log corruptions :)
>> Please try to use aio=threads to see if you also get log corruptions.
>>
>> Thing is we cannot change kvm-xfstests to always use aio=native because
>> it is not recommended for sparse images:
>> https://access.redhat.com/articles/41313
>> I will try to work something out so that kvm-xfstest will use aio=native
>> when using the recommended (by not default) LV setup.
>>
>> However, why would aio=threads cause log corruption?
>> Does it indicate a bug in kvm-qemu or in dm-log-writes??
>
> So I've been running this in a loop all day with aio=threads and it's not
> blowing up.  This is my qemu version
>
> QEMU emulator version 2.9.0(qemu-2.9.0-1.fb1)
>
> Maybe it has to do with the version of qemu?  Thanks,
>

Maybe. I am running QEMU 2.5.0
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Sept. 4, 2017, 6:42 a.m. UTC | #4
On Fri, Sep 01, 2017 at 09:52:18AM +0300, Amir Goldstein wrote:
> [CC list, Ted]
> 
> On Thu, Aug 31, 2017 at 11:54 PM, Josef Bacik <josef@toxicpanda.com> wrote:
> > On Thu, Aug 31, 2017 at 05:02:46PM +0300, Amir Goldstein wrote:
> >> On Thu, Aug 31, 2017 at 4:43 PM, Josef Bacik <josef@toxicpanda.com> wrote:
> >> > On Thu, Aug 31, 2017 at 03:48:44PM +0300, Amir Goldstein wrote:
> >> >>
> >> >> Josef,
> >> >>
> >> >> I am at lost with these log corruptions.
> >> >> I see log entry bios submitted and log_end_io report success,
> >> >> but then in the log I see old data on disk where that entry should be.
> >> >> This happens quite randomly and I assume it also happens on
> >> >> logged data, because tests sometime fail on checksum on ext4.
> >> >>
> >> >> Mean while I added some more log entry sanity checks and debug
> >> >> prints to replay-log to debug the corruption:
> >> >> https://github.com/amir73il/xfstests/commit/bb946deb0dc285867be394613ddb19ce281392cc
> >> >>
> >> >> This only happens to me when running in kvm, so maybe something
> >> >> with the virtio devices is fishy.
> >> >>
> >> >> Anyway, I ran out of time to work on this for now, so if you have
> >> >> any ideas and/or time to test this issue, let me know.
> >> >>
> >> >
> ...
> >>
> >
> > Alright I tested it and it's working fine for me.  I'm creating three lv's and
> > then doing
> >
> > -drive file=/dev/mapper/whatever,format=raw,cache=none,if=virtio,aio=native
> >
> > And I get /dev/vd[bcd] which I use for my test/scratch/log dev and it works out
> > fine.  What is your -drive option line and I'll duplicate what you are doing.
> > Thanks,
> >
> 
> I am using Ted's kvm-xfstests, so this is the qemu command line:
> https://github.com/tytso/xfstests-bld/blob/master/kvm-xfstests/kvm-xfstests#L104
> 
> The only difference in -drive command is no aio=native.
> BINGO! when I add aio-native there are no more log corruptions :)
> Please try to use aio=threads to see if you also get log corruptions.
> 
> Thing is we cannot change kvm-xfstests to always use aio=native because
> it is not recommended for sparse images:
> https://access.redhat.com/articles/41313

Hmmmm. I think you're looking at an article that's at least 6 years
out of date. It was last updated at:

	Updated September 16 2012 at 2:04 AM

Looking at the bug it references there was a heap of problems in the
DIO code, the AIO code and the filesystem code that we fixed in
upstream kernels in late 2010/early 2011. e.g

http://oss.sgi.com/archives/xfs/2011-01/msg00156.html

Those took some time to get back into vendor kernels, but the
aio=native kvm problems described in that kbase article were fixed
in a RHEL 6.1 point release in May 2011.

IOWs, if qemu w/ aio=native doesn't work these days, the article
you've quoted is not the reason.


Cheers,

Dave.
Amir Goldstein Sept. 4, 2017, 6:49 a.m. UTC | #5
On Mon, Sep 4, 2017 at 9:42 AM, Dave Chinner <david@fromorbit.com> wrote:
> On Fri, Sep 01, 2017 at 09:52:18AM +0300, Amir Goldstein wrote:
>> [CC list, Ted]
>>
>> ...
>> >>
>> >
>> > Alright I tested it and it's working fine for me.  I'm creating three lv's and
>> > then doing
>> >
>> > -drive file=/dev/mapper/whatever,format=raw,cache=none,if=virtio,aio=native
>> >
>> > And I get /dev/vd[bcd] which I use for my test/scratch/log dev and it works out
>> > fine.  What is your -drive option line and I'll duplicate what you are doing.
>> > Thanks,
>> >
>>
>> I am using Ted's kvm-xfstests, so this is the qemu command line:
>> https://github.com/tytso/xfstests-bld/blob/master/kvm-xfstests/kvm-xfstests#L104
>>
>> The only difference in -drive command is no aio=native.
>> BINGO! when I add aio-native there are no more log corruptions :)
>> Please try to use aio=threads to see if you also get log corruptions.
>>
>> Thing is we cannot change kvm-xfstests to always use aio=native because
>> it is not recommended for sparse images:
>> https://access.redhat.com/articles/41313
>
> Hmmmm. I think you're looking at an article that's at least 6 years
> out of date. It was last updated at:
>
>         Updated September 16 2012 at 2:04 AM
>
> Looking at the bug it references there was a heap of problems in the
> DIO code, the AIO code and the filesystem code that we fixed in
> upstream kernels in late 2010/early 2011. e.g
>
> http://oss.sgi.com/archives/xfs/2011-01/msg00156.html
>
> Those took some time to get back into vendor kernels, but the
> aio=native kvm problems described in that kbase article were fixed
> in a RHEL 6.1 point release in May 2011.
>
> IOWs, if qemu w/ aio=native doesn't work these days, the article
> you've quoted is not the reason.
>
>

In that case, I'll post a patch to have kvm-xfstests use aio=native.
I suppose it is the proper way to run xfstests inside kvm anyway.

Thanks,
Amir.
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Amir Goldstein May 25, 2018, 8:58 a.m. UTC | #6
On Fri, Sep 1, 2017 at 9:52 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> [CC list, Ted]
>
> On Thu, Aug 31, 2017 at 11:54 PM, Josef Bacik <josef@toxicpanda.com> wrote:
>> On Thu, Aug 31, 2017 at 05:02:46PM +0300, Amir Goldstein wrote:
>>> On Thu, Aug 31, 2017 at 4:43 PM, Josef Bacik <josef@toxicpanda.com> wrote:
>>> > On Thu, Aug 31, 2017 at 03:48:44PM +0300, Amir Goldstein wrote:
>>> >>
>>> >> Josef,
>>> >>
>>> >> I am at lost with these log corruptions.
>>> >> I see log entry bios submitted and log_end_io report success,
>>> >> but then in the log I see old data on disk where that entry should be.
>>> >> This happens quite randomly and I assume it also happens on
>>> >> logged data, because tests sometime fail on checksum on ext4.
>>> >>
>>> >> Mean while I added some more log entry sanity checks and debug
>>> >> prints to replay-log to debug the corruption:
>>> >> https://github.com/amir73il/xfstests/commit/bb946deb0dc285867be394613ddb19ce281392cc
>>> >>
>>> >> This only happens to me when running in kvm, so maybe something
>>> >> with the virtio devices is fishy.
>>> >>
>>> >> Anyway, I ran out of time to work on this for now, so if you have
>>> >> any ideas and/or time to test this issue, let me know.
>>> >>
>>> >
> ...
>>>
>>
>> Alright I tested it and it's working fine for me.  I'm creating three lv's and
>> then doing
>>
>> -drive file=/dev/mapper/whatever,format=raw,cache=none,if=virtio,aio=native
>>
>> And I get /dev/vd[bcd] which I use for my test/scratch/log dev and it works out
>> fine.  What is your -drive option line and I'll duplicate what you are doing.
>> Thanks,
>>
>
> I am using Ted's kvm-xfstests, so this is the qemu command line:
> https://github.com/tytso/xfstests-bld/blob/master/kvm-xfstests/kvm-xfstests#L104
>
> The only difference in -drive command is no aio=native.
> BINGO! when I add aio-native there are no more log corruptions :)
> Please try to use aio=threads to see if you also get log corruptions.
>
> Thing is we cannot change kvm-xfstests to always use aio=native because
> it is not recommended for sparse images:
> https://access.redhat.com/articles/41313
> I will try to work something out so that kvm-xfstest will use aio=native
> when using the recommended (by not default) LV setup.
>
> However, why would aio=threads cause log corruption?
> Does it indicate a bug in kvm-qemu or in dm-log-writes??
>
> Did you try to use kvm-xfstests? its quite convenient to deploy in masses,
> so I think it would be ideal to integrate crash tests with.
> It also helps unifying the environment between us fs developers
> when a bug can not be reproduced on another system. see:
> https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-xfstests.md
>
> Anyway, if you do end up using kvm-xfstests, you'l need this
> small patch to automatically define the log-writes device:
>
> --- a/kvm-xfstests/test-appliance/files/root/runtests.sh
> +++ b/kvm-xfstests/test-appliance/files/root/runtests.sh
> @@ -269,9 +269,11 @@ do
>             if test "$SIZE" = "large" ; then
>                 export SCRATCH_DEV=$LG_SCR_DEV
>                 export SCRATCH_MNT=$LG_SCR_MNT
> +               export LOGWRITES_DEV=$SM_SCR_DEV
>             else
>                 export SCRATCH_DEV=$SM_SCR_DEV
>                 export SCRATCH_MNT=$SM_SCR_MNT
> +               export LOGWRITES_DEV=$LG_SCR_DEV
>             fi
>         fi
>
> kvm-xfstests defined 2 sets of test/scratch a small and a large set
> and uses only one of those sets depending on command line,
> so I use the "other" scratch as the log writes device.
>

Ted,

I just realized that I am still carrying this patch, so please pull:
https://github.com/tytso/xfstests-bld/pull/8

Thanks,
Amir.
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- a/kvm-xfstests/test-appliance/files/root/runtests.sh
+++ b/kvm-xfstests/test-appliance/files/root/runtests.sh
@@ -269,9 +269,11 @@  do
            if test "$SIZE" = "large" ; then
                export SCRATCH_DEV=$LG_SCR_DEV
                export SCRATCH_MNT=$LG_SCR_MNT
+               export LOGWRITES_DEV=$SM_SCR_DEV
            else
                export SCRATCH_DEV=$SM_SCR_DEV
                export SCRATCH_MNT=$SM_SCR_MNT
+               export LOGWRITES_DEV=$LG_SCR_DEV
            fi
        fi