diff mbox

[Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

Message ID CAF6-1L502OZGbPHp37vWzEU3H-wsK5z904ZtS7SEUmuM-GhLyA@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Sylvain Munaut Aug. 12, 2013, 2:13 p.m. UTC
Hi,

>> > > tapdisk[9180]: segfault at 7f7e3a5c8c10 ip 00007f7e387532d4 sp
>> > 00007f7e3a5c8c10 error 4 in libpthread-2.13.so[7f7e38748000+17000]
>> > > tapdisk:9180 blocked for more than 120 seconds.
>> > > tapdisk         D ffff88043fc13540     0  9180      1 0x00000000

You can try generating a core file by changing the ulimit on the running process

http://superuser.com/questions/404239/setting-ulimit-on-a-running-process

A backtrace would be useful :)


> Actually maybe not. What I was reading only applies for large number of bytes written to the pipe, and even then I got confused by the double negatives. Sorry for the noise.

Yes, as you discovered but size < PIPE_BUF, they should be atomic even
in non-blocking mode. But I could still add assert() there to make
sure it is.


I did find a bug where it could "leak" requests which may lead to
hang. But it shouldn't crash ...

Here's an (untested yet) patch in the rbd error path:



Cheers,

     Sylvain
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

James Harper Aug. 12, 2013, 11:26 p.m. UTC | #1
> >> > > tapdisk[9180]: segfault at 7f7e3a5c8c10 ip 00007f7e387532d4 sp
> >> > 00007f7e3a5c8c10 error 4 in libpthread-2.13.so[7f7e38748000+17000]
> >> > > tapdisk:9180 blocked for more than 120 seconds.
> >> > > tapdisk         D ffff88043fc13540     0  9180      1 0x00000000
> 
> You can try generating a core file by changing the ulimit on the running
> process
> 
> A backtrace would be useful :)
> 

I found it was actually dumping core in /, but gdb doesn't seem to work nicely and all I get is this:

warning: Can't read pathname for load map: Input/output error.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Cannot find new threads: generic error
Core was generated by `tapdisk'.
Program terminated with signal 11, Segmentation fault.
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:163
163     ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No such file or directory.

Even when I attach to a running process.

One VM segfaults on startup, pretty much everytime except never when I attach strace to it, meaning it's probably a race condition and may not actually be in your code...

> 
> > Actually maybe not. What I was reading only applies for large number of
> > bytes written to the pipe, and even then I got confused by the double
> > negatives. Sorry for the noise.
> 
> Yes, as you discovered but size < PIPE_BUF, they should be atomic even
> in non-blocking mode. But I could still add assert() there to make
> sure it is.

Nah I got that completely backwards. I see now you are only passing a pointer so yes it should never be non-atomic.

> I did find a bug where it could "leak" requests which may lead to
> hang. But it shouldn't crash ...
> 
> Here's an (untested yet) patch in the rbd error path:
> 

I'll try that later this morning when I get a minute.

I've done the poor-mans-debugger thing and riddled the code with printf's but as far as I can determine every routine starts and ends. My thinking at the moment is that it's either a race (the VM's most likely to crash have multiple disks), or a buffer overflow that trips it up either immediately, or later.

I have definitely observed multiple VM's crash when something in ceph hiccup's (eg I bring a mon up or down), if that helps.

I also followed through the rbd_aio_release idea on the weekend - I can see that if the read returns failure it means the callback was never called so the release is then the responsibility of the caller.

Thanks

James

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Harper Aug. 13, 2013, 12:39 a.m. UTC | #2
> Here's an (untested yet) patch in the rbd error path:
> 
> diff --git a/drivers/block-rbd.c b/drivers/block-rbd.c
> index 68fbed7..ab2d2c5 100644
> --- a/drivers/block-rbd.c
> +++ b/drivers/block-rbd.c
> @@ -560,6 +560,9 @@ err:
>         if (c)
>                 rbd_aio_release(c);
> 
> +       list_move(&req->queue, &prv->reqs_free);
> +       prv->reqs_free_count++;
> +
>         return rv;
>  }
> 

FWIW, I can confirm via printf's that this error path is never hit in at least some of the crashes I'm seeing.

James
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sylvain Munaut Aug. 13, 2013, 8:32 a.m. UTC | #3
> FWIW, I can confirm via printf's that this error path is never hit in at least some of the crashes I'm seeing.

Ok thanks.

Are you using cache btw ?

Cheers,

    Sylvain
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Harper Aug. 13, 2013, 9:12 a.m. UTC | #4
> 
> > FWIW, I can confirm via printf's that this error path is never hit in at least
> some of the crashes I'm seeing.
> 
> Ok thanks.
> 
> Are you using cache btw ?
> 

I hope not. How could I tell? It's not something I've explicitly enabled.

Thanks

James
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sylvain Munaut Aug. 13, 2013, 9:20 a.m. UTC | #5
Hi,

> I hope not. How could I tell? It's not something I've explicitly enabled.

It's disabled by default.

So you'd have to have enabled it either in ceph.conf  or directly in
the device path in the xen config. (option is 'rbd cache',
http://ceph.com/docs/next/rbd/rbd-config-ref/ )

Cheers,

    Sylvain
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Frederik Thuysbaert Aug. 13, 2013, 2:59 p.m. UTC | #6
Hi,

I have been testing this a while now, and just finished testing your 
untested patch. The rbd caching problem still persists.

The system I am testing on has the following characteristics:

Dom0:
     - Linux xen-001 3.2.0-4-amd64 #1 SMP Debian 3.2.46-1 x86_64
     - Most recent git checkout of blktap rbd branch

DomU:
     - Same kernel as dom0
     - Root (xvda1) is a logical volume on dom0
     - xvda2 is a Rados Block Device format 1

Let me start by saying that the errors only occur with RBD client 
caching ON.
I will give the error messages of both dom0 and domU before and after I 
applied the patch.

Actions in domU to trigger errors:

~# mkfs.xfs -f /dev/xvda2
~# mount /dev/xvda2 /mnt
~# bonnie -u 0 -g 0 /mnt


Error messages:

BEFORE patch:

Without RBD cache:

dom0: no errors
domU: no errors

With RBD cache:

dom0: no errors

domU:
Aug 13 18:18:33 debian-vm-101 kernel: [   37.960475] lost page write due 
to I/O error on xvda2
Aug 13 18:18:33 debian-vm-101 kernel: [   37.960488] lost page write due 
to I/O error on xvda2
Aug 13 18:18:33 debian-vm-101 kernel: [   37.960501] lost page write due 
to I/O error on xvda2
...
Aug 13 18:18:52 debian-vm-101 kernel: [   56.394645] XFS (xvda2): 
xfs_do_force_shutdown(0x2) called from line 1007 of file 
/build/linux-s5x2oE/linux-3.2.46/fs/xfs/xfs_log.c.  Return address = 
0xffffffffa013ced5
Aug 13 18:19:19 debian-vm-101 kernel: [   83.941539] XFS (xvda2): 
xfs_log_force: error 5 returned.
Aug 13 18:19:19 debian-vm-101 kernel: [   83.941565] XFS (xvda2): 
xfs_log_force: error 5 returned.
...

AFTER patch:

Without RBD cache:

dom0: no errors
domU: no errors

With RBD cache:

dom0:
Aug 13 16:40:49 xen-001 kernel: [   94.954734] tapdisk[3075]: segfault 
at 7f749ee86da0 ip 00007f749d060776 sp 00007f748ea7a460 error 7 in 
libpthread-2.13.so[7f749d059000+17000]


domU:
Same as before patch.



I would like to add that I have the time to test this, we are happy to 
help you in any way possible. However, since I am no C developer, I 
won't be able to do much more than testing.


Regards

Frederik


On 13-08-13 11:20, Sylvain Munaut wrote:
> Hi,
>
>> I hope not. How could I tell? It's not something I've explicitly enabled.
> It's disabled by default.
>
> So you'd have to have enabled it either in ceph.conf  or directly in
> the device path in the xen config. (option is 'rbd cache',
> http://ceph.com/docs/next/rbd/rbd-config-ref/ )
>
> Cheers,
>
>      Sylvain
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sylvain Munaut Aug. 13, 2013, 3:39 p.m. UTC | #7
Hi,

> I have been testing this a while now, and just finished testing your
> untested patch. The rbd caching problem still persists.

Yes, I wouldn't expect to change anything for caching. But I still
don't understand why caching would change anything at all ... all of
it should be handled within the librbd lib.


Note that I would recommend against caching anyway. The blktap layer
doesn't pass through the FLUSH commands and so this make it completely
unsafe because the VM will think things are commited to disk durably
even though they are not ...



> I will give the error messages of both dom0 and domU before and after I
> applied the patch.

It's actually strange that it changes anything at all.

Can you try adding a ERROR("HERE\n");  in that error path processing
and check syslog to see if it's triggered at all ?

A traceback would be great if you can get a core file. And possibly
compile tapdisk with debug symbols.


Cheers,

    Sylvain
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Harper Aug. 13, 2013, 9:47 p.m. UTC | #8
Just noticed email subject "qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Qemu-devel] [Bug 1207686]" where Sage noted that he has seen a completion called twice in the logs the OP posted. If that is actually happening (and not just an artefact of logging ring buffer overflowing or something) then I think that could easily cause a segfault in tapdisk rbd.

I'll try and see if I can log when that happens.

James

> -----Original Message-----
> From: Sylvain Munaut [mailto:s.munaut@whatever-company.com]
> Sent: Tuesday, 13 August 2013 7:20 PM
> To: James Harper
> Cc: Pasi Kärkkäinen; ceph-devel@vger.kernel.org; xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to
> test ? :p
> 
> Hi,
> 
> > I hope not. How could I tell? It's not something I've explicitly enabled.
> 
> It's disabled by default.
> 
> So you'd have to have enabled it either in ceph.conf  or directly in
> the device path in the xen config. (option is 'rbd cache',
> http://ceph.com/docs/next/rbd/rbd-config-ref/ )
> 
> Cheers,
> 
>     Sylvain
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Harper Aug. 13, 2013, 11:39 p.m. UTC | #9
I think I have a separate problem too - tapdisk will segfault almost immediately upon starting but seemingly only for Linux PV DomU's. Once it has started doing this I have to wait a few hours to a day before it starts working again. My Windows DomU's appear to be able to start normally though.

James
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sylvain Munaut Aug. 13, 2013, 11:43 p.m. UTC | #10
On Wed, Aug 14, 2013 at 1:39 AM, James Harper
<james.harper@bendigoit.com.au> wrote:
> I think I have a separate problem too - tapdisk will segfault almost immediately upon starting but seemingly only for Linux PV DomU's. Once it has started doing this I have to wait a few hours to a day before it starts working again. My Windows DomU's appear to be able to start normally though.

What about other blktap driver ? like using blktap raw driver, does
that work without issue ?

Cheers,

    Sylvain
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Harper Aug. 13, 2013, 11:51 p.m. UTC | #11
> 
> On Wed, Aug 14, 2013 at 1:39 AM, James Harper
> <james.harper@bendigoit.com.au> wrote:
> > I think I have a separate problem too - tapdisk will segfault almost
> immediately upon starting but seemingly only for Linux PV DomU's. Once it
> has started doing this I have to wait a few hours to a day before it starts
> working again. My Windows DomU's appear to be able to start normally
> though.
> 
> What about other blktap driver ? like using blktap raw driver, does
> that work without issue ?
> 

What's the syntax for that? I use tap2:tapdisk:rbd for rbd, but I don't know how to specify raw and anything I try just says it doesn't understand

Thanks

James
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Harper Aug. 13, 2013, 11:59 p.m. UTC | #12
> 
> >
> > On Wed, Aug 14, 2013 at 1:39 AM, James Harper
> > <james.harper@bendigoit.com.au> wrote:
> > > I think I have a separate problem too - tapdisk will segfault almost
> > immediately upon starting but seemingly only for Linux PV DomU's. Once it
> > has started doing this I have to wait a few hours to a day before it starts
> > working again. My Windows DomU's appear to be able to start normally
> > though.
> >
> > What about other blktap driver ? like using blktap raw driver, does
> > that work without issue ?
> >
> 
> What's the syntax for that? I use tap2:tapdisk:rbd for rbd, but I don't know
> how to specify raw and anything I try just says it doesn't understand
> 

I just tested with tap2:aio and that worked (had an old image of the VM on lvm still so just tested with that). Switching back to rbd and it crashes every time, just as postgres is starting in the vm. Booting into single user mode, waiting 30 seconds, then letting the boot continue it still crashes at the same point so I think it's not a timing thing - maybe postgres has a disk access pattern that is triggering the bug?

Putting printf's in seems to make the problem go away sometimes, so it's hard to debug.

James

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Frederik Thuysbaert Aug. 14, 2013, 8:43 a.m. UTC | #13
On 13-08-13 17:39, Sylvain Munaut wrote:
> Hi,
>
>> I have been testing this a while now, and just finished testing your
>> untested patch. The rbd caching problem still persists.
> Yes, I wouldn't expect to change anything for caching. But I still
> don't understand why caching would change anything at all ... all of
> it should be handled within the librbd lib.
>
>
> Note that I would recommend against caching anyway. The blktap layer
> doesn't pass through the FLUSH commands and so this make it completely
> unsafe because the VM will think things are commited to disk durably
> even though they are not ...
>
>
>
>> I will give the error messages of both dom0 and domU before and after I
>> applied the patch.
> It's actually strange that it changes anything at all.
>
> Can you try adding a ERROR("HERE\n");  in that error path processing
> and check syslog to see if it's triggered at all ?
I did this, and can confirm that it is not triggered.
>
> A traceback would be great if you can get a core file. And possibly
> compile tapdisk with debug symbols.
I'm not quite sure what u mean, can u give some more information on how 
I do this? I compiled tapdisk with ./configure CFLAGS=-g, but I'm not 
sure this is what u meant.
>
> Cheers,
>
>      Sylvain
Regards

- Frederik
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Frederik Thuysbaert Aug. 14, 2013, 8:47 a.m. UTC | #14
On 13-08-13 17:39, Sylvain Munaut wrote:
>
> It's actually strange that it changes anything at all.
>
> Can you try adding a ERROR("HERE\n");  in that error path processing
> and check syslog to see if it's triggered at all ?
>
> A traceback would be great if you can get a core file. And possibly
> compile tapdisk with debug symbols.
When halting the domU after the errors, I get the following in dom0 syslog:

Aug 14 10:43:57 xen-001 kernel: [ 5041.338756] INFO: task tapdisk:9690 
blocked for more than 120 seconds.
Aug 14 10:43:57 xen-001 kernel: [ 5041.338817] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 14 10:43:57 xen-001 kernel: [ 5041.338903] tapdisk         D 
ffff8800bf213780     0  9690      1 0x00000000
Aug 14 10:43:57 xen-001 kernel: [ 5041.338908]  ffff8800b4b0e730 
0000000000000246 ffff880000000000 ffffffff8160d020
Aug 14 10:43:57 xen-001 kernel: [ 5041.338912]  0000000000013780 
ffff8800b4ebffd8 ffff8800b4ebffd8 ffff8800b4b0e730
Aug 14 10:43:57 xen-001 kernel: [ 5041.338916]  ffff8800b4d36190 
0000000181199c37 ffff8800b5798c00 ffff8800b5798c00
Aug 14 10:43:57 xen-001 kernel: [ 5041.338921] Call Trace:
Aug 14 10:43:57 xen-001 kernel: [ 5041.338929] [<ffffffffa0308411>] ? 
blktap_device_destroy_sync+0x85/0x9b [blktap]
Aug 14 10:43:57 xen-001 kernel: [ 5041.338936] [<ffffffff8105fadf>] ? 
add_wait_queue+0x3c/0x3c
Aug 14 10:43:57 xen-001 kernel: [ 5041.338940] [<ffffffffa0307444>] ? 
blktap_ring_release+0x10/0x2d [blktap]
Aug 14 10:43:57 xen-001 kernel: [ 5041.338945] [<ffffffff810fb141>] ? 
fput+0xf9/0x1a1
Aug 14 10:43:57 xen-001 kernel: [ 5041.338949] [<ffffffff810f8e6c>] ? 
filp_close+0x62/0x6a
Aug 14 10:43:57 xen-001 kernel: [ 5041.338954] [<ffffffff81049831>] ? 
put_files_struct+0x60/0xad
Aug 14 10:43:57 xen-001 kernel: [ 5041.338958] [<ffffffff81049e38>] ? 
do_exit+0x292/0x713
Aug 14 10:43:57 xen-001 kernel: [ 5041.338961] [<ffffffff8104a539>] ? 
do_group_exit+0x74/0x9e
Aug 14 10:43:57 xen-001 kernel: [ 5041.338965] [<ffffffff81055f94>] ? 
get_signal_to_deliver+0x46d/0x48f
Aug 14 10:43:57 xen-001 kernel: [ 5041.338970] [<ffffffff81347759>] ? 
force_sig_info_fault+0x5b/0x63
Aug 14 10:43:57 xen-001 kernel: [ 5041.338975] [<ffffffff8100de27>] ? 
do_signal+0x38/0x610
Aug 14 10:43:57 xen-001 kernel: [ 5041.338979] [<ffffffff81070deb>] ? 
arch_local_irq_restore+0x7/0x8
Aug 14 10:43:57 xen-001 kernel: [ 5041.338983] [<ffffffff8134eb77>] ? 
_raw_spin_unlock_irqrestore+0xe/0xf
Aug 14 10:43:57 xen-001 kernel: [ 5041.338987] [<ffffffff8103f944>] ? 
wake_up_new_task+0xb9/0xc2
Aug 14 10:43:57 xen-001 kernel: [ 5041.338992] [<ffffffff8106f987>] ? 
sys_futex+0x120/0x151
Aug 14 10:43:57 xen-001 kernel: [ 5041.338995] [<ffffffff8100e435>] ? 
do_notify_resume+0x25/0x68
Aug 14 10:43:57 xen-001 kernel: [ 5041.338999] [<ffffffff8134ef3c>] ? 
retint_signal+0x48/0x8c
...
Aug 14 10:44:17 xen-001 tap-ctl: tap-err:tap_ctl_connect: couldn't 
connect to /var/run/blktap-control/ctl9478: 111

>
>
> Cheers,
>
>      Sylvain
Regards

- Frederik
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sylvain Munaut Aug. 14, 2013, 1:13 p.m. UTC | #15
Hi,

> I just tested with tap2:aio and that worked (had an old image of the VM on lvm still so just tested with that). Switching back to rbd and it crashes every time, just as postgres is starting in the vm. Booting into single user mode, waiting 30 seconds, then letting the boot continue it still crashes at the same point so I think it's not a timing thing - maybe postgres has a disk access pattern that is triggering the bug?

Mmm, that's really interesting.

Could you try to disable request merging ? Just give option
max_merge_size=0 in the tap2 disk description. Something like
'tap2:tapdisk:rbd:rbd/test:max_merge_size=0,xvda2,w'

Cheers,

     Sylvain
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Harper Aug. 14, 2013, 1:16 p.m. UTC | #16
> 
> Hi,
> 
> > I just tested with tap2:aio and that worked (had an old image of the VM on
> lvm still so just tested with that). Switching back to rbd and it crashes every
> time, just as postgres is starting in the vm. Booting into single user mode,
> waiting 30 seconds, then letting the boot continue it still crashes at the same
> point so I think it's not a timing thing - maybe postgres has a disk access
> pattern that is triggering the bug?
> 
> Mmm, that's really interesting.
> 
> Could you try to disable request merging ? Just give option
> max_merge_size=0 in the tap2 disk description. Something like
> 'tap2:tapdisk:rbd:rbd/test:max_merge_size=0,xvda2,w'
> 

Just as suddenly the problem went away and I can no longer reproduce the crash on startup. Very frustrating. Most likely it still crashed during heavy use but that can take days.

I've just upgraded librbd to dumpling (from cuttlefish) on that one server and will see what it's doing by morning. I'll disable merging when I can reproduce it next.

Thanks

James
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sylvain Munaut Aug. 14, 2013, 3:03 p.m. UTC | #17
Hi Frederik,

>> A traceback would be great if you can get a core file. And possibly
>> compile tapdisk with debug symbols.
>
> I'm not quite sure what u mean, can u give some more information on how I do
> this? I compiled tapdisk with ./configure CFLAGS=-g, but I'm not sure this
> is what u meant.

Yes, ./configure CFLAGS=-g LDFLAGS=-g  is a good start.

Then when it crashes, if will leave a 'core' time somewhere. (not sure
where, maybe in / or in /tmp)
If it doesn't you may have to enable it. When the process is running,
use this on the tapdisk PID :

http://superuser.com/questions/404239/setting-ulimit-on-a-running-process

Then once you have a core file, you can use gdb along with the tapdisk
executable to generate a meaningful backtrace of where the crash
happenned :

See for ex http://publib.boulder.ibm.com/httpserv/ihsdiag/get_backtrace.html
for how to do it.


> When halting the domU after the errors, I get the following in dom0 syslog:

It's not really unexpected. If tapdisk crashes the IO ring is going to
be left hanging and god knows what weird behaviour will happen ...


Cheers,

    Sylvain
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Harper Aug. 15, 2013, 7:20 a.m. UTC | #18
> >
> > Hi,
> >
> > > I just tested with tap2:aio and that worked (had an old image of the VM
> on
> > lvm still so just tested with that). Switching back to rbd and it crashes every
> > time, just as postgres is starting in the vm. Booting into single user mode,
> > waiting 30 seconds, then letting the boot continue it still crashes at the
> same
> > point so I think it's not a timing thing - maybe postgres has a disk access
> > pattern that is triggering the bug?
> >
> > Mmm, that's really interesting.
> >
> > Could you try to disable request merging ? Just give option
> > max_merge_size=0 in the tap2 disk description. Something like
> > 'tap2:tapdisk:rbd:rbd/test:max_merge_size=0,xvda2,w'
> >
> 
> Just as suddenly the problem went away and I can no longer reproduce the
> crash on startup. Very frustrating. Most likely it still crashed during heavy use
> but that can take days.
> 
> I've just upgraded librbd to dumpling (from cuttlefish) on that one server and
> will see what it's doing by morning. I'll disable merging when I can reproduce
> it next.
> 

I just had a crash since upgrading to dumpling, and will disable merging tonight.

James
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Harper Aug. 16, 2013, 1:02 a.m. UTC | #19
> 
> I just had a crash since upgrading to dumpling, and will disable merging
> tonight.
> 

Still crashes with merging disabled.

James
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Frederik Thuysbaert Aug. 16, 2013, 8:27 a.m. UTC | #20
Hi Sylvain,

>> I'm not quite sure what u mean, can u give some more information on how I do
>> this? I compiled tapdisk with ./configure CFLAGS=-g, but I'm not sure this
>> is what u meant.
>
> Yes, ./configure CFLAGS=-g LDFLAGS=-g  is a good start.
>
>...
>
> Then once you have a core file, you can use gdb along with the tapdisk
> executable to generate a meaningful backtrace of where the crash
>

I did 2 runs, with a cold reboot in between just to be sure. I don't 
think I'm getting a lot of valuable information, but I will post it 
anyway. The reason for the cold reboot was a 'Cannot access memory at 
address ...' in gdb after the first frame, I thought it could help.

Here's what I got:

try 1:
Core was generated by `tapdisk'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007fb42d2082d7 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib/x86_64-linux-gnu/libpthread.so.0
(gdb) bt
#0  0x00007fb42d2082d7 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib/x86_64-linux-gnu/libpthread.so.0
Cannot access memory at address 0x7fb42f081c38
(gdb) frame 0
#0  0x00007fb42d2082d7 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib/x86_64-linux-gnu/libpthread.so.0
(gdb) list
77	}
78	
79	int
80	main(int argc, char *argv[])
81	{
82		char *control;
83		int c, err, nodaemon;
84		FILE *out;
85	
86		control  = NULL;
(gdb) info locals
No symbol table info available.

try 2:
Core was generated by `tapdisk'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007fe05a721e6b in poll () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007fe05a721e6b in poll () from /lib/x86_64-linux-gnu/libc.so.6
Cannot access memory at address 0x7fe05c2ba518
(gdb) frame 0
#0  0x00007fe05a721e6b in poll () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) list
77	}
78	
79	int
80	main(int argc, char *argv[])
81	{
82		char *control;
83		int c, err, nodaemon;
84		FILE *out;
85	
86		control  = NULL;
(gdb) info locals
No symbol table info available.

Regards,

- Frederik

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/block-rbd.c b/drivers/block-rbd.c
index 68fbed7..ab2d2c5 100644
--- a/drivers/block-rbd.c
+++ b/drivers/block-rbd.c
@@ -560,6 +560,9 @@  err:
        if (c)
                rbd_aio_release(c);

+       list_move(&req->queue, &prv->reqs_free);
+       prv->reqs_free_count++;
+
        return rv;
 }