diff mbox

Fix freezing bug in curses console

Message ID gl7g9p$jpn$2@ger.gmane.org (mailing list archive)
State New, archived
Headers show

Commit Message

Matthew Bloch Jan. 21, 2009, 3:51 p.m. UTC
Hi there,

We are running lots of kvm processes in screen and found that about 1 in
5 froze shortly after startup startup with a backtrace like this one:

#0  0xf7c7fcd9 in pthread_exit () from /lib/tls/libc.so.6
#1  0xf7cfbe62 in wresize () from /lib/libncurses.so.5
#2  0xf7cfb7ab in is_term_resized () from /lib/libncurses.so.5
#3  0xf7cfb877 in is_term_resized () from /lib/libncurses.so.5
#4  0xf7cfba31 in resize_term () from /lib/libncurses.so.5
#5  0x080d3dd9 in vga_init ()
#6  <signal handler called>
#7  0xf7c0da5b in free () from /lib/tls/libc.so.6
#8  0xf7c0effe in calloc () from /lib/tls/libc.so.6
#9  0xf7cf222e in newpad () from /lib/libncurses.so.5
#10 0x080d3549 in vga_init ()

We're just using the lenny version of kvm from 2008-12-16.

On casual inspection, the SIGWINCH signal handling looked ropey to me -
grandpa always told me not to do any real work in a signal handler, and
the backtrace suggested re-entrancy problems in curses, so I changed the
behaviour to set a flag and do the work in the main loop instead.  Maybe
I'm reading the backtrace wrong.

So far that means that when you resize the window, the display is
corrupt until the VM outputs some text, or the user hits a key.  But I
think it has solved the freezing / crashing bug too - would appreciate
any comments on my analysis or proposed solution.

Comments

Anthony Liguori Feb. 27, 2009, 7:49 p.m. UTC | #1
Matthew Bloch wrote:
> Hi there,
>
> We are running lots of kvm processes in screen and found that about 1 in
> 5 froze shortly after startup startup with a backtrace like this one:
>
> #0  0xf7c7fcd9 in pthread_exit () from /lib/tls/libc.so.6
> #1  0xf7cfbe62 in wresize () from /lib/libncurses.so.5
> #2  0xf7cfb7ab in is_term_resized () from /lib/libncurses.so.5
> #3  0xf7cfb877 in is_term_resized () from /lib/libncurses.so.5
> #4  0xf7cfba31 in resize_term () from /lib/libncurses.so.5
> #5  0x080d3dd9 in vga_init ()
> #6  <signal handler called>
> #7  0xf7c0da5b in free () from /lib/tls/libc.so.6
> #8  0xf7c0effe in calloc () from /lib/tls/libc.so.6
> #9  0xf7cf222e in newpad () from /lib/libncurses.so.5
> #10 0x080d3549 in vga_init ()
>
> We're just using the lenny version of kvm from 2008-12-16.
>
> On casual inspection, the SIGWINCH signal handling looked ropey to me -
> grandpa always told me not to do any real work in a signal handler, and
> the backtrace suggested re-entrancy problems in curses, so I changed the
> behaviour to set a flag and do the work in the main loop instead.  Maybe
> I'm reading the backtrace wrong.
>
> So far that means that when you resize the window, the display is
> corrupt until the VM outputs some text, or the user hits a key.  But I
> think it has solved the freezing / crashing bug too - would appreciate
> any comments on my analysis or proposed solution.
>   

It's racy with select().  A better fix would be to create a pipe and 
write to that pipe in the SIGWINCH handler.  You should then register an 
io callback using qemu_set_fd_handler2() that does the actions for SIGWINCH.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
andrzej zaborowski Feb. 27, 2009, 9:01 p.m. UTC | #2
2009/2/27 Anthony Liguori <aliguori@us.ibm.com>:
> Matthew Bloch wrote:
>>
>> Hi there,
>>
>> We are running lots of kvm processes in screen and found that about 1 in
>> 5 froze shortly after startup startup with a backtrace like this one:
>>
>> #0  0xf7c7fcd9 in pthread_exit () from /lib/tls/libc.so.6
>> #1  0xf7cfbe62 in wresize () from /lib/libncurses.so.5
>> #2  0xf7cfb7ab in is_term_resized () from /lib/libncurses.so.5
>> #3  0xf7cfb877 in is_term_resized () from /lib/libncurses.so.5
>> #4  0xf7cfba31 in resize_term () from /lib/libncurses.so.5
>> #5  0x080d3dd9 in vga_init ()
>> #6  <signal handler called>
>> #7  0xf7c0da5b in free () from /lib/tls/libc.so.6
>> #8  0xf7c0effe in calloc () from /lib/tls/libc.so.6
>> #9  0xf7cf222e in newpad () from /lib/libncurses.so.5
>> #10 0x080d3549 in vga_init ()
>>
>> We're just using the lenny version of kvm from 2008-12-16.
>>
>> On casual inspection, the SIGWINCH signal handling looked ropey to me -
>> grandpa always told me not to do any real work in a signal handler, and
>> the backtrace suggested re-entrancy problems in curses, so I changed the
>> behaviour to set a flag and do the work in the main loop instead.  Maybe
>> I'm reading the backtrace wrong.
>>
>> So far that means that when you resize the window, the display is
>> corrupt until the VM outputs some text, or the user hits a key.  But I
>> think it has solved the freezing / crashing bug too - would appreciate
>> any comments on my analysis or proposed solution.
>>
>
> It's racy with select().  A better fix would be to create a pipe and write
> to that pipe in the SIGWINCH handler.  You should then register an io
> callback using qemu_set_fd_handler2() that does the actions for SIGWINCH.

Maybe a bottom half would work?  The scheduling of a bh shouldn't
constitute "real work".

Cheers
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Anthony Liguori Feb. 27, 2009, 9:04 p.m. UTC | #3
andrzej zaborowski wrote:
> 2009/2/27 Anthony Liguori <aliguori@us.ibm.com>:
>   
>> Matthew Bloch wrote:
>>     
>>> Hi there,
>>>
>>> We are running lots of kvm processes in screen and found that about 1 in
>>> 5 froze shortly after startup startup with a backtrace like this one:
>>>
>>> #0  0xf7c7fcd9 in pthread_exit () from /lib/tls/libc.so.6
>>> #1  0xf7cfbe62 in wresize () from /lib/libncurses.so.5
>>> #2  0xf7cfb7ab in is_term_resized () from /lib/libncurses.so.5
>>> #3  0xf7cfb877 in is_term_resized () from /lib/libncurses.so.5
>>> #4  0xf7cfba31 in resize_term () from /lib/libncurses.so.5
>>> #5  0x080d3dd9 in vga_init ()
>>> #6  <signal handler called>
>>> #7  0xf7c0da5b in free () from /lib/tls/libc.so.6
>>> #8  0xf7c0effe in calloc () from /lib/tls/libc.so.6
>>> #9  0xf7cf222e in newpad () from /lib/libncurses.so.5
>>> #10 0x080d3549 in vga_init ()
>>>
>>> We're just using the lenny version of kvm from 2008-12-16.
>>>
>>> On casual inspection, the SIGWINCH signal handling looked ropey to me -
>>> grandpa always told me not to do any real work in a signal handler, and
>>> the backtrace suggested re-entrancy problems in curses, so I changed the
>>> behaviour to set a flag and do the work in the main loop instead.  Maybe
>>> I'm reading the backtrace wrong.
>>>
>>> So far that means that when you resize the window, the display is
>>> corrupt until the VM outputs some text, or the user hits a key.  But I
>>> think it has solved the freezing / crashing bug too - would appreciate
>>> any comments on my analysis or proposed solution.
>>>
>>>       
>> It's racy with select().  A better fix would be to create a pipe and write
>> to that pipe in the SIGWINCH handler.  You should then register an io
>> callback using qemu_set_fd_handler2() that does the actions for SIGWINCH.
>>     
>
> Maybe a bottom half would work?  The scheduling of a bh shouldn't
> constitute "real work".
>   

I think it still suffers from the same race condition so today it 
wouldn't work.  You could fix the bottom half scheduling though so that 
you could safely schedule a bottom half from a signal handler (using 
roughly the same trick).

Regards,

Anthony Liguori

> Cheers
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>   

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jamie Lokier Feb. 28, 2009, 9:21 p.m. UTC | #4
Anthony Liguori wrote:
> >>It's racy with select().  A better fix would be to create a pipe and write
> >>to that pipe in the SIGWINCH handler.  You should then register an io
> >>    
> >
> >Maybe a bottom half would work?  The scheduling of a bh shouldn't
> >constitute "real work".
> 
> I think it still suffers from the same race condition so today it 
> wouldn't work.  You could fix the bottom half scheduling though so that 
> you could safely schedule a bottom half from a signal handler (using 
> roughly the same trick).

Fwiw, it's perfectly sensible to have a single pipe which is shared by
all signal handlers, just used to say "check for work flags set".

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel P. Berrangé March 1, 2009, 11:36 a.m. UTC | #5
On Sat, Feb 28, 2009 at 09:21:16PM +0000, Jamie Lokier wrote:
> Anthony Liguori wrote:
> > >>It's racy with select().  A better fix would be to create a pipe and write
> > >>to that pipe in the SIGWINCH handler.  You should then register an io
> > >>    
> > >
> > >Maybe a bottom half would work?  The scheduling of a bh shouldn't
> > >constitute "real work".
> > 
> > I think it still suffers from the same race condition so today it 
> > wouldn't work.  You could fix the bottom half scheduling though so that 
> > you could safely schedule a bottom half from a signal handler (using 
> > roughly the same trick).
> 
> Fwiw, it's perfectly sensible to have a single pipe which is shared by
> all signal handlers, just used to say "check for work flags set".

And if you need the main loop to be able to distinguish signals coming
out of the pipe, then just write the signum into the pipe as a byte,
instead of a single dummy byte. Or even write the whole 'siginfo_t'
struct passed to the signal handler, and read it out in sizeof(siginfo_t)
sized chunks for processing. 

Daniel
Paul Brook March 1, 2009, 1:03 p.m. UTC | #6
> > > I think it still suffers from the same race condition so today it
> > > wouldn't work.  You could fix the bottom half scheduling though so that
> > > you could safely schedule a bottom half from a signal handler (using
> > > roughly the same trick).
> >
> > Fwiw, it's perfectly sensible to have a single pipe which is shared by
> > all signal handlers, just used to say "check for work flags set".
>
> And if you need the main loop to be able to distinguish signals coming
> out of the pipe, then just write the signum into the pipe as a byte,
> instead of a single dummy byte. Or even write the whole 'siginfo_t'
> struct passed to the signal handler, and read it out in sizeof(siginfo_t)
> sized chunks for processing.

I don't think this will works. If the pipe buffer gets full the write will 
either block or you'll loose signals.

When using the pipe as a simple semaphore all you care about is the presence 
or absence of data. It doesn't matter if subsequent writes loose data (e.g. 
by not retrying a nonblocking write) as long as a write to an empty pipe 
succeeds.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Anthony Liguori March 1, 2009, 2:07 p.m. UTC | #7
Paul Brook wrote:
>>>> I think it still suffers from the same race condition so today it
>>>> wouldn't work.  You could fix the bottom half scheduling though so that
>>>> you could safely schedule a bottom half from a signal handler (using
>>>> roughly the same trick).
>>>>         
>>> Fwiw, it's perfectly sensible to have a single pipe which is shared by
>>> all signal handlers, just used to say "check for work flags set".
>>>       
>> And if you need the main loop to be able to distinguish signals coming
>> out of the pipe, then just write the signum into the pipe as a byte,
>> instead of a single dummy byte. Or even write the whole 'siginfo_t'
>> struct passed to the signal handler, and read it out in sizeof(siginfo_t)
>> sized chunks for processing.
>>     
>
> I don't think this will works. If the pipe buffer gets full the write will 
> either block or you'll loose signals.
>
> When using the pipe as a simple semaphore all you care about is the presence 
> or absence of data. It doesn't matter if subsequent writes loose data (e.g. 
> by not retrying a nonblocking write) as long as a write to an empty pipe 
> succeeds.
>   

Yup.  You need to use a global flag to distinguish the type of signal.

Regards,

Anthony Liguori

> Paul
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>   

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jamie Lokier March 2, 2009, 4:57 p.m. UTC | #8
Anthony Liguori wrote:
> >When using the pipe as a simple semaphore all you care about is the 
> >presence or absence of data. It doesn't matter if subsequent writes loose 
> >data (e.g. by not retrying a nonblocking write) as long as a write to an 
> >empty pipe succeeds.
> 
> Yup.  You need to use a global flag to distinguish the type of signal.

If you have a set of BHs which can be scheduled from a signal handler,
set a flag in the BH when it's scheduled, prior to the non-blocking
pipe write.  The select-pipe reader can then look at all eligible BHs
looking for ones with the flag set.

If you can enqueue them in the signal handler that's even better, but
obviously beware of race conditions.

Don't forget to completely drain the pipe when reading.

Maybe use an eventfd instead of a pipe if you have eventfd. :-)

If the signal handler might be run in different threads, you'll need
to take care of memory ordering.  The flag must be set before writing
to the pipe, as observed by the pipe-reading thread.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

Index: curses.c
===================================================================
--- curses.c    (revision 6374)
+++ curses.c    (working copy)
@@ -41,6 +41,7 @@ 
 #define FONT_HEIGHT 16
 #define FONT_WIDTH 8

+static int winch_flag = 0;
 static console_ch_t screen[160 * 100];
 static WINDOW *screenpad = NULL;
 static int width, height, gwidth, gheight, invalidate;
@@ -110,7 +111,7 @@ 

 #ifndef _WIN32
 #if defined(SIGWINCH) && defined(KEY_RESIZE)
-static void curses_winch_handler(int signum)
+static void curses_winch_handler_real(void)
 {
     struct winsize {
         unsigned short ws_row;
@@ -126,7 +127,13 @@ 
     resize_term(ws.ws_row, ws.ws_col);
     curses_calc_pad();
     invalidate = 1;
+    winch_flag = 0;
+}

+static void curses_winch_handler(int sig)
+{
+    winch_flag = 1;
+
     /* some systems require this */
     signal(SIGWINCH, curses_winch_handler);
 }
@@ -179,6 +186,12 @@ 
s
     nextchr = ERR;
     while (1) {
+
+#if !defined(_WIN32) && defined(SIGWINCH) && defined(KEY_RESIZE)
+        if (winch_flag)
+            curses_winch_handler_real();
+#endif
+
         /* while there are any pending key strokes to process */
         if (nextchr == ERR)
             chr = getch();