diff mbox

General protection fault on rmmod cx8800

Message ID 20090302200513.7fc3568e@hyperion.delvare (mailing list archive)
State Accepted
Headers show

Commit Message

Jean Delvare March 2, 2009, 7:05 p.m. UTC
On Mon, 2 Mar 2009 17:03:49 +0100, Jean Delvare wrote:
> As far as I can see the key difference between bttv-input and
> cx88-input is that bttv-input only uses a simple self-rearming timer,
> while cx88-input uses a timer and a separate workqueue. The timer runs
> the workqueue, which rearms the timer, etc. When you flush the timer,
> the separate workqueue can be still active. I presume this is what
> happens on my system. I guess the reason for the separate workqueue is
> that the processing may take some time and we don't want to hurt the
> system's performance?
> 
> So we need to flush both the event workqueue (with
> flush_scheduled_work) and the separate workqueue (with
> flush_workqueue), at the same time, otherwise the active one may rearm
> the flushed one again. This looks tricky, as obviously we can't flush
> both at the exact same time. Alternatively, if we could get rid of one
> of the queues, we'd have only one that needs flushing, this would be a
> lot easier...

Switching to delayed_work seems to do the trick (note this is a 2.6.28
patch):

---
 drivers/media/video/cx88/cx88-input.c |   26 ++++++++------------------
 1 file changed, 8 insertions(+), 18 deletions(-)


At least I didn't have any general protection fault with this patch
applied. Comments?

Thanks,

Comments

Trent Piepho March 2, 2009, 9:12 p.m. UTC | #1
On Mon, 2 Mar 2009, Jean Delvare wrote:
> On Mon, 2 Mar 2009 17:03:49 +0100, Jean Delvare wrote:
> > As far as I can see the key difference between bttv-input and
> > cx88-input is that bttv-input only uses a simple self-rearming timer,
> > while cx88-input uses a timer and a separate workqueue. The timer runs
> > the workqueue, which rearms the timer, etc. When you flush the timer,
> > the separate workqueue can be still active. I presume this is what
> > happens on my system. I guess the reason for the separate workqueue is
> > that the processing may take some time and we don't want to hurt the
> > system's performance?
> >
> > So we need to flush both the event workqueue (with
> > flush_scheduled_work) and the separate workqueue (with
> > flush_workqueue), at the same time, otherwise the active one may rearm

What are the two work queues are you talking about?  I don't see any actual
work queues created.  Just one work function that is scheduled on the
system work queue.  The timer is a softirq and doesn't run on a work queue.

> Switching to delayed_work seems to do the trick (note this is a 2.6.28
> patch):

Makes the most sense to me.  I was just about to make a patch to do the
same thing when I got your email.  Though I was going to patch the v4l-dvb
sources to avoid porting work.
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jean Delvare March 2, 2009, 9:52 p.m. UTC | #2
Hi Trent,

On Mon, 2 Mar 2009 13:12:24 -0800 (PST), Trent Piepho wrote:
> On Mon, 2 Mar 2009, Jean Delvare wrote:
> > On Mon, 2 Mar 2009 17:03:49 +0100, Jean Delvare wrote:
> > > As far as I can see the key difference between bttv-input and
> > > cx88-input is that bttv-input only uses a simple self-rearming timer,
> > > while cx88-input uses a timer and a separate workqueue. The timer runs
> > > the workqueue, which rearms the timer, etc. When you flush the timer,
> > > the separate workqueue can be still active. I presume this is what
> > > happens on my system. I guess the reason for the separate workqueue is
> > > that the processing may take some time and we don't want to hurt the
> > > system's performance?
> > >
> > > So we need to flush both the event workqueue (with
> > > flush_scheduled_work) and the separate workqueue (with
> > > flush_workqueue), at the same time, otherwise the active one may rearm
> 
> What are the two work queues are you talking about?  I don't see any actual
> work queues created.  Just one work function that is scheduled on the
> system work queue.  The timer is a softirq and doesn't run on a work queue.

Sorry, I misread the code. There's only one work queue involved (the
system one). Reading the timer code again now, I admit I am curious how
I managed to misread it to that degree...

The key point remains though: we'd need to delete the timer and flush
the system workqueue at the exact same time, which is not possible, or
to add some sort of signaling between the work and the timer. Or use
delayed_work.

> > Switching to delayed_work seems to do the trick (note this is a 2.6.28
> > patch):
> 
> Makes the most sense to me.  I was just about to make a patch to do the
> same thing when I got your email.  Though I was going to patch the v4l-dvb
> sources to avoid porting work.

It was easier for me to test on an upstream kernel. The porting should
be fairly easy, I can take care of it. The difficult part will be to
handle the compatibility with kernels < 2.6.20 because delayed_work was
introduced in 2.6.20. Probably "compatibility" here will simply mean
that the bug I've hit will only be fixed for kernels >= 2.6.20. Which
once again raises the question of whether we really want to keep
supporting these old kernels.
Andy Walls March 2, 2009, 10:33 p.m. UTC | #3
On Mon, 2009-03-02 at 20:05 +0100, Jean Delvare wrote:
> On Mon, 2 Mar 2009 17:03:49 +0100, Jean Delvare wrote:

>From your original oops, the fault is in cx88_ir_work() but
cx88_ir_handle_key() gets inlined, so you'll need to look at both
functions:

   0:	56                   	push   %rsi
   1:	41 55                	push   %r13
   3:	41 54                	push   %r12
   5:	53                   	push   %rbx
   6:	48 83 ec 08          	sub    $0x8,%rsp
   a:	49 89 fc             	mov    %rdi,%r12
   d:	4c 8d af 40 fd ff ff 	lea    -0x2c0(%rdi),%r13   struct cx88_IR *ir = container_of(work, struct cx88_IR, work);
  14:	4c 8b b7 40 fd ff ff 	mov    -0x2c0(%rdi),%r14   struct cx88_core *core = ir->core;
  1b:	41 8b 85 48 03 00 00 	mov    0x348(%r13),%eax    fetch ir->gpio_addr [for gpio = cx_read(ir->gpio_addr);]
  22:	c1 e8 02             	shr    $0x2,%eax           ((reg)>>2)  [from cx_read() macro]
  25:	89 c0                	mov    %eax,%eax           [I don't know - maybe readl() related]
  27:	48 c1 e0 02          	shl    $0x2,%rax           [I don't know - maybe readl() related]
  2b:	49 03 46 40          	add    0x40(%r14),%rax   core->lmmio + ...  <--- Oops is here  
  2f:	8b 18                	mov    (%rax),%ebx       readl(core->lmmio + ((reg)>>2)) [cx_read() definition]
  31:	41 8b 86 10 0a 00 00 	mov    0xa10(%r14),%eax  switch (core->boardnr) {
  38:	83 f8 23             	cmp    $0x23,%eax        case CX88_BOARD_WINFAST_DTV1000:

"core" is invalid as %r14 holds junk:

R14: 2f4065766f6d6572

A valid address would start with  0xffff.....  In fact, %r14's value is
a magic cookie: "remove@/".

It's safe to assume that the work handler was called when the per device
instance of cx88_core was gone.



> > As far as I can see the key difference between bttv-input and
> > cx88-input is that bttv-input only uses a simple self-rearming timer,
> > while cx88-input uses a timer and a separate workqueue. The timer runs
> > the workqueue, which rearms the timer, etc. When you flush the timer,
> > the separate workqueue can be still active. I presume this is what
> > happens on my system. I guess the reason for the separate workqueue is
> > that the processing may take some time and we don't want to hurt the
> > system's performance?
 
> > So we need to flush both the event workqueue (with
> > flush_scheduled_work) and the separate workqueue (with
> > flush_workqueue), at the same time, otherwise the active one may rearm
> > the flushed one again. This looks tricky, as obviously we can't flush
> > both at the exact same time. Alternatively, if we could get rid of one
> > of the queues, we'd have only one that needs flushing, this would be a
> > lot easier...
> 
> Switching to delayed_work seems to do the trick (note this is a 2.6.28
> patch):
> 
> ---
>  drivers/media/video/cx88/cx88-input.c |   26 ++++++++------------------
>  1 file changed, 8 insertions(+), 18 deletions(-)
> 
> --- linux-2.6.28.orig/drivers/media/video/cx88/cx88-input.c	2009-03-02 19:11:24.000000000 +0100
> +++ linux-2.6.28/drivers/media/video/cx88/cx88-input.c	2009-03-02 19:49:31.000000000 +0100
> @@ -48,8 +48,7 @@ struct cx88_IR {
>  
>  	/* poll external decoder */
>  	int polling;
> -	struct work_struct work;
> -	struct timer_list timer;
> +	struct delayed_work work;
>  	u32 gpio_addr;
>  	u32 last_gpio;
>  	u32 mask_keycode;
> @@ -143,27 +142,20 @@ static void cx88_ir_handle_key(struct cx
>  	}
>  }
>  
> -static void ir_timer(unsigned long data)
> -{
> -	struct cx88_IR *ir = (struct cx88_IR *)data;
> -
> -	schedule_work(&ir->work);
> -}
> -
>  static void cx88_ir_work(struct work_struct *work)
>  {
> -	struct cx88_IR *ir = container_of(work, struct cx88_IR, work);
> +	struct delayed_work *dwork = container_of(work, struct delayed_work, work);
> +	struct cx88_IR *ir = container_of(dwork, struct cx88_IR, work);
>  
>  	cx88_ir_handle_key(ir);
> -	mod_timer(&ir->timer, jiffies + msecs_to_jiffies(ir->polling));
> +	schedule_delayed_work(dwork, msecs_to_jiffies(ir->polling));
>  }
>  
>  void cx88_ir_start(struct cx88_core *core, struct cx88_IR *ir)
>  {
>  	if (ir->polling) {
> -		setup_timer(&ir->timer, ir_timer, (unsigned long)ir);
> -		INIT_WORK(&ir->work, cx88_ir_work);
> -		schedule_work(&ir->work);
> +		INIT_DELAYED_WORK(&ir->work, cx88_ir_work);
> +		schedule_delayed_work(&ir->work, msecs_to_jiffies(ir->polling));
>  	}
>  	if (ir->sampling) {
>  		core->pci_irqmask |= PCI_INT_IR_SMPINT;
> @@ -179,10 +171,8 @@ void cx88_ir_stop(struct cx88_core *core
>  		core->pci_irqmask &= ~PCI_INT_IR_SMPINT;
>  	}
>  
> -	if (ir->polling) {
> -		del_timer_sync(&ir->timer);
> -		flush_scheduled_work();
> -	}
> +	if (ir->polling)
> +		cancel_delayed_work_sync(&ir->work);
>  }
>  
>  /* ---------------------------------------------------------------------- */
> 
> At least I didn't have any general protection fault with this patch
> applied. Comments?

Jean,

Reviewed-by: Andy Walls <awalls@radix.net>

I've done some research and this looks good.

1. It's a cleaner way to use the kernel event threads to perform a
periodic action.

2. No races with stopping the work, as cancel_delayed_work_sync()
reliably disarms even self-firing work handler functions like the one
here.  It even appears to make sure it is cancelled from every CPU for a
multithreaded handler, AFAICT.  (No flag variable is needed to signal
work is stopping to the work handler AFAICT.)



Not to start a flame war on supporting older kernels, but I must mention

3. Canceling of work is only supported on more recent kernels.

4. I'm not willing to waste brain cells on how to avoid work canceling
races for older kernels. :)

> Thanks,

Regards,
Andy

--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Trent Piepho March 3, 2009, 9:40 a.m. UTC | #4
On Mon, 2 Mar 2009, Jean Delvare wrote:
> > Makes the most sense to me.  I was just about to make a patch to do the
> > same thing when I got your email.  Though I was going to patch the v4l-dvb
> > sources to avoid porting work.
>
> It was easier for me to test on an upstream kernel. The porting should
> be fairly easy, I can take care of it. The difficult part will be to
> handle the compatibility with kernels < 2.6.20 because delayed_work was
> introduced in 2.6.20. Probably "compatibility" here will simply mean
> that the bug I've hit will only be fixed for kernels >= 2.6.20. Which
> once again raises the question of whether we really want to keep
> supporting these old kernels.

cancel_delayed_work_sync() was renamed from cancel_rearming_delayed_work()
in 2.6.23.  A compat.h patch can handle that one.

In 2.6.22, cancel_delayed_work_sync(work) was created from
cancel_rearming_delayed_workqueue(wq, work).  The kernel has a compat
function to turn cancel_rearming_delayed_workqueue() into the
cancel_delayed_work_sync() call.  cancel_rearming_delayed_workqueue() has
been around since 2.6.13.  Apparently it was un-exported for a while
because it had no users, see commit v2.6.12-rc2-8-g81ddef7.  Isn't it nice
that there a commit message other than "export
cancel_rearming_delayed_workqueue"?  Let me again express my dislike for
commit with no description.

In 2.6.20 delayed_work was split from work_struct.  The concept of delayed
work was already there and schedule_delayed_work() hasn't changed.  I think
this can also be handled with a compat.h change that defines delayed_work
to work_struct.  That will only be a problem on pre 2.6.20 kernels if some
code decides to define identifiers named work_struct and delayed_work in
the same scope.  There are currently no identifier named delayed_work in
any driver and one driver (sq905) has a structure member named
work_struct.  So I think it'll be ok.
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Trent Piepho March 3, 2009, 9:49 a.m. UTC | #5
On Tue, 3 Mar 2009, Trent Piepho wrote:
> On Mon, 2 Mar 2009, Jean Delvare wrote:
> > be fairly easy, I can take care of it. The difficult part will be to
> > handle the compatibility with kernels < 2.6.20 because delayed_work was
> > introduced in 2.6.20. Probably "compatibility" here will simply mean
> > that the bug I've hit will only be fixed for kernels >= 2.6.20. Which
> > once again raises the question of whether we really want to keep
> > supporting these old kernels.
>
> cancel_delayed_work_sync() was renamed from cancel_rearming_delayed_work()
> in 2.6.23.  A compat.h patch can handle that one.

compat.h has code to handle it froma year ago.

> In 2.6.22, cancel_delayed_work_sync(work) was created from
> cancel_rearming_delayed_workqueue(wq, work).  The kernel has a compat

But cancel_rearming_delayed_work() already existed, the real change was
that cancel_rearming_delayed_workqueue() was made obsolete by making
cancel_rearming_delayed_work() not care what workqueue the work was in.
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jean Delvare March 3, 2009, 12:16 p.m. UTC | #6
On Tue, 3 Mar 2009 01:40:00 -0800 (PST), Trent Piepho wrote:
> On Mon, 2 Mar 2009, Jean Delvare wrote:
> > > Makes the most sense to me.  I was just about to make a patch to do the
> > > same thing when I got your email.  Though I was going to patch the v4l-dvb
> > > sources to avoid porting work.
> >
> > It was easier for me to test on an upstream kernel. The porting should
> > be fairly easy, I can take care of it. The difficult part will be to
> > handle the compatibility with kernels < 2.6.20 because delayed_work was
> > introduced in 2.6.20. Probably "compatibility" here will simply mean
> > that the bug I've hit will only be fixed for kernels >= 2.6.20. Which
> > once again raises the question of whether we really want to keep
> > supporting these old kernels.
> 
> cancel_delayed_work_sync() was renamed from cancel_rearming_delayed_work()
> in 2.6.23.  A compat.h patch can handle that one.
> 
> In 2.6.22, cancel_delayed_work_sync(work) was created from
> cancel_rearming_delayed_workqueue(wq, work).  The kernel has a compat
> function to turn cancel_rearming_delayed_workqueue() into the
> cancel_delayed_work_sync() call.  cancel_rearming_delayed_workqueue() has
> been around since 2.6.13.  Apparently it was un-exported for a while
> because it had no users, see commit v2.6.12-rc2-8-g81ddef7.  Isn't it nice
> that there a commit message other than "export
> cancel_rearming_delayed_workqueue"?  Let me again express my dislike for
> commit with no description.
> 
> In 2.6.20 delayed_work was split from work_struct.  The concept of delayed
> work was already there and schedule_delayed_work() hasn't changed.  I think
> this can also be handled with a compat.h change that defines delayed_work
> to work_struct.  That will only be a problem on pre 2.6.20 kernels if some
> code decides to define identifiers named work_struct and delayed_work in
> the same scope.  There are currently no identifier named delayed_work in
> any driver and one driver (sq905) has a structure member named
> work_struct.  So I think it'll be ok.

Wow, I didn't expect that many different compatibility issues. This
goes beyond the time I am ready to spend on it, I'm afraid.
Trent Piepho March 3, 2009, 8:14 p.m. UTC | #7
On Tue, 3 Mar 2009, Jean Delvare wrote:
> On Tue, 3 Mar 2009 01:40:00 -0800 (PST), Trent Piepho wrote:
> > On Mon, 2 Mar 2009, Jean Delvare wrote:
> > In 2.6.20 delayed_work was split from work_struct.  The concept of delayed
> > work was already there and schedule_delayed_work() hasn't changed.  I think
> > this can also be handled with a compat.h change that defines delayed_work
> > to work_struct.  That will only be a problem on pre 2.6.20 kernels if some
> > code decides to define identifiers named work_struct and delayed_work in
> > the same scope.  There are currently no identifier named delayed_work in
> > any driver and one driver (sq905) has a structure member named
> > work_struct.  So I think it'll be ok.
>
> Wow, I didn't expect that many different compatibility issues. This
> goes beyond the time I am ready to spend on it, I'm afraid.

I already have a patch for compat.h that handles the last remaining issue.
You don't have to do anything.
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jean Delvare March 3, 2009, 9:33 p.m. UTC | #8
On Tue, 3 Mar 2009 12:14:06 -0800 (PST), Trent Piepho wrote:
> On Tue, 3 Mar 2009, Jean Delvare wrote:
> > On Tue, 3 Mar 2009 01:40:00 -0800 (PST), Trent Piepho wrote:
> > > On Mon, 2 Mar 2009, Jean Delvare wrote:
> > > In 2.6.20 delayed_work was split from work_struct.  The concept of delayed
> > > work was already there and schedule_delayed_work() hasn't changed.  I think
> > > this can also be handled with a compat.h change that defines delayed_work
> > > to work_struct.  That will only be a problem on pre 2.6.20 kernels if some
> > > code decides to define identifiers named work_struct and delayed_work in
> > > the same scope.  There are currently no identifier named delayed_work in
> > > any driver and one driver (sq905) has a structure member named
> > > work_struct.  So I think it'll be ok.
> >
> > Wow, I didn't expect that many different compatibility issues. This
> > goes beyond the time I am ready to spend on it, I'm afraid.
> 
> I already have a patch for compat.h that handles the last remaining issue.
> You don't have to do anything.

Ah, very nice then. Please push it to the v4l-dvb repository so that I
can send my own patch. I will also fix the 3 other drivers that have
the same bug (ir-kbd-i2c, saa6588 and em28xx-input).
diff mbox

Patch

--- linux-2.6.28.orig/drivers/media/video/cx88/cx88-input.c	2009-03-02 19:11:24.000000000 +0100
+++ linux-2.6.28/drivers/media/video/cx88/cx88-input.c	2009-03-02 19:49:31.000000000 +0100
@@ -48,8 +48,7 @@  struct cx88_IR {
 
 	/* poll external decoder */
 	int polling;
-	struct work_struct work;
-	struct timer_list timer;
+	struct delayed_work work;
 	u32 gpio_addr;
 	u32 last_gpio;
 	u32 mask_keycode;
@@ -143,27 +142,20 @@  static void cx88_ir_handle_key(struct cx
 	}
 }
 
-static void ir_timer(unsigned long data)
-{
-	struct cx88_IR *ir = (struct cx88_IR *)data;
-
-	schedule_work(&ir->work);
-}
-
 static void cx88_ir_work(struct work_struct *work)
 {
-	struct cx88_IR *ir = container_of(work, struct cx88_IR, work);
+	struct delayed_work *dwork = container_of(work, struct delayed_work, work);
+	struct cx88_IR *ir = container_of(dwork, struct cx88_IR, work);
 
 	cx88_ir_handle_key(ir);
-	mod_timer(&ir->timer, jiffies + msecs_to_jiffies(ir->polling));
+	schedule_delayed_work(dwork, msecs_to_jiffies(ir->polling));
 }
 
 void cx88_ir_start(struct cx88_core *core, struct cx88_IR *ir)
 {
 	if (ir->polling) {
-		setup_timer(&ir->timer, ir_timer, (unsigned long)ir);
-		INIT_WORK(&ir->work, cx88_ir_work);
-		schedule_work(&ir->work);
+		INIT_DELAYED_WORK(&ir->work, cx88_ir_work);
+		schedule_delayed_work(&ir->work, msecs_to_jiffies(ir->polling));
 	}
 	if (ir->sampling) {
 		core->pci_irqmask |= PCI_INT_IR_SMPINT;
@@ -179,10 +171,8 @@  void cx88_ir_stop(struct cx88_core *core
 		core->pci_irqmask &= ~PCI_INT_IR_SMPINT;
 	}
 
-	if (ir->polling) {
-		del_timer_sync(&ir->timer);
-		flush_scheduled_work();
-	}
+	if (ir->polling)
+		cancel_delayed_work_sync(&ir->work);
 }
 
 /* ---------------------------------------------------------------------- */