Add support for sq905 based cameras to gspca

Message ID	20090203103925.25703074@free.fr (mailing list archive)
State	Superseded
Headers	show Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by demeter.kernel.org (8.14.2/8.14.2) with ESMTP id n139ih2p028410 for <patchwork-media@patchwork.kernel.org>; Tue, 3 Feb 2009 09:44:44 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751358AbZBCJon (ORCPT <rfc822;patchwork-media@patchwork.kernel.org>); Tue, 3 Feb 2009 04:44:43 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751423AbZBCJon (ORCPT <rfc822;linux-media-outgoing>); Tue, 3 Feb 2009 04:44:43 -0500 Received: from smtp4-g21.free.fr ([212.27.42.4]:49855 "EHLO smtp4-g21.free.fr" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751358AbZBCJom (ORCPT <rfc822;linux-media@vger.kernel.org>); Tue, 3 Feb 2009 04:44:42 -0500 Received: from smtp4-g21.free.fr (localhost [127.0.0.1]) by smtp4-g21.free.fr (Postfix) with ESMTP id 95AB84C81AF; Tue, 3 Feb 2009 10:44:33 +0100 (CET) Received: from localhost (lns-bzn-22-82-249-116-124.adsl.proxad.net [82.249.116.124]) by smtp4-g21.free.fr (Postfix) with ESMTP id BA3034C81DB; Tue, 3 Feb 2009 10:44:29 +0100 (CET) Date: Tue, 3 Feb 2009 10:39:25 +0100 From: Jean-Francois Moine <moinejf@free.fr> To: kilgota@banach.math.auburn.edu, Adam Baker <linux@baker-net.org.uk> Cc: <linux-media@vger.kernel.org>, Alan Stern <stern@rowland.harvard.edu> Subject: Re: [PATCH] Add support for sq905 based cameras to gspca Message-ID: <20090203103925.25703074@free.fr> In-Reply-To: <alpine.LNX.2.00.0902022032230.1080@banach.math.auburn.edu> References: <200901192322.33362.linux@baker-net.org.uk> <200901272101.27451.linux@baker-net.org.uk> <alpine.LNX.2.00.0901271543560.21122@banach.math.auburn.edu> <200901272228.42610.linux@baker-net.org.uk> <20090128113540.25536301@free.fr> <alpine.LNX.2.00.0901281554500.22748@banach.math.auburn.edu> <20090131203650.36369153@free.fr> <alpine.LNX.2.00.0902022032230.1080@banach.math.auburn.edu> X-Mailer: Claws Mail 3.5.0 (GTK+ 2.12.11; i486-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-media-owner@vger.kernel.org Precedence: bulk List-ID: <linux-media.vger.kernel.org> X-Mailing-List: linux-media@vger.kernel.org

Jean-Francois Moine Feb. 3, 2009, 9:39 a.m. UTC

On Mon, 2 Feb 2009 20:36:31 -0600 (CST)
kilgota@banach.math.auburn.edu wrote:
> Just now when I logged in, a fortune came up which says:
> 
> "A little experience often upsets a lot of theory."
> 
> It struck me funny after our recent experiences, so I thought I would 
> share it with both of you.

Hello,

May be this message made me to look again at the gspca code. Well, it's
my fault: I did not check the previous patch. Sorry for all trouble.

The patch is simply:

Best regards.

kilgota@banach.math.auburn.edu Feb. 3, 2009, 5:30 p.m. UTC | #1

On Tue, 3 Feb 2009, Jean-Francois Moine wrote:

> On Mon, 2 Feb 2009 20:36:31 -0600 (CST)
> kilgota@banach.math.auburn.edu wrote:
>> Just now when I logged in, a fortune came up which says:
>>
>> "A little experience often upsets a lot of theory."
>>
>> It struck me funny after our recent experiences, so I thought I would
>> share it with both of you.
>
> Hello,
>
> May be this message made me to look again at the gspca code. Well, it's
> my fault: I did not check the previous patch. Sorry for all trouble.
>
> The patch is simply:
>
> diff -r 3f4a7bc53d8e linux/drivers/media/video/gspca/gspca.c
> --- a/linux/drivers/media/video/gspca/gspca.c	Mon Feb 02 20:25:38 2009 +0100
> +++ b/linux/drivers/media/video/gspca/gspca.c	Tue Feb 03 10:37:51 2009 +0100
> @@ -435,7 +435,7 @@
> 			break;
>
> 		gspca_dev->urb[i] = NULL;
> -		if (!gspca_dev->present)
> +		if (gspca_dev->present)
> 			usb_kill_urb(urb);
> 		if (urb->transfer_buffer != NULL)
> 			usb_buffer_free(gspca_dev->dev,
>

Well, OK. But this will solve the problem which comes from disconnect 
while streaming? Sure, I will try this and see if it does the job or not. 
But whatever the outcome of that might be, I do not follow the logic.

Theodore Kilgore
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jean-Francois Moine Feb. 3, 2009, 6:13 p.m. UTC | #2

On Tue, 3 Feb 2009 12:21:55 -0600 (CST)
kilgota@banach.math.auburn.edu wrote:

> I should add to the above that now I have tested, and indeed this
> change does not solve the problem of kernel oops after disconnect
> while streaming. It does make one change. The xterm does not go wild
> with error messages. But it is still not possible to close the svv
> window. Moreover, ps ax reveals that [svv] is running as an
> unkillable process, with [sq905/0] and [sq905/1], equally unkillable,
> in supporting roles. And dmesg reveals an oops. The problem is after
> all notorious by now, so I do not see much need for yet another log
> of debug output unless specifically asked for such.

Why is there 2 sq905 processes?

What is the oops? (Your last trace was good, because it gave the last
gspca/sq905 messages and the full oops)

> Perhaps we also need to do what Adam suggested yesterday, and add a
> call to destroy_urbs() in gspca_disconnect()?

Surely not! The destroy_urbs() must be called at the right time, i.e.
on close().

kilgota@banach.math.auburn.edu Feb. 3, 2009, 6:21 p.m. UTC | #3

On Tue, 3 Feb 2009, kilgota@banach.math.auburn.edu wrote:

>
>
> On Tue, 3 Feb 2009, Jean-Francois Moine wrote:
>
>> On Mon, 2 Feb 2009 20:36:31 -0600 (CST)
>> kilgota@banach.math.auburn.edu wrote:
>>> Just now when I logged in, a fortune came up which says:
>>> 
>>> "A little experience often upsets a lot of theory."
>>> 
>>> It struck me funny after our recent experiences, so I thought I would
>>> share it with both of you.
>> 
>> Hello,
>> 
>> May be this message made me to look again at the gspca code. Well, it's
>> my fault: I did not check the previous patch. Sorry for all trouble.
>> 
>> The patch is simply:
>> 
>> diff -r 3f4a7bc53d8e linux/drivers/media/video/gspca/gspca.c
>> --- a/linux/drivers/media/video/gspca/gspca.c	Mon Feb 02 20:25:38 2009 
>> +0100
>> +++ b/linux/drivers/media/video/gspca/gspca.c	Tue Feb 03 10:37:51 2009 
>> +0100
>> @@ -435,7 +435,7 @@
>> 			break;
>>
>> 		gspca_dev->urb[i] = NULL;
>> -		if (!gspca_dev->present)
>> +		if (gspca_dev->present)
>> 			usb_kill_urb(urb);
>> 		if (urb->transfer_buffer != NULL)
>> 			usb_buffer_free(gspca_dev->dev,
>> 
>
> Well, OK. But this will solve the problem which comes from disconnect while 
> streaming? Sure, I will try this and see if it does the job or not. But 
> whatever the outcome of that might be, I do not follow the logic.
>
> Theodore Kilgore

I should add to the above that now I have tested, and indeed this change 
does not solve the problem of kernel oops after disconnect while 
streaming. It does make one change. The xterm does not go wild with error 
messages. But it is still not possible to close the svv window. Moreover, 
ps ax reveals that [svv] is running as an unkillable process, with 
[sq905/0] and [sq905/1], equally unkillable, in supporting roles. And 
dmesg reveals an oops. The problem is after all notorious by now, so I do 
not see much need for yet another log of debug output unless specifically 
asked for such.

Perhaps we also need to do what Adam suggested yesterday, and add a call 
to destroy_urbs() in gspca_disconnect()? That really did appear to cure 
the problem. But the way I did it is

void gspca_disconnect(struct usb_interface *intf)
{
         struct gspca_dev *gspca_dev = usb_get_intfdata(intf);

         gspca_dev->present = 0;
         destroy_urbs(gspca_dev);

         usb_set_intfdata(intf, NULL);

         /* release the device */
         /* (this will call gspca_release() immediatly or on last close) */
         video_unregister_device(&gspca_dev->vdev);

         PDEBUG(D_PROBE, "disconnect complete");
}

and it would appear that with your changes above it would work better to 
do here first

         destroy_urbs(gspca_dev);

and then only after that

         gspca_dev->present = 0;

Or?

Theodore Kilgore
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

kilgota@banach.math.auburn.edu Feb. 3, 2009, 7:15 p.m. UTC | #4

On Tue, 3 Feb 2009, Jean-Francois Moine wrote:

> On Tue, 3 Feb 2009 12:21:55 -0600 (CST)
> kilgota@banach.math.auburn.edu wrote:
>
>> I should add to the above that now I have tested, and indeed this
>> change does not solve the problem of kernel oops after disconnect
>> while streaming. It does make one change. The xterm does not go wild
>> with error messages. But it is still not possible to close the svv
>> window. Moreover, ps ax reveals that [svv] is running as an
>> unkillable process, with [sq905/0] and [sq905/1], equally unkillable,
>> in supporting roles. And dmesg reveals an oops. The problem is after
>> all notorious by now, so I do not see much need for yet another log
>> of debug output unless specifically asked for such.
>
> Why is there 2 sq905 processes?

I of course do not fully understand why there are two such processes. 
However, I would suspect that [sq905/0] is running on processor 0 and 
[sq905/1] is running on processor 1. As I remember, there is only one 
[sq905] process which runs on a single-core machine.

>
> What is the oops? (Your last trace was good, because it gave the last
> gspca/sq905 messages and the full oops)

Well, I can do it again, I suppose. So you get that in a few minutes. But 
my private speculation is it will look just about like the last one, 
because the problem is not addressed.

>
>> Perhaps we also need to do what Adam suggested yesterday, and add a
>> call to destroy_urbs() in gspca_disconnect()?
>
> Surely not! The destroy_urbs() must be called at the right time, i.e.
> on close().

Hmmm. well, the problem is that we are down there in a workqueue. We have 
to force the workqueue to close. I do not see a way to do that, even with 
the exit routines at the end of the workqueue, if it is not possible to 
call upon the functions there which will do the job (and it appears to me 
it is not thus possible, because those gspca functions are not public). 
And if we try to close the workqueue without doing something like 
destroy_urbs() as a consequence of a violent disconnect, then the 
workqueue just does not understand it is supposed to stop and merrily goes 
on its way in spite of all. Then it can not be closed because it is "busy" 
talking to a now-nonexistent piece of hardware, and the calling program 
can not be closed, either, because it is "busy" as well.

Unless I get interrupted by something extraneous, I should have a log sent 
along in a few minutes. God knows, it is easy enough to create one.

Theodore Kilgore
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jean-Francois Moine Feb. 3, 2009, 7:23 p.m. UTC | #5

On Tue, 3 Feb 2009 13:15:58 -0600 (CST)
kilgota@banach.math.auburn.edu wrote:

> > Why is there 2 sq905 processes?  
> 
> I of course do not fully understand why there are two such processes. 
> However, I would suspect that [sq905/0] is running on processor 0 and 
> [sq905/1] is running on processor 1. As I remember, there is only one 
> [sq905] process which runs on a single-core machine.

Indeed, the problem is there! You must have only one process reading the
webcam! I do not see how this can work with these 2 processes...

kilgota@banach.math.auburn.edu Feb. 3, 2009, 7:42 p.m. UTC | #6

On Tue, 3 Feb 2009, Jean-Francois Moine wrote:

> On Tue, 3 Feb 2009 12:21:55 -0600 (CST)
> kilgota@banach.math.auburn.edu wrote:
>
>> I should add to the above that now I have tested, and indeed this
>> change does not solve the problem of kernel oops after disconnect
>> while streaming. It does make one change. The xterm does not go wild
>> with error messages. But it is still not possible to close the svv
>> window. Moreover, ps ax reveals that [svv] is running as an
>> unkillable process, with [sq905/0] and [sq905/1], equally unkillable,
>> in supporting roles. And dmesg reveals an oops. The problem is after
>> all notorious by now, so I do not see much need for yet another log
>> of debug output unless specifically asked for such.
>
> Why is there 2 sq905 processes?
>
> What is the oops? (Your last trace was good, because it gave the last
> gspca/sq905 messages and the full oops)

Right, here is the output. I have not searched for precise differences, 
but a glance at it leaves me with the feeling that it is the same old same 
old. This was done on the Pentium 4 Dual Core, with gspca_main loaded with 
option debug=255, and this is the dmesg output which resulted when I 
pulled the cord of the camera.


Theodore Kilgore

Jean-Francois Moine Feb. 3, 2009, 7:47 p.m. UTC | #7

On Tue, 3 Feb 2009 13:54:15 -0600 (CST)
kilgota@banach.math.auburn.edu wrote:

> > Indeed, the problem is there! You must have only one process
> > reading the webcam! I do not see how this can work with these 2
> > processes...  
> 
> The problem, then, would seem to me to boil down to the question of 
> whether that is up to us. Apparently, a decision like that is not up
> to us, but rather it is up to the compiler and to the rest of the
> kernel to decide. Which, incidentally, appears to me to be a very
> logical way to arrange things. Presumably, a dual- or multi-core
> machine gives certain advantages, or it ought to, but it also
> requires certain accommodations.

Yes, a multiprocessor machine is a plus, but, you must run only one
process to handle streaming. If you have not seen it yet, this is done
changing the line 373 of sq905.c (if I have the same source as yours)
from:

	dev->work_thread = create_workqueue(MODULE_NAME);
to
	dev->work_thread = create_singlethread_workqueue(MODULE_NAME);

Alan Stern Feb. 3, 2009, 7:53 p.m. UTC | #8

On Tue, 3 Feb 2009, Jean-Francois Moine wrote:

> > Perhaps we also need to do what Adam suggested yesterday, and add a
> > call to destroy_urbs() in gspca_disconnect()?
> 
> Surely not! The destroy_urbs() must be called at the right time, i.e.
> on close().

Theodore was pointing out that destroy_urbs() must also be called 
during disconnect.  If a camera is unplugged while it is in use then 
waiting until close() is no good -- it will cause destroy_urbs() to 
pass a stale pointer to usb_buffer_free().  That's the reason for the 
oops.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

kilgota@banach.math.auburn.edu Feb. 3, 2009, 7:54 p.m. UTC | #9

On Tue, 3 Feb 2009, Jean-Francois Moine wrote:

> On Tue, 3 Feb 2009 13:15:58 -0600 (CST)
> kilgota@banach.math.auburn.edu wrote:
>
>>> Why is there 2 sq905 processes?
>>
>> I of course do not fully understand why there are two such processes.
>> However, I would suspect that [sq905/0] is running on processor 0 and
>> [sq905/1] is running on processor 1. As I remember, there is only one
>> [sq905] process which runs on a single-core machine.
>
> Indeed, the problem is there! You must have only one process reading the
> webcam! I do not see how this can work with these 2 processes...

The problem, then, would seem to me to boil down to the question of 
whether that is up to us. Apparently, a decision like that is not up to 
us, but rather it is up to the compiler and to the rest of the kernel to 
decide. Which, incidentally, appears to me to be a very logical way to 
arrange things. Presumably, a dual- or multi-core machine gives certain 
advantages, or it ought to, but it also requires certain accommodations.

You know, Jean-Francois, in a way it is a lucky accident that I got this 
machine for Christmas. I would never have fired up the Pentium 4, at least 
until sometime in the unforeseeable future, because in fact I was getting 
quite adequate performance out of the old Sempron box. Thus, I would not 
have been aware of this problem, either. We would have gone right ahead, 
Adam and I, blissfully ignorant, and published a module which has a flaw 
on a dual-core machine. So, in spite of the problems I say it is better to 
face the problems now.

Theodore Kilgore
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Alan Stern Feb. 3, 2009, 7:59 p.m. UTC | #10

On Tue, 3 Feb 2009 kilgota@banach.math.auburn.edu wrote:

> 
> 
> On Tue, 3 Feb 2009, Jean-Francois Moine wrote:
> 
> > On Tue, 3 Feb 2009 13:15:58 -0600 (CST)
> > kilgota@banach.math.auburn.edu wrote:
> >
> >>> Why is there 2 sq905 processes?
> >>
> >> I of course do not fully understand why there are two such processes.
> >> However, I would suspect that [sq905/0] is running on processor 0 and
> >> [sq905/1] is running on processor 1. As I remember, there is only one
> >> [sq905] process which runs on a single-core machine.
> >
> > Indeed, the problem is there! You must have only one process reading the
> > webcam! I do not see how this can work with these 2 processes...
> 
> The problem, then, would seem to me to boil down to the question of 
> whether that is up to us. Apparently, a decision like that is not up to 
> us, but rather it is up to the compiler and to the rest of the kernel to 
> decide.

Nonsense.  It's simply a matter of how you create your workqueue.  In 
the code you sent me, you call create_workqueue().  Instead, just call 
create_singlethread_workqueue().  Or maybe even 
create_freezeable_workqueue().

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Adam Baker Feb. 3, 2009, 10:09 p.m. UTC | #11

On Tuesday 03 February 2009, Jean-Francois Moine wrote:
> Indeed, the problem is there! You must have only one process reading the
> webcam! I do not see how this can work with these 2 processes...

Although 2 processes are created only one ever gets used. 
create_singlethread_workqueue would therefore be less wasteful but make no 
other difference. Google searches for create_freezeable_workqueue bring up 
suggestions to avoid it so I think I will although I guess we ought to check 
at some point how well the camera copes with suspend while streaming.

At 19:53:31 Alan Stern wrote
> If a camera is unplugged while it is in use then 
> waiting until close() is no good -- it will cause destroy_urbs() to 
> pass a stale pointer to usb_buffer_free().  That's the reason for the 
> oops.

I'd go further and suggest that gspca_disconnect should, after calling 
destroy_urbs() do gspca_dev->dev = NULL; Any use of that pointer past that 
point is a bug (the downside is that doing so would have turned the current 
bug into a hard to spot memory leak).

At 03:37:35 Alan Stern wrote
> On Mon, 2 Feb 2009, Adam Baker wrote:
>> This shouldn't be a problem for sq905 (which doesn't use these URBs) or 
>> isochronous cameras (which don't need to resubmit URBs) but the finepix 
>> driver (the other supported bulk device) will need some careful
>> consideration  to avoid a race between killing the URB and resubmitting it.
>
> You shouldn't need to take any special action.  usb_kill_urb() solves 
> these races for you; it guarantees not to return until the URB has been 
> unlinked and the completion handler has finished, and it guarantees 
> that attempts by the completion handler to resubmit the URB will fail.

The sq905 driver doesn't use the URBs provided by gspca, it uses 
usb_control_msg and usb_bulk_msg which I presume do the right thing 
internally. There would be a tiny window in between when it checks the 
dev->streaming flag and when it sends a new USB msg for the disconnect to 
occur and invalidate the dev pointer. That could be fixed by holding 
gspca_dev->usb_lock in gspca_disconnect when it sets gspca_dev->present = 0.

That would also address the race between open and disconnect.

Unfortunately the finepix driver sometimes uses calls to schedule_delayed_work 
in the completion handler which then makes the call to usb_submit_urb. Fixing 
that will require someone with access to a suitable camera to test it 
otherwise there is a significant risk of adding deadlocks. It already suffers 
from this bug so we aren't making it worse.

Calling destroy_urbs from dev_close rather than gspca_disconnect could also 
trigger the same bug on isochronous cameras. I haven't looked at them closely 
but they probably also have the race between open and disconnect which taking 
the lock in disconnect is likely to fix.

I'll do some testing and post a patch if it looks promising.

Adam
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

kilgota@banach.math.auburn.edu Feb. 3, 2009, 10:23 p.m. UTC | #12

On Tue, 3 Feb 2009, Alan Stern wrote:

> On Tue, 3 Feb 2009 kilgota@banach.math.auburn.edu wrote:
>
>>
>>
>> On Tue, 3 Feb 2009, Jean-Francois Moine wrote:
>>
>>> On Tue, 3 Feb 2009 13:15:58 -0600 (CST)
>>> kilgota@banach.math.auburn.edu wrote:
>>>
>>>>> Why is there 2 sq905 processes?
>>>>
>>>> I of course do not fully understand why there are two such processes.
>>>> However, I would suspect that [sq905/0] is running on processor 0 and
>>>> [sq905/1] is running on processor 1. As I remember, there is only one
>>>> [sq905] process which runs on a single-core machine.
>>>
>>> Indeed, the problem is there! You must have only one process reading the
>>> webcam! I do not see how this can work with these 2 processes...
>>
>> The problem, then, would seem to me to boil down to the question of
>> whether that is up to us. Apparently, a decision like that is not up to
>> us, but rather it is up to the compiler and to the rest of the kernel to
>> decide.
>
> Nonsense.  It's simply a matter of how you create your workqueue.  In
> the code you sent me, you call create_workqueue().  Instead, just call
> create_singlethread_workqueue().  Or maybe even
> create_freezeable_workqueue().
>
> Alan Stern
>

OK, seems one way out, might even work. I will definitely try that.

Update. I did try it.

No it does not work, sorry. :/

I changed the line
dev->work_thread = create_workqueue(MODULE_NAME);

to read

dev->work_thread = create_singlethread_workqueue(MODULE_NAME);

As a result, I can definitely report that only _two_ processes were frozen 
when the cord was pulled, named [svv] and [sq905].

I am sure that the attached log file of the oops looks a little bit 
different from the previous ones. It must, after all.

While you have this matter on your mind, I am curious about the following:

As the code for the sq905 module evolved through various stages, the 
only occasion on which any real trouble arose was at the end, when we put 
in the mutex locks which you can see in the code now. Before they were put 
in, these problems which we are discussing now did not occur. 
Specifically, there was not any such problem about an oops caused by 
camera removal until the mutex locks were put in the code. And I strongly 
suspect -- nay, I am almost certain -- that with that the same code you 
are looking at now, the oops would go away if all those mutex locks were 
simply commented out and the code re-compiled and reinstalled. Can you 
explain this? I am just curious about why.

kilgota@banach.math.auburn.edu Feb. 3, 2009, 10:28 p.m. UTC | #13

On Tue, 3 Feb 2009, Adam Baker wrote:

> On Tuesday 03 February 2009, Jean-Francois Moine wrote:
>> Indeed, the problem is there! You must have only one process reading the
>> webcam! I do not see how this can work with these 2 processes...
>
> Although 2 processes are created only one ever gets used.

How do you know that? Just curious.

> create_singlethread_workqueue would therefore be less wasteful but make no
> other difference.

In at least one respect, you seem to be right. The oops still occurs. See 
my previous mail.

Theodore Kilgore

--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Alan Stern Feb. 4, 2009, 1:59 a.m. UTC | #14

On Tue, 3 Feb 2009, Adam Baker wrote:

> The sq905 driver doesn't use the URBs provided by gspca, it uses 
> usb_control_msg and usb_bulk_msg which I presume do the right thing 
> internally. There would be a tiny window in between when it checks the 
> dev->streaming flag and when it sends a new USB msg for the disconnect to 
> occur and invalidate the dev pointer. That could be fixed by holding 
> gspca_dev->usb_lock in gspca_disconnect when it sets gspca_dev->present = 0.
> 
> That would also address the race between open and disconnect.
> 
> Unfortunately the finepix driver sometimes uses calls to schedule_delayed_work 
> in the completion handler which then makes the call to usb_submit_urb. Fixing 
> that will require someone with access to a suitable camera to test it 
> otherwise there is a significant risk of adding deadlocks. It already suffers 
> from this bug so we aren't making it worse.

If the driver submits URBs from a work routine then usb_kill_urb's
guarantees don't apply.  You'll need to synchronize all three routines:
disconnect, the completion handler, and the work routine.  That means 
you'll have to use a spinlock, since a completion handler isn't allowed 
to acquire a mutex.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Alan Stern Feb. 4, 2009, 2:02 a.m. UTC | #15

On Tue, 3 Feb 2009 kilgota@banach.math.auburn.edu wrote:

> > Nonsense.  It's simply a matter of how you create your workqueue.  In
> > the code you sent me, you call create_workqueue().  Instead, just call
> > create_singlethread_workqueue().  Or maybe even
> > create_freezeable_workqueue().
> >
> > Alan Stern
> >
> 
> OK, seems one way out, might even work. I will definitely try that.
> 
> Update. I did try it.
> 
> No it does not work, sorry. :/

Again, nonsense.  Of course it works.  It causes the kernel to create
only one workqueue thread instead of two.  That's what it's supposed to
do -- it was never intended to fix your oops.

> While you have this matter on your mind, I am curious about the following:
> 
> As the code for the sq905 module evolved through various stages, the 
> only occasion on which any real trouble arose was at the end, when we put 
> in the mutex locks which you can see in the code now. Before they were put 
> in, these problems which we are discussing now did not occur. 
> Specifically, there was not any such problem about an oops caused by 
> camera removal until the mutex locks were put in the code. And I strongly 
> suspect -- nay, I am almost certain -- that with that the same code you 
> are looking at now, the oops would go away if all those mutex locks were 
> simply commented out and the code re-compiled and reinstalled. Can you 
> explain this? I am just curious about why.

You're wrong, the oops would not go away.  It would just become a lot 
less likely to occur -- and thereby all the more insidious.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Andy Walls Feb. 4, 2009, 2:33 a.m. UTC | #16

On Tue, 2009-02-03 at 22:09 +0000, Adam Baker wrote:
> On Tuesday 03 February 2009, Jean-Francois Moine wrote:
> > Indeed, the problem is there! You must have only one process reading the
> > webcam! I do not see how this can work with these 2 processes...
> 
> Although 2 processes are created only one ever gets used. 
> create_singlethread_workqueue would therefore be less wasteful but make no 
> other difference. 

This is generally not the case.  There is a single workqueue as far as a
driver is concerned.  Work items submitted to the queue by the driver
are set to be processed by the same CPU submitting the work item (unless
you call queue_work_on() to specify the CPU).  However, there is no
guarantee that the same CPU will be submitting work requests to the
workqueue every time.

For most drivers this is a moot point though, because they only ever
instantiate and submit one work object ever per device.  This means the
workqueue depth never exceeds 1 for most drivers.  So the correct
statement would be, I believe, "only one sq905 worker thread ever gets
used (per device per capture) at a time in sq905.c"

(For the cx18 driver, where the workqueue can have up to 70 items queued
for all ongoing capture streams per device, it makes quite a difference
on whether a single thread or multiple threads process the work queue.
To preserve ordering of the work items, as needed for work orders for
moving video buffers from one place to another, a singlethreaded
workqueue had to be used.)

I did look at the patch as submitted on Jan 19, and do have what I
intend to be constructive criticisms (sorry if they're overcome by
events by now):

Creating and destroying the worker thread(s) at the start and stop of
each capture is a bit overkill.  It's akin to registering and
unregistering a device's interrupt handler before and after every
capture, but it's a bit worse overhead-wise.  It's probably better to
just instantiate the workqueue when the device "appears" (I'm not a USB
guy so insert appropraite term there) and destroy the workqueue and
worker threads(s) when the device is going to "disappear".  Or if it
will meet your performance requirements, create and destroy the
workqueue when you init and remove the module.  The workqueue and its
thread(s) are essentially the bottom half of your interrupt handler to
handle deferrable work - no point in killing them off until you really
don't need them anymore.

Also, you've created the workqueue threads with a non-unique name: the
expansion of MODULE_NAME.  You're basically saying that you only need
one workqueue, with it's per CPU thread(s), for all instantiations of an
sq905 device - which *can* be a valid design choice.  However, you're
bringing them up and tearing them down on a per capture basis.  That's a
problem that needs to be corrected if you intend to support multiple
sq905 devices on a single machine.  What happens when you attempt to
have two sq905 devices do simultaneous captures?  I don't know myself;
I've never tried to create 2 separate instantiations of a workqueue
object with the same name. 

Regards,
Andy

--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

kilgota@banach.math.auburn.edu Feb. 4, 2009, 3:12 a.m. UTC | #17

On Tue, 3 Feb 2009, Alan Stern wrote:

> On Tue, 3 Feb 2009 kilgota@banach.math.auburn.edu wrote:
>
>>> Nonsense.  It's simply a matter of how you create your workqueue.  In
>>> the code you sent me, you call create_workqueue().  Instead, just call
>>> create_singlethread_workqueue().  Or maybe even
>>> create_freezeable_workqueue().
>>>
>>> Alan Stern
>>>
>>
>> OK, seems one way out, might even work. I will definitely try that.
>>
>> Update. I did try it.
>>
>> No it does not work, sorry. :/
>
> Again, nonsense.  Of course it works.  It causes the kernel to create
> only one workqueue thread instead of two.

OK, yes, it did do that.

That's what it's supposed to
> do -- it was never intended to fix your oops.

OK.

>
>> While you have this matter on your mind, I am curious about the following:
>>
>> As the code for the sq905 module evolved through various stages, the
>> only occasion on which any real trouble arose was at the end, when we put
>> in the mutex locks which you can see in the code now. Before they were put
>> in, these problems which we are discussing now did not occur.
>> Specifically, there was not any such problem about an oops caused by
>> camera removal until the mutex locks were put in the code. And I strongly
>> suspect -- nay, I am almost certain -- that with that the same code you
>> are looking at now, the oops would go away if all those mutex locks were
>> simply commented out and the code re-compiled and reinstalled. Can you
>> explain this? I am just curious about why.
>
> You're wrong, the oops would not go away.  It would just become a lot
> less likely to occur -- and thereby all the more insidious.

How nice. Thanks for explaining.

>
> Alan Stern
>
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Adam Baker Feb. 4, 2009, 9:38 p.m. UTC | #18

On Wednesday 04 February 2009, Andy Walls wrote:
> On Tue, 2009-02-03 at 22:09 +0000, Adam Baker wrote:
> > On Tuesday 03 February 2009, Jean-Francois Moine wrote:
> > > Indeed, the problem is there! You must have only one process reading
> > > the webcam! I do not see how this can work with these 2 processes...
> >
> > Although 2 processes are created only one ever gets used.
> > create_singlethread_workqueue would therefore be less wasteful but make
> > no other difference.
>
> This is generally not the case.  There is a single workqueue as far as a
> driver is concerned.  Work items submitted to the queue by the driver
> are set to be processed by the same CPU submitting the work item (unless
> you call queue_work_on() to specify the CPU).  However, there is no
> guarantee that the same CPU will be submitting work requests to the
> workqueue every time.
>
> For most drivers this is a moot point though, because they only ever
> instantiate and submit one work object ever per device.  This means the
> workqueue depth never exceeds 1 for most drivers.  So the correct
> statement would be, I believe, "only one sq905 worker thread ever gets
> used (per device per capture) at a time in sq905.c"

Yes, I did mean only one gets used in the case of sq905.c (because the queue 
is created per capture and only one item gets submitted to it).

<snip>
>
> I did look at the patch as submitted on Jan 19, and do have what I
> intend to be constructive criticisms (sorry if they're overcome by
> events by now):
>
> Creating and destroying the worker thread(s) at the start and stop of
> each capture is a bit overkill.  It's akin to registering and
> unregistering a device's interrupt handler before and after every
> capture, but it's a bit worse overhead-wise.  It's probably better to
> just instantiate the workqueue when the device "appears" (I'm not a USB
> guy so insert appropraite term there) and destroy the workqueue and
> worker threads(s) when the device is going to "disappear".  Or if it
> will meet your performance requirements, create and destroy the
> workqueue when you init and remove the module.  The workqueue and its
> thread(s) are essentially the bottom half of your interrupt handler to
> handle deferrable work - no point in killing them off until you really
> don't need them anymore.
>

My thought was that the camera was likely to remain plugged in even if it 
wasn't being used so it was best to clean up as much as possible when it 
wasn't in use. I don't really know how the overheads of creating a workqueue 
when you do need it compares to leaving an unused one around sitting on the 
not ready queue in the process table but starting a capture is going to take 
many ms just for the USB traffic so a little extra overhead doesn't seem too 
worrying.

> Also, you've created the workqueue threads with a non-unique name: the
> expansion of MODULE_NAME.  You're basically saying that you only need
> one workqueue, with it's per CPU thread(s), for all instantiations of an
> sq905 device - which *can* be a valid design choice.  However, you're
> bringing them up and tearing them down on a per capture basis.  That's a
> problem that needs to be corrected if you intend to support multiple
> sq905 devices on a single machine.  What happens when you attempt to
> have two sq905 devices do simultaneous captures?  I don't know myself;
> I've never tried to create 2 separate instantiations of a workqueue
> object with the same name.
>

I see multiple instance of [pdflush] and [nfsd] which seem to work fine so I 
believe the name doesn't need to be unique, just a guide to the user of what 
is eating CPU time. I don't have 2 sq905 cameras to test it but I have had 
left over workqueues caused by a driver bug stopping it shut down and new 
ones that started also worked fine.

>
> Regards,
> Andy


--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

kilgota@banach.math.auburn.edu Feb. 4, 2009, 10:31 p.m. UTC | #19

On Wed, 4 Feb 2009, Adam Baker wrote:

> On Wednesday 04 February 2009, Andy Walls wrote:
>> On Tue, 2009-02-03 at 22:09 +0000, Adam Baker wrote:
>>> On Tuesday 03 February 2009, Jean-Francois Moine wrote:
>>>> Indeed, the problem is there! You must have only one process reading
>>>> the webcam! I do not see how this can work with these 2 processes...
>>>
>>> Although 2 processes are created only one ever gets used.
>>> create_singlethread_workqueue would therefore be less wasteful but make
>>> no other difference.
>>
>> This is generally not the case.  There is a single workqueue as far as a
>> driver is concerned.  Work items submitted to the queue by the driver
>> are set to be processed by the same CPU submitting the work item (unless
>> you call queue_work_on() to specify the CPU).  However, there is no
>> guarantee that the same CPU will be submitting work requests to the
>> workqueue every time.
>>
>> For most drivers this is a moot point though, because they only ever
>> instantiate and submit one work object ever per device.  This means the
>> workqueue depth never exceeds 1 for most drivers.  So the correct
>> statement would be, I believe, "only one sq905 worker thread ever gets
>> used (per device per capture) at a time in sq905.c"
>
> Yes, I did mean only one gets used in the case of sq905.c (because the queue
> is created per capture and only one item gets submitted to it).
>
> <snip>
>>
>> I did look at the patch as submitted on Jan 19, and do have what I
>> intend to be constructive criticisms (sorry if they're overcome by
>> events by now):
>>
>> Creating and destroying the worker thread(s) at the start and stop of
>> each capture is a bit overkill.  It's akin to registering and
>> unregistering a device's interrupt handler before and after every
>> capture, but it's a bit worse overhead-wise.  It's probably better to
>> just instantiate the workqueue when the device "appears" (I'm not a USB
>> guy so insert appropraite term there) and destroy the workqueue and
>> worker threads(s) when the device is going to "disappear".  Or if it
>> will meet your performance requirements, create and destroy the
>> workqueue when you init and remove the module.  The workqueue and its
>> thread(s) are essentially the bottom half of your interrupt handler to
>> handle deferrable work - no point in killing them off until you really
>> don't need them anymore.
>>
>
> My thought was that the camera was likely to remain plugged in even if it
> wasn't being used so it was best to clean up as much as possible when it
> wasn't in use. I don't really know how the overheads of creating a workqueue
> when you do need it compares to leaving an unused one around sitting on the
> not ready queue in the process table but starting a capture is going to take
> many ms just for the USB traffic so a little extra overhead doesn't seem too
> worrying.
>
>> Also, you've created the workqueue threads with a non-unique name: the
>> expansion of MODULE_NAME.  You're basically saying that you only need
>> one workqueue, with it's per CPU thread(s), for all instantiations of an
>> sq905 device - which *can* be a valid design choice.  However, you're
>> bringing them up and tearing them down on a per capture basis.  That's a
>> problem that needs to be corrected if you intend to support multiple
>> sq905 devices on a single machine.  What happens when you attempt to
>> have two sq905 devices do simultaneous captures?  I don't know myself;
>> I've never tried to create 2 separate instantiations of a workqueue
>> object with the same name.
>>
>
> I see multiple instance of [pdflush] and [nfsd] which seem to work fine so I
> believe the name doesn't need to be unique, just a guide to the user of what
> is eating CPU time. I don't have 2 sq905 cameras to test it ...

Well, I do. Here is what happens:

1. Plugged in one of them, started the streaming.

2. Plugged in the second one. First one continues to stream, no problem. 
Attempt to start a stream from the second camera while the first one is 
streaming results in the error message,

VIDIOC_REQBUFS error 16, Device or resource busy

and the command line reappears. No obvious interference with the stream 
from the first camera is apparent.

3. After the stream from the first camera is closed, the streaming can be 
started again. However, the stream is again from the first camera which 
was plugged in.

4. After removing the first camera which was plugged in, I tried to start 
the stream from the second one. The stream will not start. A message says 
that

Cannot identify 'dev/video0': 2. No such file or directory.

5. Second camera left attached, first camera plugged in again. The first 
camera can stream again. The second one can not.

6. When the first camera is removed and the second one is then re-plugged, 
the second one will stream.

So, we can safely conclude that, as the module is presently constituted, 
there can be only one stream at a time. The second camera plugged in 
cannot stream. Neither can it conflict with the first one. The only thing 
which puzzles me a little bit is item 5. Apparently, the question is about 
which port the camera was connected to. So let's try three cameras. I will 
hook up camera number one, get a stream, close it, hook up number two 
through a different port, then remove number one and replace it with 
number three on the same port where number one was. What I suspected would 
happen is what happened. Number two would not stream. Number three would 
stream just as number one did. In other words, this was determined by 
which port was in use for number one, which is now also in use for number 
three.

So, now we know what happens.

Theodore Kilgore
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Adam Baker Feb. 4, 2009, 10:34 p.m. UTC | #20

On Wednesday 04 February 2009, kilgota@banach.math.auburn.edu wrote:
<snip description of attempting to stream from 2 cameras at once>
> 4. After removing the first camera which was plugged in, I tried to start
> the stream from the second one. The stream will not start. A message says
> that
>
> Cannot identify 'dev/video0': 2. No such file or directory.

This line points to an error in your test method.

You need to start the second stream with svv -d /dev/video1 to tell it to pick 
the second camera.

Adam
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

kilgota@banach.math.auburn.edu Feb. 4, 2009, 10:53 p.m. UTC | #21

On Wed, 4 Feb 2009, Adam Baker wrote:

> On Wednesday 04 February 2009, kilgota@banach.math.auburn.edu wrote:
> <snip description of attempting to stream from 2 cameras at once>
>> 4. After removing the first camera which was plugged in, I tried to start
>> the stream from the second one. The stream will not start. A message says
>> that
>>
>> Cannot identify 'dev/video0': 2. No such file or directory.
>
> This line points to an error in your test method.
>
> You need to start the second stream with svv -d /dev/video1 to tell it to pick
> the second camera.
>
> Adam
>

Oops, right.

Well, in that case I have to report that two cameras work 
simultaneously just fine. No problem at all.

Theodore Kilgore
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

kilgota@banach.math.auburn.edu Feb. 4, 2009, 11:09 p.m. UTC | #22

On Wed, 4 Feb 2009, kilgota@banach.math.auburn.edu wrote:

>
>
> On Wed, 4 Feb 2009, Adam Baker wrote:
>
>> On Wednesday 04 February 2009, kilgota@banach.math.auburn.edu wrote:
>> <snip description of attempting to stream from 2 cameras at once>
>>> 4. After removing the first camera which was plugged in, I tried to start
>>> the stream from the second one. The stream will not start. A message says
>>> that
>>> 
>>> Cannot identify 'dev/video0': 2. No such file or directory.
>> 
>> This line points to an error in your test method.
>> 
>> You need to start the second stream with svv -d /dev/video1 to tell it to 
>> pick
>> the second camera.
>> 
>> Adam
>> 
>
> Oops, right.
>
> Well, in that case I have to report that two cameras work simultaneously just 
> fine. No problem at all.

For completeness, I should add the following:

ps ax now shows two svv processes. One of them is svv -d /dev/video1 and 
is followed by [sq905] and the second one is svv followed by [sq905]. Why 
in reverse order? Well, probably it is that I now fired up the second 
camera first and the first one after that. The workqueue in the 
module is started with

dev->work_thread = create_singlethread_workqueue(MODULE_NAME);

so these are distinct instances of [sq905].

Now am I supposed to check what happens if I start jerking them off of 
their tethers without closing the capture windows? No thanks, not this 
time. Let's wait until that problem is solved with just one camera. :)

Theodore Kilgore
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Add support for sq905 based cameras to gspca

Commit Message

Comments

Patch