diff mbox

[BUG] Deferred probing in driver model is racy, resulting in lost probes

Message ID CACVXFVOZ=4HFAy38q_8ZUjOPffcv_mMTi7KCdh3HcX=MCLw5SQ@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ming Lei Sept. 16, 2012, 1:24 p.m. UTC
On Sun, Sep 16, 2012 at 4:25 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
>
> It isn't.  As I said, it's a race condition due to lack of locking - the
> driver hasn't been added to the list of drivers at this point:
>
> int bus_add_driver(struct device_driver *drv)
> {
> ...
>         if (drv->bus->p->drivers_autoprobe) {
>                 error = driver_attach(drv);
>                 if (error)
>                         goto out_unregister;
>         }
>         klist_add_tail(&priv->knode_bus, &bus->p->klist_drivers);
> ...
> }
>
> Notice that the attaching is done _before_ the driver is added to the
> bus driver list.

Yes, it is a problem since a new device may be added to bus
and bus_probe_device() may not see the new added driver.

So looks klist_add_tail() should complete before driver_attach()
inside bus_add_driver().

The attached one line change should fix the problem because the
below way can guarantee that no probe(dev) may be lost.


CPU0				CPU1
driver_register
	...
	write(bus->driver_list)
	smp_mb()
	read(bus->device_list)
	...
				device_add
					/* bus_add_device */	
					write(bus->device_list)
					smp_mb()
					/* bus_probe_device*/
					read(bus->driver_list)

And the smp_mb() has been implicit by UNLOCK+LOCK
of 'klist' according to 'VARIETIES OF MEMORY BARRIER' part
of Documentation/memory-barriers.txt.

Could you test the below patch to see if it can fix your problem?




Thanks,

Comments

Greg Kroah-Hartman Sept. 26, 2012, 8:08 p.m. UTC | #1
On Sun, Sep 16, 2012 at 09:24:43PM +0800, Ming Lei wrote:
> On Sun, Sep 16, 2012 at 4:25 PM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
> >
> > It isn't.  As I said, it's a race condition due to lack of locking - the
> > driver hasn't been added to the list of drivers at this point:
> >
> > int bus_add_driver(struct device_driver *drv)
> > {
> > ...
> >         if (drv->bus->p->drivers_autoprobe) {
> >                 error = driver_attach(drv);
> >                 if (error)
> >                         goto out_unregister;
> >         }
> >         klist_add_tail(&priv->knode_bus, &bus->p->klist_drivers);
> > ...
> > }
> >
> > Notice that the attaching is done _before_ the driver is added to the
> > bus driver list.
> 
> Yes, it is a problem since a new device may be added to bus
> and bus_probe_device() may not see the new added driver.
> 
> So looks klist_add_tail() should complete before driver_attach()
> inside bus_add_driver().
> 
> The attached one line change should fix the problem because the
> below way can guarantee that no probe(dev) may be lost.
> 
> 
> CPU0				CPU1
> driver_register
> 	...
> 	write(bus->driver_list)
> 	smp_mb()
> 	read(bus->device_list)
> 	...
> 				device_add
> 					/* bus_add_device */	
> 					write(bus->device_list)
> 					smp_mb()
> 					/* bus_probe_device*/
> 					read(bus->driver_list)
> 
> And the smp_mb() has been implicit by UNLOCK+LOCK
> of 'klist' according to 'VARIETIES OF MEMORY BARRIER' part
> of Documentation/memory-barriers.txt.
> 
> Could you test the below patch to see if it can fix your problem?
> 
> diff --git a/drivers/base/bus.c b/drivers/base/bus.c
> index 181ed26..17d7437 100644
> --- a/drivers/base/bus.c
> +++ b/drivers/base/bus.c
> @@ -714,12 +714,12 @@ int bus_add_driver(struct device_driver *drv)
>  	if (error)
>  		goto out_unregister;
> 
> +	klist_add_tail(&priv->knode_bus, &bus->p->klist_drivers);
>  	if (drv->bus->p->drivers_autoprobe) {
>  		error = driver_attach(drv);
>  		if (error)
>  			goto out_unregister;
>  	}
> -	klist_add_tail(&priv->knode_bus, &bus->p->klist_drivers);
>  	module_add_driver(drv->owner, drv);
> 
>  	error = driver_create_file(drv, &driver_attr_uevent);
> 
> 
> 

Did the above patch ever prove to solve the issue or not?

thanks,

greg k-h
Russell King - ARM Linux Sept. 26, 2012, 8:23 p.m. UTC | #2
On Wed, Sep 26, 2012 at 01:08:33PM -0700, Greg Kroah-Hartman wrote:
> On Sun, Sep 16, 2012 at 09:24:43PM +0800, Ming Lei wrote:
> > diff --git a/drivers/base/bus.c b/drivers/base/bus.c
> > index 181ed26..17d7437 100644
> > --- a/drivers/base/bus.c
> > +++ b/drivers/base/bus.c
> > @@ -714,12 +714,12 @@ int bus_add_driver(struct device_driver *drv)
> >  	if (error)
> >  		goto out_unregister;
> > 
> > +	klist_add_tail(&priv->knode_bus, &bus->p->klist_drivers);
> >  	if (drv->bus->p->drivers_autoprobe) {
> >  		error = driver_attach(drv);
> >  		if (error)
> >  			goto out_unregister;
> >  	}
> > -	klist_add_tail(&priv->knode_bus, &bus->p->klist_drivers);
> >  	module_add_driver(drv->owner, drv);
> > 
> >  	error = driver_create_file(drv, &driver_attr_uevent);
> > 
> > 
> > 
> 
> Did the above patch ever prove to solve the issue or not?

To be honest, I've not bothered to test the above patch, and now when I
look at it, I notice it's broken - in that on error it will corrupt the
driver list.  Take a look at the error path.

priv is drv->p.  We add priv->knode_bus to the driver list.  If
driver_attach() returns an error, then we go to out_unregister, which
does:

out_unregister:
        kobject_put(&priv->kobj);
        kfree(drv->p);
        drv->p = NULL;

thereby freeing the node we just added to the driver list without first
removing it.

I suspect it will fix the problem, but let's get the patch to be correct
before it gets tested...
Greg Kroah-Hartman Sept. 26, 2012, 8:36 p.m. UTC | #3
On Wed, Sep 26, 2012 at 09:23:21PM +0100, Russell King - ARM Linux wrote:
> On Wed, Sep 26, 2012 at 01:08:33PM -0700, Greg Kroah-Hartman wrote:
> > On Sun, Sep 16, 2012 at 09:24:43PM +0800, Ming Lei wrote:
> > > diff --git a/drivers/base/bus.c b/drivers/base/bus.c
> > > index 181ed26..17d7437 100644
> > > --- a/drivers/base/bus.c
> > > +++ b/drivers/base/bus.c
> > > @@ -714,12 +714,12 @@ int bus_add_driver(struct device_driver *drv)
> > >  	if (error)
> > >  		goto out_unregister;
> > > 
> > > +	klist_add_tail(&priv->knode_bus, &bus->p->klist_drivers);
> > >  	if (drv->bus->p->drivers_autoprobe) {
> > >  		error = driver_attach(drv);
> > >  		if (error)
> > >  			goto out_unregister;
> > >  	}
> > > -	klist_add_tail(&priv->knode_bus, &bus->p->klist_drivers);
> > >  	module_add_driver(drv->owner, drv);
> > > 
> > >  	error = driver_create_file(drv, &driver_attr_uevent);
> > > 
> > > 
> > > 
> > 
> > Did the above patch ever prove to solve the issue or not?
> 
> To be honest, I've not bothered to test the above patch, and now when I
> look at it, I notice it's broken - in that on error it will corrupt the
> driver list.  Take a look at the error path.
> 
> priv is drv->p.  We add priv->knode_bus to the driver list.  If
> driver_attach() returns an error, then we go to out_unregister, which
> does:
> 
> out_unregister:
>         kobject_put(&priv->kobj);
>         kfree(drv->p);
>         drv->p = NULL;
> 
> thereby freeing the node we just added to the driver list without first
> removing it.
> 
> I suspect it will fix the problem, but let's get the patch to be correct
> before it gets tested...

Good catch.  Ming, care to redo this?

greg k-h
Ming Lei Sept. 26, 2012, 11:47 p.m. UTC | #4
On Thu, Sep 27, 2012 at 4:23 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Wed, Sep 26, 2012 at 01:08:33PM -0700, Greg Kroah-Hartman wrote:
>> On Sun, Sep 16, 2012 at 09:24:43PM +0800, Ming Lei wrote:
>> > diff --git a/drivers/base/bus.c b/drivers/base/bus.c
>> > index 181ed26..17d7437 100644
>> > --- a/drivers/base/bus.c
>> > +++ b/drivers/base/bus.c
>> > @@ -714,12 +714,12 @@ int bus_add_driver(struct device_driver *drv)
>> >     if (error)
>> >             goto out_unregister;
>> >
>> > +   klist_add_tail(&priv->knode_bus, &bus->p->klist_drivers);
>> >     if (drv->bus->p->drivers_autoprobe) {
>> >             error = driver_attach(drv);
>> >             if (error)
>> >                     goto out_unregister;
>> >     }
>> > -   klist_add_tail(&priv->knode_bus, &bus->p->klist_drivers);
>> >     module_add_driver(drv->owner, drv);
>> >
>> >     error = driver_create_file(drv, &driver_attr_uevent);
>> >
>> >
>> >
>>
>> Did the above patch ever prove to solve the issue or not?
>
> To be honest, I've not bothered to test the above patch, and now when I
> look at it, I notice it's broken - in that on error it will corrupt the
> driver list.  Take a look at the error path.
>
> priv is drv->p.  We add priv->knode_bus to the driver list.  If
> driver_attach() returns an error, then we go to out_unregister, which

In fact, driver_attach() always returns zero, so it does __not__ affect
your test.

I knew the failure shouldn't be handled in theory because the probe
failure on one device should not cause failure of driver_register, and
it should be fixed, IMO.

> does:
>
> out_unregister:
>         kobject_put(&priv->kobj);
>         kfree(drv->p);
>         drv->p = NULL;
>
> thereby freeing the node we just added to the driver list without first
> removing it.
>
> I suspect it will fix the problem, but let's get the patch to be correct
> before it gets tested...

Trust me, please go ahead to test and it doesn't affect it...

Thanks,
diff mbox

Patch

diff --git a/drivers/base/bus.c b/drivers/base/bus.c
index 181ed26..17d7437 100644
--- a/drivers/base/bus.c
+++ b/drivers/base/bus.c
@@ -714,12 +714,12 @@  int bus_add_driver(struct device_driver *drv)
 	if (error)
 		goto out_unregister;

+	klist_add_tail(&priv->knode_bus, &bus->p->klist_drivers);
 	if (drv->bus->p->drivers_autoprobe) {
 		error = driver_attach(drv);
 		if (error)
 			goto out_unregister;
 	}
-	klist_add_tail(&priv->knode_bus, &bus->p->klist_drivers);
 	module_add_driver(drv->owner, drv);

 	error = driver_create_file(drv, &driver_attr_uevent);