diff mbox

ses: Fix racy cleanup of /sys in remove_dev()

Message ID 4912ec551a8ec01181cc3e7ad1e01d3d36758810.1463170976.git.calvinowens@fb.com (mailing list archive)
State Accepted, archived
Headers show

Commit Message

Calvin Owens May 13, 2016, 8:28 p.m. UTC
Currently we free the resources backing the enclosure device before we
call device_unregister(). This is racy: during rmmod of low-level SCSI
drivers that hook into enclosure, we end up with a small window of time
during which writing to /sys can OOPS. Example trace with mpt3sas:

  general protection fault: 0000 [#1] SMP KASAN
  Modules linked in: mpt3sas(-) <...>
  RIP: [<ffffffffa0388a98>] ses_get_page2_descriptor.isra.6+0x38/0x220 [ses]
  Call Trace:
   [<ffffffffa0389d14>] ses_set_fault+0xf4/0x400 [ses]
   [<ffffffffa0361069>] set_component_fault+0xa9/0xf0 [enclosure]
   [<ffffffff8205bffc>] dev_attr_store+0x3c/0x70
   [<ffffffff81677df5>] sysfs_kf_write+0x115/0x180
   [<ffffffff81675725>] kernfs_fop_write+0x275/0x3a0
   [<ffffffff8151f810>] __vfs_write+0xe0/0x3e0
   [<ffffffff8152281f>] vfs_write+0x13f/0x4a0
   [<ffffffff81526731>] SyS_write+0x111/0x230
   [<ffffffff828b401b>] entry_SYSCALL_64_fastpath+0x13/0x94

Fortunately the solution is extremely simple: call device_unregister()
before we free the resources, and the race no longer exists. The driver
core holds a reference over ->remove_dev(), so AFAICT this is safe.

Signed-off-by: Calvin Owens <calvinowens@fb.com>
---
 drivers/scsi/ses.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Calvin Owens June 2, 2016, 10:50 p.m. UTC | #1
On 05/13/2016 01:28 PM, Calvin Owens wrote:
> Currently we free the resources backing the enclosure device before we
> call device_unregister(). This is racy: during rmmod of low-level SCSI
> drivers that hook into enclosure, we end up with a small window of time
> during which writing to /sys can OOPS. Example trace with mpt3sas:

Ping?

>    general protection fault: 0000 [#1] SMP KASAN
>    Modules linked in: mpt3sas(-) <...>
>    RIP: [<ffffffffa0388a98>] ses_get_page2_descriptor.isra.6+0x38/0x220 [ses]
>    Call Trace:
>     [<ffffffffa0389d14>] ses_set_fault+0xf4/0x400 [ses]
>     [<ffffffffa0361069>] set_component_fault+0xa9/0xf0 [enclosure]
>     [<ffffffff8205bffc>] dev_attr_store+0x3c/0x70
>     [<ffffffff81677df5>] sysfs_kf_write+0x115/0x180
>     [<ffffffff81675725>] kernfs_fop_write+0x275/0x3a0
>     [<ffffffff8151f810>] __vfs_write+0xe0/0x3e0
>     [<ffffffff8152281f>] vfs_write+0x13f/0x4a0
>     [<ffffffff81526731>] SyS_write+0x111/0x230
>     [<ffffffff828b401b>] entry_SYSCALL_64_fastpath+0x13/0x94
>
> Fortunately the solution is extremely simple: call device_unregister()
> before we free the resources, and the race no longer exists. The driver
> core holds a reference over ->remove_dev(), so AFAICT this is safe.
>
> Signed-off-by: Calvin Owens <calvinowens@fb.com>
> ---
>   drivers/scsi/ses.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c
> index 53ef1cb..0e8601a 100644
> --- a/drivers/scsi/ses.c
> +++ b/drivers/scsi/ses.c
> @@ -778,6 +778,8 @@ static void ses_intf_remove_enclosure(struct scsi_device *sdev)
>   	if (!edev)
>   		return;
>
> +	enclosure_unregister(edev);
> +
>   	ses_dev = edev->scratch;
>   	edev->scratch = NULL;
>
> @@ -789,7 +791,6 @@ static void ses_intf_remove_enclosure(struct scsi_device *sdev)
>   	kfree(edev->component[0].scratch);
>
>   	put_device(&edev->edev);
> -	enclosure_unregister(edev);
>   }
>
>   static void ses_intf_remove(struct device *cdev,
>

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Calvin Owens June 15, 2016, 8:24 p.m. UTC | #2
On Thursday 06/02 at 15:50 -0700, Calvin Owens wrote:
> On 05/13/2016 01:28 PM, Calvin Owens wrote:
> > Currently we free the resources backing the enclosure device before we
> > call device_unregister(). This is racy: during rmmod of low-level SCSI
> > drivers that hook into enclosure, we end up with a small window of time
> > during which writing to /sys can OOPS. Example trace with mpt3sas:
> 
> Ping?

Any thoughts? Squinting at this more it still seems racy, but a narrow race
is surely better than just blatantly freeing everything while the file is
still exposed in /sys? Is there a better way you'd prefer I accomplish this?

(I have boxes that OOPS all the time from monitoring code reading the /sys
files, with this patch I haven't seen a single one.)

Thanks,
Calvin

> >    general protection fault: 0000 [#1] SMP KASAN
> >    Modules linked in: mpt3sas(-) <...>
> >    RIP: [<ffffffffa0388a98>] ses_get_page2_descriptor.isra.6+0x38/0x220 [ses]
> >    Call Trace:
> >     [<ffffffffa0389d14>] ses_set_fault+0xf4/0x400 [ses]
> >     [<ffffffffa0361069>] set_component_fault+0xa9/0xf0 [enclosure]
> >     [<ffffffff8205bffc>] dev_attr_store+0x3c/0x70
> >     [<ffffffff81677df5>] sysfs_kf_write+0x115/0x180
> >     [<ffffffff81675725>] kernfs_fop_write+0x275/0x3a0
> >     [<ffffffff8151f810>] __vfs_write+0xe0/0x3e0
> >     [<ffffffff8152281f>] vfs_write+0x13f/0x4a0
> >     [<ffffffff81526731>] SyS_write+0x111/0x230
> >     [<ffffffff828b401b>] entry_SYSCALL_64_fastpath+0x13/0x94
> > 
> > Fortunately the solution is extremely simple: call device_unregister()
> > before we free the resources, and the race no longer exists. The driver
> > core holds a reference over ->remove_dev(), so AFAICT this is safe.
> > 
> > Signed-off-by: Calvin Owens <calvinowens@fb.com>
> > ---
> >   drivers/scsi/ses.c | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c
> > index 53ef1cb..0e8601a 100644
> > --- a/drivers/scsi/ses.c
> > +++ b/drivers/scsi/ses.c
> > @@ -778,6 +778,8 @@ static void ses_intf_remove_enclosure(struct scsi_device *sdev)
> >   	if (!edev)
> >   		return;
> > 
> > +	enclosure_unregister(edev);
> > +
> >   	ses_dev = edev->scratch;
> >   	edev->scratch = NULL;
> > 
> > @@ -789,7 +791,6 @@ static void ses_intf_remove_enclosure(struct scsi_device *sdev)
> >   	kfree(edev->component[0].scratch);
> > 
> >   	put_device(&edev->edev);
> > -	enclosure_unregister(edev);
> >   }
> > 
> >   static void ses_intf_remove(struct device *cdev,
> > 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Calvin Owens July 28, 2016, 1:04 a.m. UTC | #3
On 06/15/2016 01:24 PM, Calvin Owens wrote:
> On Thursday 06/02 at 15:50 -0700, Calvin Owens wrote:
>> On 05/13/2016 01:28 PM, Calvin Owens wrote:
>>> Currently we free the resources backing the enclosure device before we
>>> call device_unregister(). This is racy: during rmmod of low-level SCSI
>>> drivers that hook into enclosure, we end up with a small window of time
>>> during which writing to /sys can OOPS. Example trace with mpt3sas:
>>
>> Ping?
>
> Any thoughts? Squinting at this more it still seems racy, but a narrow race
> is surely better than just blatantly freeing everything while the file is
> still exposed in /sys? Is there a better way you'd prefer I accomplish this?
>
> (I have boxes that OOPS all the time from monitoring code reading the /sys
> files, with this patch I haven't seen a single one.)
>
> Thanks,
> Calvin

Ping? Thoughts, comments?

>>>    general protection fault: 0000 [#1] SMP KASAN
>>>    Modules linked in: mpt3sas(-) <...>
>>>    RIP: [<ffffffffa0388a98>] ses_get_page2_descriptor.isra.6+0x38/0x220 [ses]
>>>    Call Trace:
>>>     [<ffffffffa0389d14>] ses_set_fault+0xf4/0x400 [ses]
>>>     [<ffffffffa0361069>] set_component_fault+0xa9/0xf0 [enclosure]
>>>     [<ffffffff8205bffc>] dev_attr_store+0x3c/0x70
>>>     [<ffffffff81677df5>] sysfs_kf_write+0x115/0x180
>>>     [<ffffffff81675725>] kernfs_fop_write+0x275/0x3a0
>>>     [<ffffffff8151f810>] __vfs_write+0xe0/0x3e0
>>>     [<ffffffff8152281f>] vfs_write+0x13f/0x4a0
>>>     [<ffffffff81526731>] SyS_write+0x111/0x230
>>>     [<ffffffff828b401b>] entry_SYSCALL_64_fastpath+0x13/0x94
>>>
>>> Fortunately the solution is extremely simple: call device_unregister()
>>> before we free the resources, and the race no longer exists. The driver
>>> core holds a reference over ->remove_dev(), so AFAICT this is safe.
>>>
>>> Signed-off-by: Calvin Owens <calvinowens@fb.com>
>>> ---
>>>   drivers/scsi/ses.c | 3 ++-
>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c
>>> index 53ef1cb..0e8601a 100644
>>> --- a/drivers/scsi/ses.c
>>> +++ b/drivers/scsi/ses.c
>>> @@ -778,6 +778,8 @@ static void ses_intf_remove_enclosure(struct scsi_device *sdev)
>>>   	if (!edev)
>>>   		return;
>>>
>>> +	enclosure_unregister(edev);
>>> +
>>>   	ses_dev = edev->scratch;
>>>   	edev->scratch = NULL;
>>>
>>> @@ -789,7 +791,6 @@ static void ses_intf_remove_enclosure(struct scsi_device *sdev)
>>>   	kfree(edev->component[0].scratch);
>>>
>>>   	put_device(&edev->edev);
>>> -	enclosure_unregister(edev);
>>>   }
>>>
>>>   static void ses_intf_remove(struct device *cdev,
>>>
>>

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Martin K. Petersen July 29, 2016, 1:23 a.m. UTC | #4
>>>>> "Calvin" == Calvin Owens <calvinowens@fb.com> writes:

>> Any thoughts? Squinting at this more it still seems racy, but a
>> narrow race is surely better than just blatantly freeing everything
>> while the file is still exposed in /sys? Is there a better way you'd
>> prefer I accomplish this?
>> 
>> (I have boxes that OOPS all the time from monitoring code reading the
>> /sys files, with this patch I haven't seen a single one.)

Calvin> Ping? Thoughts, comments?

James: This is your puppy...
James Bottomley Aug. 12, 2016, 5:45 p.m. UTC | #5
On Thu, 2016-07-28 at 21:23 -0400, Martin K. Petersen wrote:
> > > > > > "Calvin" == Calvin Owens <calvinowens@fb.com> writes:
> 
> > > Any thoughts? Squinting at this more it still seems racy, but a
> > > narrow race is surely better than just blatantly freeing
> > > everything
> > > while the file is still exposed in /sys? Is there a better way
> > > you'd
> > > prefer I accomplish this?
> > > 
> > > (I have boxes that OOPS all the time from monitoring code reading
> > > the
> > > /sys files, with this patch I haven't seen a single one.)
> 
> Calvin> Ping? Thoughts, comments?
> 
> James: This is your puppy...

I thought it would be bigger by now going by the early paw size
indicator ...

Anyway

Reviewed-by: James Bottomley <jejb@linux.vnet.ibm.com>

James

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c
index 53ef1cb..0e8601a 100644
--- a/drivers/scsi/ses.c
+++ b/drivers/scsi/ses.c
@@ -778,6 +778,8 @@  static void ses_intf_remove_enclosure(struct scsi_device *sdev)
 	if (!edev)
 		return;
 
+	enclosure_unregister(edev);
+
 	ses_dev = edev->scratch;
 	edev->scratch = NULL;
 
@@ -789,7 +791,6 @@  static void ses_intf_remove_enclosure(struct scsi_device *sdev)
 	kfree(edev->component[0].scratch);
 
 	put_device(&edev->edev);
-	enclosure_unregister(edev);
 }
 
 static void ses_intf_remove(struct device *cdev,