mbox series

[0/6] usbip fixes to crashes found by syzbot

Message ID cover.1615171203.git.skhan@linuxfoundation.org (mailing list archive)
Headers show
Series usbip fixes to crashes found by syzbot | expand

Message

Shuah Khan March 8, 2021, 3:53 a.m. UTC
This patch series fixes the following problems founds in syzbot
fuzzing.

1. The first 3 patches fix usbip-host, vhci_hcd, vudc sub-drivers to
   validate the passed in file descriptor is a stream socket. If the
   file descriptor passed was a SOCK_DGRAM socket, sock_recvmsg()
   can't detect end of stream. Reported and fix suggested by Tetsuo Handa
2. All 3 sub-drivers use a common kthread_get_run() to create and 
   start threads. There are races in updating the local and shared status
   in the current stub-up (usbip-host, vudc) and attach (vhci) sequences
   resulting in crashes. These stem from starting rx and tx threads before
   local and shared state is updated correctly to be in sync.
    
    1. Doesn't handle kthread_create() error and saves invalid ptr in local
       state that drives rx and tx threads. Reported and fix suggested by
       Tetsuo Handa.
    2. Updates tcp_socket and sockfd,  starts stub_rx and stub_tx threads
       before updating usbip_device status to correct state. This opens up
       a race condition between the threads and tear down sequences.

TODO: Once these fixes are in, kthread_get_run() macro can be removed
      in a cleanup patch.

Credit goes to syzbot and Tetsuo Handa for finding and root-causing the
kthread_get_run() improper error handling problem and others. This is a
hard problem to find and debug since the races aren't seen in a normal
case. Fuzzing forces the race window to be small enough for the
kthread_get_run() error path bug and starting threads before updating the
local and shared state bug in the stub-up sequence.

Shuah Khan (6):
  usbip: fix stub_dev to check for stream socket
  usbip: fix vhci_hcd to check for stream socket
  usbip: fix vudc to check for stream socket
  usbip: fix stub_dev usbip_sockfd_store() races leading to gpf
  usbip: fix vhci_hcd attach_store() races leading to gpf
  usbip: fix vudc usbip_sockfd_store races leading to gpf

 drivers/usb/usbip/stub_dev.c   | 42 ++++++++++++++++++++++++-----
 drivers/usb/usbip/vhci_sysfs.c | 39 +++++++++++++++++++++++----
 drivers/usb/usbip/vudc_sysfs.c | 49 +++++++++++++++++++++++++++++-----
 3 files changed, 111 insertions(+), 19 deletions(-)

Comments

Greg Kroah-Hartman March 10, 2021, 6:33 p.m. UTC | #1
On Sun, Mar 07, 2021 at 08:53:25PM -0700, Shuah Khan wrote:
> This patch series fixes the following problems founds in syzbot
> fuzzing.

Thanks for these, all now queued up.

greg k-h
Tetsuo Handa March 11, 2021, 12:34 p.m. UTC | #2
On 2021/03/11 3:33, Greg KH wrote:
> On Sun, Mar 07, 2021 at 08:53:25PM -0700, Shuah Khan wrote:
>> This patch series fixes the following problems founds in syzbot
>> fuzzing.
> 
> Thanks for these, all now queued up.

I send SIGSTOP to

  [PATCH 4/6] usbip: fix stub_dev usbip_sockfd_store() races leading to gpf
  [PATCH 5/6] usbip: fix vhci_hcd attach_store() races leading to gpf
  [PATCH 6/6] usbip: fix vudc usbip_sockfd_store races leading to gpf

because these patches merely converted NULL pointer dererefence bug to use-after-free bug
by breaking kthread_get_run() into kthread_create()/get_task_struct()/wake_up_process().
Greg Kroah-Hartman March 11, 2021, 12:57 p.m. UTC | #3
On Thu, Mar 11, 2021 at 09:34:38PM +0900, Tetsuo Handa wrote:
> On 2021/03/11 3:33, Greg KH wrote:
> > On Sun, Mar 07, 2021 at 08:53:25PM -0700, Shuah Khan wrote:
> >> This patch series fixes the following problems founds in syzbot
> >> fuzzing.
> > 
> > Thanks for these, all now queued up.
> 
> I send SIGSTOP to
> 
>   [PATCH 4/6] usbip: fix stub_dev usbip_sockfd_store() races leading to gpf
>   [PATCH 5/6] usbip: fix vhci_hcd attach_store() races leading to gpf
>   [PATCH 6/6] usbip: fix vudc usbip_sockfd_store races leading to gpf
> 
> because these patches merely converted NULL pointer dererefence bug to use-after-free bug
> by breaking kthread_get_run() into kthread_create()/get_task_struct()/wake_up_process().

I'll take follow-on patches to fix that other issue, if it's proven to
be valid.  It's nice to fix up NULL dereference issues as soon as
possible :)

thanks,

greg k-h
Tetsuo Handa March 11, 2021, 1:24 p.m. UTC | #4
On 2021/03/11 21:57, Greg KH wrote:
> On Thu, Mar 11, 2021 at 09:34:38PM +0900, Tetsuo Handa wrote:
>> On 2021/03/11 3:33, Greg KH wrote:
>>> On Sun, Mar 07, 2021 at 08:53:25PM -0700, Shuah Khan wrote:
>>>> This patch series fixes the following problems founds in syzbot
>>>> fuzzing.
>>>
>>> Thanks for these, all now queued up.
>>
>> I send SIGSTOP to
>>
>>   [PATCH 4/6] usbip: fix stub_dev usbip_sockfd_store() races leading to gpf
>>   [PATCH 5/6] usbip: fix vhci_hcd attach_store() races leading to gpf
>>   [PATCH 6/6] usbip: fix vudc usbip_sockfd_store races leading to gpf
>>
>> because these patches merely converted NULL pointer dererefence bug to use-after-free bug
>> by breaking kthread_get_run() into kthread_create()/get_task_struct()/wake_up_process().
> 
> I'll take follow-on patches to fix that other issue, if it's proven to
> be valid.  It's nice to fix up NULL dereference issues as soon as
> possible :)

Not an "other issue". Shuah's [PATCH 4,5,6/6] is failing to fix NULL pointer dereference issue.
These patches simply replaces NULL pointer dereference issue (caused by preemption) with
use after free issue (caused by exactly same preemption) issue. Shuah has to understand
the consequence of calling wake_up_process() on rx thread in order to fix this NULL pointer
dereference issue.

The only fix we can safely apply now is
https://lkml.kernel.org/r/20210205135707.4574-1-penguin-kernel@I-love.SAKURA.ne.jp .
Since I and Shuah agreed that we will remove kthread_get_run(), it is nice to fix up
frequently happening -EINTR pointer dereference issue as soon as possible.
Tetsuo Handa March 12, 2021, 5:44 a.m. UTC | #5
I cloned git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux.git as you are testing changes there.

> commit 09e4522c87ff54c655c09f318a68b012eda8eb01 (HEAD -> usbip_test, origin/usbip_test)
> Author: Shuah Khan <skhan@linuxfoundation.org>
> Date:   Thu Mar 11 11:18:25 2021 -0700
>
>    usbip: fix vhci races in connection tear down
>
>    - Change vhci_device_reset() to reset tcp_socket and sockfd.
>      if !in_disconnect

How it can happen? vhci_device_reset() can be called only after vhci_shutdown_connection()
completed, and vhci_shutdown_connection() from subsequent requests cannot be called until
vhci_device_reset() completes. I consider it as a dead code which should be removed by
my "[PATCH v4 05/12] usb: usbip: don't reset tcp_socket at vhci_device_reset()".

And what you are missing in your [PATCH 4,5,6/6] is

  diff --git a/drivers/usb/usbip/vhci_sysfs.c b/drivers/usb/usbip/vhci_sysfs.c
  index c4457026d5ad..3c64bd06ab53 100644
  --- a/drivers/usb/usbip/vhci_sysfs.c
  +++ b/drivers/usb/usbip/vhci_sysfs.c
  @@ -423,6 +423,7 @@ static ssize_t attach_store(struct device *dev, struct device_attribute *attr,
          /* end the lock */
  
          wake_up_process(vdev->ud.tcp_rx);
  +       schedule_timeout_uninterruptible(HZ); // Consider being preempted here.
          wake_up_process(vdev->ud.tcp_tx);
  
          rh_port_connect(vdev, speed);

. wake_up_process(tcp_tx) can call vhci_shutdown_connection() before wake_up_process(tcp_tx) is called.
Since vhci_shutdown_connection() destroys tcp_tx thread and releases tcp_tx memory via kthread_stop_put(tcp_tx),
wake_up_process(tcp_tx) will access already freed memory. Your patch converted "NULL pointer dereference caused by
failing to call kthread_stop_put(tcp_tx)" into "use after free caused by succeeding to call kthread_stop_put(tcp_tx)".
Tetsuo Handa March 13, 2021, 12:48 a.m. UTC | #6
On 2021/03/12 14:44, Tetsuo Handa wrote:
> And what you are missing in your [PATCH 4,5,6/6] is
> 
>   diff --git a/drivers/usb/usbip/vhci_sysfs.c b/drivers/usb/usbip/vhci_sysfs.c
>   index c4457026d5ad..3c64bd06ab53 100644
>   --- a/drivers/usb/usbip/vhci_sysfs.c
>   +++ b/drivers/usb/usbip/vhci_sysfs.c
>   @@ -423,6 +423,7 @@ static ssize_t attach_store(struct device *dev, struct device_attribute *attr,
>           /* end the lock */
>   
>           wake_up_process(vdev->ud.tcp_rx);
>   +       schedule_timeout_uninterruptible(HZ); // Consider being preempted here.
>           wake_up_process(vdev->ud.tcp_tx);
>   
>           rh_port_connect(vdev, speed);
> 
> . wake_up_process(tcp_tx) can call vhci_shutdown_connection() before wake_up_process(tcp_tx) is called.

wake_up_process(tcp_rx) can call vhci_shutdown_connection() before wake_up_process(tcp_tx) is called.

> Since vhci_shutdown_connection() destroys tcp_tx thread and releases tcp_tx memory via kthread_stop_put(tcp_tx),
> wake_up_process(tcp_tx) will access already freed memory. Your patch converted "NULL pointer dereference caused by
> failing to call kthread_stop_put(tcp_tx)" into "use after free caused by succeeding to call kthread_stop_put(tcp_tx)".
> 

And note that

  diff --git a/drivers/usb/usbip/vhci_sysfs.c b/drivers/usb/usbip/vhci_sysfs.c
  index c4457026d5ad..0e1a81d4632c 100644
  --- a/drivers/usb/usbip/vhci_sysfs.c
  +++ b/drivers/usb/usbip/vhci_sysfs.c
  @@ -422,11 +422,11 @@ static ssize_t attach_store(struct device *dev, struct device_attribute *attr,
          spin_unlock_irqrestore(&vhci->lock, flags);
          /* end the lock */
  
  -       wake_up_process(vdev->ud.tcp_rx);
  -       wake_up_process(vdev->ud.tcp_tx);
  -
          rh_port_connect(vdev, speed);
  
  +       wake_up_process(vdev->ud.tcp_tx);
  +       wake_up_process(vdev->ud.tcp_rx);
  +
          return count;
   }
   static DEVICE_ATTR_WO(attach);

is as well not sufficient, for you are still missing

  diff --git a/drivers/usb/usbip/vhci_sysfs.c b/drivers/usb/usbip/vhci_sysfs.c
  index c4457026d5ad..c958f89a9196 100644
  --- a/drivers/usb/usbip/vhci_sysfs.c
  +++ b/drivers/usb/usbip/vhci_sysfs.c
  @@ -422,11 +422,13 @@ static ssize_t attach_store(struct device *dev, struct device_attribute *attr,
          spin_unlock_irqrestore(&vhci->lock, flags);
          /* end the lock */
  
  -       wake_up_process(vdev->ud.tcp_rx);
  -       wake_up_process(vdev->ud.tcp_tx);
  +       schedule_timeout_uninterruptible(HZ); // Consider being preempted here.
  
          rh_port_connect(vdev, speed);
  
  +       wake_up_process(vdev->ud.tcp_tx);
  +       wake_up_process(vdev->ud.tcp_rx);
  +
          return count;
   }
   static DEVICE_ATTR_WO(attach);

because vhci_port_disconnect() from detach_store() can call usbip_event_add(&vdev->ud, VDEV_EVENT_DOWN)
(same use after free bug regarding tcp_tx and tcp_rx) as soon as all shared states are set up and
spinlocks are released.

What you had better consider first is how to protect event_handler()/usbip_sockfd_store()/attach_store()/detach_store() functions
 from concurrent calls. Please respond to https://lkml.kernel.org/r/3dab66dc-2981-bc88-a370-4b3178dfd390@i-love.sakura.ne.jp
before you try to make further changes.
Tetsuo Handa March 14, 2021, 11:38 a.m. UTC | #7
On 2021/03/13 9:48, Tetsuo Handa wrote:
> On 2021/03/12 14:44, Tetsuo Handa wrote:
>> And what you are missing in your [PATCH 4,5,6/6] is
>>
>>   diff --git a/drivers/usb/usbip/vhci_sysfs.c b/drivers/usb/usbip/vhci_sysfs.c
>>   index c4457026d5ad..3c64bd06ab53 100644
>>   --- a/drivers/usb/usbip/vhci_sysfs.c
>>   +++ b/drivers/usb/usbip/vhci_sysfs.c
>>   @@ -423,6 +423,7 @@ static ssize_t attach_store(struct device *dev, struct device_attribute *attr,
>>           /* end the lock */
>>   
>>           wake_up_process(vdev->ud.tcp_rx);
>>   +       schedule_timeout_uninterruptible(HZ); // Consider being preempted here.
>>           wake_up_process(vdev->ud.tcp_tx);
>>   
>>           rh_port_connect(vdev, speed);
>>
>> . wake_up_process(tcp_tx) can call vhci_shutdown_connection() before wake_up_process(tcp_tx) is called.
> 
> wake_up_process(tcp_rx) can call vhci_shutdown_connection() before wake_up_process(tcp_tx) is called.
> 
>> Since vhci_shutdown_connection() destroys tcp_tx thread and releases tcp_tx memory via kthread_stop_put(tcp_tx),
>> wake_up_process(tcp_tx) will access already freed memory. Your patch converted "NULL pointer dereference caused by
>> failing to call kthread_stop_put(tcp_tx)" into "use after free caused by succeeding to call kthread_stop_put(tcp_tx)".
>>
> 
> And note that
> 
>   diff --git a/drivers/usb/usbip/vhci_sysfs.c b/drivers/usb/usbip/vhci_sysfs.c
>   index c4457026d5ad..0e1a81d4632c 100644
>   --- a/drivers/usb/usbip/vhci_sysfs.c
>   +++ b/drivers/usb/usbip/vhci_sysfs.c
>   @@ -422,11 +422,11 @@ static ssize_t attach_store(struct device *dev, struct device_attribute *attr,
>           spin_unlock_irqrestore(&vhci->lock, flags);
>           /* end the lock */
>   
>   -       wake_up_process(vdev->ud.tcp_rx);
>   -       wake_up_process(vdev->ud.tcp_tx);
>   -
>           rh_port_connect(vdev, speed);
>   
>   +       wake_up_process(vdev->ud.tcp_tx);
>   +       wake_up_process(vdev->ud.tcp_rx);
>   +
>           return count;
>    }
>    static DEVICE_ATTR_WO(attach);
> 
> is as well not sufficient, for you are still missing

Well, since tx thread can as well call usbip_event_add(USBIP_EH_SHUTDOWN), reversing
the order of wake_up_process(tcp_tx) and wake_up_process(tcp_rx) will not help.

> 
>   diff --git a/drivers/usb/usbip/vhci_sysfs.c b/drivers/usb/usbip/vhci_sysfs.c
>   index c4457026d5ad..c958f89a9196 100644
>   --- a/drivers/usb/usbip/vhci_sysfs.c
>   +++ b/drivers/usb/usbip/vhci_sysfs.c
>   @@ -422,11 +422,13 @@ static ssize_t attach_store(struct device *dev, struct device_attribute *attr,
>           spin_unlock_irqrestore(&vhci->lock, flags);
>           /* end the lock */
>   
>   -       wake_up_process(vdev->ud.tcp_rx);
>   -       wake_up_process(vdev->ud.tcp_tx);
>   +       schedule_timeout_uninterruptible(HZ); // Consider being preempted here.
>   
>           rh_port_connect(vdev, speed);
>   
>   +       wake_up_process(vdev->ud.tcp_tx);
>   +       wake_up_process(vdev->ud.tcp_rx);
>   +
>           return count;
>    }
>    static DEVICE_ATTR_WO(attach);
> 
> because vhci_port_disconnect() from detach_store() can call usbip_event_add(&vdev->ud, VDEV_EVENT_DOWN)
> (same use after free bug regarding tcp_tx and tcp_rx) as soon as all shared states are set up and
> spinlocks are released.
> 
> What you had better consider first is how to protect event_handler()/usbip_sockfd_store()/attach_store()/detach_store() functions
>  from concurrent calls. Please respond to https://lkml.kernel.org/r/3dab66dc-2981-bc88-a370-4b3178dfd390@i-love.sakura.ne.jp
> before you try to make further changes.
> 

After all, I believe that there is no choice but introduce a mutex for serialization.

Greg, please pick up https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux.git/commit/?h=usbip_test&id=f345de0d2e51a20a2a1c30fc22fa1527670d2095
and below patch.

From e0579aa776e4a3568c06f767c193d2204b64679d Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Sun, 14 Mar 2021 20:24:16 +0900
Subject: [PATCH v5] usb: usbip: serialize attach/detach operations

The root problem syzbot has found [1] is that usbip module is not using
serialization between attach/detach operations and event_handler().
This results in the following race windows.

  (1) two userspace processes can perform attach operation on the same
      device by writing the same content to the same attach interface
      file

  (2) one userspace process can perform detach operation on a device by
      writing to detach interface file while the other userspace process
      is performing attach operation on that device by writing to attach
      interface file

  (3) event_handler() kernel workqueue thread can perform detach operation
      on a device while some userspace process is still performing attach
      operation on that device

What syzbot is reporting is (3), and what commits 46613c9dfa964c0c,
718ad9693e365612 and 9380afd6df70e24e did not take into account is

  As soon as one side of {tx,rx} kernel threads starts from attach
  operation, that kernel thread is allowed to call
  usbip_event_add(USBIP_EH_SHUTDOWN) which in turn allows event_handler()
  to call kthread_stop_put() on both {tx,rx} kernel threads via
  ud->eh_ops.shutdown(ud), before the other side of {tx,rx} kernel threads
  starts from attach operation.

which will be reported as either NULL pointer dereference or use-after-free
bug on the other side of {tx,rx} kernel threads.

Since this race window cannot be closed without introducing serialization,
this patch introduces usbip_event_mutex for serializing attach/detach
operations and event_handler().

[1] https://syzkaller.appspot.com/bug?extid=95ce4b142579611ef0a9

Reported-by: syzbot <syzbot+95ce4b142579611ef0a9@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 46613c9dfa964c0c ("usbip: fix vudc usbip_sockfd_store races leading to gpf")
Fixes: 718ad9693e365612 ("usbip: fix vhci_hcd attach_store() races leading to gpf")
Fixes: 9380afd6df70e24e ("usbip: fix stub_dev usbip_sockfd_store() races leading to gpf")
---
 drivers/usb/usbip/stub_dev.c     | 15 +++++++++++++--
 drivers/usb/usbip/usbip_common.h |  2 ++
 drivers/usb/usbip/usbip_event.c  | 15 +++++++++++++++
 drivers/usb/usbip/vhci_sysfs.c   | 30 ++++++++++++++++++++++++++----
 drivers/usb/usbip/vudc_sysfs.c   | 16 +++++++++++++---
 5 files changed, 69 insertions(+), 9 deletions(-)

diff --git a/drivers/usb/usbip/stub_dev.c b/drivers/usb/usbip/stub_dev.c
index 8f1de1fbbeed..79ebc9795b4a 100644
--- a/drivers/usb/usbip/stub_dev.c
+++ b/drivers/usb/usbip/stub_dev.c
@@ -39,8 +39,8 @@ static DEVICE_ATTR_RO(usbip_status);
  * is used to transfer usbip requests by kernel threads. -1 is a magic number
  * by which usbip connection is finished.
  */
-static ssize_t usbip_sockfd_store(struct device *dev, struct device_attribute *attr,
-			    const char *buf, size_t count)
+static ssize_t __usbip_sockfd_store(struct device *dev, struct device_attribute *attr,
+				    const char *buf, size_t count)
 {
 	struct stub_device *sdev = dev_get_drvdata(dev);
 	int sockfd = 0;
@@ -132,6 +132,17 @@ static ssize_t usbip_sockfd_store(struct device *dev, struct device_attribute *a
 	spin_unlock_irq(&sdev->ud.lock);
 	return -EINVAL;
 }
+static ssize_t usbip_sockfd_store(struct device *dev, struct device_attribute *attr,
+				  const char *buf, size_t count)
+{
+	ssize_t ret = usbip_event_lock_killable();
+
+	if (ret)
+		return ret;
+	ret = __usbip_sockfd_store(dev, attr, buf, count);
+	usbip_event_unlock();
+	return ret;
+}
 static DEVICE_ATTR_WO(usbip_sockfd);
 
 static struct attribute *usbip_attrs[] = {
diff --git a/drivers/usb/usbip/usbip_common.h b/drivers/usb/usbip/usbip_common.h
index d60ce17d3dd2..ad7de3773e06 100644
--- a/drivers/usb/usbip/usbip_common.h
+++ b/drivers/usb/usbip/usbip_common.h
@@ -326,6 +326,8 @@ void usbip_stop_eh(struct usbip_device *ud);
 void usbip_event_add(struct usbip_device *ud, unsigned long event);
 int usbip_event_happened(struct usbip_device *ud);
 int usbip_in_eh(struct task_struct *task);
+int usbip_event_lock_killable(void);
+void usbip_event_unlock(void);
 
 static inline int interface_to_busnum(struct usb_interface *interface)
 {
diff --git a/drivers/usb/usbip/usbip_event.c b/drivers/usb/usbip/usbip_event.c
index 5d88917c9631..e05b858f346d 100644
--- a/drivers/usb/usbip/usbip_event.c
+++ b/drivers/usb/usbip/usbip_event.c
@@ -58,6 +58,19 @@ static struct usbip_device *get_event(void)
 }
 
 static struct task_struct *worker_context;
+static DEFINE_MUTEX(usbip_event_mutex);
+
+int usbip_event_lock_killable(void)
+{
+	return mutex_lock_killable(&usbip_event_mutex);
+}
+EXPORT_SYMBOL_GPL(usbip_event_lock_killable);
+
+void usbip_event_unlock(void)
+{
+	mutex_unlock(&usbip_event_mutex);
+}
+EXPORT_SYMBOL_GPL(usbip_event_unlock);
 
 static void event_handler(struct work_struct *work)
 {
@@ -68,6 +81,7 @@ static void event_handler(struct work_struct *work)
 	}
 
 	while ((ud = get_event()) != NULL) {
+		mutex_lock(&usbip_event_mutex);
 		usbip_dbg_eh("pending event %lx\n", ud->event);
 
 		/*
@@ -91,6 +105,7 @@ static void event_handler(struct work_struct *work)
 			unset_event(ud, USBIP_EH_UNUSABLE);
 		}
 
+		mutex_unlock(&usbip_event_mutex);
 		wake_up(&ud->eh_waitq);
 	}
 }
diff --git a/drivers/usb/usbip/vhci_sysfs.c b/drivers/usb/usbip/vhci_sysfs.c
index c4b4256e5dad..d06087e4e29b 100644
--- a/drivers/usb/usbip/vhci_sysfs.c
+++ b/drivers/usb/usbip/vhci_sysfs.c
@@ -225,8 +225,8 @@ static int valid_port(__u32 *pdev_nr, __u32 *rhport)
 	return 1;
 }
 
-static ssize_t detach_store(struct device *dev, struct device_attribute *attr,
-			    const char *buf, size_t count)
+static ssize_t __detach_store(struct device *dev, struct device_attribute *attr,
+			      const char *buf, size_t count)
 {
 	__u32 port = 0, pdev_nr = 0, rhport = 0;
 	struct usb_hcd *hcd;
@@ -263,6 +263,17 @@ static ssize_t detach_store(struct device *dev, struct device_attribute *attr,
 
 	return count;
 }
+static ssize_t detach_store(struct device *dev, struct device_attribute *attr,
+			    const char *buf, size_t count)
+{
+	ssize_t ret = usbip_event_lock_killable();
+
+	if (ret)
+		return ret;
+	ret = __detach_store(dev, attr, buf, count);
+	usbip_event_unlock();
+	return ret;
+}
 static DEVICE_ATTR_WO(detach);
 
 static int valid_args(__u32 *pdev_nr, __u32 *rhport,
@@ -300,8 +311,8 @@ static int valid_args(__u32 *pdev_nr, __u32 *rhport,
  *
  * write() returns 0 on success, else negative errno.
  */
-static ssize_t attach_store(struct device *dev, struct device_attribute *attr,
-			    const char *buf, size_t count)
+static ssize_t __attach_store(struct device *dev, struct device_attribute *attr,
+			      const char *buf, size_t count)
 {
 	struct socket *socket;
 	int sockfd = 0;
@@ -425,6 +436,17 @@ static ssize_t attach_store(struct device *dev, struct device_attribute *attr,
 
 	return count;
 }
+static ssize_t attach_store(struct device *dev, struct device_attribute *attr,
+			    const char *buf, size_t count)
+{
+	ssize_t ret = usbip_event_lock_killable();
+
+	if (ret)
+		return ret;
+	ret = __attach_store(dev, attr, buf, count);
+	usbip_event_unlock();
+	return ret;
+}
 static DEVICE_ATTR_WO(attach);
 
 #define MAX_STATUS_NAME 16
diff --git a/drivers/usb/usbip/vudc_sysfs.c b/drivers/usb/usbip/vudc_sysfs.c
index a3ec39fc6177..3e3e4ef298d2 100644
--- a/drivers/usb/usbip/vudc_sysfs.c
+++ b/drivers/usb/usbip/vudc_sysfs.c
@@ -90,9 +90,8 @@ static ssize_t dev_desc_read(struct file *file, struct kobject *kobj,
 }
 static BIN_ATTR_RO(dev_desc, sizeof(struct usb_device_descriptor));
 
-static ssize_t usbip_sockfd_store(struct device *dev,
-				  struct device_attribute *attr,
-				  const char *in, size_t count)
+static ssize_t __usbip_sockfd_store(struct device *dev, struct device_attribute *attr,
+				    const char *in, size_t count)
 {
 	struct vudc *udc = (struct vudc *) dev_get_drvdata(dev);
 	int rv;
@@ -219,6 +218,17 @@ static ssize_t usbip_sockfd_store(struct device *dev,
 
 	return ret;
 }
+static ssize_t usbip_sockfd_store(struct device *dev, struct device_attribute *attr,
+				  const char *in, size_t count)
+{
+	ssize_t ret = usbip_event_lock_killable();
+
+	if (ret)
+		return ret;
+	ret = __usbip_sockfd_store(dev, attr, in, count);
+	usbip_event_unlock();
+	return ret;
+}
 static DEVICE_ATTR_WO(usbip_sockfd);
 
 static ssize_t usbip_status_show(struct device *dev,
Tetsuo Handa March 17, 2021, 6:21 a.m. UTC | #8
Shuah, this driver is getting more and more cryptic and buggy.
Please explain the strategy for serialization before you write patches.

> - Fix attach_store() to check usbip_event_happened() before
>   waking up threads.

No, this helps nothing.

> diff --git a/drivers/usb/usbip/vhci_sysfs.c b/drivers/usb/usbip/vhci_sysfs.c
> index c4b4256e5dad3..f0a770adebd97 100644
> --- a/drivers/usb/usbip/vhci_sysfs.c
> +++ b/drivers/usb/usbip/vhci_sysfs.c
> @@ -418,6 +418,15 @@ static ssize_t attach_store(struct device *dev, struct device_attribute *attr,
>  	spin_unlock_irqrestore(&vhci->lock, flags);
>  	/* end the lock */
>  
> +	if (usbip_event_happened(&vdev->ud)) {
> +		/*
> +		 * something went wrong - event handler shutting
> +		 * the connection and doing reset - bail out
> +		 */
> +		dev_err(dev, "Event happended - handler is active\n");
> +		return -EAGAIN;
> +	}
> +

detach_store() can queue shutdown event as soon as reaching "/* end the lock */" line
but attach_store() might be preempted immediately after verifying that
usbip_event_happened() was false (i.e. at this location) in order to wait for
shutdown event posted by detach_store() to be processed.

>  	wake_up_process(vdev->ud.tcp_rx);
>  	wake_up_process(vdev->ud.tcp_tx);
>
Shuah Khan March 17, 2021, 3:06 p.m. UTC | #9
On 3/17/21 12:21 AM, Tetsuo Handa wrote:
> Shuah, this driver is getting more and more cryptic and buggy.
> Please explain the strategy for serialization before you write patches.
> 
>> - Fix attach_store() to check usbip_event_happened() before
>>    waking up threads.
> 
> No, this helps nothing.
> 
>> diff --git a/drivers/usb/usbip/vhci_sysfs.c b/drivers/usb/usbip/vhci_sysfs.c
>> index c4b4256e5dad3..f0a770adebd97 100644
>> --- a/drivers/usb/usbip/vhci_sysfs.c
>> +++ b/drivers/usb/usbip/vhci_sysfs.c
>> @@ -418,6 +418,15 @@ static ssize_t attach_store(struct device *dev, struct device_attribute *attr,
>>   	spin_unlock_irqrestore(&vhci->lock, flags);
>>   	/* end the lock */
>>   
>> +	if (usbip_event_happened(&vdev->ud)) {
>> +		/*
>> +		 * something went wrong - event handler shutting
>> +		 * the connection and doing reset - bail out
>> +		 */
>> +		dev_err(dev, "Event happended - handler is active\n");
>> +		return -EAGAIN;
>> +	}
>> +
> 
> detach_store() can queue shutdown event as soon as reaching "/* end the lock */" line
> but attach_store() might be preempted immediately after verifying that
> usbip_event_happened() was false (i.e. at this location) in order to wait for
> shutdown event posted by detach_store() to be processed.
> 

Yes. I haven't sent the patch for that reason. I am trying to test a
solution. I haven't come up with a solution yet.

Holding event_lock isn't the right solution. I am not going to accept
that. This is a window that gets triggered by syzbot injecting errors
in a sequence. Fixing this should be done taking other moving parts of
the driver into account.

thanks,
-- Shuah
Tetsuo Handa March 17, 2021, 3:38 p.m. UTC | #10
On 2021/03/18 0:06, Shuah Khan wrote:
> Yes. I haven't sent the patch for that reason. I am trying to test a
> solution. I haven't come up with a solution yet.
> 
> Holding event_lock isn't the right solution. I am not going to accept
> that. This is a window that gets triggered by syzbot injecting errors
> in a sequence. Fixing this should be done taking other moving parts of
> the driver into account.

What is event_lock ? I questioned you what the event_lock is at
https://lkml.kernel.org/r/3dab66dc-2981-bc88-a370-4b3178dfd390@i-love.sakura.ne.jp ,
but you haven't responded to that post.

Also, you need to send https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux.git/commit/?h=usbip_test&id=f345de0d2e51a20a2a1c30fc22fa1527670d2095
because Greg already sent this regression into upstream and stable kernels.
Shuah Khan March 17, 2021, 5:09 p.m. UTC | #11
On 3/17/21 9:38 AM, Tetsuo Handa wrote:
> On 2021/03/18 0:06, Shuah Khan wrote:
>> Yes. I haven't sent the patch for that reason. I am trying to test a
>> solution. I haven't come up with a solution yet.
>>
>> Holding event_lock isn't the right solution. I am not going to accept
>> that. This is a window that gets triggered by syzbot injecting errors
>> in a sequence. Fixing this should be done taking other moving parts of
>> the driver into account.
> 
> What is event_lock ? I questioned you what the event_lock is at
> https://lkml.kernel.org/r/3dab66dc-2981-bc88-a370-4b3178dfd390@i-love.sakura.ne.jp ,
> but you haven't responded to that post.
> 
> Also, you need to send https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux.git/commit/?h=usbip_test&id=f345de0d2e51a20a2a1c30fc22fa1527670d2095
> because Greg already sent this regression into upstream and stable kernels.
> 

I acked it when it came in and it will be picked up.

thanks,
-- Shuah
Shuah Khan March 18, 2021, 1:13 p.m. UTC | #12
On 3/17/21 9:06 AM, Shuah Khan wrote:
> On 3/17/21 12:21 AM, Tetsuo Handa wrote:
>> Shuah, this driver is getting more and more cryptic and buggy.
>> Please explain the strategy for serialization before you write patches.
>>
>>> - Fix attach_store() to check usbip_event_happened() before
>>>    waking up threads.
>>
>> No, this helps nothing.
>>
>>> diff --git a/drivers/usb/usbip/vhci_sysfs.c 
>>> b/drivers/usb/usbip/vhci_sysfs.c
>>> index c4b4256e5dad3..f0a770adebd97 100644
>>> --- a/drivers/usb/usbip/vhci_sysfs.c
>>> +++ b/drivers/usb/usbip/vhci_sysfs.c
>>> @@ -418,6 +418,15 @@ static ssize_t attach_store(struct device *dev, 
>>> struct device_attribute *attr,
>>>       spin_unlock_irqrestore(&vhci->lock, flags);
>>>       /* end the lock */
>>> +    if (usbip_event_happened(&vdev->ud)) {
>>> +        /*
>>> +         * something went wrong - event handler shutting
>>> +         * the connection and doing reset - bail out
>>> +         */
>>> +        dev_err(dev, "Event happended - handler is active\n");
>>> +        return -EAGAIN;
>>> +    }
>>> +
>>
>> detach_store() can queue shutdown event as soon as reaching "/* end 
>> the lock */" line
>> but attach_store() might be preempted immediately after verifying that
>> usbip_event_happened() was false (i.e. at this location) in order to 
>> wait for
>> shutdown event posted by detach_store() to be processed.
>>
> 

Please don't review code that isn't sent upstream. This repo you are
looking at is a private branch created just to verify fixes on syzbot.

I understand the race window you are talking about. I have my way of
working to resolve it.

thanks,
-- Shuah
Tetsuo Handa March 18, 2021, 1:39 p.m. UTC | #13
On 2021/03/18 22:13, Shuah Khan wrote:
> Please don't review code that isn't sent upstream. This repo you are
> looking at is a private branch created just to verify fixes on syzbot.

But nobody was able to review this series when sent to ML (except you simply
ignored my questions), and this series was merged to upstream and stable kernels
despite there was an obvious assignment error which results in NULL pointer
dereference.

> 
> I understand the race window you are talking about. I have my way of
> working to resolve it.

Without an effort to share your understanding/idea and my understanding/idea,
nobody can test/review what you call a solution.