diff mbox series

USB: core: hub: fix usb_hub worker blocking drain_all_pages() worker issue

Message ID 20230203072819.3408-1-zhongjiezhu1@gmail.com (mailing list archive)
State Superseded
Headers show
Series USB: core: hub: fix usb_hub worker blocking drain_all_pages() worker issue | expand

Commit Message

Zhongjie Zhu Feb. 3, 2023, 7:28 a.m. UTC
From: Zhongjie Zhu <zhongjiezhu1@gmail.com>

When disconnecting a usb mass storege, if there are a lot of inodes
like 10 thousands files need to be freed, the invalidate_inodes() will
run for a loog time to freeing all inodes, this will block other worker
to run in the cpu, so mark the usb_hub workqueue to WQ_CPU_INTENSIVE to
avoid this situation.

Sometimes the flowing call stack will hanppen: the cma_alloc() will be
blocked at __flush_work() by the hub_event worker.

usb_hub worker:
Call trace:
[<ffffffc010335e84>] invalidate_inodes+0xc0/0x294
<ffffffc0103659cc>] __invalidate_device+0x44/0xbc
[<ffffffc01059efe4>] invalidate_partition+0x7c/0xac
[<ffffffc01059ed54>] del_gendisk+0x148/0x35c
[<ffffffc0107620e4>] sd_remove+0x58/0xc4
[<ffffffc01070cb3c>] device_release_driver_internal+0x184/0x248
[<ffffffc010709e40>] bus_remove_device+0xdc/0x104
[<ffffffc0107068dc>] device_del+0x2b0/0x534
[<ffffffc01075c1a8>] __scsi_remove_device+0xc8/0x14c
[<ffffffc01075b788>] scsi_forget_host+0x60/0x7c
[<ffffffc010750040>] scsi_remove_host+0x84/0x130
[<ffffffc010882d60>] usb_stor_disconnect+0x7c/0xd0
[<ffffffc010828430>] usb_unbind_interface+0xc0/0x25c
[<ffffffc01070cb3c>] device_release_driver_internal+0x184/0x248
[<ffffffc010709e40>] bus_remove_device+0xdc/0x104
[<ffffffc0107068dc>] device_del+0x2b0/0x534
[<ffffffc010825b38>] usb_disable_device+0x80/0x180
[<ffffffc010817ef0>] usb_disconnect+0x128/0x314
[<ffffffc01081c95c>] hub_event+0x99c/0x16c8
[<ffffffc01011ed94>] process_one_work+0x2e8/0x574
[<ffffffc01011f2cc>] worker_thread+0x2ac/0x5e8
[<ffffffc0101252b4>] kthread+0x14c/0x15c
[<ffffffc010083ff0>] ret_from_fork+0x10/0x18

cma_alloc():
Call trace
[<ffffffc010090c18>] __switch_to+0x244/0x264
[<ffffffc0112153e0>] __schedule+0x564/0x754
[<ffffffc011215660>] schedule+0x90/0xc4
[<ffffffc011219468>] schedule_timeout+0x4c/0x1d0
[<ffffffc011216814>] do_wait_for_common+0xf8/0x1b4
[<ffffffc0112163bc>] wait_for_completion+0x50/0x68
[<ffffffc010119d94>] __flush_work.llvm.11756653060903382828+0x270/0x324
[<ffffffc0102d51e4>] drain_all_pages+0x224/0x2a8
[<ffffffc0103031fc>] start_isolate_page_range+0x1e4/0x2dc
[<ffffffc010e25624>] aml_cma_alloc_range+0xfc/0x3ec
[<ffffffc0103072e0>] cma_alloc+0x1ac/0x6d8

Signed-off-by: Zhongjie Zhu <zhongjiezhu1@gmail.com>
---
 drivers/usb/core/hub.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Alan Stern Feb. 3, 2023, 2:47 p.m. UTC | #1
On Fri, Feb 03, 2023 at 03:28:19PM +0800, Zhu Zhongjie wrote:
> From: Zhongjie Zhu <zhongjiezhu1@gmail.com>
> 
> When disconnecting a usb mass storege, if there are a lot of inodes
> like 10 thousands files need to be freed, the invalidate_inodes() will
> run for a loog time to freeing all inodes, this will block other worker
> to run in the cpu, so mark the usb_hub workqueue to WQ_CPU_INTENSIVE to
> avoid this situation.

Very infrequently this will happen.  In the vast majority of cases, the 
usb_hub workqueue uses very little CPU time.  Marking it 
WQ_CPU_INTENSIVE seems inappropriate.

Alan Stern
Zhongjie Zhu Feb. 6, 2023, 3:33 a.m. UTC | #2
Yes, this is a very special case.

It will happen only when disconnecting the mass storage if there are
too many files in the storage, and the scanning operation is running,
and the file system is not unmounted.
It looks like this issue should be fixed in the usb mass storage
driver, but I don't find an appropriate place.

On Fri, Feb 3, 2023 at 10:47 PM Alan Stern <stern@rowland.harvard.edu> wrote:
>
> On Fri, Feb 03, 2023 at 03:28:19PM +0800, Zhu Zhongjie wrote:
> > From: Zhongjie Zhu <zhongjiezhu1@gmail.com>
> >
> > When disconnecting a usb mass storege, if there are a lot of inodes
> > like 10 thousands files need to be freed, the invalidate_inodes() will
> > run for a loog time to freeing all inodes, this will block other worker
> > to run in the cpu, so mark the usb_hub workqueue to WQ_CPU_INTENSIVE to
> > avoid this situation.
>
> Very infrequently this will happen.  In the vast majority of cases, the
> usb_hub workqueue uses very little CPU time.  Marking it
> WQ_CPU_INTENSIVE seems inappropriate.
>
> Alan Stern
Alan Stern Feb. 6, 2023, 3:17 p.m. UTC | #3
On Mon, Feb 06, 2023 at 11:33:15AM +0800, 朱忠杰 wrote:
> Yes, this is a very special case.
> 
> It will happen only when disconnecting the mass storage if there are
> too many files in the storage, and the scanning operation is running,
> and the file system is not unmounted.
> It looks like this issue should be fixed in the usb mass storage
> driver, but I don't find an appropriate place.

That's not surprising, because usb-storage doesn't know anything about 
what's happening on the mass-storage device it connects to.  All it does 
is send the commands that it gets from the SCSI subsystem to the device 
and receive the results back.  It has no idea whether there is a mounted 
filesystem on the device, if the filesystem contains any files, or 
whether a scanning operation is running,

A better place to look for fixing this might be the filesystem code.  
That's where the information about mounting, files, and scanning can be 
found.

Alan Stern
Zhongjie Zhu Feb. 7, 2023, 2:02 a.m. UTC | #4
On Mon, Feb 6, 2023 at 11:17 PM Alan Stern <stern@rowland.harvard.edu> wrote:
>
> On Mon, Feb 06, 2023 at 11:33:15AM +0800, 朱忠杰 wrote:
> > Yes, this is a very special case.
> >
> > It will happen only when disconnecting the mass storage if there are
> > too many files in the storage, and the scanning operation is running,
> > and the file system is not unmounted.
> > It looks like this issue should be fixed in the usb mass storage
> > driver, but I don't find an appropriate place.
>
> That's not surprising, because usb-storage doesn't know anything about
> what's happening on the mass-storage device it connects to.  All it does
> is send the commands that it gets from the SCSI subsystem to the device
> and receive the results back.  It has no idea whether there is a mounted
> filesystem on the device, if the filesystem contains any files, or
> whether a scanning operation is running,
>
> A better place to look for fixing this might be the filesystem code.
> That's where the information about mounting, files, and scanning can be
> found.
>
> Alan Stern

The problem is there is a for loop in the invalidate_inodes(), this
function is in the block device driver. when the usb_disconnect is
called, the filesystem is not umounted, userspace applications will be
noticed the usb storage is disconnected, and then do the umounting
work.
the invalidate_inodes() is called in the usb hub worker, and will run
for a long time. To fix this issue, the long running loop need to be
moved out from the usb hub worker.
Alan Stern Feb. 7, 2023, 4:07 a.m. UTC | #5
On Tue, Feb 07, 2023 at 10:02:51AM +0800, Zhongjie Zhu wrote:
> On Mon, Feb 6, 2023 at 11:17 PM Alan Stern <stern@rowland.harvard.edu> wrote:
> >
> > On Mon, Feb 06, 2023 at 11:33:15AM +0800, 朱忠杰 wrote:
> > > Yes, this is a very special case.
> > >
> > > It will happen only when disconnecting the mass storage if there are
> > > too many files in the storage, and the scanning operation is running,
> > > and the file system is not unmounted.
> > > It looks like this issue should be fixed in the usb mass storage
> > > driver, but I don't find an appropriate place.
> >
> > That's not surprising, because usb-storage doesn't know anything about
> > what's happening on the mass-storage device it connects to.  All it does
> > is send the commands that it gets from the SCSI subsystem to the device
> > and receive the results back.  It has no idea whether there is a mounted
> > filesystem on the device, if the filesystem contains any files, or
> > whether a scanning operation is running,
> >
> > A better place to look for fixing this might be the filesystem code.
> > That's where the information about mounting, files, and scanning can be
> > found.
> >
> > Alan Stern
> 
> The problem is there is a for loop in the invalidate_inodes(), this
> function is in the block device driver. when the usb_disconnect is
> called, the filesystem is not umounted, userspace applications will be
> noticed the usb storage is disconnected, and then do the umounting
> work.
> the invalidate_inodes() is called in the usb hub worker, and will run
> for a long time. To fix this issue, the long running loop need to be
> moved out from the usb hub worker.

Oh, maybe I didn't understand.

You've got a USB mass-storage device with a mounted filesystem and a lot 
of dirty inodes, right?  Then a USB disconnect happens, and as part of 
the disconnect processing, invalidate_inodes() runs for a long time.

Do you know why it takes so long?  The I/O operations shouldn't need any 
time; they will all fail immediately because the device has been 
disconnected and so there is no way to communicate with it.

Alan Stern
Zhongjie Zhu Feb. 7, 2023, 8:23 a.m. UTC | #6
On Tue, Feb 7, 2023 at 12:07 PM Alan Stern <stern@rowland.harvard.edu> wrote:
>
> On Tue, Feb 07, 2023 at 10:02:51AM +0800, Zhongjie Zhu wrote:
> > On Mon, Feb 6, 2023 at 11:17 PM Alan Stern <stern@rowland.harvard.edu> wrote:
> > >
> > > On Mon, Feb 06, 2023 at 11:33:15AM +0800, 朱忠杰 wrote:
> > > > Yes, this is a very special case.
> > > >
> > > > It will happen only when disconnecting the mass storage if there are
> > > > too many files in the storage, and the scanning operation is running,
> > > > and the file system is not unmounted.
> > > > It looks like this issue should be fixed in the usb mass storage
> > > > driver, but I don't find an appropriate place.
> > >
> > > That's not surprising, because usb-storage doesn't know anything about
> > > what's happening on the mass-storage device it connects to.  All it does
> > > is send the commands that it gets from the SCSI subsystem to the device
> > > and receive the results back.  It has no idea whether there is a mounted
> > > filesystem on the device, if the filesystem contains any files, or
> > > whether a scanning operation is running,
> > >
> > > A better place to look for fixing this might be the filesystem code.
> > > That's where the information about mounting, files, and scanning can be
> > > found.
> > >
> > > Alan Stern
> >
> > The problem is there is a for loop in the invalidate_inodes(), this
> > function is in the block device driver. when the usb_disconnect is
> > called, the filesystem is not umounted, userspace applications will be
> > noticed the usb storage is disconnected, and then do the umounting
> > work.
> > the invalidate_inodes() is called in the usb hub worker, and will run
> > for a long time. To fix this issue, the long running loop need to be
> > moved out from the usb hub worker.
>
> Oh, maybe I didn't understand.
>
> You've got a USB mass-storage device with a mounted filesystem and a lot
> of dirty inodes, right?  Then a USB disconnect happens, and as part of
> the disconnect processing, invalidate_inodes() runs for a long time.
>
> Do you know why it takes so long?  The I/O operations shouldn't need any
> time; they will all fail immediately because the device has been
> disconnected and so there is no way to communicate with it.
>
> Alan Stern

Yes, invalidate_inodes() will free all the inodes related to the
supper_block, there are more
than 20 thousands inodes (some times more) need to be freed, the perf
record shows the
cpu is busy running the spin_lock and spin_unlock in the
invalidate_inodes(). The work in
this function is to free all the inodes with the super_block.
Maybe I need to find out why the spin_lock is running so much first.
diff mbox series

Patch

diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 9eca403af2a8..850549b30277 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -5904,7 +5904,8 @@  int usb_hub_init(void)
 	 * device was gone before the EHCI controller had handed its port
 	 * over to the companion full-speed controller.
 	 */
-	hub_wq = alloc_workqueue("usb_hub_wq", WQ_FREEZABLE, 0);
+	hub_wq = alloc_workqueue("usb_hub_wq",
+			WQ_FREEZABLE | WQ_CPU_INTENSIVE, 0);
 	if (hub_wq)
 		return 0;