Message ID | CAJrd-UuMRdWHky4gkmiR0QYozfXW0O35Ohv6mJPFx2TLa8hRKg@mail.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] usb: host: xhci: allow __GFP_FS in dma allocation | expand |
On Sat, May 18, 2019 at 01:02:28AM +0900, Jaewon Kim wrote: > Hello I don't have enough knowledge on USB core but I've wondered > why GFP_NOIO has been used in xhci_alloc_dev for > xhci_alloc_virt_device. I found commit ("a6d940dd759b xhci: Use > GFP_NOIO during device reset"). But can we just change GFP_NOIO > to __GFP_RECLAIM | __GFP_FS ? No. __GFP_FS implies __GFP_IO; you can't set __GFP_FS and clear __GFP_IO. It seems like the problem you have is using the CMA to do DMA allocation. Why would you do such a thing? > Please refer to below case. > > I got a report from Lee YongTaek <ytk.lee@samsung.com> that the > xhci_alloc_virt_device was too slow over 2 seconds only for one page > allocation. > > 1) It was because kernel version was v4.14 and DMA allocation was > done from CMA(Contiguous Memory Allocator) where CMA region was > almost filled with file page and CMA passes GFP down to page > isolation. And the page isolation only allows file page isolation only to > requests having __GFP_FS. > > 2) Historically CMA was changed at v4.19 to use GFP_KERNEL > regardless of GFP passed to DMA allocation through the > commit 6518202970c1 "(mm/cma: remove unsupported gfp_mask > parameter from cma_alloc()". > > I think pre v4.19 the xhci_alloc_virt_device could be very slow > depending on CMA situation but free to USB deadlock issue. But as of > v4.19, I think, it will be fast but can face the deadlock issue. > Consequently I think to meet the both cases, I think USB can pass > __GFP_FS without __GFP_IO. > > If __GFP_FS is passed from USB core, of course, the CMA patch also > need to be changed to pass GFP.
Thank you for your comment. In ARM64 architecture, default CMA region is commonly activated and it could be used if no specific memory region is defined. The USB driver in my platform also uses the default CMA region as DMA allocation. If using the CMA to do DMA allocation is improper, then do you think that the USB driver for my platform should be changed not to use CMA? According to my understanding, in CONFIG_DMA_CMA on both v4.14 and v5.0, __GFP_DIRECT_RECLAIM will try CMA allocation first instead of normal buddy allocation. Then it will get default CMA region through dev_get_cma_area API. Please refer to dev_get_cma_area code below though I am using v4.14 for my platform. git show v5.0:include/linux/dma-contiguous.h 61 #ifdef CONFIG_DMA_CMA 62 63 extern struct cma *dma_contiguous_default_area; 64 65 static inline struct cma *dev_get_cma_area(struct device *dev) 66 { 67 if (dev && dev->cma_area) 68 return dev->cma_area; 69 return dma_contiguous_default_area; 70 } Thank you 2019년 5월 18일 (토) 오전 1:34, Matthew Wilcox <willy@infradead.org>님이 작성: > > On Sat, May 18, 2019 at 01:02:28AM +0900, Jaewon Kim wrote: > > Hello I don't have enough knowledge on USB core but I've wondered > > why GFP_NOIO has been used in xhci_alloc_dev for > > xhci_alloc_virt_device. I found commit ("a6d940dd759b xhci: Use > > GFP_NOIO during device reset"). But can we just change GFP_NOIO > > to __GFP_RECLAIM | __GFP_FS ? > > No. __GFP_FS implies __GFP_IO; you can't set __GFP_FS and clear __GFP_IO. > > It seems like the problem you have is using the CMA to do DMA allocation. > Why would you do such a thing? > > > Please refer to below case. > > > > I got a report from Lee YongTaek <ytk.lee@samsung.com> that the > > xhci_alloc_virt_device was too slow over 2 seconds only for one page > > allocation. > > > > 1) It was because kernel version was v4.14 and DMA allocation was > > done from CMA(Contiguous Memory Allocator) where CMA region was > > almost filled with file page and CMA passes GFP down to page > > isolation. And the page isolation only allows file page isolation only to > > requests having __GFP_FS. > > > > 2) Historically CMA was changed at v4.19 to use GFP_KERNEL > > regardless of GFP passed to DMA allocation through the > > commit 6518202970c1 "(mm/cma: remove unsupported gfp_mask > > parameter from cma_alloc()". > > > > I think pre v4.19 the xhci_alloc_virt_device could be very slow > > depending on CMA situation but free to USB deadlock issue. But as of > > v4.19, I think, it will be fast but can face the deadlock issue. > > Consequently I think to meet the both cases, I think USB can pass > > __GFP_FS without __GFP_IO. > > > > If __GFP_FS is passed from USB core, of course, the CMA patch also > > need to be changed to pass GFP. > >
Folks, you can't just pass arbitary GFP_ flags to dma allocation routines, beause very often they are not just wrappers around the page allocator. So no, you can't just fine grained control the flags, but the existing code is just as buggy. Please switch to use memalloc_noio_save() instead.
On So, 2019-05-19 at 22:56 -0700, Christoph Hellwig wrote: > Folks, you can't just pass arbitary GFP_ flags to dma allocation > routines, beause very often they are not just wrappers around > the page allocator. > > So no, you can't just fine grained control the flags, but the > existing code is just as buggy. > > Please switch to use memalloc_noio_save() instead. > Hi, we actually do. It is just higher up in the calling path: int usb_reset_device(struct usb_device *udev) { int ret; int i; unsigned int noio_flag; struct usb_port *port_dev; struct usb_host_config *config = udev->actconfig; struct usb_hub *hub = usb_hub_to_struct_hub(udev->parent); if (udev->state == USB_STATE_NOTATTACHED || udev->state == USB_STATE_SUSPENDED) { dev_dbg(&udev->dev, "device reset not allowed in state %d\n", udev->state); return -EINVAL; } if (!udev->parent) { /* this requires hcd-specific logic; see ohci_restart() */ dev_dbg(&udev->dev, "%s for root hub!\n", __func__); return -EISDIR; } port_dev = hub->ports[udev->portnum - 1]; /* * Don't allocate memory with GFP_KERNEL in current * context to avoid possible deadlock if usb mass * storage interface or usbnet interface(iSCSI case) * is included in current configuration. The easist * approach is to do it for every device reset, * because the device 'memalloc_noio' flag may have * not been set before reseting the usb device. */ noio_flag = memalloc_noio_save(); So, do we need to audit the mem_flags again? What are we supposed to use? GFP_KERNEL? Regards Oliver
On Mon, May 20, 2019 at 11:09:25AM +0200, Oliver Neukum wrote: > we actually do. It is just higher up in the calling path: Perfect! > So, do we need to audit the mem_flags again? > What are we supposed to use? GFP_KERNEL? GFP_KERNEL if you can block, GFP_ATOMIC if you can't for a good reason, that is the allocation is from irq context or under a spinlock. If you think you have a case where you think you don't want to block, but it is not because of the above reasons we need to have a chat about the details.
On Mon, 20 May 2019, Christoph Hellwig wrote: > On Mon, May 20, 2019 at 11:09:25AM +0200, Oliver Neukum wrote: > > we actually do. It is just higher up in the calling path: > > Perfect! > > > So, do we need to audit the mem_flags again? > > What are we supposed to use? GFP_KERNEL? > > GFP_KERNEL if you can block, GFP_ATOMIC if you can't for a good reason, > that is the allocation is from irq context or under a spinlock. If you > think you have a case where you think you don't want to block, but it > is not because of the above reasons we need to have a chat about the > details. What if the allocation requires the kernel to swap some old pages out to the backing store, but the backing store is on the device that the driver is managing? The swap can't take place until the current I/O operation is complete (assuming the driver can handle only one I/O operation at a time), and the current operation can't complete until the old pages are swapped out. Result: deadlock. Isn't that the whole reason for using GFP_NOIO in the first place? Alan Stern
On Mon, May 20, 2019 at 10:16:57AM -0400, Alan Stern wrote: > What if the allocation requires the kernel to swap some old pages out > to the backing store, but the backing store is on the device that the > driver is managing? The swap can't take place until the current I/O > operation is complete (assuming the driver can handle only one I/O > operation at a time), and the current operation can't complete until > the old pages are swapped out. Result: deadlock. > > Isn't that the whole reason for using GFP_NOIO in the first place? It is, or rather was. As it has been incredibly painful to wire up the gfp_t argument through some callstacks, most notably the vmalloc allocator which is used by a lot of the DMA allocators on non-coherent platforms, we now have the memalloc_noio_save and memalloc_nofs_save functions that mark a thread as not beeing to go into I/O / FS reclaim. So even if you use GFP_KERNEL you will not dip into reclaim with those flags set on the thread.
On Mo, 2019-05-20 at 07:23 -0700, Christoph Hellwig wrote: > On Mon, May 20, 2019 at 10:16:57AM -0400, Alan Stern wrote: > > What if the allocation requires the kernel to swap some old pages out > > to the backing store, but the backing store is on the device that the > > driver is managing? The swap can't take place until the current I/O > > operation is complete (assuming the driver can handle only one I/O > > operation at a time), and the current operation can't complete until > > the old pages are swapped out. Result: deadlock. > > > > Isn't that the whole reason for using GFP_NOIO in the first place? > > It is, or rather was. As it has been incredibly painful to wire > up the gfp_t argument through some callstacks, most notably the > vmalloc allocator which is used by a lot of the DMA allocators on > non-coherent platforms, we now have the memalloc_noio_save and > memalloc_nofs_save functions that mark a thread as not beeing to > go into I/O / FS reclaim. So even if you use GFP_KERNEL you will > not dip into reclaim with those flags set on the thread. OK, but this leaves a question open. Will the GFP_NOIO actually hurt, if it is used after memalloc_noio_save()? Regards Oliver
On Mo, 2019-05-20 at 10:16 -0400, Alan Stern wrote: > On Mon, 20 May 2019, Christoph Hellwig wrote: > > > GFP_KERNEL if you can block, GFP_ATOMIC if you can't for a good reason, > > that is the allocation is from irq context or under a spinlock. If you > > think you have a case where you think you don't want to block, but it > > is not because of the above reasons we need to have a chat about the > > details. > > What if the allocation requires the kernel to swap some old pages out > to the backing store, but the backing store is on the device that the > driver is managing? The swap can't take place until the current I/O > operation is complete (assuming the driver can handle only one I/O > operation at a time), and the current operation can't complete until > the old pages are swapped out. Result: deadlock. > > Isn't that the whole reason for using GFP_NOIO in the first place? Hi, lookig at this it seems to me that we are in danger of a deadlock - during reset - devices cannot do IO while being reset covered by the USB layer in usb_reset_device - resume & restore - devices cannot do IO while suspended covered by driver core in rpm_callback - disconnect - a disconnected device cannot do IO is this a theoretical case or should I do something to the driver core? How about changing configurations on USB? Regards Oliver
On Tue, May 21, 2019 at 10:54:37AM +0200, Oliver Neukum wrote: > OK, but this leaves a question open. Will the GFP_NOIO actually > hurt, if it is used after memalloc_noio_save()? Unless we have a bug somewhere it should not make any difference, neither positively nor negatively.
On Tue, 21 May 2019, Oliver Neukum wrote: > On Mo, 2019-05-20 at 10:16 -0400, Alan Stern wrote: > > On Mon, 20 May 2019, Christoph Hellwig wrote: > > > > > GFP_KERNEL if you can block, GFP_ATOMIC if you can't for a good reason, > > > that is the allocation is from irq context or under a spinlock. If you > > > think you have a case where you think you don't want to block, but it > > > is not because of the above reasons we need to have a chat about the > > > details. > > > > What if the allocation requires the kernel to swap some old pages out > > to the backing store, but the backing store is on the device that the > > driver is managing? The swap can't take place until the current I/O > > operation is complete (assuming the driver can handle only one I/O > > operation at a time), and the current operation can't complete until > > the old pages are swapped out. Result: deadlock. > > > > Isn't that the whole reason for using GFP_NOIO in the first place? > > Hi, > > lookig at this it seems to me that we are in danger of a deadlock > > - during reset - devices cannot do IO while being reset > covered by the USB layer in usb_reset_device > - resume & restore - devices cannot do IO while suspended > covered by driver core in rpm_callback > - disconnect - a disconnected device cannot do IO > is this a theoretical case or should I do something to > the driver core? > > How about changing configurations on USB? Changing configurations amounts to much the same as disconnecting, because both operations destroy all the existing interfaces. Disconnect can arise in two different ways. Physical hot-unplug: All I/O operations will fail. Rmmod or unbind: I/O operations will succeed. The second case is probably okay. The first we can do nothing about. However, in either case we do need to make sure that memory allocations do not require any writebacks. This suggests that we need to call memalloc_noio_save() from within usb_unbind_interface(). Alan Stern
On Di, 2019-05-21 at 10:00 -0400, Alan Stern wrote: > > Changing configurations amounts to much the same as disconnecting, > because both operations destroy all the existing interfaces. > > Disconnect can arise in two different ways. > > Physical hot-unplug: All I/O operations will fail. > > Rmmod or unbind: I/O operations will succeed. > > The second case is probably okay. The first we can do nothing about. > However, in either case we do need to make sure that memory allocations > do not require any writebacks. This suggests that we need to call > memalloc_noio_save() from within usb_unbind_interface(). I agree with the problem, but I fail to see why this issue would be specific to USB. Shouldn't this be done in the device core layer? Regards Oliver
On Wed, 22 May 2019, Oliver Neukum wrote: > On Di, 2019-05-21 at 10:00 -0400, Alan Stern wrote: > > > > Changing configurations amounts to much the same as disconnecting, > > because both operations destroy all the existing interfaces. > > > > Disconnect can arise in two different ways. > > > > Physical hot-unplug: All I/O operations will fail. > > > > Rmmod or unbind: I/O operations will succeed. > > > > The second case is probably okay. The first we can do nothing about. > > However, in either case we do need to make sure that memory allocations > > do not require any writebacks. This suggests that we need to call > > memalloc_noio_save() from within usb_unbind_interface(). > > I agree with the problem, but I fail to see why this issue would be > specific to USB. Shouldn't this be done in the device core layer? Only for drivers that are on the block-device writeback path. The device core doesn't know which drivers these are. Alan Stern
On Mi, 2019-05-22 at 10:56 -0400, Alan Stern wrote: > On Wed, 22 May 2019, Oliver Neukum wrote: > > > I agree with the problem, but I fail to see why this issue would be > > specific to USB. Shouldn't this be done in the device core layer? > > Only for drivers that are on the block-device writeback path. The > device core doesn't know which drivers these are. Neither does USB know. It is very hard to predict or even tell which devices are block device drivers. I think we must assume that any device may be affected. Regards Oliver
On So, 2019-05-19 at 22:56 -0700, Christoph Hellwig wrote: > Folks, you can't just pass arbitary GFP_ flags to dma allocation > routines, beause very often they are not just wrappers around > the page allocator. > > So no, you can't just fine grained control the flags, but the > existing code is just as buggy. > > Please switch to use memalloc_noio_save() instead. Thinking about this again, we have a problem. We introduced memalloc_noio_save() in 3.10 . Hence the code should have been correct in v4.14. Which means that either 6518202970c1 "(mm/cma: remove unsupported gfp_mask parameter from cma_alloc()" is buggy, or the original issue with a delay of 2 seconds still exist. Do we need to do something? Regards Oliver
On Wed, 22 May 2019, Oliver Neukum wrote: > On Mi, 2019-05-22 at 10:56 -0400, Alan Stern wrote: > > On Wed, 22 May 2019, Oliver Neukum wrote: > > > > > I agree with the problem, but I fail to see why this issue would be > > > specific to USB. Shouldn't this be done in the device core layer? > > > > Only for drivers that are on the block-device writeback path. The > > device core doesn't know which drivers these are. > > Neither does USB know. It is very hard to predict or even tell which > devices are block device drivers. I think we must assume that > any device may be affected. All right. Would you like to submit a patch? Alan Stern
On Thu, May 23, 2019 at 02:32:09PM +0200, Oliver Neukum wrote: > > Please switch to use memalloc_noio_save() instead. > > Thinking about this again, we have a problem. We introduced > memalloc_noio_save() in 3.10 . Hence the code should have been > correct in v4.14. Which means that either > 6518202970c1 "(mm/cma: remove unsupported gfp_mask > parameter from cma_alloc()" > is buggy, or the original issue with a delay of 2 seconds > still exist. > > Do we need to do something? cma_alloc calls into alloc_contig_range to do the actual allocation, which then calls current_gfp_context() to pick up the adjustments from memalloc_noio_save and friends. So at least in current mainline we should be fine.
Am Donnerstag, den 23.05.2019, 10:01 -0400 schrieb Alan Stern: > On Wed, 22 May 2019, Oliver Neukum wrote: > > > On Mi, 2019-05-22 at 10:56 -0400, Alan Stern wrote: > > > On Wed, 22 May 2019, Oliver Neukum wrote: > > > > > > > I agree with the problem, but I fail to see why this issue would be > > > > specific to USB. Shouldn't this be done in the device core layer? > > > > > > Only for drivers that are on the block-device writeback path. The > > > device core doesn't know which drivers these are. > > > > Neither does USB know. It is very hard to predict or even tell which > > devices are block device drivers. I think we must assume that > > any device may be affected. > > All right. Would you like to submit a patch? Do you like this one? Regards Oliver From 0dc9c7dfe994fc9c28a63ba283e4442c237f6989 Mon Sep 17 00:00:00 2001 From: Oliver Neukum <oneukum@suse.com> Date: Tue, 28 May 2019 11:43:02 +0200 Subject: [PATCH] base: force NOIO allocations during unplug There is one overlooked situation under which a driver must not do IO to allocate memory. You cannot do that while disconnecting a device. A device being disconnected is no longer functional in most cases, yet IO may fail only when the handler runs. Signed-off-by: Oliver Neukum <oneukum@suse.com> --- drivers/base/core.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/base/core.c b/drivers/base/core.c index fd7511e04e62..a7f5f45bd761 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -2229,6 +2229,7 @@ void device_del(struct device *dev) struct device *parent = dev->parent; struct kobject *glue_dir = NULL; struct class_interface *class_intf; + unsigned int noio_flag; /* * Hold the device lock and set the "dead" flag to guarantee that @@ -2256,6 +2257,7 @@ void device_del(struct device *dev) device_remove_sys_dev_entry(dev); device_remove_file(dev, &dev_attr_dev); } + noio_flag = memalloc_noio_save(); if (dev->class) { device_remove_class_symlinks(dev); @@ -2277,6 +2279,8 @@ void device_del(struct device *dev) device_platform_notify(dev, KOBJ_REMOVE); device_remove_properties(dev); device_links_purge(dev); + memalloc_noio_restore(noio_flag); + if (dev->bus) blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
On Tue, 28 May 2019, Oliver Neukum wrote: > Am Donnerstag, den 23.05.2019, 10:01 -0400 schrieb Alan Stern: > > On Wed, 22 May 2019, Oliver Neukum wrote: > > > > > On Mi, 2019-05-22 at 10:56 -0400, Alan Stern wrote: > > > > On Wed, 22 May 2019, Oliver Neukum wrote: > > > > > > > > > I agree with the problem, but I fail to see why this issue would be > > > > > specific to USB. Shouldn't this be done in the device core layer? > > > > > > > > Only for drivers that are on the block-device writeback path. The > > > > device core doesn't know which drivers these are. > > > > > > Neither does USB know. It is very hard to predict or even tell which > > > devices are block device drivers. I think we must assume that > > > any device may be affected. > > > > All right. Would you like to submit a patch? > > Do you like this one? Hmmm. I might be inclined to move the start of the I/O-protected region a little earlier. For example, the first blocking_notifier_call_chain() might result in some memory allocations. The end is okay; once bus_remove_device() has returned the driver will be completely unbound, so there shouldn't be any pending I/O through the device. Alan Stern
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index 005e65922608..38abcd03a1a2 100644 --- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -3893,7 +3893,7 @@ int xhci_alloc_dev(struct usb_hcd *hcd, struct usb_device *udev) * xhci_discover_or_reset_device(), which may be called as part of * mass storage driver error handling. */ - if (!xhci_alloc_virt_device(xhci, slot_id, udev, GFP_NOIO)) { + if (!xhci_alloc_virt_device(xhci, slot_id, udev, __GFP_RECLAIM | __GFP_FS)) { xhci_warn(xhci, "Could not allocate xHCI USB device